diff options
author | Erik van Oosten <e.vanoosten@grons.nl> | 2016-06-29 13:24:00 +0200 |
---|---|---|
committer | Jens Geyer <jensg@apache.org> | 2016-09-21 22:21:34 +0200 |
commit | 3f5fa5fa43e5d83f6b3ab7d441ffaa7e578340c6 (patch) | |
tree | c4b2b9b05ff7562a903f3371c24ee86e5eb0c9f6 /doc/specs | |
parent | 04e6f62c8fc68a1e846544c45943aad76934ce56 (diff) | |
download | thrift-3f5fa5fa43e5d83f6b3ab7d441ffaa7e578340c6.tar.gz |
THRIFT-3867 Specify BinaryProtocol and CompactProtocol
Component: Documentation
Patch: Erik van Oosten <e.vanoosten@grons.nl>
This closes #1036
Diffstat (limited to 'doc/specs')
-rw-r--r-- | doc/specs/thrift-binary-protocol.md | 252 | ||||
-rw-r--r-- | doc/specs/thrift-compact-protocol.md | 292 | ||||
-rw-r--r-- | doc/specs/thrift-rpc.md | 176 |
3 files changed, 720 insertions, 0 deletions
diff --git a/doc/specs/thrift-binary-protocol.md b/doc/specs/thrift-binary-protocol.md new file mode 100644 index 000000000..b56d261dc --- /dev/null +++ b/doc/specs/thrift-binary-protocol.md @@ -0,0 +1,252 @@ +Thrift Binary protocol encoding +=============================== + +-------------------------------------------------------------------- + +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + +-------------------------------------------------------------------- + +This documents describes the wire encoding for RPC using the older Thrift *binary protocol*. + +The information here is _mostly_ based on the Java implementation in the Apache thrift library (version 0.9.1 and +0.9.3). Other implementation however, should behave the same. + +For background on Thrift see the [Thrift whitepaper (pdf)](https://thrift.apache.org/static/files/thrift-20070401.pdf). + +# Contents + +* Binary protocol + * Base types + * Message + * Struct + * List and Set + * Map +* BNF notation used in this document + +# Binary protocol + +## Base types + +### Integer encoding + +In the _binary protocol_ integers are encoded with the most significant byte first (big endian byte order, aka network +order). An `int8` needs 1 byte, an `int16` 2, an `int32` 4 and an `int64` needs 8 bytes. + +The CPP version has the option to use the binary protocol with little endian order. Little endian gives a small but +noticeable performance boost because contemporary CPUs use little endian when storing integers to RAM. + +### Enum encoding + +The generated code encodes `Enum`s by taking the ordinal value and then encoding that as an int32. + +### Binary encoding + +Binary is sent as follows: + +``` +Binary protocol, binary data, 4+ bytes: ++--------+--------+--------+--------+--------+...+--------+ +| byte length | bytes | ++--------+--------+--------+--------+--------+...+--------+ +``` + +Where: + +* `byte length` is the length of the byte array, a signed 32 bit integer encoded in network (big endian) order (must be >= 0). +* `bytes` are the bytes of the byte array. + +### String encoding + +*String*s are first encoded to UTF-8, and then send as binary. + +### Double encoding + +Values of type `double` are first converted to an int64 according to the IEEE 754 floating-point "double format" bit +layout. Most run-times provide a library to make this conversion. Both the binary protocol as the compact protocol then +encode the int64 in 8 bytes in big endian order. + +### Boolean encoding + +Values of `bool` type are first converted to an int8. True is converted to `1`, false to `0`. + +## Message + +A `Message` can be encoded in two different ways: + +``` +Binary protocol Message, strict encoding, 12+ bytes: ++--------+--------+--------+--------+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+ +|1vvvvvvv|vvvvvvvv|unused |00000mmm| name length | name | seq id | ++--------+--------+--------+--------+--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+ +``` + +Where: + +* `vvvvvvvvvvvvvvv` is the version, an unsigned 15 bit number fixed to `1` (in binary: `000 0000 0000 0001`). + The leading bit is `1`. +* `unused` is an ignored byte. +* `mmm` is the message type, an unsigned 3 bit integer. The 5 leading bits must be `0` as some clients (checked for + java in 0.9.1) take the whole byte. +* `name length` is the byte length of the name field, a signed 32 bit integer encoded in network (big endian) order (must be >= 0). +* `name` is the method name, a UTF-8 encoded string. +* `seq id` is the sequence id, a signed 32 bit integer encoded in network (big endian) order. + +The second, older encoding (aka non-strict) is: + +``` +Binary protocol Message, old encoding, 9+ bytes: ++--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+--------+ +| name length | name |00000mmm| seq id | ++--------+--------+--------+--------+--------+...+--------+--------+--------+--------+--------+--------+ +``` + +Where `name length`, `name`, `mmm`, `seq id` are as above. + +Because `name length` must be positive (therefore the first bit is always `0`), the first bit allows the receiver to see +whether the strict format or the old format is used. Therefore a server and client using the different variants of the +binary protocol can transparently talk with each other. However, when strict mode is enforced, the old format is +rejected. + +Message types are encoded with the following values: + +* _Call_: 1 +* _Reply_: 2 +* _Exception_: 3 +* _Oneway_: 4 + +## Struct + +A *Struct* is a sequence of zero or more fields, followed by a stop field. Each field starts with a field header and +is followed by the encoded field value. The encoding can be summarized by the following BNF: + +``` +struct ::= ( field-header field-value )* stop-field +field-header ::= field-type field-id +``` + +Because each field header contains the field-id (as defined by the Thrift IDL file), the fields can be encoded in any +order. Thrift's type system is not extensible; you can only encode the primitive types and structs. Therefore is also +possible to handle unknown fields while decoding; these are simply ignored. While decoding the field type can be used to +determine how to decode the field value. + +Note that the field name is not encoded so field renames in the IDL do not affect forward and backward compatibility. + +The default Java implementation (Apache Thrift 0.9.1) has undefined behavior when it tries to decode a field that has +another field-type then what is expected. Theoretically this could be detected at the cost of some additional checking. +Other implementation may perform this check and then either ignore the field, or return a protocol exception. + +A *Union* is encoded exactly the same as a struct with the additional restriction that at most 1 field may be encoded. + +An *Exception* is encoded exactly the same as a struct. + +### Struct encoding + +In the binary protocol field headers and the stop field are encoded as follows: + +``` +Binary protocol field header and field value: ++--------+--------+--------+--------+...+--------+ +|tttttttt| field id | field value | ++--------+--------+--------+--------+...+--------+ + +Binary protocol stop field: ++--------+ +|00000000| ++--------+ +``` + +Where: + +* `tttttttt` the field-type, a signed 8 bit integer. +* `field id` the field-id, a signed 16 bit integer in big endian order. +* `field-value` the encoded field value. + +The following field-types are used: + +* `BOOL`, encoded as `2` +* `BYTE`, encoded as `3` +* `DOUBLE`, encoded as `4` +* `I16`, encoded as `6` +* `I32`, encoded as `8` +* `I64`, encoded as `10` +* `STRING`, used for binary and string fields, encoded as `11` +* `STRUCT`, used for structs and union fields, encoded as `12` +* `MAP`, encoded as `13` +* `SET`, encoded as `14` +* `LIST`, encoded as `15` + +## List and Set + +List and sets are encoded the same: a header indicating the size and the element-type of the elements, followed by the +encoded elements. + +``` +Binary protocol list (5+ bytes) and elements: ++--------+--------+--------+--------+--------+--------+...+--------+ +|tttttttt| size | elements | ++--------+--------+--------+--------+--------+--------+...+--------+ +``` + +Where: + +* `tttttttt` is the element-type, encoded as an int8 +* `size` is the size, encoded as an int32, positive values only +* `elements` the element values + +The element-type values are the same as field-types. The full list is included in the struct section above. + +The maximum list/set size is configurable. By default there is no limit (meaning the limit is the maximum int32 value: +2147483647). + +## Map + +Maps are encoded with a header indicating the size, the element-type of the keys and the element-type of the elements, +followed by the encoded elements. The encoding follows this BNF: + +``` +map ::= key-element-type value-element-type size ( key value )* +``` + +``` +Binary protocol map (6+ bytes) and key value pairs: ++--------+--------+--------+--------+--------+--------+--------+...+--------+ +|kkkkkkkk|vvvvvvvv| size | key value pairs | ++--------+--------+--------+--------+--------+--------+--------+...+--------+ +``` + +Where: + +* `kkkkkkkk` is the key element-type, encoded as an int8 +* `vvvvvvvv` is the value element-type, encoded as an int8 +* `size` is the size of the map, encoded as an int32, positive values only +* `key value pairs` are the encoded keys and values + +The element-type values are the same as field-types. The full list is included in the struct section above. + +The maximum map size is configurable. By default there is no limit (meaning the limit is the maximum int32 value: +2147483647). + +# BNF notation used in this document + +The following BNF notation is used: + +* a plus `+` appended to an item represents repetition; the item is repeated 1 or more times +* a star `*` appended to an item represents optional repetition; the item is repeated 0 or more times +* a pipe `|` between items represents choice, the first matching item is selected +* parenthesis `(` and `)` are used for grouping multiple items diff --git a/doc/specs/thrift-compact-protocol.md b/doc/specs/thrift-compact-protocol.md new file mode 100644 index 000000000..96e7b0eee --- /dev/null +++ b/doc/specs/thrift-compact-protocol.md @@ -0,0 +1,292 @@ +Thrift Compact protocol encoding +================================ + +-------------------------------------------------------------------- + +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + +-------------------------------------------------------------------- + +This documents describes the wire encoding for RPC using the Thrift *compact protocol*. + +The information here is _mostly_ based on the Java implementation in the Apache thrift library (version 0.9.1) and +[THRIFT-110 A more compact format](https://issues.apache.org/jira/browse/THRIFT-110). Other implementation however, +should behave the same. + +For background on Thrift see the [Thrift whitepaper (pdf)](https://thrift.apache.org/static/files/thrift-20070401.pdf). + +# Contents + +* Compact protocol + * Base types + * Message + * Struct + * List and Set + * Map +* BNF notation used in this document + +# Compact protocol + +## Base types + +### Integer encoding + +The _compact protocol_ uses multiple encodings for ints: the _zigzag int_, and the _var int_. + +Values of type `int32` and `int64` are first transformed to a *zigzag int*. A zigzag int folds positive and negative +numbers into the positive number space. When we read 0, 1, 2, 3, 4 or 5 from the wire, this is translated to 0, -1, 1, +-2 or 2 respectively. Here are the (Scala) formulas to convert from int32/int64 to a zigzag int and back: + +```scala +def intToZigZag(n: Int): Int = (n << 1) ^ (n >> 31) +def zigzagToInt(n: Int): Int = (n >>> 1) ^ - (n & 1) +def longToZigZag(n: Long): Long = (n << 1) ^ (n >> 63) +def zigzagToLong(n: Long): Long = (n >>> 1) ^ - (n & 1) +``` + +The zigzag int is then encoded as a *var int*. Var ints take 1 to 5 bytes (int32) or 1 to 10 bytes (int64). The most +significant bit of each byte indicates if more bytes follow. The concatenation of the least significant 7 bits from each +byte form the number, where the first byte has the most significant bits (so they are in big endian or network order). + +Var ints are sometimes used directly inside the compact protocol to represent positive numbers. + +To encode an `int16` as zigzag int, it is first converted to an `int32` and then encoded as such. The type `int8` simply +uses a single byte as in the binary protocol. + +### Enum encoding + +The generated code encodes `Enum`s by taking the ordinal value and then encoding that as an int32. + +### Binary encoding + +Binary is sent as follows: + +``` +Binary protocol, binary data, 1+ bytes: ++--------+...+--------+--------+...+--------+ +| byte length | bytes | ++--------+...+--------+--------+...+--------+ +``` + +Where: + +* `byte length` is the length of the byte array, using var int encoding (must be >= 0). +* `bytes` are the bytes of the byte array. + +### String encoding + +*String*s are first encoded to UTF-8, and then send as binary. + +### Double encoding + +Values of type `double` are first converted to an int64 according to the IEEE 754 floating-point "double format" bit +layout. Most run-times provide a library to make this conversion. Both the binary protocol as the compact protocol then +encode the int64 in 8 bytes in big endian order. + +### Boolean encoding + +Booleans are encoded differently depending on whether it is a field value (in a struct) or an element value (in a set, +list or map). Field values are encoded directly in the field header. Element values of type `bool` are sent as an int8; +true as `1` and false as `0`. + +## Message + +A `Message` on the wire looks as follows: + +``` +Compact protocol Message (4+ bytes): ++--------+--------+--------+...+--------+--------+...+--------+--------+...+--------+ +|pppppppp|mmmvvvvv| seq id | name length | name | ++--------+--------+--------+...+--------+--------+...+--------+--------+...+--------+ +``` + +Where: + +* `pppppppp` is the protocol id, fixed to `1000 0010`, 0x82. +* `mmm` is the message type, an unsigned 3 bit integer. +* `vvvvv` is the version, an unsigned 5 bit integer, fixed to `00001`. +* `seq id` is the sequence id, a signed 32 bit integer encoded as a var int. +* `name length` is the byte length of the name field, a signed 32 bit integer encoded as a var int (must be >= 0). +* `name` is the method name to invoke, a UTF-8 encoded string. + +Message types are encoded with the following values: + +* _Call_: 1 +* _Reply_: 2 +* _Exception_: 3 +* _Oneway_: 4 + +### Struct + +A *Struct* is a sequence of zero or more fields, followed by a stop field. Each field starts with a field header and +is followed by the encoded field value. The encoding can be summarized by the following BNF: + +``` +struct ::= ( field-header field-value )* stop-field +field-header ::= field-type field-id +``` + +Because each field header contains the field-id (as defined by the Thrift IDL file), the fields can be encoded in any +order. Thrift's type system is not extensible; you can only encode the primitive types and structs. Therefore is also +possible to handle unknown fields while decoding; these are simply ignored. While decoding the field type can be used to +determine how to decode the field value. + +Note that the field name is not encoded so field renames in the IDL do not affect forward and backward compatibility. + +The default Java implementation (Apache Thrift 0.9.1) has undefined behavior when it tries to decode a field that has +another field-type then what is expected. Theoretically this could be detected at the cost of some additional checking. +Other implementation may perform this check and then either ignore the field, or return a protocol exception. + +A *Union* is encoded exactly the same as a struct with the additional restriction that at most 1 field may be encoded. + +An *Exception* is encoded exactly the same as a struct. + +### Struct encoding + +``` +Compact protocol field header (short form) and field value: ++--------+--------+...+--------+ +|ddddtttt| field value | ++--------+--------+...+--------+ + +Compact protocol field header (1 to 3 bytes, long form) and field value: ++--------+--------+...+--------+--------+...+--------+ +|0000tttt| field id | field value | ++--------+--------+...+--------+--------+...+--------+ + +Compact protocol stop field: ++--------+ +|00000000| ++--------+ +``` + +Where: + +* `dddd` is the field id delta, an unsigned 4 bits integer, strictly positive. +* `tttt` is field-type id, an unsigned 4 bit integer. +* `field id` the field id, a signed 16 bit integer encoded as zigzag int. +* `field-value` the encoded field value. + +The field id delta can be computed by `current-field-id - previous-field-id`, or just `current-field-id` if this is the +first of the struct. The short form should be used when the field id delta is in the range 1 - 15 (inclusive). + +The following field-types can be encoded: + +* `BOOLEAN_TRUE`, encoded as `1` +* `BOOLEAN_FALSE`, encoded as `2` +* `BYTE`, encoded as `3` +* `I16`, encoded as `4` +* `I32`, encoded as `5` +* `I64`, encoded as `6` +* `DOUBLE`, encoded as `7` +* `BINARY`, used for binary and string fields, encoded as `8` +* `LIST`, encoded as `9` +* `SET`, encoded as `10` +* `MAP`, encoded as `11` +* `STRUCT`, used for both structs and union fields, encoded as `12` + +Note that because there are 2 specific field types for the boolean values, the encoding of a boolean field value has no +length (0 bytes). + +## List and Set + +List and sets are encoded the same: a header indicating the size and the element-type of the elements, followed by the +encoded elements. + +``` +Compact protocol list header (1 byte, short form) and elements: ++--------+--------+...+--------+ +|sssstttt| elements | ++--------+--------+...+--------+ + +Compact protocol list header (2+ bytes, long form) and elements: ++--------+--------+...+--------+--------+...+--------+ +|1111tttt| size | elements | ++--------+--------+...+--------+--------+...+--------+ +``` + +Where: + +* `ssss` is the size, 4 bit unsigned int, values `0` - `14` +* `tttt` is the element-type, a 4 bit unsigned int +* `size` is the size, a var int (int32), positive values `15` or higher +* `elements` are the encoded elements + +The short form should be used when the length is in the range 0 - 14 (inclusive). + +The following element-types are used (note that these are _different_ from the field-types): + +* `BOOL`, encoded as `2` +* `BYTE`, encoded as `3` +* `DOUBLE`, encoded as `4` +* `I16`, encoded as `6` +* `I32`, encoded as `8` +* `I64`, encoded as `10` +* `STRING`, used for binary and string fields, encoded as `11` +* `STRUCT`, used for structs and union fields, encoded as `12` +* `MAP`, encoded as `13` +* `SET`, encoded as `14` +* `LIST`, encoded as `15` + + +The maximum list/set size is configurable. By default there is no limit (meaning the limit is the maximum int32 value: +2147483647). + +## Map + +Maps are encoded with a header indicating the size, the type of the keys and the element-type of the elements, followed +by the encoded elements. The encoding follows this BNF: + +``` +map ::= empty-map | non-empty-map +empty-map ::= `0` +non-empty-map ::= size key-element-type value-element-type (key value)+ +``` + +``` +Compact protocol map header (1 byte, empty map): ++--------+ +|00000000| ++--------+ + +Compact protocol map header (2+ bytes, non empty map) and key value pairs: ++--------+...+--------+--------+--------+...+--------+ +| size |kkkkvvvv| key value pairs | ++--------+...+--------+--------+--------+...+--------+ +``` + +Where: + +* `size` is the size, a var int (int32), strictly positive values +* `kkkk` is the key element-type, a 4 bit unsigned int +* `vvvv` is the value element-type, a 4 bit unsigned int +* `key value pairs` are the encoded keys and values + +The element-types are the same as for lists. The full list is included in the 'List and set' section. + +The maximum map size is configurable. By default there is no limit (meaning the limit is the maximum int32 value: +2147483647). + +# BNF notation used in this document + +The following BNF notation is used: + +* a plus `+` appended to an item represents repetition; the item is repeated 1 or more times +* a star `*` appended to an item represents optional repetition; the item is repeated 0 or more times +* a pipe `|` between items represents choice, the first matching item is selected +* parenthesis `(` and `)` are used for grouping multiple items diff --git a/doc/specs/thrift-rpc.md b/doc/specs/thrift-rpc.md new file mode 100644 index 000000000..1c59abd08 --- /dev/null +++ b/doc/specs/thrift-rpc.md @@ -0,0 +1,176 @@ +Thrift Remote Procedure Call +============================ + +-------------------------------------------------------------------- + +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. + +-------------------------------------------------------------------- + +This document describes the high level message exchange between the Thrift RPC client and server. +See [thrift-binary-protocol.md] and [thrift-compact-protocol.md] for a description of how the exchanges are encoded on +the wire. + +In addition, this document compares the binary protocol with the compact protocol. Finally it describes the framed vs. +unframed transport. + +The information here is _mostly_ based on the Java implementation in the Apache thrift library (version 0.9.1 and +0.9.3). Other implementation however, should behave the same. + +For background on Thrift see the [Thrift whitepaper (pdf)](https://thrift.apache.org/static/files/thrift-20070401.pdf). + +# Contents + +* Thrift Message exchange for Remote Procedure Call + * Message + * Request struct + * Response struct +* Protocol considerations + * Comparing binary and compact protocol + * Compatibility + * Framed vs unframed transport + +# Thrift Remote Procedure Call Message exchange + +Both the binary protocol and the compact protocol assume a transport layer that exposes a bi-directional byte stream, +for example a TCP socket. Both use the following exchange: + +1. Client sends a `Message` (type `Call` or `Oneway`). The TMessage contains some metadata and the name of the method + to invoke. +2. Client sends method arguments (a struct defined by the generate code). +3. Server sends a `Message` (type `Reply` or `Exception`) to start the response. +4. Server sends a struct containing the method result or exception. + +The pattern is a simple half duplex protocol where the parties alternate in sending a `Message` followed by a struct. +What these are is described below. + +Although the standard Apache Thrift Java clients do not support pipelining (sending multiple requests without waiting +for an response), the standard Apache Thrift Java servers do support it. + +## Message + +A *Message* contains: + +* _Name_, a string (can be empty). +* _Message type_, a message types, one of `Call`, `Reply`, `Exception` and `Oneway`. +* _Sequence id_, a signed int32 integer. + +The *sequence id* is a simple message id assigned by the client. The server will use the same sequence id in the +message of the response. The client uses this number to detect out of order responses. Each client has an int32 field +which is increased for each message. The sequence id simply wraps around when it overflows. + +The *name* indicates the service method name to invoke. The server copies the name in the response message. + +When the *multiplexed protocol* is used, the name contains the service name, a colon `:` and the method name. The +multiplexed protocol is not compatible with other protocols. + +The *message type* indicates what kind of message is sent. Clients send requests with TMessages of type `Call` or +`Oneway` (step 1 in the protocol exchange). Servers send responses with messages of type `Exception` or `Reply` (step +3). + +Type `Reply` is used when the service method completes normally. That is, it returns a value or it throws one of the +exceptions defined in the Thrift IDL file. + +Type `Exception` is used for other exceptions. That is: when the service method throws an exception that is not declared +in the Thrift IDL file, or some other part of the Thrift stack throws an exception. For example when the server could +not encode or decode a message or struct. + +In the Java implementation (0.9.3) there is different behavior for the synchronous and asynchronous server. In the async +server all exceptions are send as a `TApplicationException` (see 'Response struct' below). In the synchronous Java +implementation only (undeclared) exceptions that extend `TException` are send as a `TApplicationException`. Unchecked +exceptions lead to an immediate close of the connection. + +Type `Oneway` is only used starting from Apache Thrift 0.9.3. Earlier versions do _not_ send TMessages of type `Oneway`, +even for service methods defined with the `oneway` modifier. + +When client sends a request with type `Oneway`, the server must _not_ send a response (steps 3 and 4 are skipped). Note +that the Thrift IDL enforces a return type of `void` and does not allow exceptions for oneway services. + +## Request struct + +The struct that follows the message of type `Call` or `Oneway` contains the arguments of the service method. The +argument ids correspond to the field ids. The name of the struct is the name of the method with `_args` appended. +For methods without arguments an struct is sent without fields. + +## Response struct + +The struct that follows the message of type `Reply` are structs in which exactly 1 of the following fields is encoded: + +* A field with name `success` and id `0`, used in case the method completed normally. +* An exception field, name and id are as defined in the `throws` clause in the Thrift IDL's service method definition. + +When the message is of type `Exception` the struct is encoded as if it was declared by the following IDL: + +``` +exception TApplicationException { + 1: string message, + 2: i32 type +} +``` + +The following exception types are defined in the java implementation (0.9.3): + +* _unknown_: 0, used in case the type from the peer is unknown. +* _unknown method_: 1, used in case the method requested by the client is unknown by the server. +* _invalid message type_: 2, no usage was found. +* _wrong method name_: 3, no usage was found. +* _bad sequence id_: 4, used internally by the client to indicate a wrong sequence id in the response. +* _missing result_: 5, used internally by the client to indicate a response without any field (result nor exception). +* _internal error_: 6, used when the server throws an exception that is not declared in the Thrift IDL file. +* _protocol error_: 7, used when something goes wrong during decoding. For example when a list is too long or a required + field is missing. +* _invalid transform_: 8, no usage was found. +* _invalid protocol_: 9, no usage was found. +* _unsupported client type_: 10, no usage was found. + +# Protocol considerations + +## Comparing binary and compact protocol + +The binary protocol is fairly simple and therefore easy to process. The compact protocol needs less bytes to send the +same data at the cost of additional processing. As bandwidth is usually the bottleneck, the compact protocol is almost +always slightly faster. + +## Compatibility + +A server could automatically determine whether a client talks the binary protocol or the compact protocol by +investigating the first byte. If the value is `1000 0001` or `0000 0000` (assuming a name shorter then ±16 MB) it is the +binary protocol. When the value is `1000 0010` it is talking the compact protocol. + +## Framed vs. unframed transport + +The first thrift binary wire format was unframed. This means that information is sent out in a single stream of bytes. +With unframed transport the (generated) processors will read directly from the socket (though Apache Thrift does try to +grab all available bytes from the socket in a buffer when it can). + +Later, Thrift introduced the framed transport. + +With framed transport the full request and response (the TMessage and the following struct) are first written to a +buffer. Then when the struct is complete (transport method `flush` is hijacked for this), the length of the buffer is +written to the socket first, followed by the buffered bytes. The combination is called a _frame_. On the receiver side +the complete frame is first read in a buffer before the message is passed to a processor. + +The length prefix is a 4 byte signed int, send in network (big endian) order. +The following must be true: `0` <= length <= `16384000` (16M). + +Framed transport was introduced to ease the implementation of async processors. An async processor is only invoked when +all data is received. Unfortunately, framed transport is not ideal for large messages as the entire frame stays in +memory until the message has been processed. In addition, the java implementation merges the incoming data to a single, +growing byte array. Every time the byte array is full it needs to be copied to a new larger byte array. + +Framed and unframed transports are not compatible with each other. |