TOC 
Network Working GroupS. Ferguson
Internet-DraftE. Ong
Intended status: Standards TrackCaucho Technology Inc.
Expires: February 27, 2008August 26, 2007


Hessian 2.0 Serialization Protocol
hessian.txt

Status of this Memo

By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on February 27, 2008.

Copyright Notice

Copyright © The IETF Trust (2007).



Table of Contents

1.  Introduction
2.  Design Goals
3.  Hessian Grammar
4.  Serialization
    4.1.  binary data
        4.1.1.  Compact: short binary
        4.1.2.  Binary Examples
    4.2.  boolean
        4.2.1.  Boolean Examples
    4.3.  date
        4.3.1.  Date Examples
    4.4.  double
        4.4.1.  Compact: double zero
        4.4.2.  Compact: double one
        4.4.3.  Compact: double octet
        4.4.4.  Compact: double short
        4.4.5.  Compact: double float
        4.4.6.  Double Examples
    4.5.  int
        4.5.1.  Compact: single octet integers
        4.5.2.  Compact: two octet integers
        4.5.3.  Compact: three octet integers
        4.5.4.  Integer Examples
    4.6.  list
        4.6.1.  Compact: repeated list
        4.6.2.  List examples
    4.7.  long
        4.7.1.  Compact: single octet longs
        4.7.2.  Compact: two octet longs
        4.7.3.  Compact: three octet longs
        4.7.4.  Compact: four octet longs
        4.7.5.  Long Examples
    4.8.  map
        4.8.1.  Map examples
    4.9.  null
    4.10.  object
        4.10.1.  Compact: class definition
        4.10.2.  Compact: object instantiation
        4.10.3.  Object examples
    4.11.  ref
        4.11.1.  Compact: two octet reference
        4.11.2.  Compact: three octet reference
        4.11.3.  Ref Examples
    4.12.  string
        4.12.1.  Compact: short strings
        4.12.2.  String Examples
    4.13.  type
    4.14.  Compact: type references
5.  Reference Maps
    5.1.  value reference
    5.2.  class reference
    5.3.  type reference
6.  Bytecode map
§  Authors' Addresses
§  Intellectual Property and Copyright Statements




 TOC 

1.  Introduction

Hessian is a dynamically-typed, binary serialization and Web Services protocol designed for object-oriented transmission.



 TOC 

2.  Design Goals

Hessian is dynamically-typed, compact, and portable across languages.

The Hessian protocol has the following design goals:



 TOC 

3.  Hessian Grammar



Serialization Grammar

           # starting production
top        ::= value

           # 8-bit binary data split into 64k chunks
binary     ::= 'b' b1 b0 <binary-data> binary # non-final chunk
           ::= 'B' b1 b0 <binary-data>        # final chunk
           ::= [x20-x2f] <binary-data>        # binary data of
                                                 #  length 0-15

           # boolean true/false
boolean    ::= 'T'
           ::= 'F'

           # definition for an object (compact map)
class-def  ::= 'O' type int string*

           # time in UTC encoded as 64-bit long milliseconds since
           #  epoch
date       ::= 'd' b7 b6 b5 b4 b3 b2 b1 b0

           # 64-bit IEEE double
double     ::= 'D' b7 b6 b5 b4 b3 b2 b1 b0
           ::= x67                   # 0.0
           ::= x68                   # 1.0
           ::= x69 b0                # byte cast to double
                                     #  (-128.0 to 127.0)
           ::= x6a b1 b0             # short cast to double
           ::= x6b b3 b2 b1 b0       # 32-bit float cast to double

           # 32-bit signed integer
int        ::= 'I' b3 b2 b1 b0
           ::= [x80-xbf]             # -x10 to x3f
           ::= [xc0-xcf] b0          # -x800 to x7ff
           ::= [xd0-xd7] b1 b0       # -x40000 to x3ffff

           # list/vector length
length     ::= 'l' b3 b2 b1 b0
           ::= x6e int

           # list/vector
list       ::= 'V' type? length? value* 'z'
           ::= 'v' int int value*    # type-ref, length

           # 64-bit signed long integer
long       ::= 'L' b7 b6 b5 b4 b3 b2 b1 b0
           ::= [xd8-xef]             # -x08 to x0f
           ::= [xf0-xff] b0          # -x800 to x7ff
           ::= [x38-x3f] b1 b0       # -x40000 to x3ffff
           ::= x77 b3 b2 b1 b0       # 32-bit integer cast to long

           # map/object
map        ::= 'M' type? (value value)* 'z'  # key, value map pairs

           # null value
null       ::= 'N'

           # Object instance
object     ::= 'o' int value*

           # value reference (e.g. circular trees and graphs)
ref        ::= 'R' b3 b2 b1 b0    # reference to nth map/list/object in
                                  #  stream
           ::= x4a b0             # reference to 1-255th map/list/object
           ::= x4b b1 b0          # reference to 1-65535th map/list/object

           # UTF-8 encoded character string split into 64k chunks
string     ::= 's' b1 b0 <utf8-data> string  # non-final chunk
           ::= 'S' b1 b0 <utf8-data>         # string of length
                                             #  0-65535
           ::= [x00-x1f] <utf8-data>         # string of length
                                             #  0-31

           # map/list types for OO languages
type       ::= 't' b1 b0 <type-string>         # type name
           ::= x75 int                         # type reference

           # main production
value      ::= null
           ::= binary
           ::= boolean
           ::= date
           ::= double
           ::= int
           ::= list
           ::= long
           ::= map
           ::= class-def value
           ::= ref
           ::= string
 Figure 1 



 TOC 

4.  Serialization

Hessian's object serialization has 8 primitive types:

  1. raw binary (binary data) data
  2. boolean (boolean)
  3. 64-bit date (date)
  4. 64-bit double (double)
  5. 32-bit int (int)
  6. 64-bit long (long)
  7. null (null)
  8. UTF8-encoded string (string)

It has 3 recursive types:

  1. list (list) for lists and arrays
  2. map (map) for maps and dictionaries
  3. object (object) for objects

Finally, it has one special contruct:

  1. ref (ref) for shared and circular object references.

Hessian 2.0 has 3 internal reference maps:

  1. An object/list reference map (value reference).
  2. An class definition reference map (class reference).
  3. A type (class name) reference map (type reference).


 TOC 

4.1.  binary data



Binary Grammar

binary ::= b b1 b0 <binary-data> binary
       ::= B b1 b0 <binary-data>
       ::= [x20-x2f] <binary-data>
 Figure 2 

Binary data is encoded in chunks. The octet x42 ('B') encodes the final chunk and x62 ('b') represents any non-final chunk. Each chunk has a 16-bit length value.

len = 256 * b1 + b0



 TOC 

4.1.1.  Compact: short binary

Binary data with length less than 15 may be encoded by a single octet length [x20-x2f].

len = code - 0x20



 TOC 

4.1.2.  Binary Examples



x20               # zero-length binary data

x23 x01 x02 x03   # 3 octet data

B x10 x00 ....    # 4k final chunk of data

b x04 x00 ....    # 1k non-final chunk of data
 Figure 3 



 TOC 

4.2.  boolean



Boolean Grammar

boolean ::= T
        ::= F
 Figure 4 

The octet 'F' represents false and the octet T represents true.



 TOC 

4.2.1.  Boolean Examples



T   # true
F   # false
 Figure 5 



 TOC 

4.3.  date



Date Grammar

date ::= d b7 b6 b5 b4 b3 b2 b1 b0
 Figure 6 

Date represented by a 64-bit long of milliseconds since the Jan 1 1970 00:00H, UTC.



 TOC 

4.3.1.  Date Examples



d x00 x00 x00 xd0 x4b x92 x84 xb8   # 2:51:31 May 8, 1998 UTC
 Figure 7 



 TOC 

4.4.  double



Double Grammar

double ::= D b7 b6 b5 b4 b3 b2 b1 b0
       ::= x67
       ::= x68
       ::= x69 b0
       ::= x6a b1 b0
       ::= x6b b3 b2 b1 b0
 Figure 8 

A 64-bit IEEE floating pointer number.



 TOC 

4.4.1.  Compact: double zero

The double 0.0 can be represented by the octet x67



 TOC 

4.4.2.  Compact: double one

The double 1.0 can be represented by the octet x68



 TOC 

4.4.3.  Compact: double octet

Doubles between -128.0 and 127.0 with no fractional component can be represented in two octets by casting the byte value to a double.

value = (double) b0



 TOC 

4.4.4.  Compact: double short

Doubles between -32768.0 and 32767.0 with no fractional component can be represented in three octets by casting the short value to a double.

value = (double) (256 * b1 + b0)



 TOC 

4.4.5.  Compact: double float

Doubles which are equivalent to their 32-bit float representation can be represented as the 4-octet float and then cast to double.



 TOC 

4.4.6.  Double Examples



x67          # 0.0
x68          # 1.0

x69 x00      # 0.0
x69 x80      # -128.0
x69 xff      # 127.0

x70 x00 x00  # 0.0
x70 x80 x00  # -32768.0
x70 xff xff  # 32767.0

D x40 x28 x80 x00 x00 x00 x00 x00  # 12.25
 Figure 9 



 TOC 

4.5.  int



Integer Grammar

int ::= 'I' b3 b2 b1 b0
    ::= [x80-xbf]
    ::= [xc0-xcf] b0
    ::= [xd0-xd7] b1 b0
 Figure 10 

A 32-bit signed integer. An integer is represented by the octet x49 ('I') followed by the 4 octets of the integer in big-endian order.

value = (b3 << 24) + (b2 << 16) + (b1 << 8) + b0;



 TOC 

4.5.1.  Compact: single octet integers

Integers between -16 and 47 can be encoded by a single octet in the range x80 to xbf.

value = code - 0x90



 TOC 

4.5.2.  Compact: two octet integers

Integers between -2048 and 2047 can be encoded in two octets with the leading byte in the range xc0 to xcf.

value = ((code - 0xc8) << 8) + b0;



 TOC 

4.5.3.  Compact: three octet integers

Integers between -262144 and 262143 can be encoded in three bytes with the leading byte in the range xd0 to xd7.

value = ((code - 0xd4) << 16) + (b1 << 8) + b0;



 TOC 

4.5.4.  Integer Examples



x90                # 0
x80                # -16
xbf                # 47

xc8 x00            # 0
xc0 x00            # -2048
xc7 x00            # -256
xcf xff            # 2047

xd4 x00 x00        # 0
xd0 x00 x00        # -262144
xd7 xff xff        # 262143

I x00 x00 x00 x00  # 0
I x00 x00 x01 x2c  # 300
 Figure 11 



 TOC 

4.6.  list



List Grammar

list ::= V type? length? value* z
     ::= v int int value*
 Figure 12 

An ordered list, like an array. All lists have a type string, a length, a list of values, and a trailing octet x7a ('z'). The type string may be an arbitrary UTF-8 string understood by the service. The length may be omitted to indicate that the list is variable length.

Each list item is added to the reference list to handle shared and circular elements. See the ref element.

Any parser expecting a list must also accept a null or a shared ref.

The valid values of type are not specified in this document and may depend on the specific application. For example, a server implemented in a language with static typing which exposes an Hessian interface can use the type information to instantiate the specific array type. On the other hand, a server written in a dynamicly-typed language would likely ignore the contents of type entirely and create a generic array.



 TOC 

4.6.1.  Compact: repeated list

Hessian 2.0 allows a compact form of the list for successive lists of the same type where the length is known beforehand. The type and length are encoded by integers, where the type is a reference to an earlier specified type.



 TOC 

4.6.2.  List examples



Serialization of a typed int array: int[] = {0, 1}

V
  t x00 x04 [int     # encoding of int[] type
  x6e x02            # length = 2
  x90                # integer 0
  x91                # integer 1
  z
 Figure 13 



Anonymous variable-length list = {0, "foobar"}

V
  t x00 x04 [int     # encoding of int[] type
  x6e x02            # length = 2
  x90                # integer 0
  x91                # integer 1
  z
 Figure 14 



Repeated list type

V
  t x00 x04 [int   # type for int[] (save as type #1)
  x63 x02          # length 2
  x90              # integer 0
  x91              # integer 1
  z

v
  x91              # type reference to int[] (integer #1)
  x92              # length 2
  x92              # integer 2
  x93              # integer 3
 Figure 15 



 TOC 

4.7.  long



Long Grammar

long ::= L b7 b6 b5 b4 b3 b2 b1 b0
     ::= [xd8-xef]
     ::= [xf0-xff] b0
     ::= [x38-x3f] b1 b0
     ::= x77 b3 b2 b1 b0
 Figure 16 

A 64-bit signed integer. An long is represented by the octet x4c ('L' ) followed by the 8-bytes of the integer in big-endian order.



 TOC 

4.7.1.  Compact: single octet longs

Longs between -8 and 15 are represented by a single octet in the range xd8 to xef.

value = (code - 0xe0)



 TOC 

4.7.2.  Compact: two octet longs

Longs between -2048 and 2047 are encoded in two octets with the leading byte in the range xf0 to xff.

value = ((code - 0xf8) << 8) + b0



 TOC 

4.7.3.  Compact: three octet longs

Longs between -262144 and 262143 are encoded in three octets with the leading byte in the range x38 to x3f.

value = ((code - 0x3c) << 16) + (b1 << 8) + b0



 TOC 

4.7.4.  Compact: four octet longs

Longs between which fit into 32-bits are encoded in five octets with the leading byte x77.

value = (b3 << 24) + (b2 << 16) + (b1 << 8) + b0



 TOC 

4.7.5.  Long Examples



xe0                  # 0
xd8                  # -8
xef                  # 15

xf8 x00              # 0
xf0 x00              # -2048
xf7 x00              # -256
xff xff              # 2047

x3c x00 x00          # 0
x38 x00 x00          # -262144
x3f xff xff          # 262143

x77 x00 x00 x00 x00  # 0
x77 x00 x00 x01 x2c  # 300

L x00 x00 x00 x00 x00 x00 x01 x2c  # 300
 Figure 17 



 TOC 

4.8.  map



Map Grammar

map        ::= M type? (value value)* z
 Figure 18 

Represents serialized maps and can represent objects. The type element describes the type of the map.

The type may be empty, i.e. a zero length. The parser is responsible for choosing a type if one is not specified. For objects, unrecognized keys will be ignored.

Each map is added to the reference list. Any time the parser expects a map, it must also be able to support a null or a ref.

The type is chosen by the service.



 TOC 

4.8.1.  Map examples



A sparse array

map = new HashMap();
map.put(new Integer(1), "fee");
map.put(new Integer(16), "fie");
map.put(new Integer(256), "foe");

---

M
  x91       # 1
  x03 fee   # "fee"

  xa0       # 16
  x03 fie   # "fie"

  xb9 x00   # 256
  x03 foe   # "foe"

  z
 Figure 19 



Map Representation of a Java Object

public class Car implements Serializable {
  String color = "aquamarine";
  String model = "Beetle";
  int mileage = 65536;
}

---
M
  t x00 x13 com.caucho.test.Car  # type

  x05 color                # color field
  x0a aquamarine

  x05 model                # model field
  x06 Beetle

  x07 mileage              # mileage field
  I x00 x01 x00 x00
  z
 Figure 20 



 TOC 

4.9.  null



Null Grammar

null ::= N
 Figure 21 

Null represents a null pointer.

The octet 'N' represents the null value.



 TOC 

4.10.  object



Object Grammar

object     ::= 'o' int value*

class-def  ::= 'O' type int string*
 Figure 22 



 TOC 

4.10.1.  Compact: class definition

Hessian 2.0 has a compact object form where the field names are only serialized once. Following objects only need to serialize their values.

The object definition includes a mandatory type string, the number of fields, and the field names. The object definition is stored in the object definition map and will be referenced by object instances with an integer reference.



 TOC 

4.10.2.  Compact: object instantiation

Hessian 2.0 has a compact object form where the field names are only serialized once. Following objects only need to serialize their values.

The object instantiation creates a new object based on a previous definition. The integer value refers to the object definition.



 TOC 

4.10.3.  Object examples



Object serialization

class Car {
  String color;
  String model;
}

out.writeObject(new Car("red", "corvette"));
out.writeObject(new Car("green", "civic"));

---

O                        # object definition (#0)
  t x00 x0b example.Car  # type is example.Car
  x92                    # two fields
  x05 color              # color field name
  x05 model              # model field name

o
  x90                    # object definition #0
  x03 red                # color field value
  x08 corvette           # model field value

o
  x90                    # object definition #0
  x05 green              # color field value
  x05 civic              # model field value
 Figure 23 



enum Color {
  RED,
  GREEN,
  BLUE,
}

out.writeObject(Color.RED);
out.writeObject(Color.GREEN);
out.writeObject(Color.BLUE);
out.writeObject(Color.GREEN);

---

O                         # object definition #0
  t x00 x0b example.Color # type is example.Color
  x91                     # one field
  x04 name                # enumeration field is "name"

o                         # object #0
  x90                     # object definition ref #0
  x03 RED                 # RED value

o                         # object #1
  x90                     # object definition ref #0
  x05 GREEN               # GREEN value

o                         # object #2
  x90                     # object definition ref #0
  x04 BLUE                # BLUE value

x4a x01                   # object ref #1, i.e. Color.GREEN
 Figure 24 



 TOC 

4.11.  ref



Ref Grammar

ref ::= R b3 b2 b1 b0
    ::= x4a b0
    ::= x4b b1 b0
 Figure 25 

An integer referring to a previous list, map, or object instance. As each list, map or object is read from the input stream, it is assigned the integer position in the stream, i.e. the first list or map is '0', the next is '1', etc. A later ref can then use the previous object. Writers MAY generate refs. Parsers MUST be able to recognize them.

ref can refer to incompletely-read items. For example, a circular linked-list will refer to the first link before the entire list has been read.

A possible implementation would add each map, list, and object to an array as it is read. The ref will return the corresponding value from the array. To support circular structures, the implementation would store the map, list or object immediately, before filling in the contents.

Each map or list is stored into an array as it is parsed. ref selects one of the stored objects. The first object is numbered '0'.



 TOC 

4.11.1.  Compact: two octet reference

References between 0 and 255 can be encoded by two octets

value = b0



 TOC 

4.11.2.  Compact: three octet reference

References between 0 and 255 can be encoded in three octets

value = (b1 << 8) + b0



 TOC 

4.11.3.  Ref Examples



Circular list

list = new LinkedList();
list.data = 1;
list.tail = list;

---
O
  x9a LinkedList
  x92
  x04 head
  x04 tail

o x90      # object stores ref #0
  x91      # data = 1
  x4b x00  # next field refers to itself, i.e. ref #0
 Figure 26 

ref only refers to list, map and objects elements. Strings and binary data, in particular, will only share references if they're wrapped in a list or map.



 TOC 

4.12.  string



String Grammar

string ::= s b1 b0 <utf8-data> string
       ::= S b1 b0 <utf8-data>
       ::= [x00-x1f] <utf8-data>
 Figure 27 

A 16-bit unicode character string encoded in UTF-8. Strings are encoded in chunks. x53 ('S') represents the final chunk and x73 ('s') represents any non-final chunk. Each chunk has a 16-bit length value.

The length is the number of characters, which may be different than the number of bytes.

String chunks may not split surrogate pairs.



 TOC 

4.12.1.  Compact: short strings

Strings with length less than 32 may be encoded with a single octet length [x00-x1f].

value = code



 TOC 

4.12.2.  String Examples



x00               # "", empty string
x05 hello         # "hello"
x01 xc3 x83       # "\u00c3"

S x00 x05 hello   # "hello" in long form

s x00 x07 hello,  # "hello, world" split into two chunks
  x05 world
 Figure 28 



 TOC 

4.13.  type



Type Grammar

type ::= 't' b1 b0 <type-string>
     ::= x4a b0
 Figure 29 

A map (map) or list (list) MAY include a type attribute indicating the type name of the map or list for object-oriented languages.

Each type is added to the type map (type reference) for future reference.



 TOC 

4.14.  Compact: type references

Repeated type strings MAY use the type map (type reference) to refer to a previously used type. The type reference is zero-based over all the types encountered during parsing.



 TOC 

5.  Reference Maps

Hessian 2.0 has 3 internal reference maps:

  1. An map/object/list reference map.
  2. An class definition map.
  3. A type (class name) map.

The value reference map lets Hessian support arbitrary graphs, and recursive and circular data structures.

The class and type maps improve Hessian efficiency by avoiding repetition of common string data.



 TOC 

5.1.  value reference

Hessian supports arbitrary graphs by adding list (list), object (object), and map (map) as it encounters them in the bytecode stream.

Parsers MUST store each list, object and map in the reference map as they are encountered.

The stored objects can be used with a ref (ref) bytecode.



 TOC 

5.2.  class reference

Each object definition (object) is automatically added to the class-map. Parsers MUST add a class definition to the class map as each is encountered. Following object instances will refer to the defined class.



 TOC 

5.3.  type reference

The type (type) strings for map (map) and list (list) values are stored in a type map for reference.

Parsers MUST add a type string to the type map as each is encountered.



 TOC 

6.  Bytecode map

Hessian is organized as a bytecode protocol. A Hessian reader is essentially a switch statement on the initial octet.



Bytecode Encoding

x00 - x1f    # utf-8 string length 0-32
x20 - x2f    # binary data length 0-16
x30 - x37    # reserved
x38 - x3f    # long from -x40000 to x3ffff
x40 - x41    # reserved
x42          # 8-bit binary data final chunk ('B')
x43          # reserved ('C' streaming call)
x44          # 64-bit IEEE encoded double ('D')
x45          # reserved ('E' envelope)
x46          # boolean false ('F')
x47          # reserved
x48          # reserved ('H' header)
x49          # 32-bit signed integer ('I')
x4a          # reference to 1-256th map/list
x4b          # reference to 1-65536th map/list
x4c          # 64-bit signed long integer ('L')
x4d          # map with optional type ('M')
x4e          # null ('N')
x4f          # object definition ('O')
x50          # reserved ('P' streaming message/post)
x51          # reserved
x52          # reference to map/list - integer ('R')
x53          # utf-8 string final chunk ('S')
x54          # boolean true ('T')
x55          # reserved
x56          # list/vector ('V')
x57 - x62    # reserved
x62          # 8-bit binary data non-final chunk ('b')
x63          # reserved ('c' call for RPC)
x64          # UTC time encoded as 64-bit long milliseconds since
             #  epoch ('d')
x65          # reserved
x66          # reserved ('f' for fault for RPC)
x67          # double 0.0
x68          # double 1.0
x69          # double represented as byte (-128.0 to 127.0)
x6a          # double represented as short (-32768.0 to 327676.0)
x6b          # double represented as float
x6c          # list/vector length ('l')
x6d          # reserved ('m' method for RPC call)
x6e          # list/vector compact length
x6f          # object instance ('o')
x70          # reserved ('p' - message/post)
x71          # reserved
x72          # reserved ('r' reply for message/RPC)
x73          # utf-8 string non-final chunk ('s')
x74          # map/list type ('t')
x75          # type-ref
x76          # compact vector ('v')
x77          # long encoded as 32-bit int
x78 - x79    # reserved
x7a          # list/map terminator ('z')
x7b - x7f    # reserved
x80 - xbf    # one-octet compact int (-x10 to x3f, x90 is 0)
xc0 - xcf    # two-octet compact int (-x800 to x3ff)
xd0 - xd7    # three-octet compact int (-x40000 to x3ffff)
xd8 - xef    # one-octet compact long (-x8 to x10, xe0 is 0)
xf0 - xff    # two-octet compact long (-x800 to x3ff, xf8 is 0)
 Figure 30 



 TOC 

Authors' Addresses

  Scott Ferguson
  Caucho Technology Inc.
  P.O. Box 9001
  La Jolla, CA 92038
  USA
Email:  ferg@caucho.com
  
  Emil Ong
  Caucho Technology Inc.
  P.O. Box 9001
  La Jolla, CA 92038
  USA
Email:  emil@caucho.com


 TOC 

Full Copyright Statement

Intellectual Property

Acknowledgment