resin-ee

$ee/ejb-ref/burlap-notes.xtp

December 31, 2001

As described in the Burlap draft spec, we created Burlap to implement Enterprise Java Beans (EJB) using an XML-based protocol with reasonable performance. Although many RPC protocols already exist, including several based on XML, none met our application's needs. The name "Burlap" was chosed for a simple reason: it's boring. Unlike the exciting protocols defining "Internet 3.0", SOAP and XML-RPC, Burlap is just boring text-based protocol to make testing and debugging EJB a little bit easier.

Because we're an engineering-driven company and lack the resources to effectively lobby for Burlap as the standard internet services wire-protocol, we have the opportunity to write these design notes from an engineering perspective, devoid of marketing hype. Several of the examples use SOAP and XML-RPC to contrast Burlap's design decisions, but as those protocols have different goals, the contrasts should not be taken as criticisms of the protocols. Most of the design of Burlap, after all, is merely a modification of SOAP and XML-RPC for EJB's needs.

The Burlap protocol was created to solve a specific problem: to provide remote procedure calls for Java Enterprise Java Beans (EJB) using an XML-based protocol without limiting the protocol to Java servers and clients.

EJB is often used for distributed computing on a small scale. One development team controls the design for both clients and servers. Most often, the application uses a single subnet for all communication. As long as the RPC protocol is fast and reliable, most EJB developers don't care what protocol is used. "Small-scale" might be misleading: a site like eBay using EJB internally would be small-scale distributed computing, but it needs large-scale performance and reliability.

It must have sufficient power to support EJB.
It must be simple so it can be effectively tested.
It must be as fast as possible.
It must only require Java introspection. It must not require external IDL or schema definitions.
It must use a subset of XML.
It must allow EJB servers to deployed as a Servlet.
It should support transaction contexts.
It should allow non-Java clients to use EJB servers.

Fortunately, we know exactly how powerful Burlap needs to be: it needs to serialize and deserialize arbitrary Java objects. In contrast, SOAP and XML-RPC have the much tougher responsibility of an open-ended sufficiency requirement. Although Burlap may be sufficiently powerful for essentially any service, it's not a design goal.

Specific requirements:

Serialize Java primitive types (boolean, int, long, double, String).
Support nulls.
Permit shared and circular data structures.
Support objects, arrays, lists, and maps.
Deserialize subclassed objects, including Object variables..
Support remote object references (EJBHome and EJBObject references.)

Surprisingly, many of the existing protocols are insufficiently powerful to handle these requirements. Even IIOP/CORBA needed protocol additions to support full Java serialization.

XML-RPC is missing a number of the features needed to support EJBs: 64-bit integer (Java long), nulls, shared data structures, subclassed objects, and remote object references.

SOAP explicitly does not support remote object references because it was perceived as too difficult. In other respects SOAP seems to be sufficient. In fact, it defines a large number of types unusable for EJB, including everying in the XML Schema: sparse arrays, enumerations, integer subranges, durations.

Maps, including Java Hashtables and HashMaps, are not explicitly supported by other protocols, even though they are very common in Java programming. All the protocols support special String to Object maps, i.e. structures, but none support Object to Object maps. Of course, it's always possible to serialize the Hashtable structure itself, as RMI does, but that exposes the implementation details of the hashtable. So a Hashtable would look very different from a HashMap serialization, making it difficult for different languages to interoperate.

Testability is the key design goal of Burlap. At this early stage of development, we have 43k lines in the EJB test suite, but only 28k lines of code in the EJB implementation. Granted, the EJB implementation uses Resin libraries heavily which weren't included in the count, but it should give an idea of the work involved in testing.

The Burlap spec is as small as possible to reduce the test suite size. For example, eliminating XML attributes, namespaces, and XML Schema radically reduces the test suite size and complexity without losing any power. The EJB spec only requires CORBA/IIOP as a wire protocol, but IIOP is a huge ungainly beast and is a binary protocol. Because IIOP is huge, testing it is a large task and leaves many places where bugs can spawn. By using Burlap as the primary wire protocol, we can make it more reliable because it's more testable, and when we do implement IIOP we can easily localize the bugs.

Since ambiguity and complexity make testing difficult, both have been eliminated where possible in Burlap. Even simple ambiguity can make testing more difficult. For example, XML-RPC allows either <int> or <i4> to represent 32-bit integers. In theory, an implementation could parse integers using different code for <int> and for <i4>, so <int> might be carefully tested and <i4> spot-checked, but the <i4> parse might be buggy. If a ServerA implementation generally uses <int>, a fully conforming ClientB might use <i4> and run into a bug undetected for months. So ClientB needs to use <int> for ServerA. The XML-RPC case is trivial, but for more complicated and verbose specs like IIOP/CORBA and SOAP, full testing may be impossible, forcing bake-off testing so the mutually-incompatible implementations can be forced to work together.

Ambiguity is not an academic problem. Every HTTP server implementor needs to work around various non-conforming clients. Clients interpret the HTTP cookie spec in many strange an non-conformant ways. For example, one client needs spaces in a "; secure" attribute, but another client can't deal with the spaces. This is a real example. If browsers mess up the simple cookie spec, the SOAP, XML Schema, SOAP Attachment, WSDL, and more! specs, are likely to cause interoperability and testing problems, like CORBA/IIOP did. Burlap tries to eliminate ambiguity so these problems never show up in the first place.

The primary motivation for using a text-based protocol, like XML, is the resulting simplicity of each test. The following is one of Caucho's tests for integer serialization.

<title>burlap: server int</title> <file file='file:/tmp/caucho/qa/WEB-INF/classes/test/MyBean.java'> package test; import com.caucho.burlap.*; public class MyBean extends BurlapServlet { public int add(int a, int b) { return a + b; } } </file> <script out='stdout'> http = new caucho.server.http.TestHttp(File("file:/tmp/caucho/qa")); http.request(@<<END, out); POST /servlet/test.MyBean HTTP/1.0 Content-Length: 120 <burlap:call> <method>add</method> <int>32000</int> <int>-1000</int> </burlap:call> END http.close(); </script> <compare file='/stdout'> HTTP/1.0 200 OK Server: Resin/1.1 Content-Length: 60 Date: Fri, 08 May 1998 09:51:31 GMT <burlap:reply><value><int>31000</int></value></burlap:reply> </compare>

Testability is a true measure of the complexity of a spec. Writing a sample server easy for any spec, but writing a fully-conforming, tested servers that works with all the clients is a much larger issue. Unfortunately, most spec writers are "white-paper engineers" who don't even implement their design, much less test it. Given the computer industry's reputation for buggy products, it just seems prudent to design the spec to minimize testing and bugs.

This decision is obvious. Burlap doesn't need the additional complexity with XML or namespaces or schema, so there's no need to add them. It would just explode the test-suite, complicate the parsers, and make conformance harder. And all that complexity gives zero added power.

The XML-specs give no added power because Burlap is an existence proof for the sufficiency of SML. Here's a rough sketch of a formal proof:

It encodes Java primitives: <boolean>, <int>, <long>, <double>, <string>, <null>, and is non-lossy for the additional byte, short, char, float primitives.
It encodes Java arrays (<list> with <type>[foo</type>)
It encodes Java classes (serialization with <map>)
It supports Java inheritance (<type> for <list> and <map>).
All Java objects are generated using a combination of the above.

(The additional tags <date>, <base64>, <remote>, using <list> for List objects and <map> for Map objects are not strictly necessary, but make the protocol Java-independent.)

Since SML is sufficient, any added XML "features" can add nothing to the expressive power because there's nothing left to add. But adding XML attributes and namespaces and have a huge cost in conformance, testing and performance. It's a good idea to remember that Burlap will always be machine-generated. Unlike HTML, no graphic designer will ever write Burlap documents.

The trivial <null> tag has only one encoding in Burlap as follows.

Having only one option not only simplifies the tests and interoperability, but it improves the parser performance. The following is explicitly forbidden in Burlap, though allowed in XML:

<null/>

At first glance, the XML encoding appears more efficient, because it saves 5 characters. In reality, that savings of network bandwidth is in the noise performance-wise. When a typical page even over a 56k modem can be 100k with additional gifs, 5 characters doesn't matter. Burlap's use is for single subnets, usually running at 100Mb or possibly 1Gb. The miniscule savings aren't worth the complexity.

For Burlap's encoding it's easy to write an efficient parser, knowing what kind of results are expected:

switch (is.readToken()) { case NULL: is.readEndTag("null"); break; case STRING: String v = is.readString(); is.readEndTag("string"); break; ... }

With the XML short-tag "feature", the code would need to return a different token for "null-with-trailing-slash", the test suite would need to add cases and all servers and clients would need to parse both (with possible buggy clients only understanding one form). This example is relatively simple; the addition of attributes and namespaces would increase the complexity.

XML has two main use patterns: a static syntax described by a DTD (document type descriptor), and an extensible syntax using XML namespaces and presumably defined by an XML schema. XML-RPC choose the DTD and SOAP choose XML schema. Burlap choose the DTD.

A major advantage of the DTD choice is that any Burlap request or response is verifiable without any external schema. The DTD defines the entire grammar for Burlap. So a tester can use that grammar to validate that a Burlap client or server is conformant. The grammar serves as a testplan.

The schema route gives more flexibility without adding expressive power. The same object might have two different serializations if the XML schema differs. With Burlap or XML-RPC, the class has a unique serialization. With SOAP the different representations leave open the possibility of interoperability issues.

Burlap uses a URL to locate the server object. Usually, this will be an HTTP url, but nothing in the Burlap protocol requires the use of HTTP. Since URLs are sufficient and already familiar, it seems obvious to use them.

An advantage of the particular encoding used for EJB is that intelligent load balancers can use it to direct the request to an owning server. It's not required that the URL for Burlap servers follow the same conventions as the EJB encoding. In the EJB encoding, the bean home interface is also a URL, easily determined by removing the object identifier as the last name. This makes getting meta information easier.

XML-RPC, in contrast, uses a combination of an HTTP URL and an object identifier embedded in the request. The object identifier is part of the <methodName> item.

<?xml version="1.0"?> <methodCall> <methodName>examples.getStateName</methodName> <params> <param> <value><i4>41</i4></value> </param> </params> </methodCall>

SOAP also uses the URL to identify the target object. The primary model for SOAP appears to be more service-based rather than object-based. In other words, it appears that most SOAP servers will only have a single URL and not have separate URLs for object instances. In constrast, Burlap heavily uses sub-URLs (path-info) to identify object instances. Nothing in the SOAP protocol precludes using sub-URLs, so this isn't a limitation of the SOAP spec. It just appears to be counter to the culture of SOAP to use lots of URLs. As the SOAP spec says, objects-by-reference are not part of the SOAP specification.

Since EJB requires method overloading, Burlap supports it. Burlap implements method overloading with "method name mangling". The type of the method arguments becomes part of the method name. To support simpler clients, like script-based clients, one of the methods will also typically respond to an unadorned method.

<burlap:call> <method>add_int_int</method> <int>2</int> <int>3</int> </burlap:call>

Because Burlap's encoding is lossy, some overloaded methods are indistinguishable. For example add(int) and add(short) are both encoded as integer argument. So only the integer version will be callable. The main reason for the lossy limitation is to make Burlap less Java-dependent. int, long, and double are supported by most languages. We expect this won't be a large limitation. Few objects have methods with aliased methods.

XML-RPC doesn't specifically address this issue. Since its method selection is very similar to Burlap's, name mangling would be easy to implement in XML-RPC.

SOAP's method selection is very different. Each method has its own XML element and namespace, presumably defined in some external schema. It could also support overloading with mangled element names. SOAP adds additional complexity to the method call with the addition of a namespace. It's not clear what value this adds, other than additional complications, more tests, and interoperability issues.

Since a service/remote-object has a single set of methods, selecting the object implicitly selects the method set or object signature. In theory, namespaces would allow a client to choose a different object signature, but it's not clear what application would need that. Java and EJB doesn't need that complexity.

... <m:add xmlns:m="http://www.caucho.com/soap/calculator"> <a>2</a> <b>3</b> </m:add> ...

Burlap method parameters are order-based like Java and essentially every major programming language. The number of parameters is fixed for each method. Varying-length parameters and extra parameters are forbidden. It's easily tested, unambiguous, and it's easy to write a fast parsers. The choice seems obvious and hardly worth discussion, but SOAP dismissed position-based arguments.

SOAP's method arguments are named by scheme-defined elements and can appear in any order:

... <m:myMethodCall xmlns:m="my-namespace"> <z>13</z> <a>4</a> <m>13</m> </m:myMethodCall>

For an EJB-implementation, the SOAP method arguments creates several problems. What name should the first argument be given? Does a client need an external schema or IDL to map the SOAP argument to the function call? How intelligent does a client or server have to be to handle all the possible argument name assignments? And how can all this be tested? The SOAP designers must have had some reason for this choice, but it directly opposes the needs of an EJB server and client.

Although most of Burlap's design avoids Java-specific requirements, solving the inheritance problem requires the deserializer know what Java class to create. Burlap's solution makes Java implementations easy, but doesn't significantly impede other languages and is no less language-independent than SOAP or XML-RPC.

Since Java is an object-oriented language, the type of a value may not equal the declared type. For example, an Object field might contain a Car object or a Truck object. To serialize the Car or Truck, Burlap needs to add the object type in the protocol. XML-RPC, in contrast, has no accompanying type information, so a Java XML-RPC implementation can't to create a Car from information in the protocol. The Java serializer also need to know if an array will be a Java array, Vector, or ArrayList. That's done with the <type> tag in <list> and <map>.

<map> <type>com.caucho.example.Car</type> <string>color</string><string>red</string> </map>

This is a thorny problem. The serializer must know the Java type, but we'd prefer to keep Java-specific information out of the protocol. Burlap's solution allows non-Java clients with a little extra work, and makes Java implementations easy.

The heart of the problem how to map a type key in the protocol to a language-specific type. That map might appear in a repository, in some external schema like IDL, or it can be encoded in the protocol itself for a specific language. Some simple clients may not need this mapping. A simple client in Perl, JavaScript or Java client might use generic types, like ArrayList and Hashtables to read the serialized values. When they write request to a Java server, they'll need to add the type, but protocol writing is much easier than protocol parsing.

It's not clear whether SOAP uses an external schema or a repository service (WSDL) to map to language types, but it's clear something external is required. The following is a typical SOAP serialization of the Car:

<e:Car xmlns:e="http://www.caucho.com/soap/cars.xml"> <color>red</color> </e:Car>

How do you map from "e:Car" with namespace "http://www.caucho.com/soap/cars" to the Java class com.caucho.example.Car? Since that's not part of the SOAP spec, there must be an additional spec defining that mapping. Since our requirements required a testable implementation (repository is extra work), no external IDL (so schema is out), the technique used by SOAP was not appropriate for Burlap.

Burlap doesn't preclude clients in other languages from using their own types. It would just require the addition of a repository service or an external IDL. In other words, it's no more complicated than the type lookup SOAP requires for every language. So Burlap makes Java implementation easy, but doesn't make other languages hard. When a Python client first encountered a "com.caucho.example.Car", it could either use "Car" as a Python type, or query some yet-to-be-defined service mapping to Python. In other words, it's a trade-off, but it's unavoidable and we believe the solution is fair to other languages.

Shared references are an integral vital for serializing any significant data structure. Serializing a tree, for example, needs to link a node's children and the parent in a circular set of references. Shared references are one of the necessary capabilities missing from XML-RPC that Burlap adds.

Burlap implements shared references with an implicit array of all <list> and <map> objects. Linking to an old object can then just refer to the object's position in the array. An advantage of Burlap's approach is that it only requires a single pass through the object.

<map> <type>com.muggle.Car</type> <string>model</string> <string>Ford Anglia</string> </map> ... <map> <type>com.muggle.Person</type> <string>name</string> <string>Arthur Weasley</string> <string>car</string> <ref>1</ref> </e:Person>

SOAP takes a different approach for shared references. Declaring an object uses an id attribute and referring to the object uses a href attribute:

<e:Car id="1"> <e:Model>Ford Anglia</e:Model> </e:Car> ... <e:Person> <e:Name>Arthur Weasley</e:Name> <e:Car href="1"/> </e:Person>

The SOAP approach has the advantage that the receiving server doesn't need to keep track of every object it's received. In addition, it can refer to any object, including strings. With Burlap, only <map> and <list> can share references, strings and byte array can't be shared. As a disadvantage, SOAP requires two passes over the object before sending it and SOAP's approach requires the use of XML attributes.

The Burlap design aimed at reducing the testing and implementation complexity, and should give decent performance. Because Burlap is a wire protocol, EJB users don't care about the protocol details and we could tailor Burlap to the specific requirements needed to support EJB.

The small size of the Burlap specification should not be confused with its expressive power. Burlap is fully capable of handling Java/EJB calls. The added complexity and "flexibility" of a spec like SOAP just introduces ambiguity and adds to the testing requirements without adding expressive power.