|
resin-ee
$ee/ejb-ref/burlap-notes.xtp
December 31, 2001
As described in the Burlap draft spec,
we created Burlap to implement Enterprise Java Beans (EJB) using
an XML-based protocol with reasonable performance. Although many
RPC protocols already exist, including several based on XML, none met
our application's needs. The name "Burlap" was chosed for a simple
reason: it's boring. Unlike the exciting protocols defining "Internet 3.0",
SOAP and XML-RPC, Burlap is just boring text-based protocol to make
testing and debugging EJB a little bit easier.
Because we're an engineering-driven company and lack the resources
to effectively lobby for Burlap as the standard internet
services wire-protocol,
we have the opportunity to write these design notes from an
engineering perspective, devoid of marketing hype. Several of the
examples use SOAP and
XML-RPC to
contrast Burlap's design decisions, but as those protocols have
different goals, the contrasts should not be taken as criticisms
of the protocols. Most of the design of Burlap, after all, is merely a
modification of SOAP and XML-RPC for EJB's needs.
 |  |
|
The Burlap protocol was created to solve a specific problem: to
provide remote procedure calls for Java Enterprise Java Beans (EJB)
using an XML-based protocol without limiting the
protocol to Java servers and clients.
EJB is often used for distributed computing on a small scale. One
development team controls the design for both clients and servers.
Most often, the application uses a single subnet for all
communication. As long as the RPC protocol is fast and reliable, most
EJB developers don't care what protocol is used. "Small-scale" might
be misleading: a site like eBay using EJB internally would be small-scale
distributed computing, but it needs large-scale performance
and reliability.
- It must have sufficient power to support EJB.
- It must be simple so it can be effectively tested.
- It must be as fast as possible.
- It must only require Java introspection. It must not require external
IDL or schema definitions.
- It must use a subset of XML.
- It must allow EJB servers to deployed as a Servlet.
- It should support transaction contexts.
- It should allow non-Java clients to use EJB servers.
Fortunately, we know exactly how powerful Burlap needs to be: it needs to
serialize and deserialize arbitrary Java objects. In contrast, SOAP and
XML-RPC have the much tougher responsibility of an open-ended sufficiency
requirement. Although Burlap may be sufficiently powerful
for essentially any service, it's not a design goal.
Specific requirements:
- Serialize Java primitive types (boolean, int, long, double, String).
- Support nulls.
- Permit shared and circular data structures.
- Support objects, arrays, lists, and maps.
- Deserialize subclassed objects, including Object variables..
- Support remote object references (EJBHome and EJBObject references.)
Surprisingly, many of the existing protocols are insufficiently
powerful to handle these requirements. Even IIOP/CORBA needed protocol
additions to support full Java serialization.
XML-RPC is missing a number of the features needed to support
EJBs: 64-bit integer (Java long), nulls, shared data structures, subclassed
objects, and remote object references.
SOAP explicitly does not support remote object references because it
was perceived as too difficult. In other respects SOAP seems to be
sufficient. In fact, it defines a large number of types unusable for
EJB, including everying in the
XML Schema: sparse arrays,
enumerations, integer subranges, durations.
Maps, including Java Hashtables and HashMaps, are not explicitly
supported by other protocols, even though they are very common in
Java programming. All the protocols support special String to Object
maps, i.e. structures, but none support Object to Object maps. Of course,
it's always possible to serialize the Hashtable structure itself, as RMI
does, but that exposes the implementation details of the hashtable. So a
Hashtable would look very different from a HashMap serialization, making
it difficult for different languages to interoperate.
Testability is the key design goal of Burlap. At this early stage
of development, we have 43k lines in the EJB test suite, but only 28k
lines of code in the EJB implementation. Granted, the EJB
implementation uses Resin libraries heavily which weren't included
in the count, but it should give an idea of the work involved in testing.
The Burlap spec is as small as possible to reduce the
test suite size. For example, eliminating XML attributes, namespaces, and
XML Schema
radically reduces the test suite size and complexity without losing any power.
The EJB spec only requires CORBA/IIOP as a wire protocol, but
IIOP is a huge ungainly beast and is a binary protocol. Because
IIOP is huge, testing it is a large task and leaves many places where
bugs can spawn. By using Burlap as the primary wire protocol, we can
make it more reliable because it's more testable, and when we do implement
IIOP we can easily localize the bugs.
Since ambiguity and complexity make testing difficult, both have
been eliminated where possible in Burlap. Even simple ambiguity can make
testing more difficult. For example, XML-RPC allows either <int>
or <i4> to represent 32-bit integers. In theory, an implementation
could parse integers using different code for <int> and for <i4>,
so <int> might be carefully tested and <i4> spot-checked, but the
<i4> parse might be buggy. If a ServerA implementation
generally uses <int>, a fully conforming ClientB might use <i4> and
run into a bug undetected for months. So ClientB needs to use
<int> for ServerA. The XML-RPC case is trivial, but for more complicated
and verbose specs like IIOP/CORBA and SOAP, full testing may be impossible,
forcing bake-off testing so the mutually-incompatible implementations can
be forced to work together.
Ambiguity is not an academic problem. Every HTTP server
implementor needs to work around various non-conforming clients.
Clients interpret the HTTP cookie spec in many strange an
non-conformant ways. For example, one client needs spaces in a
"; secure" attribute, but another client can't deal with the spaces.
This is a real example. If browsers mess up the simple cookie spec, the
SOAP, XML Schema, SOAP Attachment, WSDL, and more! specs, are likely
to cause interoperability and testing problems, like CORBA/IIOP did.
Burlap tries to eliminate ambiguity so these problems never show up
in the first place.
The primary motivation for using a text-based protocol, like XML, is the
resulting simplicity of each test. The following is
one of Caucho's tests for integer serialization.
<title>burlap: server int</title>
<file file='file:/tmp/caucho/qa/WEB-INF/classes/test/MyBean.java'>
package test;
import com.caucho.burlap.*;
public class MyBean extends BurlapServlet {
public int add(int a, int b) { return a + b; }
}
</file>
<script out='stdout'>
http = new caucho.server.http.TestHttp(File("file:/tmp/caucho/qa"));
http.request(@<<END, out);
POST /servlet/test.MyBean HTTP/1.0
Content-Length: 120
<burlap:call>
<method>add</method>
<int>32000</int>
<int>-1000</int>
</burlap:call>
END
http.close();
</script>
<compare file='/stdout'>
HTTP/1.0 200 OK
Server: Resin/1.1
Content-Length: 60
Date: Fri, 08 May 1998 09:51:31 GMT
<burlap:reply><value><int>31000</int></value></burlap:reply>
</compare>
Testability is a true measure of the complexity of a spec. Writing a
sample server easy for any spec, but writing a
fully-conforming, tested servers that works with all the clients is a
much larger issue. Unfortunately, most spec writers are "white-paper
engineers" who don't even implement their design, much less test it.
Given the computer industry's reputation for buggy products, it just
seems prudent to design the spec to minimize testing and bugs.
This decision is obvious. Burlap doesn't need the additional
complexity with XML or namespaces or schema, so there's no
need to add them. It would just explode the test-suite, complicate the
parsers, and make conformance harder. And all that complexity gives
zero added power.
The XML-specs give no added power because Burlap is an existence
proof for the sufficiency of SML. Here's a rough sketch of a formal proof:
- It encodes Java primitives: <boolean>, <int>, <long>, <double>,
<string>, <null>, and is non-lossy for the additional byte,
short, char, float primitives.
- It encodes Java arrays (<list> with <type>[foo</type>)
- It encodes Java classes (serialization with <map>)
- It supports Java inheritance (<type> for <list> and
<map>).
- All Java objects are generated using a combination of the above.
(The additional tags <date>, <base64>, <remote>,
using <list> for List objects and <map> for Map objects are
not strictly necessary, but make the protocol Java-independent.)
Since SML is sufficient, any added XML "features" can add nothing to
the expressive power because there's nothing left to add. But adding
XML attributes and namespaces and have a huge cost in conformance, testing
and performance. It's a good idea to remember that Burlap will always be
machine-generated. Unlike HTML, no graphic designer will
ever write Burlap documents.
The trivial <null> tag has only one encoding in Burlap as follows.
<null></null>
Having only one option not only simplifies the tests and
interoperability, but it improves the parser performance. The
following is explicitly forbidden in Burlap, though allowed in XML:
<null/>
At first glance, the XML encoding appears more efficient, because
it saves 5 characters. In reality, that savings of network bandwidth
is in the noise performance-wise. When a typical page even over
a 56k modem can be 100k with additional gifs, 5 characters
doesn't matter. Burlap's use is for single subnets, usually
running at 100Mb or possibly 1Gb. The miniscule savings aren't worth
the complexity.
For Burlap's encoding it's easy to write an efficient parser, knowing
what kind of results are expected:
switch (is.readToken()) {
case NULL:
is.readEndTag("null");
break;
case STRING:
String v = is.readString();
is.readEndTag("string");
break;
...
}
With the XML short-tag "feature", the code would need to
return a different token for "null-with-trailing-slash", the test
suite would need to add cases and all servers and clients would
need to parse both (with possible buggy clients only understanding
one form). This example is relatively simple; the addition of
attributes and namespaces would increase the complexity.
XML has two main use patterns: a static syntax described by a DTD (document
type descriptor), and an extensible syntax using XML namespaces and
presumably defined by an XML schema. XML-RPC choose the DTD
and SOAP choose XML schema. Burlap choose the DTD.
A major advantage of the DTD choice is that any Burlap request or
response is verifiable without any external schema. The DTD defines
the entire grammar for Burlap. So a tester can use that grammar to validate
that a Burlap client or server is conformant. The grammar serves as
a testplan.
The schema route gives more flexibility without adding expressive power.
The same object might have two different serializations if the
XML schema differs. With Burlap or XML-RPC, the class has a
unique serialization. With SOAP the different representations leave open
the possibility of interoperability issues.
Burlap uses a URL to locate the server object. Usually, this will
be an HTTP url, but nothing in the Burlap protocol requires the
use of HTTP. Since URLs are sufficient and already familiar, it seems
obvious to use them.
An advantage of the particular encoding used for
EJB is that intelligent load balancers can use it to direct the
request to an owning server. It's not required that the URL for
Burlap servers follow the same conventions as the EJB encoding. In
the EJB encoding, the bean home interface is also a URL, easily
determined by removing the object identifier as the last name. This
makes getting meta information easier.
XML-RPC, in contrast, uses a combination of an HTTP URL and an
object identifier embedded in the request. The object identifier is
part of the <methodName> item.
<?xml version="1.0"?>
<methodCall>
<methodName>examples.getStateName</methodName>
<params>
<param>
<value><i4>41</i4></value>
</param>
</params>
</methodCall>
SOAP also uses the URL to identify the target object. The primary
model for SOAP appears to be more service-based rather than
object-based. In other words, it appears that most SOAP servers will
only have a single URL and not have separate URLs for object
instances. In constrast, Burlap heavily uses sub-URLs (path-info) to
identify object instances. Nothing in the SOAP protocol precludes
using sub-URLs, so this isn't a limitation of the SOAP spec. It just
appears to be counter to the culture of SOAP to use lots of URLs. As
the SOAP spec says, objects-by-reference are not part of
the SOAP specification.
Since EJB requires method overloading, Burlap supports it. Burlap
implements method overloading with "method name mangling". The type
of the method arguments becomes part of the method name. To support
simpler clients, like script-based clients, one of the methods will
also typically respond to an unadorned method.
<burlap:call>
<method>add_int_int</method>
<int>2</int>
<int>3</int>
</burlap:call>
Because Burlap's encoding is lossy, some overloaded methods are
indistinguishable. For example add(int) and add(short)
are both encoded as integer argument. So only the integer version
will be callable. The main reason for the lossy limitation is to
make Burlap less Java-dependent. int, long, and
double are supported by most languages. We expect this won't
be a large limitation. Few objects have methods with aliased methods.
XML-RPC doesn't specifically address this issue. Since its
method selection is very similar to Burlap's, name mangling would
be easy to implement in XML-RPC.
SOAP's method selection is very different. Each method has its
own XML element and namespace, presumably defined in some external
schema. It could also support overloading with mangled element
names. SOAP adds additional complexity to the method call with the
addition of a namespace. It's not clear what value this adds, other
than additional complications, more tests, and interoperability
issues.
Since a service/remote-object has a single set of methods,
selecting the object implicitly selects the method set or object
signature. In theory, namespaces would allow a client to choose a
different object signature, but it's not clear what application would
need that. Java and EJB doesn't need that complexity.
...
<m:add xmlns:m="http://www.caucho.com/soap/calculator">
<a>2</a>
<b>3</b>
</m:add>
...
Burlap method parameters are order-based like Java and essentially
every major programming language. The number of parameters is fixed
for each method. Varying-length parameters and extra parameters are
forbidden. It's easily tested, unambiguous, and it's easy to write a fast
parsers. The choice seems obvious and hardly worth discussion, but
SOAP dismissed position-based arguments.
SOAP's method arguments are named by scheme-defined elements and
can appear in any order:
...
<m:myMethodCall xmlns:m="my-namespace">
<z>13</z>
<a>4</a>
<m>13</m>
</m:myMethodCall>
For an EJB-implementation, the SOAP method arguments creates
several problems. What name should the first argument be given? Does
a client need an external schema or IDL to map the SOAP argument to
the function call? How intelligent does a client or server have to be
to handle all the possible argument name assignments? And how can all
this be tested? The SOAP designers must have had some reason for this
choice, but it directly opposes the needs of an EJB server and client.
Although most of Burlap's design avoids Java-specific requirements,
solving the inheritance problem requires the deserializer know what
Java class to create. Burlap's solution makes Java implementations
easy, but doesn't significantly impede other languages and is
no less language-independent than SOAP or XML-RPC.
Since Java is an object-oriented language, the type of a value may not
equal the declared type. For example, an Object field might contain a
Car object or a Truck object. To serialize the Car or Truck, Burlap
needs to add the object type in the protocol. XML-RPC, in contrast,
has no accompanying type information, so a Java XML-RPC implementation
can't to create a Car from information in the protocol. The Java serializer
also need to know if an array will be a Java array, Vector, or
ArrayList. That's done with the <type> tag in <list> and <map>.
<map>
<type>com.caucho.example.Car</type>
<string>color</string><string>red</string>
</map>
This is a thorny problem. The serializer must know the Java type,
but we'd prefer to keep Java-specific information out of the
protocol. Burlap's solution allows non-Java clients with a
little extra work, and makes Java implementations easy.
The heart of the problem how to map a type key in the
protocol to a language-specific type. That map might appear in a
repository, in some external schema like IDL, or it can be encoded
in the protocol itself for a specific language. Some simple clients
may not need this mapping. A simple client in Perl, JavaScript or
Java client might use generic types, like ArrayList and Hashtables to
read the serialized values. When they write request to a Java server,
they'll need to add the type, but protocol writing is much easier than
protocol parsing.
It's not clear whether SOAP uses an external schema or a
repository service (WSDL) to
map to language types, but it's clear something external is required.
The following is a typical SOAP serialization of the Car:
<e:Car xmlns:e="http://www.caucho.com/soap/cars.xml">
<color>red</color>
</e:Car>
How do you map from "e:Car" with namespace
"http://www.caucho.com/soap/cars" to the Java class
com.caucho.example.Car? Since that's not part of the SOAP spec, there
must be an additional spec defining that mapping. Since our
requirements required a testable implementation (repository is extra
work), no external IDL (so schema is out), the technique used by SOAP
was not appropriate for Burlap.
Burlap doesn't preclude clients in other languages from using
their own types. It would just require the addition of a repository
service or an external IDL. In other words, it's no more complicated
than the type lookup SOAP requires for every language. So Burlap
makes Java implementation easy, but doesn't make other languages hard.
When a Python client first encountered a "com.caucho.example.Car", it could
either use "Car" as a Python type, or query some yet-to-be-defined
service mapping to Python. In other words, it's a trade-off, but it's
unavoidable and we believe the solution is fair to other languages.
Shared references are an integral vital for serializing any
significant data structure. Serializing a tree, for example, needs to
link a node's children and the parent in a circular set of
references. Shared references are one of the necessary capabilities
missing from XML-RPC that Burlap adds.
Burlap implements shared references with an implicit array of all
<list> and <map> objects. Linking to an old object can then
just refer to the object's position in the array. An advantage of
Burlap's approach is that it only requires a single pass through the
object.
<map>
<type>com.muggle.Car</type>
<string>model</string>
<string>Ford Anglia</string>
</map>
...
<map>
<type>com.muggle.Person</type>
<string>name</string>
<string>Arthur Weasley</string>
<string>car</string>
<ref>1</ref>
</e:Person>
SOAP takes a different approach for shared references. Declaring an object
uses an id attribute and referring to the object uses a
href attribute:
<e:Car id="1">
<e:Model>Ford Anglia</e:Model>
</e:Car>
...
<e:Person>
<e:Name>Arthur Weasley</e:Name>
<e:Car href="1"/>
</e:Person>
The SOAP approach has the advantage that the receiving server
doesn't need to keep track of every object it's received. In
addition, it can refer to any object, including strings. With Burlap,
only <map> and <list> can share references, strings and byte
array can't be shared. As a disadvantage, SOAP requires two passes
over the object before sending it and SOAP's approach requires the
use of XML attributes.
The Burlap design aimed at reducing the testing and implementation
complexity, and should give decent performance. Because Burlap is a
wire protocol, EJB users don't care about the protocol details and we
could tailor Burlap to the specific requirements needed to support
EJB.
The small size of the Burlap specification should not be confused
with its expressive power. Burlap is fully capable of handling
Java/EJB calls. The added complexity and "flexibility" of a
spec like SOAP just introduces ambiguity and adds to the testing
requirements without adding expressive power.
Copyright (c) 1998-2009 Caucho Technology, Inc. All rights reserved. caucho® ,
resin® and
quercus®
are registered trademarks of Caucho Technology, Inc.
|
Copyright (c) 1998-2009 Caucho Technology, Inc. All rights reserved. caucho® ,
resin® and
quercus®
are registered trademarks of Caucho Technology, Inc.
|