Note: The left side frame may take a few seconds (< 30) to come up. Please be patient.
This web page presents a simple tool to compare interprocess communication overhead of a variety of distributed computing technologies. These tests are conducted as follow. For all tests there are 2 processes (as in the Unix sense) involved. Roughly speaking one of these is a client and the other is a server. A client initiates a request to transmit a number of bytes to the server. The call is synchronous and the client waits for the request to be completed. The server does not perform any processing but merely returns which completes the request as this reply winds back to the client process. The duration of this round-trip invocation is measured using gettimeofday(). For Java applications the gettimeofday() is called via a JNI call since the standard Java clock routine does not have the resolution of gettimeofday() (at least thru Java 1.3+ version). The overhead of calling gettimeofday() is approximately 0.5usec for C/C++ programs and is approximately 0.66 usec for Java programs using JNI. The round trip times are recorded in a histogram. The cost of this record operation adds an additional overhead of approximately 0.25 usec for each sample. These overheads are relatively small compared to intervals we are measuring. (A possible exceptions are some shared memory tests where we have round trip latency values in the 5-6 usec range. For these values an overhead of ~1 usec starts to be large enough to worry somewhat. However in most cases we are measuring intervals that are in 10's, if not 100s, of usec in these cases the measurement and recording overheads are negligible.)
To the extent possible we have tried to stick to our central notion - measuring average and worst case latencies for transporting n bytes from process A to process B. This is so even when we have tested database technologies. In such tests "n" bytes are first sent from process A to a database. After this, process B is notified (often via a byte sent down a socket to wake it up) that it should go read "n" bytes from the database. Upon completion of this read process B singles back to process A that the read is done, which completes one cycle of sending information from process A to process B. Natarurally the use of 1 byte socket write from process A to process B to wake it up adds some etxra overhead to this measurement. However, measurements show that this overhead is less than 5% of the overall time measured. Thus, while not perfect, we are able to use this technique to quantify latency of exchange of "n" bytes between processes via a datastore.
We use three categories of platforms to do our tests.
To repeat, where the term "two_hosts" is used it signifies that some network communications, 10 Mbps, 100 Mbps or 1000 Mbps ethernet, or some other network technology, is involved. Where "one_host" is used it means that both the sending and the receiving processes are on the same host. This host may be a uniprocessor or it may be a SMP, but it is basically under one kernel.
Some of our tests are conducted using University of Utah's Emulab computers.
For synchronous tests, in which client makes a request and waits on the server, the uniprocessor architectures are expected to behave somewhat similar to the SMP architecture. A major difference may be that each processor in SMP would have its own cache and this could lead to a significant benefit.
These two test configurations are shown in the following figures.
Under each hardware setup there are multiple categories of tests. These include various network transports (only TCP for now, but SCTP and others to follow), a variety of CORBA ORBs, some CORBA Components implementation measurements, RMI, and some EJB tests.
Most of the ORBs are tested with 3 different types of "in" argument in the method call.
Some results have the word "_opt" or "_default" tagged to them. This signifies if a svc.conf file providiing various optimizations was applied. This is esp. true for many of our TAO tests. "Default" means no svc.conf file was used.
As number of results we have available has grown we have added a "filter" capability that can be used to obtain a subset of results that are available. This can often be of considerable use. Please follow the "filter" link near the top of the left frame for an explanation of this feature.
The above procedure will generate a graph that shows curves of "mean" values of roundtrip latencies. It is often useful to see the range of observations. By this we mean the (min, mean, max) values shown as an errorbar. (We use gnuplot to generate these plots.) In order to see the ranges please select the "Range (< 5)" check box near the top. The "(< 5" means that you can only see range values for 1, 2, 3 or 4 different curves. Beyond this the graph becomes very cluttered. (This restriction may be relaxed in future.)
Besides the "mean" value it is sometimes useful to plot just either the minimum or the maximum values for a test. Checkboxes are provided near the top of the left hand panel to make these selections.
By default the CGI script will permit gnuplot to auto select the Y-axis range. However the ymin and ymax values can be manually set at the bottom of the left frame (the frame showing all the check boxes for the possible selections.) If you wish to specify only one of ymin or ymax you may enter a "*" which will permit gnuplot to autoselect that value. For example, ymin=* and ymax=3000 will limit ymax to 3000 but will permit gnuplot to autoselect ymin value based on the values being graphed.
After you hit the "plot" button you should see a graphic on the right hand side panel. This is a .png file generated by gnuplot. Plots are ordered from low latency values to high latency values. Each of the individual graphs are labeled. In order to keep label strings as compact as possible a bit of processing is done and common elements of the graph labels are factored out and shown in the overall title at the top of the graphic. For example, if you plot the graphs:
smp/orb/mico/2.3.6 and smp/orb/tao/1.2
the common text in title will be
smp/orb/./.where the "." means uncommon elements. The titles of the two curves shown will be "mico/2.3.6" and "tao/1.2". Non-leading "." elements in the legend for each curve are shown and they correspond to common element at the relative position in the title.
Besides making the legend text more compact this processing has the added benefit of showing what is common for all the data shown.
Some of the interprocess latency tests are conducted with varying types of load on the comptuers involved in the test. Basically there can be 4 test conditions.
We show below some of the pre-generated graphs with minor discussion. The purpose of this tool, we hope, is to permit the reader to generate graphs making comparisons that interest him or her.
As it turns out intra container invocations in JBOSS (at least in 2.4.x series) merely transfer the byte array by reference. Thus the roundtrip latency costs remain flat for all sizes of the byte array (which are plotted along the x axis.) This "by reference" semantics are obviously different from "copy" semantics that are used for parameter passing between a client and a bean when they are in two separate processes.
In order to somewhat duplicate the "copy" semantics we also show the results for intra container invocations where the receiving bean makes a copy of the byte array being sent to it.
We see that this curve, shown in blue, essentially tracks the JBOSS client to bean (two process) curve which is shown in grown color.
We also show the shared memory communication curve which forms the bound that intra container with copying approaches.