Thick Servers
Halliburton Systems Inc - THEOS Networking Series

We've seen what a "thin" server looks like; now what is "thick" about a thick server?

Generally speaking, a thin server is indistinguishable to the application programmer from normal operating system function. A thick server is an attempt to move some of the logic you might be used to seeing in the application into the server. As an example, suppose you have a customer file that uses a customer number as the key and the company name as one of the data fields. In your application, you want to search for all companies that have "SMITH" anywhere in the company name field. A normal THEOS application would use readn() to read all the records in the file and then look at each record to see if it has a "SMITH" anywhere.

The drawback to this approach when using network-connected database servers is that there is a lot of unnecessary network traffic associated with transmitting all the records that *don't* have a "SMITH" in the name field. On our network, the best round-trip time for reading a single record we have seen is 0.7ms; figure in a fudge factor and call it 1ms per I/O. This means that the best performance we could expect on the search is a sustained rate of about 1000 records per second. Various real-life factors conspire to keep the actual performance below this. On our single-machine applications, we would expect to see performance on the scale of 4000 records/second or more, so this is a serious difference. In a multi-user case the network congestion can get so bad that total I/O bandwidth is much less than the 1000 records/second level. On a network with complicated routing, WAN bridges, slower PPP links, and other such bottlenecks, performance could easily drop below 100/second.

Suppose we make the data server a little "thicker" so that all we need to do is send it a request that says "return the (next) customer record that has 'SMITH' in the name field". The server can read records at the higher rate and return over the network only records that are of interest to the application. The performance improvement from the network standpoint is considerable.

So what's the catch here? The answer is that the server becomes very complicated very quickly. In effect, we could move the entire application to the server logic and just use the client machine as a dumb terminal. Sound familiar? it should...this is the difference between THEOS and Netware that we have all been selling and expoiting for years.

In an attempt to strike a compromise position between the thinnest and thickest servers, database systems like SQL define a language for specifying requests from the client to the server. This looks almost like a programming language, so the request packets are actually text versions of little programs. The resulting flexibility allows almost any type of request to be specified. The drawback is that we now have yet another language interpreter in action, with the usual reduced efficiency. The database design is pretty much fixed by the server design, with "tables" the current fashion. A "table" is very much like a THEOS keyed file (not ISAM since they are usually not ordered), so access to individual records is fairly efficient. The "rows" are THEOS records, and the "columns" are variable names or fields in the record layout.

The internals of an SQL server look a lot like a THEOS application in terms of the record-level I/O that is done. To satisfy the request for the "SMITH" records, the server would have to read all the customer records and return the ones that matched. In terms of performance over the network, an SQL-type server will usually beat a "thin" server on complicated requests; it will in turn be beaten badly by a well-designed "very thick" server that knows exactly how to satisfy a particular request. The main failure of the SQL design is that there is usually no way to take advantage of special situations known to the programmer; there is also a limit to the complexity of requests. If we wanted only records with "SMITH" that also were in New Jersey, we might have to read all the "SMITH" records and look for the "NJ" in the application, or vice-versa. Depending on the relative number of "SMITH" records to "NJ" records, one way might be much faster than the other using SQL, but the distinction does not exist in a "very thick" server.

Are thick servers always better than thin ones? Nothing in life is ever that simple, and programming is no exception. Some applications never have to search for records; they always know the proper key to use. In this case, the overhead of request languages, larger server programs, and complicated request-processing logic are all wasted. A very thin server will respond much more quickly.

One final note on network database design: up until now we have been considering the usual case of having all the data stored in one place with a single server. There is nothing to stop us from distributing the data base in one of two ways. The first is to keep data that is more often used from a particular machine on that machine. This means that only occasional requests will go via the network; most will be satisfied locally. An example of this might be the two-store inventory database mentioned earlier. We had one case in which the headquarters machine received only summary sales information from the remote locations, with updates being sent once a day. Customer and invoice records were transmitted only if a refund was necessary since all refund checks were written on the headquarters machine.

The other case is data with a very high read:write ratio. The solution to this might be to keep a copy of the data on each machine and send all updates to all machines. One of our web server applications works this way because we have about 100,000 data records that are searched thousands of times per day, but only about 200 records are updated per day. The updates occupy a fraction of a percent of total I/O, so read performance is at a premium. Since there are two parallel machines to handle requests, they both have complete databases and update each other on writes. There is also an inherent backup here; if one machine fails the other simple handles all requests until the failing machine is back on line. Update transactions are spooled to disk on the working machine. The two machines can be connected by a slow link and still perform quite well.


Have any questions? E-mail us at info@hsix.com


Return to Previous Page | Return to the HSI Homepage