
We've seen what a "thin" server looks like; now what is "thick"
about a thick server?
Generally speaking, a thin server is indistinguishable to the application
programmer from normal operating system function. A thick server is
an attempt to move some of the logic you might be used to seeing in
the application into the server. As an example, suppose you have a
customer file that uses a customer number as the key and the company
name as one of the data fields. In your application, you want to
search for all companies that have "SMITH" anywhere in the company
name field. A normal THEOS application would use readn() to read
all the records in the file and then look at each record to see if
it has a "SMITH" anywhere.
The drawback to this approach when using network-connected database
servers is that there is a lot of unnecessary network traffic
associated with transmitting all the records that *don't* have
a "SMITH" in the name field. On our network, the best round-trip
time for reading a single record we have seen is 0.7ms; figure in
a fudge factor and call it 1ms per I/O. This means that the best
performance we could expect on the search is a sustained rate of
about 1000 records per second. Various real-life factors conspire
to keep the actual performance below this. On our single-machine
applications, we would expect to see performance on the scale of
4000 records/second or more, so this is a serious difference.
In a multi-user case the network congestion can get so bad that
total I/O bandwidth is much less than the 1000 records/second
level. On a network with complicated routing, WAN bridges,
slower PPP links, and other such bottlenecks, performance could
easily drop below 100/second.
Suppose we make the data server a little "thicker" so that all we
need to do is send it a request that says "return the (next)
customer record that has 'SMITH' in the name field". The server
can read records at the higher rate and return over the network
only records that are of interest to the application. The performance
improvement from the network standpoint is considerable.
So what's the catch here? The answer is that the server becomes very
complicated very quickly. In effect, we could move the entire application
to the server logic and just use the client machine as a dumb terminal.
Sound familiar? it should...this is the difference between THEOS and
Netware that we have all been selling and expoiting for years.
In an attempt to strike a compromise position between the thinnest and
thickest servers, database systems like SQL define a language for
specifying requests from the client to the server. This looks almost
like a programming language, so the request packets are actually
text versions of little programs. The resulting flexibility allows
almost any type of request to be specified. The drawback is that we
now have yet another language interpreter in action, with the usual
reduced efficiency. The database design is pretty much fixed by the
server design, with "tables" the current fashion. A "table" is
very much like a THEOS keyed file (not ISAM since they are usually
not ordered), so access to individual records is fairly efficient.
The "rows" are THEOS records, and the "columns" are variable names
or fields in the record layout.
The internals of an SQL server look a lot like a THEOS application
in terms of the record-level I/O that is done. To satisfy the
request for the "SMITH" records, the server would have to read
all the customer records and return the ones that matched. In
terms of performance over the network, an SQL-type server will
usually beat a "thin" server on complicated requests; it will
in turn be beaten badly by a well-designed "very thick" server
that knows exactly how to satisfy a particular request. The
main failure of the SQL design is that there is usually no way
to take advantage of special situations known to the programmer;
there is also a limit to the complexity of requests. If we wanted
only records with "SMITH" that also were in New Jersey, we might
have to read all the "SMITH" records and look for the "NJ" in
the application, or vice-versa. Depending on the relative
number of "SMITH" records to "NJ" records, one way might be much
faster than the other using SQL, but the distinction does not
exist in a "very thick" server.
Are thick servers always better than thin ones? Nothing in life is
ever that simple, and programming is no exception. Some applications
never have to search for records; they always know the proper key
to use. In this case, the overhead of request languages, larger
server programs, and complicated request-processing logic are all
wasted. A very thin server will respond much more quickly.
One final note on network database design: up until now we have been
considering the usual case of having all the data stored in one
place with a single server. There is nothing to stop us from
distributing the data base in one of two ways. The first is to keep
data that is more often used from a particular machine on that
machine. This means that only occasional requests will go via the
network; most will be satisfied locally. An example of this might
be the two-store inventory database mentioned earlier. We had one
case in which the headquarters machine received only summary sales
information from the remote locations, with updates being sent once
a day. Customer and invoice records were transmitted only if a
refund was necessary since all refund checks were written on the
headquarters machine.
The other case is data with a very high read:write ratio. The solution
to this might be to keep a copy of the data on each machine and send
all updates to all machines. One of our web server applications works
this way because we have about 100,000 data records that are searched
thousands of times per day, but only about 200 records are updated
per day. The
updates occupy a fraction of a percent of total I/O, so read
performance is at a premium. Since there are two parallel machines to
handle requests, they both have complete databases and update each
other on writes. There is also an inherent backup here; if one
machine fails the other simple handles all requests until the failing
machine is back on line. Update transactions are spooled to disk
on the working machine. The two machines can be connected by a
slow link and still perform quite well.
Have any questions?
E-mail us at info@hsix.com