Q1.10: What is a SQL Server?


Overview

Before Sybase System 10 (as they call it) we had Sybase 4.x. Sybase System 10 has some significant improvements over Sybase 4.x product line. Namely:

Background and More Terminology

A SQL Server is simply a Unix process. It is also known as the database engine. It has multiple threads to handle asynchronous I/O and other tasks. The number of threads spawned is the number of engines (more on this in a second) times five. This is the current implementation of Sybase System 10, 10.0.1 and 10.0.2 on IRIX 5.3.

Each SQL dataserver allocates the following resources from a host machine:

Each SQL dataserver can have up to 255 databases. In most implementations the number of databases is limited to what seems reasonable based on the load on the SQL dataserver. That is, it would be impractical to house all of a large company's databases under one SQL dataserver because the SQL dataserver (a Unix process) will become overloaded.

That's where the DBA's experience comes in with interrogation of the user community to determine how much activity is going to result on a given database or databases and from that we determine whether to create a new SQL Server or to house the new database under an existing SQL Server. We do make mistakes (and businesses grow) and have to move databases from one SQL Server to another. And at times SQL Servers need to move from one CPU server to another.

With Sybase System 10, each SQL Server can be configured to have more than one engine (each engine is again a Unix process). There's one primary engine that is the master engine and the rest of the engines are subordinates. They are assigned tasks by the master.

Interprocess communication among all these engines is accomplished with shared memory.

Some times when a DBA issues a Unix kill command to extinguish a maverick SQL Server, the subordinate engines are forgotten. This leaves the shared memory allocated and eventually we may get in to situations where swapping occurs because this memory is locked. To find engines that belong to no master SQL Server, simple look for engines owned by /etc/init (process id 1). These engines can be killed -- this is just FYI and is a DBA duty.

Before presenting an example of a SQL Server, some other topics should be covered.

Connections

A SQL Server has connections to it. A connection can be viewed as a user login but it's not necessarily so. That is, a client (a user) can spark up multiple instances of their application and each client establishes its own connection to the SQL dataserver. Some clients may require two or more per invocation. So typically DBA's are only concerned with the number of connections because the number of users typically does not provide sufficient information for us to do our job.
Connections take up SQL Server resources, namely memory, leaving less memory for the SQL Servers' available cache.

SQL Server Buffer Cache

In Sybase 4.0.1 there was a limit to the amount of memory that could be allocated to a SQL Server. It was around 80MB, with 40MB being the typical max. This was due to internal implementations of Sybase's data structures.

With Sybase System 10 there really is no limit. For instance, we have a SQL Server cranked up to 300MB.

The memory in a SQL Server is primarily used to cache data pages from disk. Consider that the SQL Server is a light weight Operating System: handling user (connections), allocating memory to users, keeping track of which data pages need to be flushed to disk and the sort. Very sophisticated and complex. Obviously if a data page is found in memory it's much faster to retrieve than going out to disk.

Each connection takes away a little bit from the available memory that is used to cache disk pages. Upon startup, the SQL Server pre-allocates the memory that is needed for each connection so it's not prudent to configure 500 connections when only 300 are needed. We'd waste 200 connections and the memory associated with that. On the other hand, it is also imprudent to under configure the number of connections; users have a way of soaking up a resource (like a SQL Server) and if users have all the connections a DBA cannot get into the server to allocate more connections.

One of the neat things about a SQL Server is that it reaches (just like a Unix process) a working set. That is, upon startup it'll do a lot of physical I/O's to seed its cache, to get lookup information for typical transactions and the like. So initially, the first users have heavy hits because their requests have to be performed as a physical I/O. Subsequent transactions have less physical I/O and more logical I/O's. Logical I/O is an I/O that is satisfied in the SQL Servers' buffer cache. Obviously, this is the preferred condition.

DSS vs OLTP

We throw around terms like everyone is supposed to know this high tech lingo. The problem is that they are two different animals that require a SQL Server to be tuned accordingly for each.

Well, here's the low down.

DSS
Decision Support System
OLTP
Online Transaction Processing

What do these mean? OLTP applications are those that have very short orders of work for each connection: fetch this row and with the results of it update one or two other rows. Basically, small number of rows affected per transaction in rapid sucession, with no significant wait times between operations in a transaction.

DSS is the lumbering elephant in the database world (unless you do some tricks... out of this scope). DSS requires a user to comb through gobs of data to aggregate some values. So the transactions typically involve thousands of rows. Big difference than OLTP.

We never want to have DSS and OLTP on the same SQL Server because the nature of OLTP is to grab things quickly but the nature of DSS is to stick around for a long time reading tons of information and summarizing the results.

What a DSS application does is flush out the SQL Server's data page cache because of the tremendous amount of I/O's. This is obviously very bad for OTLP applications because the small transactions are now hurt by this trauma. When it was only OLTP a great percentage of I/O was logical (satisfied in the cache); now transactions must perform physical I/O.

That's why it's important in Sybase not to mix DSS and OLTP, at least until System 11 arrives.

Sybase System 11 release will allow for the mixing of OLTP and DSS by allowing the DBA to partition (and name) the SQL Server's buffer cache and assign it to different databases and/or objects. The idea is to allow DSS to only affect their pool of memory and thus allowing OLTP to maintain its working set of memory.

Asynchronous I/O

Why async I/O? The idea is in a typical online transaction processing (OLTP) application you have many connections (over 200 connections) and short transactions: get this row, update that row. These transactions are typically spread across different tables of the databases. The SQL Server can then perform each one of these asynchronously without having to wait for others to finish. Hence the importance of having async I/O fixed on our platform.

Engines

Sybase System 10 can have more than one engine (as stated above). Sybase has trace flags to pin the engines to a given CPU processor but we typically don't do this. It appears that the master engine goes to processor 0 and subsequent subordinates to the next processor.

Currently, Sybase does not scale linearly. That is, five engines doesn't make Sybase perform five times as fast however we do max out with four engines. After that, performs starts to degrade. This is supposed to be fixed with Sybase System 11.

Putting Everything Together

As previously mentioned, a SQL Server is a collection of databases with connections (that are the users) to apply and retrieve information to and from these containers of information (databases).

The SQL Server is built and its master device is typically built over a medium sized (50MB) raw partition. The tempdb is built over a cooked (regular - as opposed to a raw device) file system to realize any performance gains by buffered writes. The databases themselves are built over the raw logical devices to ensure their integrity.

Physical and Logical Devices

Sybase likes to live in its own little world. This shields the DBA from the outside world known as Unix (or VMS). However, it needs to have a conduit to the outside world and this is accomplished via devices.

All physical devices are mapped to logical devices. That is, given a physical device (such as /lv1/dumps/tempdb_01.efs or /dev/rdsk/dks1ds0) it is mapped by the DBA to a logical device. Depending on the type of the device, it is allocated, by the DBA, to the appropriate place (vague enough?).

Okay, let's try and clear this up...

Dump Device

The DBA may decide to create a device for dumping the database nightly. The DBA needs to create a dump device.

We'll call that logically in the database datadump_for_my_db but we'll map it to the physical world as /lv1/dumps/in_your_eye.dat So the DBA will write a script that connects to the SQL Server and issues a command like this:

dump database my_stinking_db to datadump_for_my_db
go

and the backupserver (out of this scope) takes the contents of my_stinking_db and writes it out to the disk file /lv1/dumps/in_your_eye.dat

That's a dump device. The thing is that it's not preallocated. This special device is simply a window to the operating system.

Data and Log Devices

Ah, now we are getting into the world of pre-allocation. Databases are built over raw partitions. The reason for this is because Sybase needs to be guaranteed that all its writes complete successfully. Otherwise, if it posted to a file system buffer (as in a cooked file system) and the machine crashed, as far as Sybase is concerned the write was committed. It was not, however, and integrity of the database was lost. That is why Sybase needs raw partitions. But back to the matter at hand...

When building a new SQL Server, the DBA determines how much space they'll need for all the databases that will be housed in this SQL Server.

Each production database is composed of data and log.

The data is where the actual information resides. The log are where the changes are kept. That is, every row that is updated/deleted/inserted gets placed into the log portion then applied to the data portion of the database.

That's why DBA strives to place the raw devices for logs on separate disks because everything has to single thread through the log.
A transaction is a collection of SQL statements (insert/delete/update) that are grouped together to form a single unit of work. Typically they map very closely to the business.

I'll quote the Sybase SQL Server System Administration guide on the role of the log:

The transaction log is a write-ahead log. When a user issues a statement that would modify the database, SQL Server automatically writes the changes to the log. After all changes for a statement have been recorded in the log, they are written to an in-cache copy of the data page. The data page remains in cache until the memory is needed for another database page. At that time, it is written to disk. If any statement in a transaction fails to complete, SQL Server reverses all changes made by the transaction. SQL Server writes an "end transaction" record to the log at the end of each transaction, recording the status (success or failure) of the transaction
As such, the log will grow as user connections affect changes to the database. The need arises to then clear out the log of all transactions that have been flushed to disk. This is performed by issuing the following command:
dump transaction my_stinking_db to logdump_for_my_db
go
The SQL Server will write to the dumpdevice all transactions that have been committed to disk and will delete the entries from its copy, thus freeing up space in the log. Dumping of the transaction logs is accomplished via cron. We schedule the heavily hit databases every 20 minutes during peak times.
A single user can fill up the log by having begin transaction with no corresponding commit/rollback transaction. This is because all their changes are being applied to the log as an open-ended transaction, which is never closed. This open-ended transaction cannot be flushed from the log, and therefore grows until it occupies all of the free space on the log device.
And the way we dump it is with a dump device. :-)

An Example

If the DBA has four databases to plop on this SQL Server and they need a total of 800MB of data and 100MB of log (because that's what really matters to us), then they'd probably do something like this:
  1. allocate sufficient raw devices to cover the data portion of all the databases
  2. allocate sufficient raw devices to cover the log portion of all the databases
  3. start allocating the databases to the devices.
For example, assuming the following database requirements:

Database Requirements
DB Data Log
a 300 30
b 400 40
c 100 10

and the following devices:

Devices
Logical Physical Size
dks3d1s2_data /dev/rdsk/dks3d1s2 500
dks4d1s2_data /dev/rdsk/dks4d1s2 500
dks5d1s0_log /dev/rdsk/dks5d1s0 200

then the DBA may elect to create the databases as follows:

create database a on dks3d1s2_data = 300 log on dks5d1s0_log = 30
create database b on dks4d1s2_data = 400 log on dks5d1s0_log = 40
create database c on dks3d1s2_data = 50, dks4d1s2_data = 50 log on dks5d1s0_log = 10
Some of the devices will have extra space available because out database allocations didn't use up all the space. That's fine because it can be used for future growth. While the Sybase SQL Server is running, no other Sybase SQL Server can re-allocate these physical devices.

TempDB

TempDB is simply a scratch pad database. It gets recreated when a SQL Server is rebooted. The information held in this database is temporary data. A query may build a temporary table to assist it; the Sybase optimizer may decide to create a temporary table to assist itself.

Since this is an area of constant activity we create this database over a cooked file system which has historically proven to have better performance than raw - due to the buffered writes provided by the Operating System.

Port Numbers

When creating a new SQL Server, we allocate a port to it (currently, DBA reserves ports 1500 through 1899 for its use). We then map a host name to the different ports: hera, fddi-hera and so forth. We can actually have more than one port number for a SQL Server but we typically don't do this.