Q8.1: Sybase SQL Server Performance and Tuning
All Components Affect Response Time & Throughput
We often think that high performance is defined as a fast
data server, but the picture is not that simple. Performance
is determined by all these factors:
- The client application itself:
- How efficiently is it written?
- We will return to this later, when we look at application tuning.
- The client-side library:
- What facilities does it make available to the application?
- How easy are they to use?
- The network:
- How efficiently is it used by the client/server connection?
- The DBMS:
- How effectively can it use the hardware?
- What facilities does it supply to help build efficient fast
applications?
- The size of the database:
- How long does it take to dump the database?
- How long to recreate it after a media failure?
Unlike some products which aim at performance on paper,
Sybase aims at solving the multi-dimensional problem of
delivering high performance for real applications.
OBJECTIVES
To gain an overview of important considerations and
alternatives for the design, development, and implementation
of high performance systems in the Sybase client/server
environment. The issues we will address are:
- Client Application and API Issues
- Physical Database Design Issues
- Networking Issues
- Operating System Configuration Issues
- Hardware Configuration Issues
- SQL Server Configuration Issues
Client Application and Physical Database Design design
decisions will account for over 80% of your system's "tuneable"
performance so ...
plan your project resources accordingly !
It is highly recommended that every project include
individuals who have taken Sybase Education's Performance and
Tuning course. This 5-day course provides the hands-on
experience essential for success.
Client Application Issues
- Tuning Transact-SQL Queries
- Locking and Concurrency
- ANSI Changes Affecting Concurrency
- Application Deadlocking
- Optimizing Cursors in v10
- Special Issues for Batch Applications
- Asynchronous Queries
- Generating Sequential Numbers
- Other Application Issues
Tuning Transact-SQL Queries
- Learn the Strengths and Weaknesses of the Optimizer
- One of the largest factors determining performance is TSQL!
Test not only for efficient plans but also semantic correctness.
- Optimizer will cost every permutation of accesses for
queries involving 4 tables or less. Joins of more than 4
tables are "planned" 4-tables at a time (as listed in the
FROM clause) so not all permutations are evaluated. You can
influence the plans for these large joins by the order of
tables in the FROM clause.
- Avoid the following, if possible:
- Use Tools to Evaluate and Tune Important/Problem Queries
- Use the "set showplan on" command to see the plan chosen
as "most efficient" by optimizer. Run all queries through
during development and testing to ensure accurate access
model and known performance. Information comes through the
Error Handler of a DB-Library application.
- Use the "dbcc traceon(3604, 302, 310)" command to see
each alternative plan evaluated by the optimizer. Generally,
this is only necessary to understand why the optimizer won't
give you the plan you want or need (or think you need)!
- Use the "set statistics io on" command to see the number
of logical and physical i/o's for a query. Scrutinize those
queries with high logical i/o's.
- Use the "set statistics time on" command to see the
amount of time (elapsed, execution, parse and compile) a query
takes to run.
- If the optimizer turns out to be a "pessimizer", use the
"set forceplan on" command to change join order to be the
order of the tables in the FROM clause.
- If the optimizer refuses to select the proper index for
a table, you can force it by adding the index id in
parentheses after the table name in the FROM clause.
SELECT * FROM orders(2), order_detail(1) WHERE ...
This may cause portability issues should
index id's vary/change by site !
Locking and Concurrency
- The Optimizer Decides on Lock Type and Granularity
- Decisions on lock type (share, exclusive, or update) and
granularity (page or table) are made during optimization so
make sure your updates and deletes don't scan the table !
- Exclusive Locks are Only Released Upon Commit or
Rollback
- Lock Contention can have a large impact on both
throughput and response time if not considered both in the
application and database design !
- Keep transactions as small and short as possible to
minimize blocking. Consider alternatives to "mass" updates
and deletes such as a v10.0 cursor in a stored procedure
which frequently commits.
- Never include any "user interaction" in the middle of
transactions.
- Shared Locks Generally Released After Page is Read
- Share locks "roll" through result set for concurrency.
Only "HOLDLOCK" or "Isolation Level 3" retain share locks
until commit or rollback. Remember also that HOLDLOCK is for
read-consistency. It doesn't block other readers !
- Use optimistic locking techniques such as timestamps and
the tsequal() function to check for updates to a row since it
was read (rather than holdlock)
ANSI Changes Affecting Concurrency
- Chained Transactions Risk Concurrency if Behavior not
Understood
- Sybase defaults each DML statement to its own
transaction if not specified ;
- ANSI automatically begins a transaction with any SELECT,
FETCH, OPEN, INSERT, UPDATE, or DELETE statement ;
- If Chained Transaction must be used, extreme care must
be taken to ensure locks aren't left held by applications
unaware they are within a transaction! This is especially
crucial if running at Level 3 Isolation
- Lock at the Level of Isolation Required by the Query
- Read Consistency is NOT a requirement of every query.
- Choose level 3 only when the business model requires it
- Running at Level 1 but selectively applying HOLDLOCKs as
needed is safest
- If you must run at Level 3, use the NOHOLDLOCK clause
when you can !
- Beware of (and test) ANSI-compliant third-party
applications for concurrency
Application Deadlocking
Prior to SQL Server 10 cursors, many developers simulated
cursors by using two or more connections (dbproc's) and
divided the processing between them. Often, this meant one
connection had a SELECT open while "positioned" UPDATEs and
DELETEs were issued on the other connection. The approach
inevitably leads to the following problem:
- Connection A holds a share lock on page X (remember
"Rows Pending" on SQL Server leave a share lock on the
"current" page).
- Connection B requests an exclusive lock on the same page
X and waits...
- The APPLICATION waits for connection B to succeed before
invoking whatever logic will remove the share lock (perhaps
dbnextrow). Of course, that never happens ...
Since Connection A never requests a lock which Connection B
holds, this is NOT a true server-side deadlock. It's really
an "application" deadlock !
Design Alternatives
- Buffer additional rows in the client that are "nonupdateable".
This forces the shared lock onto a page on which
the application will not request an exclusive lock.
- Re-code these modules with CT-Library cursors (aka.
server-side cursors). These cursors avoid this problem by
disassociating command structures from connection structures.
- Re-code these modules with DB-Library cursors (aka.
client-side cursors). These cursors avoid this problem
through buffering techniques and re-issuing of SELECTs.
Because of the re-issuing of SELECTs, these cursors are not
recommended for high transaction sites !
Optimizing Cursors with v10.0
Special Issues for Batch Applications
SQL Server was not designed as a batch subsystem! It was
designed as an RBDMS for large multi-user applications.
Designers of batch-oriented applications should consider the
following design alternatives to maximize performance :
Design Alternatives :
- Minimize Client/Server Interaction Whenever Possible
- Don't turn SQL Server into a "file system" by issuing
single table / single row requests when, in actuality, set
logic applies.
- Maximize TDS packet size for efficient Interprocess
Communication (v10 only)
- New SQL Server 10.0 cursors declared and processed
entirely within stored procedures and triggers offer
significant performance gains in batch processing.
- Investigate Opportunities to Parallelize Processing
- Breaking up single processes into multiple, concurrently
executing, connections (where possible) will outperform
single streamed processes everytime.
- Make Use of TEMPDB for Intermediate Storage of Useful
Data
Asynchronous Queries
Many, if not most, applications and 3rd Party tools are coded
to send queries with the DB-Library call dbsqlexec( ) which
is a synchronous call ! It sends a query and then waits for
a response from SQL Server that the query has completed !
Designing your applications for asynchronous queries provides many benefits:
- A "Cooperative" multi-tasking application design under
Windows will allow users to run other Windows applications
while your long queries are processed !
- Provides design opportunities to parallize work across
multiple SQL Server connections.
Implementation Choices:
- System 10 Client Library Applications:
- True asynchronous behaviour is built into the entire
library. Through the appropriate use of call-backs,
asynchronous behavior is the normal processing paradigm.
- Windows DB-Library Applications (not true async but
polling for data):
- Use dbsqlsend(), dbsqlok(), and dbdataready() in
conjunction with some additional code in WinMain() to pass
control to a background process. Code samples which outline
two different Windows programming approaches (a PeekMessage
loop and a Windows Timer approach) are available in the
Microsoft Software Library on Compuserve (GO MSL). Look for
SQLBKGD.ZIP
- Non-PC DB-Library Applications (not true async but
polling for data):
- Use dbsqlsend(), dbsqlok(), and dbpoll() to utilize
non-blocking functions.
Generating Sequential Numbers
Many applications use unique sequentially increasing numbers,
often as primary keys. While there are good benefits to this
approach, generating these keys can be a serious contention
point if not careful. For a complete discussion of the
alternatives, download Malcolm Colton's White Paper on
Sequential Keys from the SQL Server Library of our OpenLine
forum on Compuserve.
The two best alternatives are outlined below.
- "Primary Key" Table Storing Last Key Assigned
- IDENTITY Columns (v10.0 only)
- Last key assigned for each table is stored in memory and
automatically included in all INSERTs (BCP too). This should
be the method of choice for performance.
- Choose a large enough numeric or else all inserts will
stop once the max is hit.
- Potential rollbacks in long transactions may cause gaps
in the sequence !
Other Application Issues
- Transaction Logging Can Bottleneck Some High Transaction
Environments
- Committing a Transaction Must Initiate a Physical Write
for Recoverability
- Implementing multiple statements as a transaction can
assist in these environment by minimizing the number of log
writes (log is flushed to disk on commits).
- Utilizing the Client Machine's Processing Power Balances
Load
- Client/Server doesn't dictate that everything be done on
Server!
- Consider moving "presentation" related tasks such as string
or mathematical manipulations, sorting, or, in some cases,
even aggregating to the client.
- Populating of "Temporary" Tables Should Use "SELECT
INTO" - balance this with dynamic creation of
temporary tables in an OLTP environment. Dynamic creation
may cause blocks in your tempdb.
- "SELECT INTO" operations are not logged and thus are
significantly faster than there INSERT with a nested SELECT
counterparts.
- Consider Porting Applications to Client Library Over
Time
- True Asynchronous Behavior Throughout Library
- Array Binding for SELECTs
- Dynamic SQL
- Support for ClientLib-initiated callback functions
- Support for Server-side Cursors
- Shared Structures with Server Library (Open Server 10)
Physical Database Design Issues
- Normalized -vs- Denormalized Design
- Index Selection
- Promote "Updates-in-Place" Design
- Promote Parallel I/O Opportunities
Normalized -vs- Denormalized
- Always Start with a Completely Normalized Database
- Denormalization should be an optimization taken as a
result of a performance problem
- Benefits of a normalized database include :
- Accelerates searching, sorting, and index creation since
tables are narrower
- Allows more clustered indexes and hence more flexibility
in tuning queries, since there are more tables ;
- Accelerates index searching since indexes tend to be
narrower and perhaps shorter ;
- Allows better use of segments to control physical
placement of tables ;
- Fewer indexes per table, helping UPDATE, INSERT, and
DELETE performance ;
- Fewer NULLs and less redundant data, increasing
compactness of the database ;
- Accelerates trigger execution by minimizing the extra
integrity work of maintaining redundant data.
- Joins are Generally Very Fast Provided Proper Indexes
are Available
- Normal caching and cindextrips parameter (discussed in
Server section) means each join will do on average only 1-2
physical I/Os.
- Cost of a logical I/O (get page from cache) only 1-2
milliseconds.
- There Are Some Good Reasons to Denormalize
- All queries require access to the "full" set of joined
data.
- Majority of applications scan entire tables doing joins.
- Computational complexity of derived columns require
storage for SELECTs
- Others ...
Index Selection
- Without a clustered index, all INSERTs and "out-of-place"
UPDATEs go to the last page. The lock contention in
high transaction environments would be prohibitive. This is
also true for INSERTs to a clustered index on a monotonically
increasing key.
- High INSERT environments should always cluster on a key
which provides the most "randomness" (to minimize lock /
device contention) that is usable in many queries. Note this
is generally not your primary key !
- Prime candidates for clustered index (in addition to the
above) include :
- Columns Accessed by a Range
- Columns Used with Order By, Group By, or Joins
- Indexes Help SELECTs and Hurt INSERTs
- Too many indexes can significantly hurt performance of
INSERTs and "out-of-place" UPDATEs.
- Prime candidates for nonclustered indexes include :
- Columns Used in Queries Requiring Index Coverage
- Columns Used to Access Less than 20% (rule of thumb)
of the Data.
- Unique indexes should be defined as UNIQUE to help the
optimizer
- Minimize index page splits with Fillfactor (helps
concurrency and minimizes deadlocks)
- Keep the Size of the Key as Small as Possible
- Accelerates index scans and tree traversals
- Use small datatypes whenever possible . Numerics should
also be used whenever possible as they compare faster than
strings.
Promote "Update-in-Place" Design
- "Update-in-Place" Faster by Orders of Magnitude
- Performance gain dependent on number of indexes. Recent
benchmark (160 byte rows, 1 clustered index and 2 nonclustered)
showed 800% difference!
- Alternative ("Out-of-Place" Update) implemented as a
physical DELETE followed by a physical INSERT. These tactics
result in:
- Increased Lock Contention
- Increased Chance of Deadlock
- Decreased Response Time and Throughput
- Currently (System 10 and below), Rules for "Update-in-Place" Behavior Include :
- Columns updated can not be variable length or allow nulls
- Columns updated can not be part of an index used to
locate the row to update
- No update trigger on table being updated (because the
inserted and deleted tables used in triggers get their data
from the log)
In v4.9.x and below, only one row may be affected and
the optimizer must know this in advance by choosing a UNIQUE
index. System 10 eliminated this limitation.
Promote Parallel I/O Opportunities
Networking Issues
- Choice of Transport Stacks
- Variable Sized TDS Packets
- TCP/IP Packet Batching
Choice of Transport Stacks for PCs
- Choose a Stack that Supports "Attention Signals" (aka.
"Out of Band Data")
- Provides for the most efficient mechanism to cancel
queries.
- Essential for sites providing ad-hoc query access to
large databases.
- Without "Attention Signal" capabilities (or the urgent
flag in the connection string), the DB-Library functions
DBCANQUERY ( ) and DBCANCEL ( ) will cause SQL Server to send
all rows back to the Client DB-Library as quickly as possible
so as to complete the query. This can be very expensive if
the result set is large and, from the user's perspective,
causes the application to appear as though it has hung.
- With "Attention Signal" capabilities, Net-Library is
able to send an out-of-sequence packet requesting the SQL
Server to physically throw away any remaining results
providing for instantaneous response.
- Currently, the following network vendors and associated
protocols support the an "Attention Signal" capable
implementation:
- NetManage NEWT
- FTP TCP
- Named Pipes (10860) - Do not use urgent parameter with this Netlib
- Novell LAN Workplace v4.1 0 Patch required from Novell
- Novell SPX - Implemented internally through an "In-Band" packet
- Wollongong Pathway
- Microsoft TCP - Patch required from Microsoft
Variable-sized TDS Packets
Pre-v4.6 TDS Does Not Optimize Network Performance
Current SQL Server TDS packet size limited to 512 bytes while
network frame sizes are significantly larger (1508 bytes on
Ethernet and 4120 bytes on Token Ring).
The specific protocol may have other limitations!
For example:
- IPX is limited to 576 bytes in a routed network.
- SPX requires acknowledgement of every packet before it will
send another. A recent benchmark measured a 300% performance
hit over TCP in "large" data transfers (small transfers
showed no difference).
- Open Client Apps can "Request" a Larger Packet Shown to
have significant performance improvement on "large" data
transfers such as BCP, Text / Image Handling, and Large
Result Sets.
- clients:
- isql -Usa -Annnnn
- bcp -Usa -Annnnn
- ct_con_props (connection, CS_SET, CS_PACKETSIZE,
&packetsize, sizeof(packetsize), NULL)
- An "SA" must Configure each Servers' Defaults Properly
- sp_configure "default packet size", nnnnn -
Sets default packet size per client connection (defaults to 512)
- sp_configure "maximum packet size", nnnnn -
Sets maximum TDS packet size per client connection (defaults
to 512)
- sp_configure "additional netmem", nnnnn -
Additional memory for large packets taken from separate
pool. This memory does not come from the sp_configure memory
setting.
Optimal value = ((# connections using large packets
large packetsize * 3) + an additional 1-2% of the above
calculation for overhead)
Each connection using large packets has 3 network
buffers: one to read; one to write; and one overflow.
TCP/IP Packet Batching
Operating System Issues
- Never Let SQL Server Page Fault
- It is better to configure SQL Server with less memory
and do more physical database I/O than to page fault. OS
page faults are synchronous and stop the entire dataserver
engine until the page fault completes. Since database I/O's
are asynchronous, other user tasks can continue!
- Use Process Affinitying in SMP Environments, if
Supported
- Affinitying dataserver engines to specific CPUs
minimizes overhead associated with moving process information
(registers, etc) between CPUs. Most implementations will
preference other tasks onto other CPUs as well allowing even
more CPU time for dataserver engines.
- Watch out for OS's which are not fully symmetric.
Affinitying dataserver engines onto CPUs that are heavily
used by the OS can seriously degrade performance. Benchmark
with your application to find optimal binding.
- Increase priority of dataserver engines, if supported
- Give SQL Server the opportunity to do more work. If SQL
Server has nothing to do, it will voluntarily yield the CPU.
- Watch out for OS's which externalize their async
drivers. They need to run too!
- Use of OS Monitors to Verify Resource Usage
- The OS CPU monitors only "know" that an instruction is
being executed. With SQL Server's own threading and
scheduling, it can routinely be 90% idle when the OS thinks
its 90% busy. SQL Monitor shows real CPU usage.
- Look into high disk I/O wait time or I/O queue lengths.
These indicate physical saturation points in the I/O
subsystem or poor data distribution.
- Disk Utilization above 50% may be subject to queuing
effects which often manifest themselves as uneven response
times.
- Look into high system call counts which may be
symptomatic of problems.
- Look into high context switch counts which may also be
symptomatic of problems.
- Optimize your kernel for SQL Server (minimal OS file
buffering, adequate network buffers, appropriate KEEPALIVE
values, etc).
- Use OS Monitors and SQL Monitor to Determine Bottlenecks
- Most likely "Non-Application" contention points include:
Resource Where to Look
--------- --------------
CPU Performance SQL Monitor - CPU and Trends
Physical I/O Subsystem OS Monitoring tools - iostat, sar...
Transaction Log SQL Monitor - Device I/O and
Device Hit Rate
on Log Device
SQL Server Network Polling SQL Monitor - Network and Benchmark
Baselines
Memory SQL Monitor - Data and Cache
Utilization
- Use of Vendor-support Striping such as LVM and RAID
- These technologies provide a very simple and effective
mechanism of load balancing I/O across physical devices and
channels.
- Use them provided they support asynchronous I/O and
reliable writes.
- These approaches do not eliminate the need for Sybase
segments to ensure minimal contention for internal resources.
- Non-read-only environments should expect performance
degradations when using RAID levels other than level 0.
These levels all include fault tolerance where each write
requires additional reads to calculate a "parity" as well as
the extra write of the parity data.
Hardware Configuration Issues
- Number of CPUs
- Use information from SQL Monitor to assess SQL Server's
CPU usage.
- In SMP environments, dedicate at least one CPU for the
OS.
- Advantages and scaling of VSA is application-dependent.
VSA was architected with large multi-user systems in mind.
- I/O Subsystem Configuration
- Look into high Disk I/O Wait Times or I/O Queue Lengths.
These may indicate physical I/O saturation points or poor
data distribution.
- Disk Utilization above 50% may be subject to queuing
effects which often manifest themselves as uneven response
times.
- Logical Volume configurations can impact performance of
operations such as create database, create index, and bcp.
To optimize for these operations, create Logical Volumes such
that they start on different channels / disks to ensure I/O
is spread across channels.
- Discuss device and controller throughput with hardware
vendors to ensure channel throughput high enough to drive all
devices at maximum rating.
General SQL Server Tuning
- Changing Values with sp_configure or buildmaster
It is imperative that you only use sp_configure to
change those parameters that it currently maintains because
the process of reconfiguring actually recalculates a number
of other buildmaster parameters. Using the Buildmaster
utility to change a parameter "managed" by sp_configure may
result in a mis-configured server and cause adverse
performance or even worse ...
- Sizing Procedure Cache
- SQL Server maintains an MRU-LRU chain of stored
procedure query plans. As users execute sprocs, SQL Server
looks in cache for a query plan to use. However, stored
procedure query plans are currently not re-entrant! If a
query plan is available, it is placed on the MRU and
execution begins. If no plan is in memory, or if all copies
are in use, a new copy is read from the sysprocedures table.
It is then optimized and put on the MRU for execution.
- Use dbcc memusage to evaluate the size and number of
each sproc currently in cache. Use SQL Monitor's cache
statistics to get your average cache hit ratio. Ideally
during production, one would hope to see a high hit ratio to
minimize the procedure reads from disk. Use this information
in conjuction with your desired hit ratio to calculate the
amount of memory needed.
- Memory
- Tuning memory is more a price/performance issue than
anything else ! The more memory you have available, the
greater than probability of minimizing physical I/O. This is
an important goal though. Not only does physical I/O take
significantly longer, but threads doing physical I/O must go
through the scheduler once the I/O completes. This means
that work on behalf of the thread may not actually continue
to execute for quite a while !
- There are no longer (as of v4.8) any inherent
limitations in SQL Server which cause a point of diminishing
returns on memory size.
- Calculate Memory based on the following algorithm :
Total Memory = Dataserver Executable Size (in bytes) +
Static Overhead of 1 Mb +
User Connections x 40,960 bytes +
Open Databases x 644 bytes +
Locks x 32 bytes +
Devices x 45,056 bytes +
Procedure Cache +
Data Cache
- Recovery Interval
- As users change data in SQL Server, only the transaction
log is written to disk right away for recoverability.
"Dirty" data and index pages are kept in cache and written to
disk at a later time. This provides two major benefits:
- Many transactions may change a page yet only one
physical write is done
- SQL Server can schedule the physical writes "when appropriate"
- SQL Server must eventually write these "dirty" pages to
disk.
- A checkpoint process wakes up periodically and "walks"
the cache chain looking for dirty pages to write to disk
- The recovery interval controls how often checkpoint
writes dirty pages.
- Tuning Recovery Interval
- A low value may cause unnecessary physical I/O lowering
throughput of the system. Automatic recovery is generally
much faster during boot-up.
- A high value minimizes unnecessary physical I/O and
helps throughput of the system. Automatic recovery may take
substantial time during boot-up.
Audit Performance Tuning for v10.0
- Potentially as Write Intensive as Logging
- Isolate Audit I/O from other components.
- Since auditing nearly always involves sequential writes,
RAID Level 0 disk striping or other byte-level striping
technology should provide the best performance
(theoretically).
- Size Audit Queue Carefully
- Audit records generated by clients are stored in an in memory
audit queue until they can be processed.
- Tune the queue's size with sp_configure "audit queue
size", nnnn (in rows).
- Sizing this queue too small will seriously impact
performance since all user processes who generate audit
activity will sleep if the queue fills up.
- Size Audit Database Carefully
- Each audit row could require up to 416 bytes depending
on what is audited.
- Sizing this database too small will seriously impact
performance since all user processes who generate audit
activity will sleep if the database fills up.