|
| |
Database Application Performance: Elements of Performance
by Jacques Roy jroy@informix.com
Tech Notes 1998, Volume 8, Issue 4
Overview
According to Moore's law, computer performance doubles every 18 months. We have seen
PCs go from an Intel 8088 at 4.77MHz and a maximum of 640KB of memory to the current
machines with multiple (4 today) 450MHz Pentium II processors and up to 4GB of memory. In
other classes of computers, we find machines with up to 64 processors and 64GB of memory.
Machines are also available in clusters to provide massively parallel processing (MPP).
With all this power, why worry about database application performance?
Database applications are more complex, and the amount of data has grown over the
years. We are already dealing with multi-terabyte databases. When can we expect petabyte
(1000TB) databases? It probably won't surprise you to find that a few already exist. As a
result, the processing is becoming more complex and the number of users is increasing.
This document provides an overview of the factors that impact performance, and will
describe guidelines that will help in application development and tuning. It covers
hardware, operating system, database engine, application development, and database access.
Hardware is the first element of performance. Given everything else being constant,
performance can be improved, if only marginally, by throwing hardware at a problem. In
some cases it is the most cost-effective solution. Then the question is: what hardware
should be added? Let's look at the major hardware components of a system, using the Sun
Enterprise Server 10000 as an example.
- CPU: The central processing unit is what executes program
instructions. Since the late '80s, UNIX vendors have supported symmetrical multiprocessing
(SMP). The scalability of these systems has increased to support a large number of
processors. SMP means that a machine contains several CPUs that are all equal in
functionality: any CPU can handle any task in the system, including interrupts and I/O.
High-performance
CPUs include on-chip memory and often use a secondary cache. For example, each UltraSPARC
processor in the Sun ES10000 has 16KB of instruction cache and 16KB of data cache. It also
has a 4MB secondary cache.
Why have a cache and so much of it? Let's again consider the Sun ES10000. The processor
speed is listed at 336MHz and it uses a 128 bit (16 bytes) interface. For each CPU cycle,
it can get 16 bytes:
So, at peak performance, one UltraSPARC processor can go through 5.2GB per second.
Memory caches provide faster access to instructions and data in hopes of keeping the CPU
busy
- Memory: Memory sub-systems use the concept of interleaving to increase data
transfer performance. In the Sun ES 100000, a memory module achieves transfer rates of
1.3GB per second.
- System bus: The system bus is the critical component of the I/O sub-system that
allows communication between the I/O controllers and the system. The standard Sun SBus
provides a data throughput rate of 100MB per second.
- Controllers: Several types of controllers are available. They are used to access
disk drives, networks, tape drives, and so forth. The standard SCSI controller can
accommodate the performance of the SCSI bus at 20MB/sec. Sun has come out with controllers
for disk arrays based on fiber optic technology that are rated at 100MB/sec. This matches
the speed of the system bus.
- Disks: The standard disk drive used in high-performance systems is the SCSI
drive. You can calculate the transfer rate of a disk by looking at its capacity, rotation
speed, latency, seek time, track-to-track seek, etc. The real throughput is a factor of
all these factors and the size of the transfer requested. If you ask for disk transfers of
a few disk blocks scattered all over the disk, you will incur a high percentage of seek
time and latency. However, if you are transferring a large number of contiguous disk
blocks, most of the time will be spent transferring data.
The average seek time,
average latency, and the track-to-track seek times are measured in milliseconds. A top of
the line drive may have an average seek time of 5.4 msec. During that time, the UltraSPARC
processor can execute about 7 million instructions. In an effort to reduce the impact of
this overhead, most disk manufacturers include memory cache in their disk drives.
The previous paragraph explains why it is always recommended to allocate contiguous
space and avoid disk fragmentation. So what transfer rate should we expect from a disk
drive?
Some disk drives have peak transfer rates of up to 80MB/sec. Of course this is a
"best case" scenario. The performance will vary greatly depending on the type of
I/O being done and the impact of latency and seek time. In a "worst case"
scenario, we could require the average seek time and average latency for each page, which
would reduce our performance to around 240KB/sec (2KB every 8.4 msec). This is a huge
range. The real performance will have to be evaluated by benchmarking the application.

Figure 1: Some Disk Drive Characteristics.1
1 Data extracted from http://www.seagate.com/disc/cheetah.shtml
- Network: The most popular network is the ethernet. The usual transfer rate is
10Mbit/sec. Considering the standard overhead of control bits and data used for error
detection, we can expect a transfer rate of, at most, 1MB/sec. However, new network
technology is rapidly appearing and provides transfer rates of 100Mbits/sec (10MB/sec).
The
ethernet protocol defines a format for sending information. Each packet of information has
a maximum size of 1526 bytes. This includes 26 bytes of control information.2
The size of the data transfer will have a major impact on the effective transfer rate; for
example, the transfer of 20 bytes of data will still incur 26 bytes of control data for
more than 50 percent overhead.
2 Douglas Comer, Internetworking with TCP/IP, pp. 18-19.
- Modems: Modems are available at various speeds. The most popular are
14.4Kbits/sec, 28.8Kbits/sec, and 56Kbits/sec. Because we've been dealing mostly with
MB/sec up to now, let's convert the previous ratings. They translate to around
0.0014MB/sec, 0.0028MB/sec, and 0.0056MB/sec respectively.
- Interactive users: Finally, we arrive to the user. Depending on the application,
a user could have a relatively high receiving rate but, in general, should be limited to
short bursts. However, the transmission rate is generally very low. People think, issue
commands, and analyze the result. Let's consider a data entry clerk that types 120 words
per minute. If we assume an average word of 8 letters, we get a rate of 960 bytes/minute
or 16 bytes/sec. This translates to 0.000016MB/sec.
This constitutes a basic set of numbers that allows some generalizations on the
performance of the system. This has to be used carefully, because it does not take into
consideration any of the interactions between components. For example, a Sun ES10000 can
contain up to 16 UltraSPARC daughterboards. Each board includes four processors, one or
two memory modules, two SBus', and a number of controllers with devices attached to them.
In addition, to consider all the interactions on a daughterboard, we have to consider the
communication between daughterboards and the nature of the processing done on the machine.
Instead of getting lost in the complexity of modern computers, let's consider some
simple guidelines that could help the database environment. Some may be outside of our
control, but should still be kept in mind.

Figure 2: System Components' Peak Performance
Looking at the numbers in Figure 2, some people may get the impression that systems are
not well balanced. Large benchmarks have demonstrated that systems scale well, which shows
that no component is an obvious bottleneck. This is explained by a few simple points:
- CPUs have a cache containing data and instructions that may be reused
several times before the CPU requests something that is not already stored in the cache.
- Components can be added to scale performance. Depending on the processing environment,
an optimal ratio of CPUs, memory, controllers, and disk drives can be achieved.
- Disk drives include a local cache used to reduce the impact of latency and seek time.
Furthermore, if the data written is read back, the cache may provide it without accessing
the drive.
From the numbers in Figure 2, we can extract the following guidelines:
- Keep the CPU busy.
The CPU is the key to all processing. If it becomes idle, the processing cycles are lost.
The CPU will become idle if the cache does not contain the data it requires. The cache
will have to wait if the memory does not contain the proper information. The memory will
then wait for I/O completion. We will see later some ways to impact this.
- Balance your I/O.
The most common performance problem is when a large portion of the I/O goes to one disk
drive while other disks are idle. A variation on this is if all the I/O goes through the
same controller. This can be caused by database tables that are not distributed properly,
or because all the system I/O is done on one drive. To solve these hot spots in the
database, analyze your I/O and distribute your tables properly. This can include moving
the logical logs to different dbspaces. If the hot spot occurs within one table, take
advantage of the Informix table partitioning so the table spans several disk drives. You
can also improve the partitioning by using hardware disk striping. On the system side, the
hot spots will most likely occur on the system disk. Consider moving the swap area and the
temp directory (/tmp) to other drives.
- Avoid disk fragmentation.
We saw earlier that fragmented I/O can have a dramatic impact on disk throughput. Make
sure you allocate large chunks, and try to allocate your tables contiguously.
- Limit your network traffic.
Because the network has a relatively low bandwidth, increased traffic can generate
collisions that force re-transmission which, in turn, adds more overhead to the network
communication. Keep this in mind when developing applications. The network is a valuable
resource - avoid transferring large amounts of data when the database server can perform
the computation and only return a small result.
These general guidelines take different forms at other levels of the computing
environment. It always comes back to a balance of activities in your system
The main goal of an operating system is to optimize the use of the hardware resources.
This simply stated goal involves very sophisticated decisions. Let's consider again the
Sun ES10000. A fully configured machine contains 16 system boards communicating together
through a crossbar interconnect. Each system board contains four UltraSPARC CPUs with
their secondary cache, up to 4GB of memory, two SBus', and a number of controllers.

Figure 3: Block Diagram of an OS Kernel.3
3 Modified version from Maurice J. Bach, The Design of the UNIX Operating
System, p. 20
Figure 3 shows the major components of an operating system. A user is an executing
program called a process. It interacts with the system through system calls. The Solaris
operating system, for example, provides about 185 system calls. Through system calls, a
process can access devices, file systems, and communicate with other local or remote
processes.
The operating system must optimize the use of memory to accommodate as many processes
as possible. Memory is used by the operating system and its system processes, device
drivers, the file system, user processes, etc.
Let's take a high-level look at the major components of a system:
- Process
A process is an executing program that sees memory as a contiguous addressable space. In
32-bit systems, a process can address 4GB of memory. The operating system divides the
process address space into pages. At any one time, a process requires a minimum number of
pages to be able to execute. This is referred to as working set size. A process
goes through several states of execution-it can be running, ready to run, or waiting for
an event (sleeping). This allows the operating system to decide how to schedule processes
for execution.
- Thread
Today, most UNIX implementations include the concept of "thread of execution"
within a process. The simplest model is where a process includes one thread of execution.
It is possible to start an arbitrary number of threads of execution within a process. When
using a native thread implementation, a single process can execute on multiple CPUs
simultaneously. Since a thread executes in the context of a process, it uses less
resources, and the communication between threads is simpler. Furthermore, thread creation
is significantly less expensive than process creation.
- Process control and scheduling
The operating system schedules processes for execution after a specified time period has
elapsed, a system interrupt occurred, or when the currently running process executes a
system call. If the process is still ready to run, the scheduler will decide if it should
continue its execution, or if another process should take its place. In large systems,
the process scheduler must also consider where a process ran last. This way, it is
possible that the process memory pages will already be available in the processor cache.
The processor can then be productive right away.
- Memory management
When the demand on physical memory becomes too great, the operating system takes entire
working sets of processes and swaps them to disk to accommodate memory demands. It may
reach a point where the memory demand is so high that the operating system spends a large
portion of the CPU time administering the system. This is often reflected by a high
percentage of system time reported by system administration tools.
- File system
Most UNIX file systems have their origin in the BSD Fast File System. This file system
structure implemented several improvements over the original UNIX file system. These
improvements include: increased block size and creation of cylinder groups where file
descriptors (inodes) are closer to the data they describe. The simple act of doubling
the block size from its original 512 bytes doubled performance. Current file systems often
use a default block size of 8KB. Some UNIX vendors provide file systems with a tunable
block size.
You can find detailed information on file systems in the reference manuals listed at
the end of this article. For our purpose, two internal structures are of particular
interests: inode and directory.
An inode describes a specific file. It contains information including the owner, group,
file type, access permission, access time, number of links, and file size. It also
includes a set of 15 pointers that identify the location of the data. Figure 4 illustrates
the structure of an inode, including its data pointers. The first 12 data pointers
reference data blocks directly. They are identified on figure 4 as "Direct0" to
"Direct11." The following three pointers, PTR 1, PTR 2, and PTR 3, are pointers
to index blocks. Each of these pointers add a level of indirection. PTR 3 points to an
index block that contains pointers to index blocks. As Figure 4 shows, PTR 3 points to 3
levels of index blocks.
To be able to read a file, the operating system needs to have the inode, the required
index block, and finally the data block. The first time a file is read, there may be one
disk access to get the inode, other disk accesses to get an index block, and additional
accesses to get the needed data blocks. The file system optimizes disk access by caching
the disk blocks in memory. When a process requests a data page from a frequently accessed
file, the data cache may be able to return it without any disk I/O.
The inode describes a file, but the description does not include a name. A directory
file provides a mapping between names and inodes. This scheme allows the possibility to
refer to an inode through several names. Directories also provide the hierarchical
structure of a file system.

Figure 4: I-node data pointers.
Figure 5 illustrates the format of a directory. It consists of a list of variable
length entries aligned on 4-byte boundaries. A sequential search is executed to find a
file. The average length of a search will increase with the number of entries.

Figure 5: Directory entries.
- Network access
UNIX systems provide access to the network through library functions. The two standard
libraries are sockets and TLI (Transport Layer Interface). A user program will establish a
network connection by issuing calls to the network library, which, in turn, will issue
system calls. Once the connection is established, either library calls or system calls can
be used to transfer data over the network. Let's consider a TCP/IP connection. The TCP
protocol (Transmission Control Protocol) prepares data for transmission and adds 24 bytes
of control information before passing it to the IP layer. The IP (Internet Protocol) also
adds 24 bytes of control information and passes it to the physical layer. The physical
layer needs to break down the data it receives into packets that include 32 bytes of
control information.
If a TCP packet fits into a physical frame, we incur an 80-byte overhead in our
communication. Otherwise, the TCP, or more precisely, the IP packet, is divided in a
number of physical packets. The maximum size of a physical packet varies with the network
interface used. The ethernet format defines a maximum size of 1526 bytes including the
control information. The FDDI format defines a maximum size of 4500 bytes.

Figure 6: TCP/IP Layering Model
Some system calls do more processing than others. Consider the following statement:
fd = open("/usr/informix/online.log", O_RDONLY);
This system call parses the file path, find "/". It then opens the directory
and finds the entry for "usr". The "usr"
directory is then opened and scanned for "informix", which is open.
Finally, we find the inode number for "online.log". All these
operations involve disk I/O and buffer cache operations.
We see that the same concerns apply at the operating system level as at the hardware
level. The operating system gives the CPUs to a different process each time a process has
to wait for an event. This way the operating system keeps the CPUs as busy as possible.
Some adjustments can be made to the operating system so the scheduling method better fits
the execution environment. For example, a machine that is a dedicated database server
should not be set up for interactive users.
The file system tries to optimize performance by caching the disk information.
Furthermore, it is using a relatively large I/O size. This is in line with what we saw in
the hardware section.
From the information discovered in the operating system section, we can create the
following guidelines:
- Limit system calls usage.
As we saw earlier, a system call requires quite a bit of processing even before it does
anything. There are situations where it can easily be done-a system call could be taken
out of a loop and still provide the same functionality. Consider the following code:
This loop issues one system call per loop. This may be because your application logic
needs to process one character at a time. If this is not a hard requirement, you can
buffer your reads by using the standard file access functions and still process one
character at a time:
You can change the definition of a function that opens a file and instead pass it a
file descriptor. Instead of looping on a system call to check on some status, it may be
better to block on an event. Finally, some work could be done in user space instead of
always executing a system call. Threads can provide major benefits in that area instead of
using cooperating processes.
- Localize your code and data.
A process needs a certain amount of memory to execute. This memory includes instruction
and data pages. The minimum amount of memory required by a process is called its working
set size. You should put functions that are often used together close to each other so
they will be in contiguous pages, if not in the same page. This also applies to data
pages.
- Buffer your I/O.
Take a hint from the file system and try to buffer your I/O. For example, instead of
writing one character at a time, accumulate the data until a certain size or condition
occurs and then execute the I/O. This is done automatically for you by some programming
libraries like the fputc() function in "C".
- Limit network activities.
Networks are getting faster and faster; however, the other components are also getting
faster, which keeps the performance ratio virtually intact. The number of users on the
network exacerbates the contention problem. For large organizations, a small utilization
difference per user can make a big difference. Consider that if the network utilization
goes down 10 percent on a saturated network that has 1000 users, it means that an
additional 100 users could use the network.
If you have a performance problem, you must first find out where the bottleneck is, and
then review the situation to decide if you should add hardware (if so, which hardware) and
where, or if, you should review the software implementation.
You should also be familiar with the capabilities provided by your operating system,
specifically in the area of tuning parameters, scheduling, and thread support.

Figure 7: Simplified Server Architecture view.
Informix Dynamic Server is a multithreaded database server. It implements its threading
model using a concept of virtual processors (VPs). The server architecture is depicted in
Figure 7.
Virtual processors are divided in specialized classes:
PIO, LIO: these virtual processors handle the physical and logical logs,
respectively.
NET: The NET virtual processors handle network connections. The database server
can be configured with multiple NET virtual processors and handle several separate
connection services. An alternative is to use the CPU virtual processors to handle network
connections.
AIO: In the case where the disk storage is not on raw devices, or when a
specific platform does not support kernel asynchronous I/O, AIO virtual processors are
used to optimize disk access.
CPU: The CPU virtual processor is the heart of the system. It manages the user
session threads and all of the processing threads. It also takes care of disk I/O when the
platform supports kernel asynchronous I/O. It can also be used to manage network
connections.
EVP: The Extended Virtual Processor is a variation of the CPU VP that can be
used in the Universal Data Option of Informix Dynamic Server. It is used to execute
user-defined routines that are "misbehaved." Refer to the Universal Data Option
documentation for more information.

Figure 8: Example of query plan.
This database server architecture provides "fan-in" of user connections into
a few processes. It also provides "fan-out" of queries. Queries that are coming
in are divided into multiple threads of execution. The Informix Dynamic Server
architecture can execute all threads in parallel, taking advantage of all the computer
resources, including all CPUs and all I/O devices provided. Figure 8 shows a query plan
example where the query is divided in several operations that execute in separate threads.
The query execution is impacted by several factors:
- Partitioning: Parallelism depends on data partitioning. The engine
will start as many scan threads as there are dbspaces involved in the execution. The
result of the scan operation is passed through exchanges to other threads that execute
operations including joins, group, sort, and merge.
- Indexes: Indexes can be used to reduce the I/O required to perform the query. The
basic indexing scheme is based on b+ trees.
- Statistics: Informix Dynamic Server keeps, among other things, statistics on the
amount of data and its distribution in tables. Statistics are vital in the search for the
best query plan.
- Optimizer: The optimizer uses an exhaustive search algorithm looking at all
possible ways to process the query. During query optimization, the optimizer may rewrite
the query, which may generate query plans that were not obvious when looking at the
original text of the query. The cost of each plan is established by getting the cost of
each operation, and by getting an estimate of the amount of data that will be processed.
We saw that Informix Dynamic Server tries to optimize CPU usage and reduce the I/O
processing. There are several tuning parameters that relate to CPU VPs, memory allocation,
and network connections, among other things. The database administrator, with the help of
the application system architect, can still do more to improve performance.
You should fragment tables on as many dbspaces as appropriate. You should also analyze
the queries that have the most impact on your application to determine the fragmentation
strategy.
Indexes can help performance, but too many indexes can reduce insert and update
performance. Some indexes are obvious. For others, you have to look at the query plans to
see if some indexes are ever used. In the case where the query plan shows table scans, you
may be able to create an index that will speed up the processing. Keep in mind that a
table scan is not necessarily a bad thing.

Table 1: Some onconfig server parameters.
When it comes to finding the best way to process a query, the most important thing for
the optimizer is to have updated statistics. This should be done any time a significant
number of rows are added or modified and after indexes are added. Informix recommends the
following UPDATE STATISTICS methods:
- Medium distribution only: for each table or the entire database.
- High: for all columns that head an index
- Low: for all columns in a multi-column index.
Applications are becoming more complex. We now have to choose between two-tier,
three-tier, N-tier applications, multithreading, component architecture (including CORBA
and DCOM), etc. These decisions have a huge impact on application performance. These
topics are too large to be covered here but, hopefully, the basic concepts discussed in
this article will help you improve application performance.
When it comes down to it, the way you decide to solve a problem has the biggest impact
on performance. Many books are available on algorithms that show how to represent some
types of data and how to perform sorts and searches, etc. Finding the right algorithm
starts with defining the right problem.
There are some very good examples of algorithms application and finding the right
problem in "Programming Pearls," listed in the reference section. In chapter 5
of his book, author Jon Bentley describes a case study where a different approach provided
a 400 times performance improvement.
Of course, you can't always expect dramatic performance improvement. The key is to try
to come up with several ways to solve a problem, and evaluate the solutions before
implementing one. When using Informix Dynamic Server with Universal Data Option, many new
possibilities open up. By adding business logic in the server, you can take advantage of
the database engine's sophisticated algorithms and parallelism.
This can be as simple as providing the proper comparison operators or date arithmetic.
You can provide functions that will be used for grouping, allowing the server to do the
aggregation instead of pulling all the information in the application program and perform
the aggregation there. You can find more information on how to take advantage of Universal
Data Option features in the "Best Practices" book listed in the reference
section.
It is sometimes easy to forget the simplest things. I have seen production code that
was compiled for debugging. This probably happened because the development team had
forgotten to change the compiler options. The difference is so small. Instead of
"-O" they kept "-g".
What is the impact of the compilation options? It affects the total number of
instructions and the number of instructions needed to set up a function call. Let's look
at a simple example using the following program:
#include
int fn1(int i) {
int j, k;
k = 0;
for (j = 0; j < i; j++) {
k += j;
}
return(k);
}
int main(int argc, char **argv) {
int ret;
ret = fn1(1000000);
printf("ret = %d\n", ret);
return(0);
}
Keep in mind that this program is an example. The result of fn1(1000000)
overflows and does not give the proper result.
We can compile this program with or without optimization and with options for profiling
and debugging. The following table reports some statistics on the assembler file generated
by compilation on a specific platform. These results will vary depending on the platform
and the compiler used.

Table 2: Assembler File Statistics.
The debugging option and the optimization option used to be mutually exclusive. This is
why you may still find make files that will not include optimization when compiling for
debugging.
Most "C" compilers provide an option to generate assembler code (-S). This is
what was used to determine the number of instructions generated with each set of compiler
options. The assembler source generated also included comments and assembler directives
that are not included in the count. The sizes for object file size and executable file
size are in bytes.
The execution times were obtained using the timex(1) function. The elapse
time represents the time it took to complete the execution. This is measured in a
"stop watch" manner. If the system had been busy, it would have also included
any time spent in a queue waiting to execute. The user and system time represent the CPU
cycles spent in user space and the cycles spent in kernel mode.
Most of the execution time is spent in the loop of function fn1(). All of
the execution times are within one second, which could be dismissed as insignificant. But
when we look at ratios between executions, we see that the time for standard compilation
takes almost three times as long in user time (0.11 sec) as the optimized version (0.04
sec). In other words, the optimized version is 63% faster!
To improve performance in your application, you first need to find out where the time
is spent. You can compile your program so it generates information on function calls, and
how much time was spent in each function. Here is an example using the program listed
above. You first need to compile the program with the profiling option:
Then you execute the program normally. A file named mon.out is generated. You can get
the profiling statistics by executing the prof command:
prof -V prg1
%Time |
Seconds |
Cumsecs |
#Calls |
msec/call |
Name |
100.0 |
0.11 |
0.11 |
1 |
110. |
fn1 |
| 0.0 |
0.00 |
0.11 |
1 |
0. |
main |
Now we know that the time is spent in fn1() so we can see if it can be optimized. You
may have realized that this is really the classic problem of adding all the numbers
between 1 and N. The loop can be replaced with:
This will make the fn1() execution time insignificant for any number. Of course, there
is still the problem of the overflow.
Here are a few simple suggestions to improve performance:
- Move code out of a loop: make sure you don't have extra code in a loop.
Sometimes some code can be moved out without impacting the result.
- Merge loops: if you have two loops that operate on the same range, you may be
able to perform the two loops together.
- Reorder tests: a complex test will terminate as soon as it can determine the
result. For "or" conditions, the first true value determines the result. For
"and" conditions, the first false value determines the result. Based on your
knowledge of the data, you can reorder the tests so the result can be found as soon as
possible.
- Use sentinel values: a sentinel is a value that will ensure that a condition will
eventually be met. Assuming that you have a text string and you need to split it into
words, you could use a condition like:
This code looks for spaces as word separators. It also tests for the end of the string.
You can replace the NULL at the end of the string with a space, saving one comparison per
loop. The statement would become:
You just have to make sure you put back the NULL value once you are done.
- Table lookup: when doing a table lookup, you may want to order it with the most
common pick first, so that most of your searches will terminate after one comparison. In
other situations, you may want to order them alphabetically and do a binary search. For a
table of 100 elements of equal probability, you'll use, at most, seven comparisons to find
your answer, instead of averaging 50 with a sequential search.
Adding database access to an application raises a number of issues that relate to the
communication with the database server. This section discusses some of the most important
points to consider.
Establishing a connection with the database server is an expensive operation. You can
look at it as a login procedure where some validation needs to be done and execution
threads need to be created. You should try to keep your connection for the duration of the
application. If you need to change connection, you should still preserve your other
connections for reuse. This is done in different ways depending on the client interface
used. Using INFORMIX-ESQL/C, you can name your connections:
This way you can refer to your connections by name. Assuming that you opened two
connections named "connection1" and "connection2", you can set
"connection1" as the current one with:
In a multithreaded environment, connections can move from thread to thread. A
connection must be set "dormant" before it can be used by another thread. To
make "connection1" available to other threads, you can execute the following
statement:
If "connection1" is the current connection, it can be done with:
The connection names can be passed through host variables. This makes it easier to
manage a pool of connections. Refer to the Informix documentation for more information.
Before an SQL statement can be executed, it must first be parsed, analyzed, and
optimized. This represents a significant overhead for statements that are executed
frequently. A statement can be prepared once for multiple execution:
The statement is then executed by passing arguments to take the place of the question
marks:
By preparing SQL statements once for multiple executions, you can notice a significant
performance improvement.
When you have a large number of rows to insert, you can optimize the communication with
the database server by buffering the rows. The general idea is to open an insert cursor
and use the PUT command to insert the row. Here is some pseudo-code that illustrates the
process:
EXEC SQL PREPARE stmt FROM
"INSERT INTO customer VALUES(?,?,?,?);
EXEC SQL DECLARE cur CURSOR FOR stmt;
EXEC SQL OPEN cur;
for (....) {
....
EXEC SQL PUT cur FROM :val1, :val2, :val3, :val4;
}
EXEC SQL CLOSE cur;
EXEC SQL FREE cur;
EXEC SQL FREE stmt;
It is also possible to tune the buffer size. This could provide additional performance
benefits. Please see the INFORMIX-SQL reference manual for information on the FET_BUF_SIZE
environment variable. The default can also be set in the sqlhosts file.
Informix Dynamic Server with Universal Data Option provides a new set of possibilities.
According to Michael Stonebraker ("Betting on ORDBMS," Byte Magazine,
April 1998), we can now choose between thick client, thick middleware, and thick database.
In fact, it could be a mix of the three, depending on the processing requirements.
Universal Data Option can be part of the application instead of being a rigid
persistence mechanism. Most businesses have particular ways to manipulate their business
data. This often requires custom application development, with all the additional
functionality required for reporting, printing, etc. This particular way to manipulate
data can be included in the server to provide the proper answer. This way, common
off-the-shelf products can be used to provide the user interface and additional
functionality.
The benefits extend to performance. The full power of sophisticated algorithms and
parallelism in the engine can be tapped by adding simple functions that will provide a
better way to manipulate a specific type of data. Here is a simple example. Some
Departments of Motor Vehicles store a license number that also includes the county of
registration and the type of vehicle. To find out how many vehicles of each type have been
registered for the month of September, you would need to pull all the records in a custom
application and count them. With Informix Dynamic Server with Universal Data Option, you
can add a function that will extract the type of vehicle from the license number.
Answering the question becomes a simple SQL statement like:
The vehicleType() function is very simple. It takes a license number as
input and returns the type of vehicle it represents. A custom application would have
required this function and additional code to process each row coming from the database.
This represents more complexity than just adding the function in the server. Furthermore,
the cost of transferring data to the application is significantly greater than doing the
processing in the server. The result of the function vehicleType() could also
be indexed. The index would be used, if it made sense, and performance would increase
accordingly.
A lot more can be done with Universal Data Option. You can create new types, new
operators, and even new aggregation functions. This is discussed in greater detail in the
book "Informix Dynamic Server with Universal Data Option: Best Practices"
listed in the reference section.
Performance is a complicated subject, and estimating performance is even more complex.
This article shows that processing costs increase dramatically with the distance from the
processor: cache, memory, disk, and network. Furthermore, since the CPU is orders of
magnitude faster than I/O devices, we should always try to optimize the I/O by balancing
the requests on as many devices as it makes sense for a specific application.
Operating systems try to optimize resource utilization. Each request made by
application programs incurs a significant overhead. It is sometime possible to reduce the
number of requests by grouping them together. This frees precious CPU cycles for more
productive work.
Informix Dynamic Server is a sophisticated server that can be tuned to use as much
resources as you see fit. Make sure it is tuned properly. Distribute your I/O on as many
devices as possible, and don't forget to update the statistics so the optimizer can pick
the right query plan.
The application code has the most impact on performance. The key is how you solve the
problem. Some solutions processing increase performance exponentially, others linearly.
Some programming techniques will improve performance with minimal effort. This includes
optimized compilation and simple programming techniques like binary searches.
By keeping these simple concepts in mind during the design and implementation phases,
it is possible to have applications that provide adequate performance at delivery.
- Jon Bentley, Programming Pearls, ISBN 0-201-10331-1, Addison-Wesley, 1986
- Jon Bentley, More Programming Pearls, ISBN 0-201-11889-0, Addison-Wesley, 1988
- Maurice J. Bach, The Design of the UNIX Operating System, ISBN 0-13-201799-5,
Prentice-Hall, 1986
- Samuel J., Leffler ...[et al.], The Design and Implementation of the 4.3 BSD UNIX
Operating System, ISBN 0-201-06196-1, Addison-Wesley, 1989
- Andrew S. Tanenbaum, Operating Systems Design and Implementation, ISBN 0-13-637406-9,
Prentice-Hall, 1987
- Bil Lewis and Daniel J Berg, Threads Primer, ISBN 0-13-443698-9, SunSoft Press, 1996
- Douglas Comer, Internetworking with TCP/IP, ISBN 0-13-470154-2, Prentice-Hall, 1988
- Jacques Roy, Threaded Application Development Using Informix-Client SDK: A Practical
Guide, Informix Tech Notes Volume 8, Issue 3, 1998
- Informix Press, Informix Dynamic Server with Universal Data Option: Best Practices, ISBN
0-13-911074-7, 1998
Jacques Roy, a frequent Tech Notes contributor, is a member of the Informix Advanced
Technology Group in Denver, Colorado. He can be contacted via email at jroy@informix.com.
|