Checkpoint tuning: how to find the bottlenecks in the checkpoint duration time(Technote)

来源:百度文库 编辑:神马文学网 时间:2024/04/25 03:48:03
Checkpoint tuning: how to find the bottlenecks in the checkpoint duration time

Technote (FAQ)
Question
This article describes how to find the bottlenecks in the checkpoint duration time
Answer
Why is important to reduce the checkpoint duration?
The database server prevents a user thread from entering in a critical section during a checkpoint. Sometimes a checkpoint can take several seconds or even a minute to complete, which can impact the customer activities and thus performance.
How to reduce the checkpoint duration
In a full checkpoint, the database server flushes all modified pages in the shared-memory buffer pool to disk. So this activity is based on:
Disk access,
CPU elaboration, to find the dirty pages and order them to perform a chunk write,
Informix processes, to manage the FLRU,
Number and length of the LRU queue.
The DBA can check how many user threads had to wait for the checkpoint to complete, this will indicate how serious the problem is. Use the onstat -p and check the value of "ckpwaits". Reducing the time to perform a checkpoint will reduce the number of user thread that have to wait for the checkpoint to complete and thus increase the performance,
Usually the bottleneck can be in one or more of the following areas which are explained further, below:
Disk device(s)
Disk controller(s)
CPU(s)
AIO VP(s)
CPU VP(s)
Page cleaners
Number of LRU queues
Number of modified pages in the buffer cache
Disk device
If the disk drives you are trying to write data to are already at 100% busy, it is advised to spread the data more evenly on the disks you have available, by partitioning or fragmentation.
The Physical CPU(s)
In Unix CPU can be monitored by using sar, vmstat, or glance.
If the CPU(s) are 100% busy all of the time, reduce contention or add more or faster CPU(s).
AIO VP(s)/KAIO VP(s).
Monitor onstat -g ioq for queue lengths and max queue lengths of the AIO VP(s)/KAIO VP(s) and GFD(s). If there is a large max length (>25) or a consistent length (>10) on any of those queues, then those IO requests are not happening fast enough. If you are using AIO then add more AIO VPs or if you are using KAIO add more CPU VPs.
Check the GFD:
In the onstat -g ioq the GFD (global file descriptors) are open file handlers to the chunks and one GFD per chunk exist in the instance. Check the columns dskread and dskwrite and compare all the chunks to check if you can find one or more chunks with a very high number of read or write. If this is the case then you have some table inside this chunk with a high number of activities, thus try to spread the tables in a better way.
Note that the second column "ID" specifies the chunk id, which can be used with the onstat -d output to find the chunk name and position. Use the oncheck -pe to find the contents of the chunk.
If the AIO queues are keeping up and performance is still not desirable, then maybe the requests are not getting to the queues fast enough check the page cleaner.
The page cleaners
Responsible for making the requests to the queues so adding more page cleaners can get those requests out faster. Usually the "cleaners" should be 1 cleaner per each pair of LRU or 1 cleaner per each chunk. The status of page cleaners can be checked with onstat -F. While the checkpoint is running execute the following command: onstat -F -r 1 and check the column "state" it should be flagged as "C" checkpoint and then check the column "data" which represents the chunk-id where the cleaner is working. If cleaner spends more time in some chunks then it means that there may be some congestion in this chunk, so you need to spread the tables in a better way so you need to do a "database reorganization".
LRU
Make sure that at a minimum there is one LRU queue per CPU VP. The CPU VPs will place a mutex on the LRU queue they are currently accessing to change a page in the buffer cache. If there are fewer queues than CPU VPs there could be a contention problem. Try to increase the number of LRU and check the performance again. With a large buffer cache, having more LRU queues will make the length of each queue shorter.
KAIO vs AIO
Check if you are using KAIO or AIO, usually KAIO is faster.
CKPTINTVL
Check the "Checkpoint interval" in the ONCONFIG. If it is too high it means you will have a high number of pages that need to be flushed in the disk. Try to reduce the values of the CKPTINTVL.
Important: All these suggestions need to be applied one at a time, monitoring the effect that any change made has on the system. Also it is important to note that when tuning a system, in some circumstances changes can improve performance in some areas to the detriment of others, so a methodical approach is needed.
The suggested procedure is:
1. Execute the tuning under the same condition, for example every day during a specific time of day
2. Apply only one change at a time, then compare the performance and if you note an increase in the performance then leave the modification in place, otherwise put back the values you had before. If the performance appears acceptable to you after a making a specific alteration, then there may be no need to attempt further performance tuning adjustments.
For more information about the checkpoint process can be found in the IBM Informix manual "Administrator's Guide" and "Performance Tuning".
Related information
What is Checkpoint duration?




















文档信息?

产品类别:?

软件类型?


Data Servers (Database Management Systems)?

Informix Dynamic Server (IDS)?


操作系统:?
AIX, DYNIX/ptx, Digital Unix (OSF1)(TRU64), HP-UX, IRIX, Linux, Reliant UNIX, Sinix, Solaris, Windows

软件版本:?
7.3, 9.4, 10.0

用途版本:?
Enterprise, Workgroup

参考号:?
1250841

IBM 组:?
Software Group?

修改日期:?
2008-11-25?