Overview of AIX page replacement(David Hepkin, IBM DeveloperWorks, January 2008)

来源:百度文库 编辑:神马文学网 时间:2024/04/29 06:43:04
Table of contents
Introduction
Page classification
Page replacement
Monitoring a system's memory usage
Displaying and setting tunable parameters
Suggested tunable parameter settings
Conclusion
Resources
About the author
Comments
The AIX® virtual memory manager (AIX VMM) is a page-based virtual memorymanager. A page is a fixed-size block of data. A page might be resident in memory(that is, mapped into a location in physical memory), or a page might be residenton a disk (that is, paged out of physical memory into paging space or a filesystem).
One unique aspect of the AIX VMM is the management of cached file data. The AIXVMM integrates cached file data with the management of other types of virtualmemory (for example, process data, process stack, and so forth). It caches thefile data as pages, just like virtual memory for processes.
AIX maps pages into real memory based on demand. When an application references apage that is not mapped into real memory, the system generates a page fault. Toresolve the page fault, the AIX kernel loads the referenced page to a location inreal memory. If the referenced page is a new page (that is, a page in a data heapof the process that has never been previously referenced), "loading" thereferenced page simply means filling a real memory location with zeros (that is,providing a zero-filled page). If the referenced page is a pre-existing page (thatis, a page in a file or a previously paged out page), loading the referenced pageinvolves reading the page from the disk (paging space or disk file system) into alocation in real memory.
Once a page is loaded into real memory, it is marked as unmodified. If a processor the kernel modifies the page, the state of the page changes to modified. Thisallows AIX to keep track of whether a page has been modified after it was loadedinto memory.
As the system adds more pages into real memory, the number of empty locations inreal memory that can contain pages decreases. You can also refer to the number of emptylocations as the number of free page frames. When the number of free page framesgets to a low value, the AIX kernel must empty out some locations in real memoryfor reuse of new pages. This process is otherwise known as page replacement.
The AIX VMM has background daemons responsible for doing page replacement. A pagereplacement daemon is referred to as lrud (shows up aslrud in the output ofps-k). lrud daemons areresponsible for scanning in memory pages and evicting pages in order to emptylocations in real memory. When a page replacement daemon determines that it wantsto evict a specific page, the page replacement daemon does one of two things:
If the page is modified, the page replacement daemon writes the page out to a secondary storage location (for example, paging space or file system disk). The physical memory block that contains the page is marked as free and ready for reuse for additional pages.
If the page is unmodified, the page replacement daemon can simply mark the physical memory block as free, and the physical memory block can then be re-used for another page. In this case, the page replacement daemon does not have to write the page out to disk, because the in-memory version of the page is unmodified, and thus is identical to the copy of the page that resides on the disk (in paging space or on a disk file system).
The page replacement daemons target different types of pages for eviction basedon system memory usage and tunable parameters. The remainder of this articleprovides details on how the page replacement daemons target pages for eviction.
Fundamentally, there are two types of pages on AIX:
Working storage pages
Permanent storage pages
Working storage pages are pages that contain volatile data (in otherwords, data that is not preserved across a reboot). On other platforms, workingstorage memory is sometimes referred to as anonymous memory. Examples ofvirtual memory regions that consist of working storage pages are:
Process data
Stack
Shared memory
Kernel data
When modified working storage pages need to be paged out (moved from memory tothe disk), they are written to paging space. Working storage pages are neverwritten to a file system.
When a process exits, the system releases all of its private working storage pages. Thus,the system releases the working storage pages for the data of a process and stackwhen the process exits. The working storage pages for shared memory regions arenot released until the shared memory region is deleted.
Permanent storage pages are pages that contain permanent data (that is, data thatis preserved across a reboot). This permanent data is just file data. So,permanent storage pages are basically just pieces of files cached in memory.
When a modified permanent storage page needs to be paged out (moved from memoryto disk), it is written to a file system. As mentioned earlier, an unmodifiedpermanent storage page can just be released without being written to the filesystem, since the file system contains a pristine copy of the data.
For example, if an application is reading a file, the file data is cached inmemory in permanent storage pages. These permanent storage pages are unmodified,meaning that the pages have not been modified in memory. So, the in-memorypermanent storage pages are equivalent to the file data on the disk. When AIXneeds to free up memory, it can just "release" these pageswithout having to write anything to disk. If the application had been doing writesto the file instead of reads, the permanent storage pages would be"modified," and AIX would have to flush the pages to disk beforereleasing the pages.
You can divide permanent storage pages into two sub-types:
Client pages
Non-client pages
Non-client pages are pages containing cached Journaled File System (JFS)file data. Non-client pages are sometimes referred to as persistent pages. Clientpages are pages containing cached data for all other file systems (for example,JFS2 and Network File System (NFS)).
In order to help optimize which pages are selected for replacement by the pagereplacement daemons, AIX classifies pages into one of two types:
Computational pages
Non-computational pages
Computational pages are pages used for the text, data, stack, and shared memoryof a process. Non-computational pages are pages containing file data for filesthat are being read and written.
All working storage pages are computational. A working storage page is nevermarked as non-computational.
Depending on how you use the permanent storage pages, the pages can becomputational or non-computational. If a file contains executable text for aprocess, the system treats the file as computational and marks all of thepermanent storage pages in the file as computational. If the file does not containexecutable text, the system treats the file as non-computational file and marksall of the pages in the file as non-computational.
When you first open a file, the AIX kernel creates an internal VMM object torepresent the file. It marks it as non-computational, meaning all files start outas non-computational.
As a program does reads and writes to the file, the AIX kernel caches the file'sdata in memory as non-computational permanent storage pages.
If the file is closed, the AIX kernel continues to cache the file data in memory(in permanent storage pages). The kernel continues to cache the file forperformance; for example, if another process comes along later and uses the samefile, the file data is still in memory, and the AIX kernel does not have to readthe file data in from disk.
When a page fault is taken on a file due to an instruction fetch, only thenon-computational file transitions to the computational state. When a process pagefaults on a file (meaning the process references a part of the file that is notcurrently cached in memory in a permanent storage page), the process generates apage fault. If the page fault is due to an instruction fetch (meaning the processwas trying to load an instruction from the page to execute), the kernel marks thefile as computational. This involves marking all pages in the file ascomputational. A file is either completely computational or non-computational.
Once a file has been marked as computational, it remains marked as acomputational file until the file is deleted (or the system is rebooted). Thus, afile remains marked as computational even after it is moved or renamed.
The AIX page replacement daemons scan memory a page at a time to find pages toevict in order to free up memory. The page replacement daemons must choose pagescarefully to minimize the performance impact of paging on the system, and the pagereplacement daemons target pages of different classes based on tunable parametersettings and system conditions.
There are a number of tunable parameters that you can use to control how AIXselects pages to replace.
The two most basic page replacement tunable parameters areminperm and maxperm. Thesetunable parameters are used to indicate how much memory the AIX kernel should useto cache non-computational pages. The maxperm tunable parameter indicates themaximum amount of memory that should be used to cache non-computational pages.
By default, maxperm is an"un-strict" limit, meaning that the limit can be exceeded.Making maxperm an un-strict limit allows morenon-computational files to be cached in memory when there is available freememory. The maxperm limit can be made a"strict" limit by setting thestrict_maxperm tunable parameter to 1. Whenmaxperm is a strict-limit, the kernel does not allow thenumber of non-computational pages to exceed maxperm,even if there is free memory available. Thus, the disadvantage with makingmaxperm a strict limit is that the number ofnon-computational pages cannot grow beyond maxperm andconsume more memory when there is free memory on the system.
The minperm limit indicates the target minimum amountof memory that should be used for non-computational pages.
The number of non-computational pages is referred to asnumperm: The vmstat –vcommand displays the numperm value for a system as apercentage of a system’s real memory.
Figure 1 below gives an overview of how these tunableparameters work under different system conditions:

When the number of non-computational pages (numperm)is greater than or equal to maxperm, the AIX pagereplacement daemons strictly target non-computational pages (for example, cachedfiles that are not executables).
When the number of non-computational pages (numperm)is less than or equal to minperm, the AIX pagereplacement daemons target both computational and non-computational pages. In thiscase, AIX scans both classes of pages and evicts the least recently used pages.
When the number of non-computational pages (numperm)is between minperm andmaxperm, the lru_file_repagetunable parameter controls what kind of pages the AIX page replacement daemonsshould steal (seeFigure 2).

When numperm is betweenminperm and maxperm, the AIXpage replacement daemons determine what type of pages to target based on theirinternal re-paging table when the lru_file_repagetunable parameter is set to 1.
The AIX kernel maintains a re-paging table in order to identify pages that arepaged out and then quickly paged back in. When the kernel pages a page out andthen pages it back in, it usually indicates that there is strong demand for thepage and that the page should stay in memory. The kernel maintains an indicationof how many times it re-pages computational pages and how many times it re-pagesnon-computational pages. The AIX kernel can then use this information to determinewhich class of pages is being re-paged more heavily (thus, indicating which classof pages is experiencing higher demand). When thelru_file_repage tunable parameter is set to 1, the AIXkernel uses this re-paging information to determine whether to target justnon-computational pages or target both computational and non-computational pages.If the rate of re-paging computational pages is higher than that ofnon-computational pages, the AIX kernel just targets non-computational pages(since there appears to be stronger demand for computational pages). If the rateof re-paging non-computational pages is higher than that of computational pages,the AIX kernel targets both computational as well as non-computational pages.
In most customer environments, it is most optimal to just have the kernel alwaystarget non-computational pages, because paging computational pages (for example, aprocess’s stack, data, and so forth) usually has a much higher performance cost ona process than paging non-computational pages (that is, data file cache). Thus,the lru_file_repage tunable parameter can be set to 0.In this case, the AIX kernel always targets non-computational pages whennumperm is between minpermand maxperm.
In addition to the minperm andmaxperm tunable parameters, there is also amaxclient tunable parameter. Themaxclient tunable parameter specifies a limit on themaximum amount of memory that should be used to cache non-computational clientpages. Because all non-computational client pages are a subset of the total numberof non-computational permanent storage pages, themaxclient limit must always be less than or equal tothe maxperm limit.
The number of non-computational client pages is referred to asnumclient. The vmstat –vcommand displays the numclient value for a system as apercentage of a system’s real memory.
By default, the maxclient limit is a strict limit.This means that the AIX kernel does not allow the non-computational client filecache to exceed the maxclient limit (that is, the AIXkernel does not allow numclient to exceedmaxclient). When numclientreaches the maxclient limit, the AIX kernel starts pagereplacement in a special, client-only mode. In this client-only mode, the AIX pagereplacement daemons strictly target client pages.
AIX provides several tools for providing information about counts of thedifferent pages on the system.
The vmstat command reports information about asystem’s memory usage and statistics about VMM operations like page replacement.
The -v option specified with thevmstat command displays the percentage of real memorybeing used for different classification of pages (seeListing1):
# vmstat -v
4980736 memory pages
739175 lruable pages
432957 free pages
1 memory pools
84650 pinned pages
80.0 maxpin percentage
20.0 minperm percentage <<- system’s minperm% setting
80.0 maxperm percentage <<- system’s maxperm%
setting
2.2 numperm percentage << % of memory containing
non-comp. pages
16529 file pages <<- # of non-comp. pages
0.0 compressed percentage
0 compressed pages
2.2 numclient percentage <<- % of memory containing
non-comp. client pages
80.0 maxclient percentage <<- system’s maxclient%
setting
16503 client pages <<- # of client pages
0 remote pageouts scheduled
0 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2484 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf
0 external pager filesystem I/Os blocked with no fsbuf
0 Virtualized Partition Memory Page Faults
0.00 Time resolving virtualized partition memory page faults
So, in the above example, there are 16529 non-computational file pages mappedinto memory. These non-computational pages consume 2.2 percent of memory. Of these16529 non-computational file pages, 16503 of them are client pages.
The vmstat output does not provide information aboutcomputational file pages. Information about computational file pages can begathered from the svmon command.
Another command that can be used to display information about a system’s memoryusage is the svmon command. Thesvmon command supports a number of different optionsfor providing very detailed information about a system’s memory usage.
The -G option to the svmoncommand isplays information about how much memory is being used for differenttypes of pages (seeListing2):
# svmon -G
size inuse free pin virtual
memory 786432 209710 576722 133537 188426
pg space 131072 1121
work pers clnt
pin 133537 0 0
in use 188426 0 21284
PageSize PoolSize inuse pgsp pin virtual
s 4 KB - 103966 1121 68929 82682
m 64 KB - 6609 0 4038 6609
To understand how a system’s real memory is being used,svmon displays three columns:
work—working storage
pers—persistent storage (Persistent storage pages are non-client pages—that is, JFS pages.)
clnt—client storage
For each page type, svmon displays two rows:
inuse—number of 4K pages mapped into memory
pin—number of 4K pages mapped into memory and pinned (pin is a subset of inuse)
So, in the above example, there are 188426 working storage pages mapped intomemory. Of those 188426 working storage pages, 133537 of them are pinned (that is,can’t be paged out).
There are no persistent storage pages (because there are no JFS filesystems inuse on the system). There are 21284 client storage pages, and none of them arepinned.
The svmon command does not display the number ofpermanent storage pages, but it can be calculated from thesvmon output. As mentioned earlier, the number ofpermanent storage pages is the sum of the number of persistent storage pages andthe number of client storage pages. So, in the above example, there are a total of21284 permanent storage pages on the system:
0 persistent storage pages + 21284 client storage pages = 21284 permanent storage pages
The type of information reported by svmon is slightlydifferent than vmstat. svmonreports information about the number of in-memory pages of differenttypes—working, persistent (that is, non-client), and client.svmon does not report information about computationalversus non-computational. svmon just reports the totalnumber of in-memory pages of each page type.
In contrast, vmstat reports information aboutnon-computational versus computational pages.
To illustrate this difference, consider the above example ofsvmon output. Some of the 21284 client pages will becomputational, and the rest of the 21284 client pages will be non-computational.To determine the breakdown of these client pages between computational andnon-computational, use the vmstat command to determinehow many of the 21284 client pages are non-computational.
The vmo command is used to interact with VMM tunableparameters. The vmo command can be used to displayinformation about tunable parameters as well as to set the values for tunableparameters.
To display the current values of all VMM tunable parameters, run thevmo command with the –Loption:
# vmo -L
To display the current values of select VMM tunable parameters, use the–L option to list names of tunable parameters. Forexample, the following command snapshot shows output when listing the currentvalues of the minperm%,maxperm%, maxclient%, andlru_file_repage tunable parameters (seeListing3):
# vmo -L minperm% -L maxperm% -L maxclient% -L lru_file_repage
NAME CUR DEF BOOT MIN MAX UNIT TYPE
DEPENDENCIES
--------------------------------------------------------------------------------
lru_file_repage 1 1 1 0 1 boolean D
--------------------------------------------------------------------------------
maxclient% 80 80 80 1 100 % memory D
maxperm%
minperm%
--------------------------------------------------------------------------------
maxperm% 80 80 80 1 100 % memory D
minperm%
maxclient%
--------------------------------------------------------------------------------
minperm% 20 20 20 1 100 % memory D
maxperm%
maxclient%
Column Description
CUR This column lists the current values of the tunable parameters.
DEF This column lists the default values.
BOOT This column lists the values of the tunable parameters at the time the system was booted.
MIN This column lists the minimum values of the tunable parameters.
MAX This column lists the maximum values of the tunable parameters.
UNIT This column lists the unit in which the tunable parameter is specified.
The vmo command supports changing the value of atunable parameter immediately or to defer changing the value of a tunableparameter until the system is rebooted. To change the above tunable parameters andhave the changes take affect immediately and on all subsequent reboots, specifythe -p option. Here is an example (seeListing4):
# vmo -p -o lru_file_repage=0 -o maxclient%=90 -o maxperm%=90 -o minperm%=3
Setting minperm% to 3 in nextboot file
Setting maxperm% to 90 in nextboot file
Setting maxclient% to 90 in nextboot file
Setting lru_file_repage to 0 in nextboot file
Setting minperm% to 3
Setting maxperm% to 90
Setting maxclient% to 90
Setting lru_file_repage to 0
The vast majority of workloads benefit from the VMM page replacement daemonstargeting non-computational pages. Thus, the following suggested tunableparameters provide the best performance for the majority of workloads (seeListing5):
lru_file_repage = 0
maxperm = 90%
maxclient = 90%
minperm = 3%
strict_maxclient = 1 (default)
strict_maxperm = 0 (default)
These tunable parameters can be set with the vmocommand (seeListing6):
# vmo –p –o lru_file_repage=0 –o maxclient%=90 –o maxperm%=90 –o minperm%=3
# vmo –p –o strict_maxclient=1 –o strict_maxperm=0
The settings can be viewed with the vmo –L command.
These tunable parameter settings apply to AIX Version 5.2 and AIX Version 5.3.To set these tunable parameters on AIX Version 5.2, AIX Version 5.2 TL6 or lateris required. To set the above tunable parameters on AIX Version 5.3, AIX Version5.3 TL1 or later is required.
The above tunable parameters settings are the default settings for AIX Version6.1.
The AIX VMM classifies pages based on use. You can use system tunable parametersto control the behavior of the AIX page replacement daemons and control how AIXtreats different classes of pages page replacement. Tuning the AIX VMM can resultin significant performance improvements for workloads.
Learn
SeeVMM page replacement tuning for more information.
IBM Redbooks: The AIX 5L Practical Performance Tools and Tuning Guide is a comprehensive guide about AIX performance monitoring and tuning tools.
Popular content: See what AIX and UNIX® content your peers find interesting.
AIX and UNIX: The AIX and UNIX developerWorks zone provides a wealth of information relating to all aspects of AIX systems administration and expanding your UNIX skills.
New to AIX and UNIX?: Visit the "New to AIX and UNIX" page to learn more about AIX and UNIX.
AIX Wiki: A collaborative environment for technical information related to AIX.
Search the AIX and UNIX library by topic:System administration
Application development
Performance
Porting
Security
Tips
Tools and utilities
Java™ technology
Linux
Open source
Safari bookstore: Visit this e-reference library to find specific technical resources.
developerWorks technical events and webcasts: Stay current with developerWorks technical events and webcasts.
Podcasts: Tune in and catch up with IBM technical experts.
Get products and technologies
IBM trial software: Build your next development project with software for download directly from developerWorks.
Discuss
Participate in thedeveloperWorks blogs and get involved in the developerWorks community.
Participate in the AIX and UNIX forums:AIX —technical forum
AIX 6 Open Beta
AIX for Developers Forum
Cluster Systems Management
IBM Support Assistant
Performance Tools—technical
Virtualization—technical
More AIX and UNIX forums
Davidis an AIX kernel architect. His responsibilities include designing anddeveloping new technology for the AIX operating system. His backgroundis in kernel development. You can contact him atdhepkin@us.ibm.com.