Peering Inside the PE: A Tour of the Win32 Portable Executable File Format

来源:百度文库 编辑:神马文学网 时间:2024/04/29 02:55:30

Matt Pietrek

March 1994

Matt Pietrek is the author of Windows Internals (Addison-Wesley, 1993). He works at Nu-Mega Technologies Inc., and can be reached via CompuServe: 71774,362

This article is reproduced from the March 1994 issue of Microsoft Systems Journal.Copyright © 1994 by Miller Freeman, Inc. All rights are reserved. Nopart of this article may be reproduced in any fashion (except in briefquotations used in critical articles and reviews) without the priorconsent of Miller Freeman.

To contact Miller Freemanregarding subscription information, call (800) 666-1084 in the U.S., or(303) 447-9330 in all other countries. For other inquiries, call (415)358-9500.


The format of an operating system'sexecutable file is in many ways a mirror of the operating system.Although studying an executable file format isn't usually high on mostprogrammers' list of things to do, a great deal of knowledge can begleaned this way. In this article, I'll give a tour of the PortableExecutable (PE) file format that Microsoft has designed for use by alltheir Win32®-based systems: Windows NT®, Win32s™, and Windows® 95. ThePE format plays a key role in all of Microsoft's operating systems forthe foreseeable future, including Windows 2000. If you use Win32s orWindows NT, you're already using PE files. Even if you program only forWindows 3.1 using Visual C++®, you're still using PE files (the 32-bitMS-DOS® extended components of Visual C++ use this format). In short,PEs are already pervasive and will become unavoidable in the nearfuture. Now is the time to find out what this new type of executablefile brings to the operating system party.

I'm not going to makeyou stare at endless hex dumps and chew over the significance ofindividual bits for pages on end. Instead, I'll present the conceptsembedded in the PE file format and relate them to things you encountereveryday. For example, the notion of thread local variables, as in

Copy Code
declspec(thread) int i;

drove me crazy until I saw how it was implementedwith elegant simplicity in the executable file. Since many of you arecoming from a background in 16-bit Windows, I'll correlate theconstructs of the Win32 PE file format back to their 16-bit NE fileformat equivalents.

In addition to a different executableformat, Microsoft also introduced a new object module format producedby their compilers and assemblers. This new OBJ file format has manythings in common with the PE executable format. I've searched in vainto find any documentation on the new OBJ file format. So I decipheredit on my own, and will describe parts of it here in addition to the PEformat.

It's common knowledge that Windows NT has a VAX® VMS®and UNIX® heritage. Many of the Windows NT creators designed and codedfor those platforms before coming to Microsoft. When it came time todesign Windows NT, it was only natural that they tried to minimizetheir bootstrap time by using previously written and tested tools. Theexecutable and object module format that these tools produced andworked with is called COFF (an acronym for Common Object File Format).The relative age of COFF can be seen by things such as fields specifiedin octal format. The COFF format by itself was a good starting point,but needed to be extended to meet all the needs of a modern operatingsystem like Windows NT or Windows 95. The result of this updating isthe Portable Executable format. It's called "portable" because all theimplementations of Windows NT on various platforms (x86, MIPS®, Alpha,and so on) use the same executable format. Sure, there are differencesin things like the binary encodings of CPU instructions. The importantthing is that the operating system loader and programming tools don'thave to be completely rewritten for each new CPU that arrives on thescene.

The strength of Microsoft's commitment to get Windows NTup and running quickly is evidenced by the fact that they abandonedexisting 32-bit tools and file formats. Virtual device drivers writtenfor 16-bit Windows were using a different 32-bit file layout—the LEformat—long before Windows NT appeared on the scene. More importantthan that is the shift of OBJ formats. Prior to the Windows NT Ccompiler, all Microsoft compilers used the Intel OMF (Object ModuleFormat) specification. As mentioned earlier, the Microsoft compilersfor Win32 produce COFF-format OBJ files. Some Microsoft competitorssuch as Borland and Symantec have chosen to forgo the COFF format OBJsand stick with the Intel OMF format. The upshot of this is thatcompanies producing OBJs or LIBs for use with multiple compilers willneed to go back to distributing separate versions of their products fordifferent compilers (if they weren't already).

The PE format isdocumented (in the loosest sense of the word) in the WINNT.H headerfile. About midway through WINNT.H is a section titled "Image Format."This section starts out with small tidbits from the old familiar MS-DOSMZ format and NE format headers before moving into the newer PEinformation. WINNT.H provides definitions of the raw data structuresused by PE files, but contains only a few useful comments to make senseof what the structures and flags mean. Whoever wrote the header filefor the PE format (the name Michael J. O'Leary keeps popping up) iscertainly a believer in long, descriptive names, along with deeplynested structures and macros. When coding with WINNT.H, it's notuncommon to have expressions like this:

Copy Code
pNTHeader->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_DEBUG].VirtualAddress;

To help make logical sense of the information inWINNT.H, read the Portable Executable and Common Object File FormatSpecification, available on MSDN Library quarterly CD-ROM releases upto and including October 2001.

Turning momentarily to thesubject of COFF-format OBJs, the WINNT.H header file includes structuredefinitions and typedefs for COFF OBJ and LIB files. Unfortunately,I've been unable to find any documentation on this similar to that forthe executable file mentioned above. Since PE files and COFF OBJ filesare so similar, I decided that it was time to bring these files outinto the light and document them as well.

Beyond just readingabout what PE files are composed of, you'll also want to dump some PEfiles to see these concepts for yourself. If you use Microsoft® toolsfor Win32-based development, the DUMPBIN program will dissect andoutput PE files and COFF OBJ/LIB files in readable form. Of all the PEfile dumpers, DUMPBIN is easily the most comprehensive. It even has anifty option to disassemble the code sections in the file it's takingapart. Borland users can use TDUMP to view PE executable files, butTDUMP doesn't understand the COFF OBJ files. This isn't a big dealsince the Borland compiler doesn't produce COFF-format OBJs in thefirst place.

I've written a PE and COFF OBJ file dumpingprogram, PEDUMP (see Table 1), that I think provides moreunderstandable output than DUMPBIN. Although it doesn't have adisassembler or work with LIB files, it is otherwise functionallyequivalent to DUMPBIN, and adds a few new features to make it worthconsidering. The source code for PEDUMP is available on any MSJbulletin board, so I won't list it here in its entirety. Instead, I'llshow sample output from PEDUMP to illustrate the concepts as I describethem.

Table 1. PEDUMP.C

Copy Code
//--------------------// PROGRAM: PEDUMP// FILE:    PEDUMP.C// AUTHOR:  Matt Pietrek - 1993//--------------------#include #include #include "objdump.h"#include "exedump.h"#include "extrnvar.h"// Global variables set here, and used in EXEDUMP.C and OBJDUMP.CBOOL fShowRelocations = FALSE;BOOL fShowRawSectionData = FALSE;BOOL fShowSymbolTable = FALSE;BOOL fShowLineNumbers = FALSE;char HelpText[] ="PEDUMP - Win32/COFF .EXE/.OBJ file dumper - 1993 Matt Pietrek\n\n""Syntax: PEDUMP [switches] filename\n\n""  /A    include everything in dump\n""  /H    include hex dump of sections\n""  /L    include line number information\n""  /R    show base relocations\n""  /S    show symbol table\n";// Open up a file, memory map it, and call the appropriate dumping routinevoid DumpFile(LPSTR filename){HANDLE hFile;HANDLE hFileMapping;LPVOID lpFileBase;PIMAGE_DOS_HEADER dosHeader;hFile = CreateFile(filename, GENERIC_READ, FILE_SHARE_READ, NULL,OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);if ( hFile = = INVALID_HANDLE_VALUE ){   printf("Couldn't open file with CreateFile()\n");return; }hFileMapping = CreateFileMapping(hFile, NULL, PAGE_READONLY, 0, 0, NULL);if ( hFileMapping = = 0 ){   CloseHandle(hFile);printf("Couldn't open file mapping with CreateFileMapping()\n");return; }lpFileBase = MapViewOfFile(hFileMapping, FILE_MAP_READ, 0, 0, 0);if ( lpFileBase = = 0 ){CloseHandle(hFileMapping);CloseHandle(hFile);printf("Couldn't map view of file with MapViewOfFile()\n");return;}printf("Dump of file %s\n\n", filename);dosHeader = (PIMAGE_DOS_HEADER)lpFileBase;if ( dosHeader->e_magic = = IMAGE_DOS_SIGNATURE ){ DumpExeFile( dosHeader ); }else if ( (dosHeader->e_magic = = 0x014C)    // Does it look like a i386&& (dosHeader->e_sp = = 0) )        // COFF OBJ file???{// The two tests above aren't what they look like.  They're// really checking for IMAGE_FILE_HEADER.Machine = = i386 (0x14C)// and IMAGE_FILE_HEADER.SizeOfOptionalHeader = = 0;DumpObjFile( (PIMAGE_FILE_HEADER)lpFileBase );}elseprintf("unrecognized file format\n");UnmapViewOfFile(lpFileBase);CloseHandle(hFileMapping);CloseHandle(hFile);}// process all the command line arguments and return a pointer to// the filename argument.PSTR ProcessCommandLine(int argc, char *argv[]){int i;for ( i=1; i < argc; i++ ){strupr(argv[i]);// Is it a switch character?if ( (argv[i][0] = = '-') || (argv[i][0] = = '/') ){if ( argv[i][1] = = 'A' ){   fShowRelocations = TRUE;fShowRawSectionData = TRUE;fShowSymbolTable = TRUE;fShowLineNumbers = TRUE; }else if ( argv[i][1] = = 'H' )fShowRawSectionData = TRUE;else if ( argv[i][1] = = 'L' )fShowLineNumbers = TRUE;else if ( argv[i][1] = = 'R' )fShowRelocations = TRUE;else if ( argv[i][1] = = 'S' )fShowSymbolTable = TRUE;}else    // Not a switch character.  Must be the filename{   return argv[i]; }}}int main(int argc, char *argv[]){PSTR filename;if ( argc = = 1 ){   printf(    HelpText );return 1; }filename = ProcessCommandLine(argc, argv);if ( filename )DumpFile( filename );return 0;}

Win32 and PE Basic Concepts

Let'sgo over a few fundamental ideas that permeate the design of a PE file(see Figure 1). I'll use the term "module" to mean the code, data, andresources of an executable file or DLL that have been loaded intomemory. Besides code and data that your program uses directly, a moduleis also composed of the supporting data structures used by Windows todetermine where the code and data is located in memory. In 16-bitWindows, the supporting data structures are in the module database (thesegment referred to by an HMODULE). In Win32, these data structures arein the PE header, which I'll explain shortly.

Figure 1. The PE file format

Thefirst important thing to know about PE files is that the executablefile on disk is very similar to what the module will look like afterWindows has loaded it. The Windows loader doesn't need to workextremely hard to create a process from the disk file. The loader usesthe memory-mapped file mechanism to map the appropriate pieces of thefile into the virtual address space. To use a construction analogy, aPE file is like a prefabricated home. It's essentially brought intoplace in one piece, followed by a small amount of work to wire it up tothe rest of the world (that is, to connect it to its DLLs and so on).This same ease of loading applies to PE-format DLLs as well. Once themodule has been loaded, Windows can effectively treat it like any othermemory-mapped file.

This is in marked contrast to the situationin 16-bit Windows. The 16-bit NE file loader reads in portions of thefile and creates completely different data structures to represent themodule in memory. When a code or data segment needs to be loaded, theloader has to allocate a new segment from the global heap, find wherethe raw data is stored in the executable file, seek to that location,read in the raw data, and apply any applicable fixups. In addition,each 16-bit module is responsible for remembering all the selectorsit's currently using, whether the segment has been discarded, and so on.

ForWin32, all the memory used by the module for code, data, resources,import tables, export tables, and other required module data structuresis in one contiguous block of memory. All you need to know in thissituation is where the loader mapped the file into memory. You caneasily find all the various pieces of the module by following pointersthat are stored as part of the image.

Another idea you should beacquainted with is the Relative Virtual Address (RVA). Many fields inPE files are specified in terms of RVAs. An RVA is simply the offset ofsome item, relative to where the file is memory-mapped. For example,let's say the loader maps a PE file into memory starting at address0x10000 in the virtual address space. If a certain table in the imagestarts at address 0x10464, then the table's RVA is 0x464.

Copy Code
 (Virtual address 0x10464)-(base address 0x10000) = RVA 0x00464

To convert an RVA into a usable pointer, simplyadd the RVA to the base address of the module. The base address is thestarting address of a memory-mapped EXE or DLL and is an importantconcept in Win32. For the sake of convenience, Windows NT and Windows95 uses the base address of a module as the module's instance handle(HINSTANCE). In Win32, calling the base address of a module anHINSTANCE is somewhat confusing, because the term "instance handle"comes from 16-bit Windows. Each copy of an application in 16-bitWindows gets its own separate data segment (and an associated globalhandle) that distinguishes it from other copies of the application,hence the term instance handle. In Win32, applications don't need to bedistinguished from one another because they don't share the sameaddress space. Still, the term HINSTANCE persists to keep continuitybetween 16-bit Windows and Win32. What's important for Win32 is thatyou can call GetModuleHandle for any DLL that your process uses to geta pointer for accessing the module's components.

The finalconcept that you need to know about PE files is sections. A section ina PE file is roughly equivalent to a segment or the resources in an NEfile. Sections contain either code or data. Unlike segments, sectionsare blocks of contiguous memory with no size constraints. Some sectionscontain code or data that your program declared and uses directly,while other data sections are created for you by the linker andlibrarian, and contain information vital to the operating system. Insome descriptions of the PE format, sections are also referred to asobjects. The term object has so many overloaded meanings that I'llstick to calling the code and data areas sections.

The PE Header

Likeall other executable file formats, the PE file has a collection offields at a known (or easy to find) location that define what the restof the file looks like. This header contains information such as thelocations and sizes of the code and data areas, what operating systemthe file is intended for, the initial stack size, and other vitalpieces of information that I'll discuss shortly. As with otherexecutable formats from Microsoft, this main header isn't at the verybeginning of the file. The first few hundred bytes of the typical PEfile are taken up by the MS-DOS stub. This stub is a tiny program thatprints out something to the effect of "This program cannot be run inMS-DOS mode." So if you run a Win32-based program in an environmentthat doesn't support Win32, you'll get this informative error message.When the Win32 loader memory maps a PE file, the first byte of themapped file corresponds to the first byte of the MS-DOS stub. That'sright. With every Win32-based program you start up, you get anMS-DOS-based program loaded for free!

As in other Microsoftexecutable formats, you find the real header by looking up its startingoffset, which is stored in the MS-DOS stub header. The WINNT.H fileincludes a structure definition for the MS-DOS stub header that makesit very easy to look up where the PE header starts. The e_lfanew fieldis a relative offset (or RVA, if you prefer) to the actual PE header.To get a pointer to the PE header in memory, just add that field'svalue to the image base:

Copy Code
// Ignoring typecasts and pointer conversion issues for clarity...pNTHeader = dosHeader + dosHeader->e_lfanew;

Once you have a pointer to the main PE header,the fun can begin. The main PE header is a structure of typeIMAGE_NT_HEADERS, which is defined in WINNT.H. This structure iscomposed of a DWORD and two substructures and is laid out as follows:

Copy Code
DWORD Signature;IMAGE_FILE_HEADER FileHeader;IMAGE_OPTIONAL_HEADER OptionalHeader;

The Signature field viewed as ASCII text is"PE\0\0". If after using the e_lfanew field in the MS-DOS header, youfind an NE signature here rather than a PE, you're working with a16-bit Windows NE file. Likewise, an LE in the signature field wouldindicate a Windows 3.x virtual device driver (VxD). An LX here would bethe mark of a file for OS/2 2.0.

Following the PE signatureDWORD in the PE header is a structure of type IMAGE_FILE_HEADER. Thefields of this structure contain only the most basic information aboutthe file. The structure appears to be unmodified from its original COFFimplementations. Besides being part of the PE header, it also appearsat the very beginning of the COFF OBJs produced by the Microsoft Win32compilers. The fields of the IMAGE_FILE_HEADER are shown in Table 2.

Table 2. IMAGE_FILE_HEADER Fields

WORD Machine
The CPU that this file is intended for. The following CPU IDs are defined: 0x14d Intel i860 0x14c Intel I386 (same ID used for 486 and 586) 0x162 MIPS R3000 0x166 MIPS R4000 0x183 DEC Alpha AXP

WORD NumberOfSections
The number of sections in the file.
DWORD TimeDateStamp
Thetime that the linker (or compiler for an OBJ file) produced this file.This field holds the number of seconds since December 31st, 1969, at4:00 P.M.
DWORD PointerToSymbolTable
Thefile offset of the COFF symbol table. This field is only used in OBJfiles and PE files with COFF debug information. PE files supportmultiple debug formats, so debuggers should refer to theIMAGE_DIRECTORY_ENTRY_DEBUG entry in the data directory (definedlater).
DWORD NumberOfSymbols
The number of symbols in the COFF symbol table. See above.
WORD SizeOfOptionalHeader
Thesize of an optional header that can follow this structure. In OBJs, thefield is 0. In executables, it is the size of the IMAGE_OPTIONAL_HEADERstructure that follows this structure.
WORD Characteristics
Flags with information about the file. Some important fields:
0x0001
There are no relocations in this file
0x0002
File is an executable image (not a OBJ or LIB)
0x2000
File is a dynamic-link library, not a program
Other fields are defined in WINNT.H

Thethird component of the PE header is a structure of typeIMAGE_OPTIONAL_HEADER. For PE files, this portion certainly isn'toptional. The COFF format allows individual implementations to define astructure of additional information beyond the standardIMAGE_FILE_HEADER. The fields in the IMAGE_OPTIONAL_HEADER are what thePE designers felt was critical information beyond the basic informationin the IMAGE_FILE_HEADER.

All of the fields of theIMAGE_OPTIONAL_HEADER aren't necessarily important to know about (seeFigure 4). The more important ones to be aware of are the ImageBase andthe Subsystem fields. You can skim or skip the description of thefields.

Table 3. IMAGE_OPTIONAL_HEADER Fields

WORD Magic
Appears to be a signature WORD of some sort. Always appears to be set to 0x010B.
BYTE MajorLinkerVersion
BYTE MinorLinkerVersion
Theversion of the linker that produced this file. The numbers should bedisplayed as decimal values, rather than as hex. A typical linkerversion is 2.23.
DWORD SizeOfCode
Thecombined and rounded-up size of all the code sections. Usually, mostfiles only have one code section, so this field matches the size of the.text section.
DWORD SizeOfInitializedData
Thisis supposedly the total size of all the sections that are composed ofinitialized data (not including code segments.) However, it doesn'tseem to be consistent with what appears in the file.
DWORD SizeOfUninitializedData
Thesize of the sections that the loader commits space for in the virtualaddress space, but that don't take up any space in the disk file. Thesesections don't need to have specific values at program startup, hencethe term uninitialized data. Uninitialized data usually goes into asection called .bss.
DWORD AddressOfEntryPoint
The address where the loader will begin execution. This is an RVA, and usually can usually be found in the .text section.
DWORD BaseOfCode
TheRVA where the file's code sections begin. The code sections typicallycome before the data sections and after the PE header in memory. ThisRVA is usually 0x1000 in Microsoft Linker-produced EXEs. Borland'sTLINK32 looks like it adds the image base to the RVA of the first codesection and stores the result in this field.
DWORD BaseOfData
TheRVA where the file's data sections begin. The data sections typicallycome last in memory, after the PE header and the code sections.
DWORD ImageBase
Whenthe linker creates an executable, it assumes that the file will bememory-mapped to a specific location in memory. That address is storedin this field, assuming a load address allows linker optimizations totake place. If the file really is memory-mapped to that address by theloader, the code doesn't need any patching before it can be run. Inexecutables produced for Windows NT, the default image base is 0x10000.For DLLs, the default is 0x400000. In Windows 95, the address 0x10000can't be used to load 32-bit EXEs because it lies within a linearaddress region shared by all processes. Because of this, Microsoft haschanged the default base address for Win32 executables to 0x400000.Older programs that were linked assuming a base address of 0x10000 willtake longer to load under Windows 95 because the loader needs to applythe base relocations.
DWORD SectionAlignment
Whenmapped into memory, each section is guaranteed to start at a virtualaddress that's a multiple of this value. For paging purposes, thedefault section alignment is 0x1000.
DWORD FileAlignment
Inthe PE file, the raw data that comprises each section is guaranteed tostart at a multiple of this value. The default value is 0x200 bytes,probably to ensure that sections always start at the beginning of adisk sector (which are also 0x200 bytes in length). This field isequivalent to the segment/resource alignment size in NE files. UnlikeNE files, PE files typically don't have hundreds of sections, so thespace wasted by aligning the file sections is almost always very small.
WORD MajorOperatingSystemVersion
WORD MinorOperatingSystemVersion
Theminimum version of the operating system required to use thisexecutable. This field is somewhat ambiguous since the subsystem fields(a few fields later) appear to serve a similar purpose. This fielddefaults to 1.0 in all Win32 EXEs to date.
WORD MajorImageVersion
WORD MinorImageVersion
Auser-definable field. This allows you to have different versions of anEXE or DLL. You set these fields via the linker /VERSION switch. Forexample, "LINK /VERSION:2.0 myobj.obj".
WORD MajorSubsystemVersion
WORD MinorSubsystemVersion
Containsthe minimum subsystem version required to run the executable. A typicalvalue for this field is 3.10 (meaning Windows NT 3.1).
DWORD Reserved1
Seems to always be 0.
DWORD SizeOfImage
Thisappears to be the total size of the portions of the image that theloader has to worry about. It is the size of the region starting at theimage base up to the end of the last section. The end of the lastsection is rounded up to the nearest multiple of the section alignment.
DWORD SizeOfHeaders
Thesize of the PE header and the section (object) table. The raw data forthe sections starts immediately after all the header components.
DWORD CheckSum
Supposedlya CRC checksum of the file. As in other Microsoft executable formats,this field is ignored and set to 0. The one exception to this rule isfor trusted services and these EXEs must have a valid checksum.
WORD Subsystem
The type of subsystem that this executable uses for its user interface. WINNT.H defines the following values:
NATIVE
1
Doesn't require a subsystem (such as a device driver)
WINDOWS_GUI
2
Runs in the Windows GUI subsystem
WINDOWS_CUI
3
Runs in the Windows character subsystem (a console app)
OS2_CUI
5
Runs in the OS/2 character subsystem (OS/2 1.x apps only)
POSIX_CUI
7
Runs in the Posix character subsystem

WORD DllCharacteristics
Aset of flags indicating under which circumstances a DLL'sinitialization function (such as DllMain) will be called. This valueappears to always be set to 0, yet the operating system still calls theDLL initialization function for all four events.
The following values are defined:
1 Call when DLL is first loaded into a process's address space 2 Call when a thread terminates 4 Call when a thread starts up 8 Call when DLL exits

DWORD SizeOfStackReserve
Theamount of virtual memory to reserve for the initial thread's stack. Notall of this memory is committed, however (see the next field). Thisfield defaults to 0x100000 (1MB). If you specify 0 as the stack size toCreateThread, the resulting thread will also have a stack of this samesize.
DWORD SizeOfStackCommit
Theamount of memory initially committed for the initial thread's stack.This field defaults to 0x1000 bytes (1 page) for the Microsoft Linkerwhile TLINK32 makes it two pages.
DWORD SizeOfHeapReserve
Theamount of virtual memory to reserve for the initial process heap. Thisheap's handle can be obtained by calling GetProcessHeap. Not all ofthis memory is committed (see the next field).
DWORD SizeOfHeapCommit
The amount of memory initially committed in the process heap. The default is one page.
DWORD LoaderFlags
FromWINNT.H, these appear to be fields related to debugging support. I'venever seen an executable with either of these bits enabled, nor is itclear how to get the linker to set them. The following values aredefined: 1. Invoke a breakpoint instruction before starting the process 2. Invoke a debugger on the process after it's been loaded
DWORD NumberOfRvaAndSizes
The number of entries in the DataDirectory array (below). This value is always set to 16 by the current tools.
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES]
Anarray of IMAGE_DATA_DIRECTORY structures. The initial array elementscontain the starting RVA and sizes of important portions of theexecutable file. Some elements at the end of the array are currentlyunused. The first element of the array is always the address and sizeof the exported function table (if present). The second array entry isthe address and size of the imported function table, and so on. For acomplete list of defined array entries, see theIMAGE_DIRECTORY_ENTRY_XXX #defines in WINNT.H. This array allows theloader to quickly find a particular section of the image (for example,the imported function table), without needing to iterate through eachof the images sections, comparing names as it goes along. Most arrayentries describe an entire section's data. However, theIMAGE_DIRECTORY_ENTRY_DEBUG element only encompasses a small portion ofthe bytes in the .rdata section.

The Section Table

Betweenthe PE header and the raw data for the image's sections lies thesection table. The section table is essentially a phone book containinginformation about each section in the image. The sections in the imageare sorted by their starting address (RVAs), rather than alphabetically.

NowI can better clarify what a section is. In an NE file, your program'scode and data are stored in distinct "segments" in the file. Part ofthe NE header is an array of structures, one for each segment yourprogram uses. Each structure in the array contains information aboutone segment. The information stored includes the segment's type (codeor data), its size, and its location elsewhere in the file. In a PEfile, the section table is analogous to the segment table in the NEfile. Unlike an NE file segment table, though, a PE section tabledoesn't store a selector value for each code or data chunk. Instead,each section table entry stores an address where the file's raw datahas been mapped into memory. While sections are analogous to 32-bitsegments, they really aren't individual segments. They're just reallymemory ranges in a process's virtual address space.

Another areawhere PE files differ from NE files is how they manage the supportingdata that your program doesn't use, but the operating system does; forexample, the list of DLLs that the executable uses or the location ofthe fixup table. In an NE file, resources aren't considered segments.Even though they have selectors assigned to them, information aboutresources is not stored in the NE header's segment table. Instead,resources are relegated to a separate table towards the end of the NEheader. Information about imported and exported functions also doesn'twarrant its own segment; it's crammed into the NE header.

Thestory with PE files is different. Anything that might be consideredvital code or data is stored in a full-fledged section. Thus,information about imported functions is stored in its own section, asis the table of functions that the module exports. The same goes forthe relocation data. Any code or data that might be needed by eitherthe program or the operating system gets its own section.

BeforeI discuss specific sections, I need to describe the data that theoperating system manages the sections with. Immediately following thePE header in memory is an array of IMAGE_SECTION_HEADERs. The number ofelements in this array is given in the PE header (theIMAGE_NT_HEADER.FileHeader.NumberOfSections field). I used PEDUMP tooutput the section table and all of the section's fields andattributes. Figure 5 shows the PEDUMP output of a section table for atypical EXE file, and Figure 6 shows the section table in an OBJ file.

Table 4. A Typical Section Table from an EXE File

Copy Code
01 .text     VirtSize: 00005AFA  VirtAddr:  00001000raw data offs:   00000400  raw data size: 00005C00relocation offs: 00000000  relocations:   00000000line # offs:     00009220  line #'s:      0000020Ccharacteristics: 60000020CODE  MEM_EXECUTE  MEM_READ02 .bss      VirtSize: 00001438  VirtAddr:  00007000raw data offs:   00000000  raw data size: 00001600relocation offs: 00000000  relocations:   00000000line # offs:     00000000  line #'s:      00000000characteristics: C0000080UNINITIALIZED_DATA  MEM_READ  MEM_WRITE03 .rdata    VirtSize: 0000015C  VirtAddr:  00009000raw data offs:   00006000  raw data size: 00000200relocation offs: 00000000  relocations:   00000000line # offs:     00000000  line #'s:      00000000characteristics: 40000040INITIALIZED_DATA  MEM_READ04 .data     VirtSize: 0000239C  VirtAddr:  0000A000raw data offs:   00006200  raw data size: 00002400relocation offs: 00000000  relocations:   00000000line # offs:     00000000  line #'s:      00000000characteristics: C0000040INITIALIZED_DATA  MEM_READ  MEM_WRITE05 .idata    VirtSize: 0000033E  VirtAddr:  0000D000raw data offs:   00008600  raw data size: 00000400relocation offs: 00000000  relocations:   00000000line # offs:     00000000  line #'s:      00000000characteristics: C0000040INITIALIZED_DATA  MEM_READ  MEM_WRITE06 .reloc    VirtSize: 000006CE  VirtAddr:  0000E000raw data offs:   00008A00  raw data size: 00000800relocation offs: 00000000  relocations:   00000000line # offs:     00000000  line #'s:      00000000characteristics: 42000040INITIALIZED_DATA  MEM_DISCARDABLE  MEM_READ

Table 5. A Typical Section Table from an OBJ File

Copy Code
01 .drectve  PhysAddr: 00000000  VirtAddr:  00000000raw data offs:   000000DC  raw data size: 00000026relocation offs: 00000000  relocations:   00000000line # offs:     00000000  line #'s:      00000000characteristics: 00100A00LNK_INFO  LNK_REMOVE02 .debug$S  PhysAddr: 00000026  VirtAddr:  00000000raw data offs:   00000102  raw data size: 000016D0relocation offs: 000017D2  relocations:   00000032line # offs:     00000000  line #'s:      00000000characteristics: 42100048INITIALIZED_DATA  MEM_DISCARDABLE  MEM_READ03 .data     PhysAddr: 000016F6  VirtAddr:  00000000raw data offs:   000019C6  raw data size: 00000D87relocation offs: 0000274D  relocations:   00000045line # offs:     00000000  line #'s:      00000000characteristics: C0400040INITIALIZED_DATA  MEM_READ  MEM_WRITE04 .text     PhysAddr: 0000247D  VirtAddr:  00000000raw data offs:   000029FF  raw data size: 000010DArelocation offs: 00003AD9  relocations:   000000E9line # offs:     000043F3  line #'s:      000000D9characteristics: 60500020CODE  MEM_EXECUTE  MEM_READ05 .debug$T  PhysAddr: 00003557  VirtAddr:  00000000raw data offs:   00004909  raw data size: 00000030relocation offs: 00000000  relocations:   00000000line # offs:     00000000  line #'s:      00000000characteristics: 42100048INITIALIZED_DATA  MEM_DISCARDABLE  MEM_READ

EachIMAGE_SECTION_HEADER has the format described in Figure 7. It'sinteresting to note what's missing from the information stored for eachsection. First off, notice that there's no indication of any PRELOADattributes. The NE file format allows you to specify with the PRELOADattribute which segments should be loaded at module load time. TheOS/2® 2.0 LX format has something similar, allowing you to specify upto eight pages to preload. The PE format has nothing like this.Microsoft must be confident in the performance of Win32 demand-pagedloading.

Table 6. IMAGE_SECTION_HEADER Formats

BYTE Name[IMAGE_SIZEOF_SHORT_NAME]
Thisis an 8-byte ANSI name (not UNICODE) that names the section. Mostsection names start with a . (such as ".text"), but this is not arequirement, as some PE documentation would have you believe. You canname your own sections with either the segment directive in assemblylanguage, or with "#pragma data_seg" and "#pragma code_seg" in theMicrosoft C/C++ compiler. It's important to note that if the sectionname takes up the full 8 bytes, there's no NULL terminator byte. Ifyou're a printf devotee, you can use %.8s to avoid copying the namestring to another buffer where you can NULL-terminate it.
union {
DWORD PhysicalAddress
DWORD VirtualSize
} Misc;
Thisfield has different meanings, in EXEs or OBJs. In an EXE, it holds theactual size of the code or data. This is the size before rounding up tothe nearest file alignment multiple. The SizeOfRawData field (seems abit of a misnomer) later on in the structure holds the rounded upvalue. The Borland linker reverses the meaning of these two fields andappears to be correct. For OBJ files, this field indicates the physicaladdress of the section. The first section starts at address 0. To findthe physical address in an OBJ file of the next section, add theSizeOfRawData value to the physical address of the current section.
DWORD VirtualAddress
InEXEs, this field holds the RVA to where the loader should map thesection. To calculate the real starting address of a given section inmemory, add the base address of the image to the section'sVirtualAddress stored in this field. With Microsoft tools, the firstsection defaults to an RVA of 0x1000. In OBJs, this field ismeaningless and is set to 0.
DWORD SizeOfRawData
InEXEs, this field contains the size of the section after it's beenrounded up to the file alignment size. For example, assume a filealignment size of 0x200. If the VirtualSize field from above says thatthe section is 0x35A bytes in length, this field will say that thesection is 0x400 bytes long. In OBJs, this field contains the exactsize of the section emitted by the compiler or assembler. In otherwords, for OBJs, it's equivalent to the VirtualSize field in EXEs.
DWORD PointerToRawData
Thisis the file-based offset of where the raw data emitted by the compileror assembler can be found. If your program memory maps a PE or COFFfile itself (rather than letting the operating system load it), thisfield is more important than the VirtualAddress field. You'll have acompletely linear file mapping in this situation, so you'll find thedata for the sections at this offset, rather than at the RVA specifiedin the VirtualAddress field.
DWORD PointerToRelocations
InOBJs, this is the file-based offset to the relocation information forthis section. The relocation information for each OBJ sectionimmediately follows the raw data for that section. In EXEs, this field(and the subsequent field) are meaningless, and set to 0. When thelinker creates the EXE, it resolves most of the fixups, leaving onlybase address relocations and imported functions to be resolved at loadtime. The information about base relocations and imported functions iskept in their own sections, so there's no need for an EXE to haveper-section relocation data following the raw section data.
DWORD PointerToLinenumbers
Thisis the file-based offset of the line number table. A line number tablecorrelates source file line numbers to the addresses of the codegenerated for a given line. In modern debug formats like the CodeViewformat, line number information is stored as part of the debuginformation. In the COFF debug format, however, the line numberinformation is stored separately from the symbolic name/typeinformation. Usually, only code sections (such as .text) have linenumbers. In EXE files, the line numbers are collected towards the endof the file, after the raw data for the sections. In OBJ files, theline number table for a section comes after the raw section data andthe relocation table for that section.
WORD NumberOfRelocations
Thenumber of relocations in the relocation table for this section (thePointerToRelocations field from above). This field seems relevant onlyfor OBJ files.
WORD NumberOfLinenumbers
The number of line numbers in the line number table for this section (the PointerToLinenumbers field from above).
DWORD Characteristics
Whatmost programmers call flags, the COFF/PE format calls characteristics.This field is a set of flags that indicate the section's attributes(such as code/data, readable, or writeable,). For a complete list ofall possible section attributes, see the IMAGE_SCN_XXX_XXX #defines inWINNT.H. Some of the more important flags are shown below:
0x00000020 This section contains code. Usually set in conjunction with the executable flag (0x80000000).
0x00000040 This section contains initialized data. Almost all sectionsexcept executable and the .bss section have this flag set.
0x00000080 This section contains uninitialized data (for example, the .bss section).
0x00000200 This section contains comments or some other type ofinformation. A typical use of this section is the .drectve sectionemitted by the compiler, which contains commands for the linker.
0x00000800 This section's contents shouldn't be put in the final EXEfile. These sections are used by the compiler/assembler to passinformation to the linker.
0x02000000 This section can be discarded, since it's not needed by theprocess once it's been loaded. The most common discardable section isthe base relocations (.reloc).
0x10000000 This section is shareable. When used with a DLL, the data inthis section will be shared among all processes using the DLL. Thedefault is for data sections to be nonshared, meaning that each processusing a DLL gets its own copy of this section's data. In more technicalterms, a shared section tells the memory manager to set the pagemappings for this section such that all processes using the DLL referto the same physical page in memory. To make a section shareable, usethe SHARED attribute at link time. For example
Copy Code
LINK /SECTION:MYDATA,RWS ...
tells the linker that the section called MYDATA should be readable, writeable, and shared.
0x20000000 This section is executable. This flag is usually set whenever the "contains code" flag (0x00000020) is set.
0x40000000 This section is readable. This flag is almost always set for sections in EXE files.
0x80000000 The section is writeable. If this flag isn't set in an EXE'ssection, the loader should mark the memory mapped pages as read-only orexecute-only. Typical sections with this attribute are .data and .bss.Interestingly, the .idata section also has this attribute set.

Alsomissing from the PE format is the notion of page tables. The OS/2equivalent of an IMAGE_SECTION_HEADER in the LX format doesn't pointdirectly to where the code or data for a section can be found in thefile. Instead, it refers to a page lookup table that specifiesattributes and the locations of specific ranges of pages within asection. The PE format dispenses with all that, and guarantees that asection's data will be stored contiguously within the file. Of the twoformats, the LX method may allow more flexibility, but the PE style issignificantly simpler and easier to work with. Having written filedumpers for both formats, I can vouch for this!

Another welcomechange in the PE format is that the locations of items are stored assimple DWORD offsets. In the NE format, the location of almosteverything is stored as a sector value. To find the real offset, youneed to first look up the alignment unit size in the NE header andconvert it to a sector size (typically 16 or 512 bytes). You then needto multiply the sector size by the specified sector offset to get anactual file offset. If by chance something isn't stored as a sectoroffset in an NE file, it is probably stored as an offset relative tothe NE header. Since the NE header isn't at the beginning of the file,you need to drag around the file offset of the NE header in your code.All in all, the PE format is much easier to work with than the NE, LX,or LE formats (assuming you can use memory-mapped files).

Common Sections

Havingseen what sections are in general and where they're located, let's lookat the common sections that you'll find in EXE and OBJ files. The listis by no means complete, but includes the sections you encounter everyday (even if you're not aware of it).

The .text section is whereall general-purpose code emitted by the compiler or assembler ends up.Since PE files run in 32-bit mode and aren't restricted to 16-bitsegments, there's no reason to break the code from separate sourcefiles into separate sections. Instead, the linker concatenates all the.text sections from the various OBJs into one big .text section in theEXE. If you use Borland C++ the compiler emits its code to a segmentnamed CODE. PE files produced with Borland C++ have a section namedCODE rather than one called .text. I'll explain this in a minute.

Itwas somewhat interesting to me to find out that there was additionalcode in the .text section beyond what I created with the compiler orused from the run-time libraries. In a PE file, when you call afunction in another module (for example, GetMessage in USER32.DLL), theCALL instruction emitted by the compiler doesn't transfer controldirectly to the function in the DLL (see Figure 8). Instead, the callinstruction transfers control to a

Copy Code
JMP DWORD PTR [XXXXXXXX]

instructionthat's also in the .text section. The JMP instruction indirects througha DWORD variable in the .idata section. This .idata section DWORDcontains the real address of the operating system function entry point.After thinking about this for a while, I came to understand why DLLcalls are implemented this way. By funneling all calls to a given DLLfunction through one location, the loader doesn't need to patch everyinstruction that calls a DLL. All the PE loader has to do is put thecorrect address of the target function into the DWORD in the .idatasection. No call instructions need to be patched. This is in markedcontrast to NE files, where each segment contains a list of fixups thatneed to be applied to the segment. If the segment calls a given DLLfunction 20 times, the loader must write the address of that function20 times into the segment. The downside to the PE method is that youcan't initialize a variable with the true address of a DLL function.For example, you would think that something like

Figure 2. Calling a function in another module

Copy Code
FARPROC pfnGetMessage = GetMessage;

would put the address of GetMessage into thevariable pfnGetMessage. In 16-bit Windows, this works, while in Win32it doesn't. In Win32, the variable pfnGetMessage will end up holdingthe address of the JMP DWORD PTR [XXXXXXXX] thunk that I mentionedearlier. If you wanted to call through the function pointer, thingswould work as you'd expect. However, if you want to read the bytes atthe beginning of GetMessage, you're out of luck (unless you doadditional work to follow the .idata "pointer" yourself). I'll comeback to this topic later, in the discussion of the import table.

AlthoughBorland could have had the compiler emit segments with a name of .text,it chose a default segment name of CODE. To determine a section name inthe PE file, the Borland linker (TLINK32.EXE) takes the segment namefrom the OBJ file and truncates it to 8 characters (if necessary).

Whilethe difference in the section names is a small matter, there is a moreimportant difference in how Borland PE files link to other modules. AsI mentioned in the .text description, all calls to OBJs go through aJMP DWORD PTR [XXXXXXXX] thunk. Under the Microsoft system, this thunkcomes to the EXE from the .text section of an import library. Becausethe library manager (LIB32) creates the import library (and the thunk)when you link the external DLL, the linker doesn't have to "know" howto generate these thunks itself. The import library is really just somemore code and data to link into the PE file.

The Borlandsystem of dealing with imported functions is simply an extension of theway things were done for 16-bit NE files. The import libraries that theBorland linker uses are really just a list of function names along withthe name of the DLL they're in. TLINK32 is therefore responsible fordetermining which fixups are to external DLLs, and generating anappropriate JMP DWORD PTR [XXXXXXXX] thunk for it. TLINK32 stores thethunks that it creates in a section named .icode.

Just as .textis the default section for code, the .data section is where yourinitialized data goes. This data consists of global and staticvariables that are initialized at compile time. It also includes stringliterals. The linker combines all the .data sections from the OBJ andLIB files into one .data section in the EXE. Local variables arelocated on a thread's stack, and take no room in the .data or .bsssections.

The .bss section is where any uninitialized static andglobal variables are stored. The linker combines all the .bss sectionsin the OBJ and LIB files into one .bss section in the EXE. In thesection table, the RawDataOffset field for the .bss section is set to0, indicating that this section doesn't take up any space in the file.TLINK doesn't emit this section. Instead it extends the virtual size ofthe DATA section.

.CRT is another initialized data sectionutilized by the Microsoft C/C++ run-time libraries (hence the name).Why this data couldn't go into the standard .data section is beyond me.

The.rsrc section contains all the resources for the module. In the earlydays of Windows NT, the RES file output of the 16-bit RC.EXE wasn't ina format that the Microsoft PE linker could understand. The CVTRESprogram converted these RES files into a COFF-format OBJ, placing theresource data into a .rsrc section within the OBJ. The linker couldthen treat the resource OBJ as just another OBJ to link in, allowingthe linker to not "know" anything special about resources. More recentlinkers from Microsoft appear to be able to process RES files directly.

The.idata section contains information about functions (and data) that themodule imports from other DLLs. This section is equivalent to an NEfile's module reference table. A key difference is that each functionthat a PE file imports is specifically listed in this section. To findthe equivalent information in an NE file, you'd have to go diggingthrough the relocations at the end of the raw data for each of thesegments.

The .edata section is a list of the functions anddata that the PE file exports for other modules. Its NE file equivalentis the combination of the entry table, the resident names table, andthe nonresident names table. Unlike in 16-bit Windows, there's seldom areason to export anything from an EXE file, so you usually only see.edata sections in DLLs. When using Microsoft tools, the data in the.edata section comes to the PE file via the EXP file. Put another way,the linker doesn't generate this information on its own. Instead, itrelies on the library manager (LIB32) to scan the OBJ files and createthe EXP file that the linker adds to its list of modules to link. Yes,that's right! Those pesky EXP files are really just OBJ files with adifferent extension.

The .reloc section holds a table of baserelocations. A base relocation is an adjustment to an instruction orinitialized variable value that's needed if the loader couldn't loadthe file where the linker assumed it would. If the loader is able toload the image at the linker's preferred base address, the loadercompletely ignores the relocation information in this section. If youwant to take a chance and hope that the loader can always load theimage at the assumed base address, you can tell the linker to stripthis information with the /FIXED option. While this may save space inthe executable file, it may cause the executable not to work on otherWin32-based implementations. For example, say you built an EXE forWindows NT and based the EXE at 0x10000. If you told the linker tostrip the relocations, the EXE wouldn't run under Windows 95, where theaddress 0x10000 is already in use.

It's important to note thatthe JMP and CALL instructions that the compiler generates use offsetsrelative to the instruction, rather than actual offsets in the 32-bitflat segment. If the image needs to be loaded somewhere other thanwhere the linker assumed for a base address, these instructions don'tneed to change, since they use relative addressing. As a result, thereare not as many relocations as you might think. Relocations are usuallyonly needed for instructions that use a 32-bit offset to some data. Forexample, let's say you had the following global variable declarations:

Copy Code
int i;int *ptr = &i;

Ifthe linker assumed an image base of 0x10000, the address of thevariable i will end up containing something like 0x12004. At the memoryused to hold the pointer "ptr", the linker will have written out0x12004, since that's the address of the variable i. If the loader forwhatever reason decided to load the file at a base address of 0x70000,the address of i would be 0x72004. The .reloc section is a list ofplaces in the image where the difference between the linker assumedload address and the actual load address needs to be factored in.

Whenyou use the compiler directive _ _declspec(thread), the data that youdefine doesn't go into either the .data or .bss sections. It ends up inthe .tls section, which refers to "thread local storage," and isrelated to the TlsAlloc family of Win32 functions. When dealing with a.tls section, the memory manager sets up the page tables so thatwhenever a process switches threads, a new set of physical memory pagesis mapped to the .tls section's address space. This permits per-threadglobal variables. In most cases, it is much easier to use thismechanism than to allocate memory on a per-thread basis and store itspointer in a TlsAlloc'ed slot.

There's one unfortunate note thatmust be added about the .tls section and _ _declspec(thread) variables.In Windows NT and Windows 95, this thread local storage mechanism won'twork in a DLL if the DLL is loaded dynamically by LoadLibrary. In anEXE or an implicitly loaded DLL, everything works fine. If you can'timplicitly link to the DLL, but need per-thread data, you'll have tofall back to using TlsAlloc and TlsGetValue with dynamically allocatedmemory.

Although the .rdata section usually falls between the.data and .bss sections, your program generally doesn't see or use thedata in this section. The .rdata section is used for at least twothings. First, in Microsoft linker-produced EXEs, the .rdata sectionholds the debug directory, which is only present in EXE files. (InTLINK32 EXEs, the debug directory is in a section named .debug.) Thedebug directory is an array of IMAGE_DEBUG_DIRECTORY structures. Thesestructures hold information about the type, size, and location of thevarious types of debug information stored in the file. Three main typesof debug information appear: CodeView®, COFF, and FPO. Figure 9 showsthe PEDUMP output for a typical debug directory.

Table 7. A Typical Debug Directory

Type Size Address FilePtr Charactr TimeData Version   COFF 000065C5 00000000 00009200 00000000 2CF8CF3D   0.00 00000114 00000000 0000F7C8 00000000 2CF8CF3D   0.00 FPO 000004B0 00000000 0000F8DC 00000000 2CF8CF3D   0.00 CODEVIEW 0000B0B4 00000000 0000FD8C 00000000 2CF8CF3D   0.00

Thedebug directory isn't necessarily found at the beginning of the .rdatasection. To find the start of the debug directory table, use the RVA inthe seventh entry (IMAGE_DIRECTORY_ENTRY_DEBUG) of the data directory.The data directory is at the end of the PE header portion of the file.To determine the number of entries in the Microsoft linker-generateddebug directory, divide the size of the debug directory (found in thesize field of the data directory entry) by the size of anIMAGE_DEBUG_DIRECTORY structure. TLINK32 emits a simple count, usually1. The PEDUMP sample program demonstrates this.

The other usefulportion of an .rdata section is the description string. If youspecified a DESCRIPTION entry in your program's DEF file, the specifieddescription string appears in the .rdata section. In the NE format, thedescription string is always the first entry of the nonresident namestable. The description string is intended to hold a useful text stringdescribing the file. Unfortunately, I haven't found an easy way to findit. I've seen PE files that had the description string before the debugdirectory, and other files that had it after the debug directory. I'mnot aware of any consistent method of finding the description string(or even if it's present at all).

These .debug$S and .debug$Tsections only appear in OBJs. They store the CodeView symbol and typeinformation. The section names are derived from the segment names usedfor this purpose by previous 16-bit compilers ($$SYMBOLS and $$TYPES).The sole purpose of the .debug$T section is to hold the pathname to thePDB file that contains the CodeView information for all the OBJs in theproject. The linker reads in the PDB and uses it to create portions ofthe CodeView information that it places at the end of the finished PEfile.

The .drective section only appears in OBJ files. Itcontains text representations of commands for the linker. For example,in any OBJ I compile with the Microsoft compiler, the following stringsappear in the .drectve section:

Copy Code
-defaultlib:LIBC -defaultlib:OLDNAMES

Whenyou use _ _declspec(export) in your code, the compiler simply emits thecommand-line equivalent into the .drectve section (for instance,"-export:MyFunction").

In playing around with PEDUMP, I'veencountered other sections from time to time. For instance, in theWindows 95 KERNEL32.DLL, there are LOCKCODE and LOCKDATA sections.Presumably these are sections that will get special paging treatment sothat they're never paged out of memory.

There are two lessonsto be learned from this. First, don't feel constrained to use only thestandard sections provided by the compiler or assembler. If you need aseparate section for some reason, don't hesitate to create your own. Inthe C/C++ compiler, use the #pragma code_seg and #pragma data_seg. Inassembly language, just create a 32-bit segment (which becomes asection) with a name different from the standard sections. If usingTLINK32, you must use a different class or turn off code segmentpacking. The other thing to remember is that section names that are outof the ordinary can often give a deeper insight into the purpose andimplementation of a particular PE file.

PE File Imports

Earlier,I described how function calls to outside DLLs don't call the DLLdirectly. Instead, the CALL instruction goes to a JMP DWORD PTR[XXXXXXXX] instruction somewhere in the executable's .text section (or.icode section if you're using Borland C++). The address that the JMPinstruction looks up and transfers control to is the real targetaddress. The PE file's .idata section contains the informationnecessary for the loader to determine the addresses of the targetfunctions and patch them into the executable image.

The .idatasection (or import table, as I prefer to call it) begins with an arrayof IMAGE_IMPORT_DESCRIPTORs. There is one IMAGE_IMPORT_DESCRIPTOR foreach DLL that the PE file implicitly links to. There's no fieldindicating the number of structures in this array. Instead, the lastelement of the array is indicated by an IMAGE_IMPORT_DESCRIPTOR thathas fields filled with NULLs. The format of an IMAGE_IMPORT_DESCRIPTORis shown in Figure 10.

Table 8. IMAGE_IMPORT_DESCRIPTOR Format

DWORD Characteristics
Atone time, this may have been a set of flags. However, Microsoft changedits meaning and never bothered to update WINNT.H. This field is reallyan offset (an RVA) to an array of pointers. Each of these pointerspoints to an IMAGE_IMPORT_BY_NAME structure.
DWORD TimeDateStamp
The time/date stamp indicating when the file was built.
DWORD ForwarderChain
Thisfield relates to forwarding. Forwarding involves one DLL sending onreferences to one of its functions to another DLL. For example, inWindows NT, NTDLL.DLL appears to forward some of its exported functionsto KERNEL32.DLL. An application may think it's calling a function inNTDLL.DLL, but it actually ends up calling into KERNEL32.DLL. Thisfield contains an index into FirstThunk array (described momentarily).The function indexed by this field will be forwarded to another DLL.Unfortunately, the format of how a function is forwarded isn'tdocumented, and examples of forwarded functions are hard to find.
DWORD Name
Thisis an RVA to a NULL-terminated ASCII string containing the importedDLL's name. Common examples are "KERNEL32.DLL" and "USER32.DLL".
PIMAGE_THUNK_DATA FirstThunk
Thisfield is an offset (an RVA) to an IMAGE_THUNK_DATA union. In almostevery case, the union is interpreted as a pointer to anIMAGE_IMPORT_BY_NAME structure. If the field isn't one of thesepointers, then it's supposedly treated as an export ordinal value forthe DLL that's being imported. It's not clear from the documentation ifyou really can import a function by ordinal rather than by name.

Theimportant parts of an IMAGE_IMPORT_DESCRIPTOR are the imported DLL nameand the two arrays of IMAGE_IMPORT_BY_NAME pointers. In the EXE file,the two arrays (pointed to by the Characteristics and FirstThunkfields) run parallel to each other, and are terminated by a NULLpointer entry at the end of each array. The pointers in both arrayspoint to an IMAGE_IMPORT_BY_NAME structure. Figure 11 shows thesituation graphically. Figure 12 shows the PEDUMP output for an importstable.

Figure 3. Two parallel arrays of pointers

Table 9. Imports Table from an EXE File

Copy Code
GDI32.dllHint/Name Table: 00013064TimeDateStamp:   2C51B75BForwarderChain:  FFFFFFFFFirst thunk RVA: 00013214Ordn  Name48  CreatePen57  CreateSolidBrush62  DeleteObject160  GetDeviceCaps//  Rest of table omitted...KERNEL32.dllHint/Name Table: 0001309CTimeDateStamp:   2C4865A0ForwarderChain:  00000014First thunk RVA: 0001324COrdn  Name83  ExitProcess137  GetCommandLineA179  GetEnvironmentStrings202  GetModuleHandleA//  Rest of table omitted...SHELL32.dllHint/Name Table: 00013138TimeDateStamp:   2C41A383ForwarderChain:  FFFFFFFFFirst thunk RVA: 000132E8Ordn  Name46  ShellAboutAUSER32.dllHint/Name Table: 00013140TimeDateStamp:   2C474EDFForwarderChain:  FFFFFFFFFirst thunk RVA: 000132F0Ordn  Name10  BeginPaint35  CharUpperA39  CheckDlgButton40  CheckMenuItem//  Rest of table omitted...

There is one IMAGE_IMPORT_BY_NAME structure foreach function that the PE file imports. An IMAGE_IMPORT_BY_NAMEstructure is very simple, and looks like this:

Copy Code
WORD    Hint;BYTE    Name[?];

The first field is the best guess as to what theexport ordinal for the imported function is. Unlike with NE files, thisvalue doesn't have to be correct. Instead, the loader uses it as asuggested starting value for its binary search for the exportedfunction. Next is an ASCIIZ string with the name of the importedfunction.

Why are there two parallel arrays of pointers to theIMAGE_IMPORT_BY_NAME structures? The first array (the one pointed at bythe Characteristics field) is left alone, and never modified. It'ssometimes called the hint-name table. The second array (pointed at bythe FirstThunk field) is overwritten by the PE loader. The loaderiterates through each pointer in the array and finds the address of thefunction that each IMAGE_IMPORT_BY_NAME structure refers to. The loaderthen overwrites the pointer to IMAGE_IMPORT_BY_NAME with the foundfunction's address. The [XXXXXXXX] portion of the JMP DWORD PTR[XXXXXXXX] thunk refers to one of the entries in the FirstThunk array.Since the array of pointers that's overwritten by the loader eventuallyholds the addresses of all the imported functions, it's called theImport Address Table.

For you Borland users, there's a slighttwist to the above description. A PE file produced by TLINK32 ismissing one of the arrays. In such an executable, the Characteristicsfield in the IMAGE_IMPORT_DESCRIPTOR (aka the hint-name array) is 0.Therefore, only the array that's pointed at by the FirstThunk field(the Import Address Table) is guaranteed to exist in all PE files. Thestory would end here, except that I ran into an interesting problemwhen writing PEDUMP. In the never ending search for optimizations,Microsoft "optimized" the thunk array in the system DLLs for Windows NT(KERNEL32.DLL and so on). In this optimization, the pointers in thearray don't point to an IMAGE_IMPORT_BY_NAME structure—rather, theyalready contain the address of the imported function. In other words,the loader doesn't need to look up function addresses and overwrite thethunk array with the imported function's addresses. This causes aproblem for PE dumping programs that are expecting the array to containpointers to IMAGE_IMPORT_BY_NAME structures. You might be thinking,"But Matt, why don't you just use the hint-name table array?" Thatwould be an ideal solution, except that the hint-name table arraydoesn't exist in Borland files. The PEDUMP program handles all thesesituations, but the code is understandably messy.

Since theimport address table is in a writeable section, it's relatively easy tointercept calls that an EXE or DLL makes to another DLL. Simply patchthe appropriate import address table entry to point at the desiredinterception function. There's no need to modify any code in either thecaller or callee images. What could be easier?

It's interestingto note that in Microsoft-produced PE files, the import table is notsomething wholly synthesized by the linker. All the pieces necessary tocall a function in another DLL reside in an import library. When youlink a DLL, the library manager (LIB32.EXE or LIB.EXE) scans the OBJfiles being linked and creates an import library. This import libraryis completely different from the import libraries used by 16-bit NEfile linkers. The import library that the 32-bit LIB produces has a.text section and several .idata$ sections. The .text section in theimport library contains the JMP DWORD PTR [XXXXXXXX] thunk, which has aname stored for it in the OBJ's symbol table. The name of the symbol isidentical to the name of the function being exported by the DLL (forexample, _Dispatch_Message@4). One of the .idata$ sections in theimport library contains the DWORD that the thunk dereferences through.Another of the .idata$ sections has a space for the hint ordinalfollowed by the imported function's name. These two fields make up anIMAGE_IMPORT_BY_NAME structure. When you later link a PE file that usesthe import library, the import library's sections are added to the listof sections from your OBJs that the linker needs to process. Since thethunk in the import library has the same name as the function beingimported, the linker assumes the thunk is really the imported function,and fixes up calls to the imported function to point at the thunk. Thethunk in the import library is essentially "seen" as the importedfunction.

Besides providing the code portion of an importedfunction thunk, the import library provides the pieces of the PE file's.idata section (or import table). These pieces come from the various.idata$ sections that the library manager put into the import library.In short, the linker doesn't really know the differences betweenimported functions and functions that appear in a different OBJ file.The linker just follows its preset rules for building and combiningsections, and everything falls into place naturally.

PE File Exports

Theopposite of importing a function is exporting a function for use byEXEs or other DLLs. A PE file stores information about its exportedfunctions in the .edata section. Generally, Microsoft linker-generatedPE EXE files don't export anything, so they don't have an .edatasection. Borland's TLINK32 always exports at least one symbol from anEXE. Most DLLs do export functions and have an .edata section. Theprimary components of an .edata section (aka the export table) aretables of function names, entry point addresses, and export ordinalvalues. In an NE file, the equivalents of an export table are the entrytable, the resident names table, and the nonresident names table. Thesetables are stored as part of the NE header, rather than in distinctsegments or resources.

At the start of an .edata section is anIMAGE_EXPORT_DIRECTORY structure (see Table 10). This structure isimmediately followed by data pointed to by fields in the structure.

Table 10. IMAGE_EXPORT_DIRECTORY Format

DWORD Characteristics
This field appears to be unused and is always set to 0.
DWORD TimeDateStamp
The time/date stamp indicating when this file was created.
WORD MajorVersion
WORD MinorVersion
These fields appear to be unused and are set to 0.
DWORD Name
The RVA of an ASCIIZ string with the name of this DLL.
DWORD Base
Thestarting ordinal number for exported functions. For example, if thefile exports functions with ordinal values of 10, 11, and 12, thisfield contains 10. To obtain the exported ordinal for a function, youneed to add this value to the appropriate element of theAddressOfNameOrdinals array.
DWORD NumberOfFunctions
Thenumber of elements in the AddressOfFunctions array. This value is alsothe number of functions exported by this module. Theoretically, thisvalue could be different than the NumberOfNames field (next), butactually they're always the same.
DWORD NumberOfNames
Thenumber of elements in the AddressOfNames array. This value seems alwaysto be identical to the NumberOfFunctions field, and so is the number ofexported functions.
PDWORD *AddressOfFunctions
Thisfield is an RVA and points to an array of function addresses. Thefunction addresses are the entry points (RVAs) for each exportedfunction in this module.
PDWORD *AddressOfNames
Thisfield is an RVA and points to an array of string pointers. The stringsare the names of the exported functions in this module.
PWORD *AddressOfNameOrdinals
Thisfield is an RVA and points to an array of WORDs. The WORDs are theexport ordinals of all the exported functions in this module. However,don't forget to add in the starting ordinal number specified in theBase field.

The layout of the export table is somewhatodd (see Figure 4 and Table 10). As I mentioned earlier, therequirements for exporting a function are a name, an address, and anexport ordinal. You'd think that the designers of the PE format wouldhave put all three of these items into a structure, and then have anarray of these structures. Instead, each component of an exported entryis an element in an array. There are three of these arrays(AddressOfFunctions, AddressOfNames, AddressOfNameOrdinals), and theyare all parallel to one another. To find all the information about thefourth function, you need to look up the fourth element in each array.

Figure 4. Export table layout

Table 11. Typical Exports Table from an EXE File

Copy Code
Name:            KERNEL32.dllCharacteristics: 00000000TimeDateStamp:   2C4857D3Version:         0.00Ordinal base:    00000001# of functions:  0000021F# of Names:      0000021FEntry Pt  Ordn  Name00005090     1  AddAtomA00005100     2  AddAtomW00025540     3  AddConsoleAliasA00025500     4  AddConsoleAliasW00026AC0     5  AllocConsole00001000     6  BackupRead00001E90     7  BackupSeek00002100     8  BackupWrite0002520C     9  BaseAttachCompleteThunk00024C50    10  BasepDebugDump// Rest of table omitted...

Incidentally, if you dump out the exports fromthe Windows NT system DLLs (for example, KERNEL32.DLL and USER32.DLL),you'll note that in many cases there are two functions that only differby one character at the end of the name, for instance CreateWindowExAand CreateWindowExW. This is how UNICODE support is implementedtransparently. The functions that end with A are the ASCII (or ANSI)compatible functions, while those ending in W are the UNICODE versionof the function. In your code, you don't explicitly specify whichfunction to call. Instead, the appropriate function is selected inWINDOWS.H, via preprocessor #ifdefs. This excerpt from the Windows NTWINDOWS.H shows an example of how this works:

Copy Code
#ifdef UNICODE#define DefWindowProc  DefWindowProcW#else#define DefWindowProc  DefWindowProcA#endif // !UNICODE

PE File Resources

Findingresources in a PE file is quite a bit more complicated than in an NEfile. The formats of the individual resources (for example, a menu)haven't changed significantly but you need to traverse a strangehierarchy to find them.

Navigating the resource directoryhierarchy is like navigating a hard disk. There's a master directory(the root directory), which has subdirectories. The subdirectories havesubdirectories of their own that may point to the raw resource data forthings like dialog templates. In the PE format, both the root directoryof the resource directory hierarchy and all of its subdirectories arestructures of type IMAGE_RESOURCE_DIRECTORY (see Table 12).

Table 12. IMAGE_RESOURCE_DIRECTORY Format

DWORD Characteristics
Theoretically this field could hold flags for the resource, but appears to always be 0.
DWORD TimeDateStamp
The time/date stamp describing the creation time of the resource.
WORD MajorVersion
WORD MinorVersion
Theoretically these fields would hold a version number for the resource. These field appear to always be set to 0.

WORD NumberOfNamedEntries

The number of array elements that use names and that follow this structure.

WORD NumberOfIdEntries
The number of array elements that use integer IDs, and which follow this structure.
IMAGE_RESOURCE_DIRECTORY_ENTRY DirectoryEntries[]
Thisfield isn't really part of the IMAGE_RESOURCE_DIRECTORY structure.Rather, it's an array of IMAGE_RESOURCE_DIRECTORY_ENTRY structures thatimmediately follow the IMAGE_RESOURCE_DIRECTORY structure. The numberof elements in the array is the sum of the NumberOfNamedEntries andNumberOfIdEntries fields. The directory entry elements that have nameidentifiers (rather than integer IDs) come first in the array.

Adirectory entry can either point at a subdirectory (that is, to anotherIMAGE_RESOURCE_DIRECTORY), or it can point to the raw data for aresource. Generally, there are at least three directory levels beforeyou get to the actual raw resource data. The top-level directory (ofwhich there's only one) is always found at the beginning of theresource section (.rsrc). The subdirectories of the top-level directorycorrespond to the various types of resources found in the file. Forexample, if a PE file includes dialogs, string tables, and menus, therewill be three subdirectories: a dialog directory, a string tabledirectory, and a menu directory. Each of these type subdirectories willin turn have ID subdirectories. There will be one ID subdirectory foreach instance of a given resource type. In the above example, if thereare three dialog boxes, the dialog directory will have three IDsubdirectories. Each ID subdirectory will have either a string name(such as "MyDialog") or the integer ID used to identify the resource inthe RC file. Figure 5 shows a resource directory hierarchy example invisual form. Table 13 shows the PEDUMP output for the resources in theWindows NT CLOCK.EXE.

Figure 5. Resource directory hierarchy

Table 13. Resources Hierarchy for CLOCK.EXE

Copy Code
ResDir (0) Named:00 ID:06 TimeDate:2C3601DB Vers:0.00 Char:0ResDir (ICON) Named:00 ID:02 TimeDate:2C3601DB Vers:0.00 Char:0ResDir (1) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 00000200ResDir (2) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 00000210ResDir (MENU) Named:02 ID:00 TimeDate:2C3601DB Vers:0.00 Char:0ResDir (CLOCK) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 00000220ResDir (GENERICMENU) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 00000230ResDir (DIALOG) Named:01 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ResDir (ABOUTBOX) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 00000240ResDir (64) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 00000250ResDir (STRING) Named:00 ID:03 TimeDate:2C3601DB Vers:0.00 Char:0ResDir (1) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 00000260ResDir (2) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 00000270ResDir (3) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 00000280ResDir (GROUP_ICON) Named:01 ID:00 TimeDate:2C3601DB Vers:0.00 Char:0ResDir (CCKK) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 00000290ResDir (VERSION) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ResDir (1) Named:00 ID:01 TimeDate:2C3601DB Vers:0.00 Char:0ID: 00000409  Offset: 000002A0

Asmentioned earlier, each directory entry is a structure of typeIMAGE_RESOURCE_DIRECTORY_ENTRY (boy, these names are getting long!).Each IMAGE_RESOURCE_DIRECTORY_ENTRY has the format shown in Table 13.

Table 14. IMAGE_RESOURCE_DIRECTORY_ENTRY Format

DWORD Name
Thisfield contains either an integer ID or a pointer to a structure thatcontains a string name. If the high bit (0x80000000) is zero, thisfield is interpreted as an integer ID. If the high bit is nonzero, thelower 31 bits are an offset (relative to the start of the resources) toan IMAGE_RESOURCE_DIR_STRING_U structure. This structure contains aWORD character count, followed by a UNICODE string with the resourcename. Yes, even PE files intended for non-UNICODE Win32 implementationsuse UNICODE here. To convert the UNICODE string to an ANSI string, usethe WideCharToMultiByte function.
DWORD OffsetToData
Thisfield is either an offset to another resource directory or a pointer toinformation about a specific resource instance. If the high bit(0x80000000) is set, this directory entry refers to a subdirectory. Thelower 31 bits are an offset (relative to the start of the resources) toanother IMAGE_RESOURCE_DIRECTORY. If the high bit isn't set, the lower31 bits point to an IMAGE_RESOURCE_DATA_ENTRY structure. TheIMAGE_RESOURCE_DATA_ENTRY structure contains the location of theresource's raw data, its size, and its code page.

To gofurther into the resource formats, I'd need to discuss the format ofeach resource type (dialogs, menus, and so on). Covering these topicscould easily fill up an entire article on its own.

PE File Base Relocations

Whenthe linker creates an EXE file, it makes an assumption about where thefile will be mapped into memory. Based on this, the linker puts thereal addresses of code and data items into the executable file. If forwhatever reason the executable ends up being loaded somewhere else inthe virtual address space, the addresses the linker plugged into theimage are wrong. The information stored in the .reloc section allowsthe PE loader to fix these addresses in the loaded image so thatthey're correct again. On the other hand, if the loader was able toload the file at the base address assumed by the linker, the .relocsection data isn't needed and is ignored. The entries in the .relocsection are called base relocations since their use depends on the baseaddress of the loaded image.

Unlike relocations in the NE fileformat, base relocations are extremely simple. They boil down to a listof locations in the image that need a value added to them. The formatof the base relocation data is somewhat quirky. The base relocationentries are packaged in a series of variable length chunks. Each chunkdescribes the relocations for one 4KB page in the image. Let's look atan example to see how base relocations work. An executable file islinked assuming a base address of 0x10000. At offset 0x2134 within theimage is a pointer containing the address of a string. The stringstarts at physical address 0x14002, so the pointer contains the value0x14002. You then load the file, but the loader decides that it needsto map the image starting at physical address 0x60000. The differencebetween the linker-assumed base load address and the actual loadaddress is called the delta. In this case, the delta is 0x50000. Sincethe entire image is 0x50000 bytes higher in memory, so is the string(now at address 0x64002). The pointer to the string is now incorrect.The executable file contains a base relocation for the memory locationwhere the pointer to the string resides. To resolve a base relocation,the loader adds the delta value to the original value at the baserelocation address. In this case, the loader would add 0x50000 to theoriginal pointer value (0x14002), and store the result (0x64002) backinto the pointer's memory. Since the string really is at 0x64002,everything is fine with the world.

Each chunk of baserelocation data begins with an IMAGE_BASE_RELOCATION structure thatlooks like Table 14. Table 15 shows some base relocations as shown byPEDUMP. Note that the RVA values shown have already been displaced bythe VirtualAddress in the IMAGE_BASE_RELOCATION field.

Figure 15. IMAGE_BASE_RELOCATION Format

DWORD VirtualAddress
Thisfield contains the starting RVA for this chunk of relocations. Theoffset of each relocation that follows is added to this value to formthe actual RVA where the relocation needs to be applied.
DWORD SizeOfBlock
Thesize of this structure plus all the WORD relocations that follow. Todetermine the number of relocations in this block, subtract the size ofan IMAGE_BASE_RELOCATION (8 bytes) from the value of this field, andthen divide by 2 (the size of a WORD). For example, if this fieldcontains 44, there are 18 relocations that immediately follow: Copy Code
 (44 - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(WORD) = 18WORD TypeOffset
This isn't just a single WORD, but rather an array of WORDs, the numberof which is calculated by the above formula. The bottom 12 bits of eachWORD are a relocation offset, and need to be added to the value of theVirtual Address field from this relocation block's header. The high 4bits of each WORD are a relocation type. For PE files that run on IntelCPUs, you'll only see two types of relocations:
0 IMAGE_REL_BASED_ABSOLUTE This relocation is meaningless and is only used as a place holder to round relocation blocks up to a DWORD multiple size. 3 IMAGE_REL_BASED_HIGHLOW This relocation means add both the high and low 16 bits of the delta to the DWORD specified by the calculated RVA.

Table 16. The Base Relocations from an EXE File

Copy Code
Virtual Address: 00001000  size: 0000012C00001032 HIGHLOW0000106D HIGHLOW000010AF HIGHLOW000010C5 HIGHLOW// Rest of chunk omitted...Virtual Address: 00002000  size: 0000009C000020A6 HIGHLOW00002110 HIGHLOW00002136 HIGHLOW00002156 HIGHLOW// Rest of chunk omitted...Virtual Address: 00003000  size: 000001140000300A HIGHLOW0000301E HIGHLOW0000303B HIGHLOW0000306A HIGHLOW// Rest of relocations omitted...

Differences Between PE and COFF OBJ Files

Thereare two portions of the PE file that are not used by the operatingsystem. These are the COFF symbol table and the COFF debug information.Why would anyone need COFF debug information when the much morecomplete CodeView information is available? If you intend to use theWindows NT system debugger (NTSD) or the Windows NT kernel debugger(KD), COFF is the only game in town. For those of you who areinterested, I've included a detailed description of these parts of thePE file in the online posting that accompanies this article (availableon all MSJ bulletin boards).

At many points throughout thepreceding discussion, I've noted that many structures and tables arethe same in both a COFF OBJ file and the PE file created from it. BothCOFF OBJ and PE files have an IMAGE_FILE_HEADER at or near theirbeginning. This header is followed by a section table that containsinformation about all the sections in the file. The two formats alsoshare the same line number and symbol table formats, although the PEfile can have additional non-COFF symbol tables as well. The amount ofcommonality between the OBJ and PE EXE formats is evidenced by thelarge amount of common code in PEDUMP (see COMMON.C on any MSJ bulletinboard).

This similarity between the two file formats isn'thappenstance. The goal of this design is to make the linker's job aseasy as possible. Theoretically, creating an EXE file from a single OBJshould be just a matter of inserting a few tables and modifying acouple of file offsets within the image. With this in mind, you canthink of a COFF file as an embryonic PE file. Only a few things aremissing or different, so I'll list them here.

  • COFF OBJ files don't have an MS-DOS stub preceding the IMAGE_FILE_HEADER, nor is there a "PE" signature preceding the IMAGE_FILE_HEADER.
  • OBJ files don't have the IMAGE_OPTIONAL_HEADER. In a PE file, this structure immediately follows the IMAGE_FILE_HEADER. Interestingly, COFF LIB files do have an IMAGE_OPTIONAL_HEADER. Space constraints prevent me from talking about LIB files here.
  • OBJ files don't have base relocations. Instead, they have regular symbol-based fixups. I haven't gone into the format of the COFF OBJ file relocations because they're fairly obscure. If you want to dig into this particular area, the PointerToRelocations and NumberOfRelocations fields in the section table entries point to the relocations for each section. The relocations are an array of IMAGE_RELOCATION structures, which is defined in WINNT.H. The PEDUMP program can show OBJ file relocations if you enable the proper switch.
  • The CodeView information in an OBJ file is stored in two sections (.debug$S and .debug$T). When the linker processes the OBJ files, it doesn't put these sections in the PE file. Instead, it collects all these sections and builds a single symbol table stored at the end of the file. This symbol table isn't a formal section (that is, there's no entry for it in the PE's section table).

Using PEDUMP

PEDUMPis a command-line utility for dumping PE files and COFF OBJ formatfiles. It uses the Win32 console capabilities to eliminate the need forextensive user interface work. The syntax for PEDUMP is as follows:

Copy Code
PEDUMP [switches] filename

Theswitches can be seen by running PEDUMP with no arguments. PEDUMP usesthe switches shown in Table 17. By default, none of the switches areenabled. Running PEDUMP without any of the switches provides most ofthe useful information without creating a huge amount of output. PEDUMPsends its output to the standard output file, so its output can beredirected to a file with an > on the command line.

Table 17. PEDUMP Switches

/A Include everything in dump (essentially, enable all the switches) /H Include a hex dump of each section at the end of the dump /L Include line number information (both PE and COFF OBJ files) /R Show base relocations (PE files only) /S Show symbol table (both PE and COFF OBJ files)

Summary

Withthe advent of Win32, Microsoft made sweeping changes in the OBJ andexecutable file formats to save time and build on work previously donefor other operating systems. A primary goal of these file formats is toenhance portability across different platforms.