Pe format file




















Each section has a distinct name. This name is intended to convey the purpose of the section. For example, a section called. Section names are used solely for the benefit of humans, and are insignificant to the operating system.

Microsoft typically prefixes their section names with a period, but it's not a requirement. While compilers have a standard set of sections that they generate, there's nothing magical about them. You can create and name your own sections, and the linker happily includes them in the executable.

For instance, the statement. Most programs are fine using the default sections emitted by the compiler, but occasionally you may have funky requirements which necessitate putting code or data into a separate section.

Sections don't spring fully formed from the linker; rather, they start out in OBJ files, usually placed there by the compiler. The linker's job is to combine all the required sections from OBJ files and libraries into the appropriate final section in the PE file. For example, each OBJ file in your project probably has at least a. The linker takes all the sections named. Likewise, all the sections named. Code and data from. LIB files are also typically included in an executable, but that subject is outside the scope of this article.

There is a rather complete set of rules that linkers follow to decide which sections to combine and how. A section in an OBJ file may be intended for the linker's use, and not make it into the final executable. A section like this would be intended for the compiler to pass information to the linker.

Sections have two alignment values, one within the disk file and the other in memory. The PE file header specifies both of these values, which can differ. Each section starts at an offset that's some multiple of the alignment value. For instance, in the PE file, a typical alignment would be 0x Thus, every section begins at a file offset that's a multiple of 0x Once mapped into memory, sections always start on at least a page boundary. That is, when a PE section is mapped into memory, the first byte of each section corresponds to a memory page.

Likewise, the. It's possible to create PE files in which the sections start at the same offset in the file as they start from the load address in memory. This makes for larger executables, but can speed loading under Windows 9 x or Windows Me. An interesting linker feature is the ability to merge sections.

If two sections have similar, compatible attributes, they can usually be combined into a single section at link time. For instance, the following linker option combines the. The advantage to merging sections is that it saves space, both on disk and in memory. At a minimum, each section occupies one page in memory. If you can reduce the number of sections in an executable from four to three, there's a decent chance you'll use one less page of memory. Of course, this depends on whether the unused space at the end of the two merged sections adds up to a page.

Things can get interesting when you're merging sections, as there are no hard and fast rules as to what's allowed.

For example, it's OK to merge. Prior to Visual Studio. NET, you could merge. In Visual Studio. NET, this is not allowed, but the linker often merges parts of the. Since portions of the imports data are written to by the Windows loader when they are loaded into memory, you might wonder how they can be put in a read-only section.

Once the imports table is initialized, the pages are then set back to their original protection attributes. In an executable file, there are many places where an in-memory address needs to be specified. For instance, the address of a global variable is needed when referencing it. PE files can load just about anywhere in the process address space. While they do have a preferred load address, you can't rely on the executable file actually loading there.

For this reason, it's important to have some way of specifying addresses that are independent of where the executable file loads. For instance, consider an EXE file loaded at address 0x, with its code section at address 0x The RVA of the code section would be:. To convert an RVA to an actual address, simply reverse the process: add the RVA to the actual load address to find the actual memory address. Want to go spelunking through some arbitrary DLL's data structures in memory?

So we are at the right address. Like other executable files, a PE file has a collection of fields that defines what the rest of file looks like.

The header contains info such as the location and size of code, as we discussed earlier. Signature: It only contains the signature so that it can be easily understandable by windows loader.

The letters P. Machines: This is a number that identifies the type of machine on the target system, such as Intel, AMD, etc. We will target a basic structure like Intel, as shown below:. Number of sections: This defines the size of the section table, which immediately follow the header. Size of the optional header: This lies between top of the optional header and the start of the section table.

This is the size of the optional header that is required for an executable file. This value should be zero for an object file. Characteristics: These are the characteristic flags that indicate an attribute of the object or image file. It has also different flags that are not required for us at this time. We can see the information in the snapshot below. We can see there are lots of headers and it is not possible to cover each and everything in detail due to space limitations, so we will discuss some of the important things that are necessary.

As you see in the above picture, we have two fields that are again categorized into some headers. Magic: The unsigned integer that identifies the state of the image file. The most common number is 0x10b for bit and 0x10b for bit. Before getting into the details, we should know some details of PE that are required here.

RVA relative virtual address : An RVA is nothing but the offset of some item, relative to where it is memory-mapped; or we can simply say that this is an image file and the address of the item after it is loaded into memory, with the base address of image subtracted from memory.

The address of the entry point is the address where the PE loader will begin execution; this is the address that is relative to image base when the executable is loaded into memory. For the program image, this is the starting address; for device drivers, this is the address of the initialization function and, for the DLL, this is optional.

Image base: the preferred address of the image when loaded into memory. The default address is 0x Section alignment: The alignment of the section when loaded into memory. Section alignment can be no less than page size currently bytes on the windows x File alignment : The granularity of the alignment of the sections in the file. For example, if the value in this field is h , each section must start at multiples of bytes.

Size of image: The size of the memory, including all of the headers. However in saying that, if you have custom data that you wish to embed inside of the executable, then placing it inside of a section and identifying it by the section's name can be a good idea since you won't be changing the PE format and your executable will remain compatible with PE tools.

The RVA is the address at where something exists once it's loaded into memory, rather than an offset into the file. To calculate the file's address from an RVA without actually loading the sections into memory, you can use the table of section entries. By using the virtual address and size of each section you can find which section the RVA belongs to, then subtract the difference between the section's virtual address and file offset.

The segments will apear as sections. Using this it is possible to keep C and Asm seperate, as a linker will not automatically merge.

If each section specifies which virtual address to load it in to, you may be wondering how multiple DLLs can exist in one virtual address space without conflict. It is true that most code you'll find in a PE file DLL or otherwise is position dependent and linked to a specific address. However to resolve this issue there exist a structure called a Relocation Table that is attached to each section entry.

The table is basically a HUGE long list of every address stored in that section so you can offset it to the location where you loaded the section. Because addresses can point across section borders, relocations should be done after each section is loaded into memory. Many PE executable most notably all Microsoft updates are signed with a certificate.

These information is stored in the Attribute Certificate Table, pointed by the Data Directory's 5th entry. The format is concatenated signatures, each with the following structure:. If the firmware allows installing more KEK not typical , then you can use other certificates as well. Microsoft uses signtool. CLI works alongside the PE format. Rather than being an extension to the format, it really exists as its own format inside of a format with a completely different way of storing tables and values.

All the. Net data and headers exist inside of sections that are loaded into memory they are loaded into memory since CLI involves heavy language reflection requiring the metadata without thrashing the disk.

The second reason that the. Net metadata exists inside of the sections rather than the PE headers is because the PE loader actually has no concept of.

To determine whether the name itself or an offset is given, test the first 4 bytes for equality to zero. Normally, the Section Value field in a symbol table entry is a one-based index into the section table. However, this field is a signed integer and can take negative values.

The following values, less than one, have special meanings. The Type field of a symbol table entry contains 2 bytes, where each byte represents type information.

The following values are defined for base type, although Microsoft tools generally do not use this field and set the LSB to 0. However, the possible COFF values are listed here for completeness. The most significant byte specifies whether the symbol is a pointer to, function returning, or array of the base type that is specified in the LSB.

Microsoft tools use this field only to indicate whether the symbol is a function, so that the only two resulting values are 0x0 and 0x20 for the Type field. However, other tools can use this field to communicate more information. It is very important to specify the function attribute correctly. This information is required for incremental linking to work correctly.

For some architectures, the information may be required for other purposes. The StorageClass field of the symbol table indicates what kind of definition a symbol represents. The following table shows possible values. Note that the StorageClass field is an unsigned 1-byte integer. The special value -1 should therefore be taken to mean its unsigned equivalent, 0xFF.

Except in the second column heading below, "Value" should be taken to mean the Value field of the symbol record whose interpretation depends on the number found as the storage class.

Auxiliary symbol table records always follow, and apply to, some standard symbol table record. An auxiliary record can have any format that the tools can recognize, but 18 bytes must be allocated for them so that symbol table is maintained as an array of regular size.

Currently, Microsoft tools recognize auxiliary formats for the following kinds of records: function definitions, function begin and end symbols. The traditional COFF design also includes auxiliary-record formats for arrays and structures. A symbol table record marks the beginning of a function definition if it has all of the following: a storage class of EXTERNAL 2 , a Type value that indicates it is a function 0x20 , and a section number that is greater than zero.

Function-definition symbol records are followed by an auxiliary record in the format described below:. For each function definition in the symbol table, three items describe the beginning, ending, and number of lines.

A symbol record named. The Value field gives the number of lines in the function. The Value field has the same number as the Total Size field in the function-definition symbol record. A module can contain an unresolved external symbol sym1 , but it can also include an auxiliary record that indicates that if sym1 is not present at link time, another external symbol sym2 is used to resolve references instead.

If a definition of sym1 is linked, then an external reference to the symbol is resolved normally. If a definition of sym1 is not linked, then all references to the weak external for sym1 refer to sym2 instead. The external symbol, sym2, must always be linked; typically, it is defined in the module that contains the weak reference to sym1. The weak-external symbol record is followed by an auxiliary record with the following format:.

H; instead, the Total Size field is used. This format follows a symbol-table record with storage class FILE The symbol name itself should be.

This format follows a symbol-table record that defines a section. Such a record has a symbol name that is the name of a section such as. The auxiliary record provides information about the section to which it refers. Thus, it duplicates some of the information in the section header.

It is used to associate a token with the COFF symbol table's namespace. The position of this table is found by taking the symbol table address in the COFF header and adding the number of symbols multiplied by the size of a symbol. At the beginning of the COFF string table are 4 bytes that contain the total size in bytes of the rest of the string table.

This size includes the size field itself, so that the value in this location would be 4 if no strings were present. Following the size are null-terminated strings that are pointed to by symbols in the COFF symbol table.

Attribute certificates can be associated with an image by adding an attribute certificate table. The attribute certificate table is composed of a set of contiguous, quadword-aligned attribute certificate entries.

Zero padding is inserted between the original end of the file and the beginning of the attribute certificate table to achieve this alignment. Each attribute certificate entry contains the following fields. The virtual address value from the Certificate Table entry in the Optional Header Data Directory is a file offset to the first attribute certificate entry. Subsequent entries are accessed by advancing that entry's dwLength bytes, rounded up to an 8-byte multiple, from the start of the current attribute certificate entry.

This continues until the sum of the rounded dwLength values equals the Size value from the Certificates Table entry in the Optional Header Data Directory. If the sum of the rounded dwLength values does not equal the Size value, then either the attribute certificate table or the Size field is corrupted.

The first certificate starts at offset 0x from the start of the file on disk. To advance through all the attribute certificate entries:. Alternatively, you can enumerate the certificate entries by calling the Win32 ImageEnumerateCertificates function in a loop.

For a link to the function's reference page, see References. Attribute certificate table entries can contain any certificate type, as long as the entry has the correct dwLength value, a unique wRevision value, and a unique wCertificateType value. Note that some values are not currently supported. If the bCertificate content does not end on a quadword boundary, the attribute certificate entry is padded with zeros, from the end of bCertificate to the next quadword boundary. As stated in the preceding section, the certificates in the attribute certificate table can contain any certificate type.

Certificates that ensure a PE file's integrity may include a PE image hash. A PE image hash or file hash is similar to a file checksum in that the hash algorithm produces a message digest that is related to the integrity of a file. However, a checksum is produced by a simple algorithm and is used primarily to detect whether a block of memory on disk has gone bad and the values stored there have become corrupted. A file hash is similar to a checksum in that it also detects file corruption.

However, unlike most checksum algorithms, it is very difficult to modify a file without changing the file hash from its original unmodified value. A file hash can thus be used to detect intentional and even subtle modifications to a file, such as those introduced by viruses, hackers, or Trojan horse programs.

This is because the act of adding a Certificate changes these fields and would cause a different hash value to be calculated. This data stream remains consistent when certificates are added to or removed from a PE file. Based on the parameters that are passed to ImageGetDigestStream , other data from the PE image can be omitted from the hash computation. These tables were added to the image to support a uniform mechanism for applications to delay the loading of a DLL until the first call into that DLL.

The layout of the tables matches that of the traditional import tables that are described in section 6. The delay-load directory table is the counterpart to the import directory table. It can be retrieved through the Delay Import Descriptor entry in the optional header data directories list offset The table is arranged as follows:. The tables that are referenced in this data structure are organized and sorted just as their counterparts are for traditional imports.

For details, see The. As yet, no attribute flags are defined. The linker sets this field to zero in the image. This field can be used to extend the record by indicating the presence of new fields, or it can be used to indicate behaviors to the delay or unload helper functions. The name of the DLL to be delay-loaded resides in the read-only data section of the image. It is referenced through the szName field. The handle of the DLL to be delay-loaded is in the data section of the image. The phmod field points to the handle.

The supplied delay-load helper uses this location to store the handle to the loaded DLL. The delay-load helper updates these pointers with the real entry points so that the thunks are no longer in the calling loop. The delay import name table INT contains the names of the imports that might require loading.

They are ordered in the same fashion as the function pointers in the IAT. It consists of initialized data in the read-only section that is an exact copy of the original IAT that referred the code to the delay-load thunks. Typical COFF sections contain code or data that linkers and Microsoft Win32 loaders process without special knowledge of the section contents. The contents are relevant only to the application that is being linked or executed.

However, some COFF sections have special meanings when found in object files or image files. Tools and loaders recognize these sections because they have special flags set in the section header, because special locations in the image optional header point to them, or because the section name itself indicates a special function of the section.

Even if the section name itself does not indicate a special function of the section, the section name is dictated by convention, so the authors of this specification can refer to a section name in all cases. The reserved sections and their attributes are described in the table below, followed by detailed descriptions for the section types that are persisted into executables and the section types that contain metadata for extensions.

Some of the sections listed here are marked "object only" or "image only" to indicate that their special semantics are relevant only for object files or image files, respectively. A section that is marked "image only" might still appear in an object file as a way of getting into the image file, but the section has no special meaning to the linker, only to the image file loader.

This section describes the packaging of debug information in object and image files. The next section describes the format of the debug directory, which can be anywhere in the image. Subsequent sections describe the "groups" in object files that contain debug information. The default for the linker is that debug information is not mapped into the address space of the image.

Image files contain an optional debug directory that indicates what form of debug information is present and where it is. This directory consists of an array of debug directory entries whose location and size are indicated in the image optional header. The debug directory can be in a discardable. Each debug directory entry identifies the location and size of a block of debug information. The specified RVA can be zero if the debug information is not covered by a section header that is, it resides in the image file and is not mapped into the run-time address space.

If it is mapped, the RVA is its address. Those functions that do not have FPO information are assumed to have normal stack frames. The format for FPO information is as follows:. If the input does not change, the output PE file is guaranteed to be bit-for-bit identical no matter when or where the PE is produced. The raw data of this debug entry may be empty, or may contain a calculated hash value preceded by a four-byte value that represents the hash value length. Object files can contain.

The linker recognizes these. These are shared types among all of the objects that were compiled by using the precompiled header that was generated with this object. Gathers all relevant debug data from the. Processes that data along with the linker-generated debugging information into the PDB file, and creates a debug directory entry to refer to it.

The linker removes a. The directive string is a series of linker options that are separated by spaces. Each option contains a hyphen, the option name, and any appropriate attribute. If an option contains spaces, the option must be enclosed in quotes. The export data section, named. An overview of the general structure of the export section is described below.

The tables described are usually contiguous in the file in the order shown though this is not required. Only the export directory table and export address table are required to export symbols as ordinals. An ordinal is an export that is accessed directly by its export address table index. The name pointer table, ordinal table, and export name table all exist to support use of export names. When another image file imports a symbol by name, the Win32 loader searches the name pointer table for a matching string.

If a matching string is found, the associated ordinal is identified by looking up the corresponding member in the ordinal table that is, the member of the ordinal table with the same index as the string pointer found in the name pointer table. The resulting ordinal is an index into the export address table, which gives the actual location of the desired symbol. Every export symbol can be accessed by an ordinal. When another image file imports a symbol by ordinal, it is unnecessary to search the name pointer table for a matching string.

Direct use of an ordinal is therefore more efficient. However, an export name is easier to remember and does not require the user to know the table index for the symbol. The export symbol information begins with the export directory table, which describes the remainder of the export symbol information. The export directory table contains address information that is used to resolve imports to the entry points within this image. The export address table contains the address of exported entry points and exported data and absolutes.

An ordinal number is used as an index into the export address table. Each entry in the export address table is a field that uses one of two formats in the following table. If the address specified is not within the export section as defined by the address and length that are indicated in the optional header , the field is an export RVA, which is an actual address in code or data.

A forwarder RVA exports a definition from some other image, making it appear as if it were being exported by the current image.

Thus, the symbol is simultaneously imported and exported. For example, in Kernel The application's import table refers only to Kernel Therefore, the application is not specific to Windows XP and can run on any Win32 system. The export name pointer table is an array of addresses RVAs into the export name table. The pointers are 32 bits each and are relative to the image base. The pointers are ordered lexically to allow binary searches. The export ordinal table is an array of bit unbiased indexes into the export address table.

Ordinals are biased by the Ordinal Base field of the export directory table. In other words, the ordinal base must be subtracted from the ordinals to obtain true indexes into the export address table.

The export name pointer table and the export ordinal table form two parallel arrays that are separated to allow natural field alignment. These two tables, in effect, operate as one table, in which the Export Name Pointer column points to a public exported name and the Export Ordinal column gives the corresponding ordinal for that public name. A member of the export name pointer table and a member of the export ordinal table are associated by having the same position index in their respective arrays.

Thus, when the export name pointer table is searched and a matching string is found at position i, the algorithm for finding the symbol's RVA and biased ordinal is:. When searching for a symbol by biased ordinal, the algorithm for finding the symbol's RVA and name is:. The export name table contains the actual string data that was pointed to by the export name pointer table. The strings in this table are public names that other images can use to import the symbols.

These public export names are not necessarily the same as the private symbol names that the symbols have in their own image file and source code, although they can be. Every exported symbol has an ordinal value, which is just the index into the export address table. Use of export names, however, is optional. Some, all, or none of the exported symbols can have export names.

For exported symbols that do have export names, corresponding entries in the export name pointer table and export ordinal table work together to associate each name with an ordinal. The structure of the export name table is a series of null-terminated ASCII strings of variable length.

All image files that import symbols, including virtually all executable EXE files, have an. A typical file layout for the import information follows:. The import information begins with the import directory table, which describes the remainder of the import information.

The import directory table contains address information that is used to resolve fixup references to the entry points within a DLL image. The import directory table consists of an array of import directory entries, one entry for each DLL to which the image refers. The last directory entry is empty filled with null values , which indicates the end of the directory table. Each entry uses the bit-field format that is described in the following table. The collection of these entries describes all imports from a given DLL.

The last entry is set to zero NULL to indicate the end of the table. The structure and content of the import address table are identical to those of the import lookup table, until the file is bound. These addresses are the actual memory addresses of the symbols, although technically they are still called "virtual addresses.

It is pointed to by the exception table entry in the image data directory. The entries must be sorted according to the function addresses the first field in each structure before being emitted into the final image.

The target platform determines which of the three function table entry format variations described below is used. The base relocation table contains entries for all base relocations in the image. The Base Relocation Table field in the optional header data directories gives the number of bytes in the base relocation table. The base relocation table is divided into blocks. Each block represents the base relocations for a 4K page.

Each block must start on a bit boundary. The loader is not required to process base relocations that are resolved by the linker, unless the load image cannot be loaded at the image base that is specified in the PE header.

The Block Size field is then followed by any number of Type or Offset field entries. Each entry is a WORD 2 bytes and has the following structure:. To apply a base relocation, the difference is calculated between the preferred base address and the base where the image is actually loaded. If the image is loaded at its preferred base, the difference is zero and thus the base relocations do not have to be applied.

TLS is a special storage class that Windows supports in which a data object is not an automatic stack variable, yet is local to each individual thread that runs the code.

Thus, each thread can maintain a different value for a variable declared by using TLS. This implementation enables TLS data to be defined and initialized similarly to ordinary static variables in a program.

Statically declared TLS data objects can be used only in statically loaded image files. This field points to a location where the program expects to receive the TLS index.

The linker looks for this memory image and uses the data there to create the TLS directory. Other compilers that support TLS and work with the Microsoft linker must use this same technique. When a thread is created, the loader communicates the address of the thread's TLS array by placing the address of the thread environment block TEB in the FS register.

This behavior is Intel xspecific. The loader assigns the value of the TLS index to the place that was indicated by the Address of Index field. The code uses the TLS index and the TLS array location multiplying the index by 4 and using it as an offset to the array to get the address of the TLS data area for the given program and module. Each thread has its own TLS data area, but this is transparent to the program, which does not need to know how data is allocated for individual threads.

The TLS array is an array of addresses that the system maintains for each thread. The TLS index indicates which member of the array to use. The index is a number meaningful only to the system that identifies the module. The program can provide one or more TLS callback functions to support additional initialization and termination for TLS data objects. A typical use for such a callback function would be to call constructors and destructors for objects.

Although there is typically no more than one callback function, a callback is implemented as an array to make it possible to add additional callback functions if desired.

If there is more than one callback function, each function is called in the order in which its address appears in the array. A null pointer terminates the array. It is perfectly valid to have an empty list no callback supported , in which case the callback array has exactly one member-a null pointer.

The Reserved parameter should be set to zero. The Reason parameter can take the following values:. Current versions of the Microsoft linker and Windows XP and later versions of Windows use a new version of this structure for bit xbased systems that include reserved SEH technology. This provides a list of safe structured exception handlers that the operating system uses during exception dispatching.

Otherwise, the operating system terminates the application. This helps prevent the "x86 exception handler hijacking" exploit that has been used in the past to take control of the operating system. The Microsoft linker automatically provides a default load configuration structure to include the reserved SEH data.

If the user code already provides a load configuration structure, it must include the new reserved SEH fields.



0コメント

  • 1000 / 1000