News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

PDB age discrepancy

Started by lilhoser, February 20, 2012, 05:47:17 PM

Previous topic - Next topic

lilhoser

Hello GoAsm community,

I'm new to this list, so I apologize if I have posted in the wrong forum.  I have a question about what appears to be a PDB age discrepancy.

Taking this article as a starting point:

http://www.godevtool.com/Other/pdb.htm

To summarize, this article has an example that illustrates the structure of the PDB match information stored inside the second PDB stream in a PDB file (the age+GUID is used to match a PDB to its corresponding EXE to load symbols into a debugger).  In this article, the age that was found in the exe's CODEVIEW data (age 7) matched the age found in the second stream in the PDB file (again, age 7).  However, I am finding pdb's whose internal age does not match the corresponding executable.  This is what I did...

dumpbin /headers c:\windows\system32\ntoskrnl.exe

        Time Type       Size      RVA  Pointer
    -------- ------ -------- -------- --------
    4E02AAA3 cv           25 001A300C   1A260C    Format: RSDS, {47F5C3BF-9E0A-493C-9F63-BB8F6413358B}, 2, ntkrnlmp.pdb
    4E02AAA3 (   A)        4 001A3008   1A2608    BB03197E

Here we see the age is 2 and the GUID is  {47F5C3BF-9E0A-493C-9F63-BB8F6413358B}

So then I asked the MS Symbol server for the PDB symbol file for the same ntoskrnl.exe using symchk:

symchk c:\downloads\ntos\ntoskrnl.exe /v /s SRV*c:\downloads\ntos*http://msdl.microsoft.com/download/symbols

...
DBGHELP: ntoskrnl - public symbols
         c:\downloads\ntos\ntkrnlmp.pdb\47F5C3BF9E0A493C9F63BB8F6413358B2\ntkrnlmp.pdb
[SYMCHK] MODULE64 Info ----------------------
[SYMCHK] Struct size: 1680 bytes
[SYMCHK] Base: 0x0000000140000000
[SYMCHK] Image size: 6197248 bytes
[SYMCHK] Date: 0x4e02aaa3
[SYMCHK] Checksum: 0x0055c228
[SYMCHK] NumSyms: 0
[SYMCHK] SymType: SymPDB
[SYMCHK] ModName: ntoskrnl
[SYMCHK] ImageName: c:\downloads\ntos\ntoskrnl.exe
[SYMCHK] LoadedImage: c:\downloads\ntos\ntoskrnl.exe
[SYMCHK] PDB: "c:\downloads\ntos\ntkrnlmp.pdb\47F5C3BF9E0A493C9F63BB8F6413358B2\ntkrnlmp.pdb"
[SYMCHK] CV: RSDS
[SYMCHK] CV DWORD: 0x53445352
[SYMCHK] CV Data:  ntkrnlmp.pdb
[SYMCHK] PDB Sig:  0
[SYMCHK] PDB7 Sig: {47F5C3BF-9E0A-493C-9F63-BB8F6413358B}
[SYMCHK] Age: 2
[SYMCHK] PDB Matched:  TRUE
[SYMCHK] DBG Matched:  TRUE
[SYMCHK] Line nubmers: FALSE
[SYMCHK] Global syms:  FALSE
[SYMCHK] Type Info:    TRUE
[SYMCHK] ------------------------------------
SymbolCheckVersion  0x00000002
Result              0x00130001
DbgFilename
DbgTimeDateStamp    0x4e02aaa3
DbgSizeOfImage      0x005e9000
DbgChecksum         0x0055c228
PdbFilename
c:\downloads\ntos\ntkrnlmp.pdb\47F5C3BF9E0A493C9F63BB8F6413358B2\ntkrnlmp.pdb
PdbSignature        {47F5C3BF-9E0A-493C-9F63-BB8F6413358B}
PdbDbiAge           0x00000002
[SYMCHK] [ 0x00000000 - 0x00130001 ] Checked "c:\downloads\ntos\ntoskrnl.exe"

SYMCHK: FAILED files = 0
SYMCHK: PASSED + IGNORED files = 1

So, symchk got the PDB with the same GUID and age as what was stored in the binary.  Cool.

However, when I dump the streams from that pdb (using any number of free tools that do so) and view the second stream in a hex editor, the age is not 2:

94 2E 31 01 A7 AA 02 4E [05 00 00 00] [BF C3 F5 47
0A 9E 3C 49 9F 63 BB 8F 64 13 35 8B] 0A 00 00 00
2F 4C 69 6E 6B 49 6E 66 6F 00 01 00 00 00 02 00
00 00 01 00 00 00 02 00 00 00 00 00 00 00 00 00

The GUID in brackets matches the one that was downloaded, but the age is 5 -- not 2.  Can anyone explain this discrepancy?

Thanks in advance,

clive

Quote from: lilhoserCan anyone explain this discrepancy?

Probably not totally satisfactorily, but the data on the symbol server, and the executables have been heavily post-processed. That is to say they have been altered by tools you don't have that strip out information Microsoft doesn't want to see, and they have profiled and performed optimization mapping on the code after the link time creation of the PDB.

The discrepancy is observable in quite a number of files. I personally use the time of compilation/linkage, to confirm A goes with B. I'm pretty confident that Microsoft provides the appropriate/useable file when the symbol server is queried.

DUMPPE...

c:\windows\system32\ntoskrnl.exe   (hex)           (dec)
..

Portable Executable starts at                278
Signature                               00004550 (PE)
Machine                                     014C (Intel 386)
Sections                                    0016
Time Date Stamp                         4E02A381 Wed Jun 22 21:22:57 2011
..

Debug Entry

Chars    TimeDate Maj  Min  Type                   Size     AddrRaw  PtrRaw
-------- -------- ---- ---- ---------------------- -------- -------- --------
00000000 4E02A381 0000 0000 00000002 CODEVIEW      00000025 00115DFC 001155FC
00000000 4E02A381 0235 197E 0000000A RESERVED10    00000004 00115DF8 001155F8

CODEVIEW Debug Info

ntkrnlmp.pdb - 5BADD44F6B58A34D931DBDFE9100357D - 00000002

GUID     {4FD4AD5B-586B-4DA3-931D-BDFE9100357D}

Opening Socket..
Getting Host..
msdl.microsoft.com -> 65.55.10.11
Connecting..
Sending..
Receiving..
Content-Length 2110369


DUMPPDB...

sn     iBlk       Size     Module
0001 : 00000019 : 00000059
iBlk     Blk      FileOffs Size
00000019 00001A18 00686000   59

4E02A381 - 00000005 - TDS/Version - Wed Jun 22 21:22:57 2011
5BADD44F6B58A34D931DBDFE9100357D - 00000005 - GUID/Version
00000000 /LinkInfo
0000000A /names
00000002 00000004 00000001 00000006
0 00000000 0000000A
1 00000009 00000000
2 00000004 00000000

00000000: 94 2E 31 01 81 A3 02 4E - 05 00 00 00 5B AD D4 4F  ..1....N....[..O
00000010: 6B 58 A3 4D 93 1D BD FE - 91 00 35 7D 11 00 00 00  kX.M......5}....
00000020: 2F 4C 69 6E 6B 49 6E 66 - 6F 00 2F 6E 61 6D 65 73  /LinkInfo./names
00000030: 00 02 00 00 00 04 00 00 - 00 01 00 00 00 06 00 00  ................
00000040: 00 00 00 00 00 0A 00 00 - 00 09 00 00 00 00 00 00  ................
00000050: 00 04 00 00 00 00 00 00 - 00                       .........
It could be a random act of randomness. Those happen a lot as well.

dedndave

this may or may not be relevant, but...

there are different versions of ntoskrnl
one of them is selected to become the ntoskrnl that is actually used

ntoskrnl.exe - "standard"
ntkrnlmp.exe - multiprocessor
ntkrnlpa.exe - page address extension
ntkrpamp.exe - multiprocessor and page address extension

also, there are different versions of the linker that generate different pdb formats

lilhoser

I appreciate the ideas.

@clive, Even if the executables were post-processed/stripped/whatever, windbg/symchk still have to know how to match the executable in my windows installation with a PDB file on a symbol server.  It was my understanding that this is done by combining the GUID with the age into a unique string that you see in the symbol path (C:\symbols\<file>\<guid+age>\<file>.pdb).  In both your sample below and my sample, the symbol server retrieves the correct PDB despite the age value in the second PDB stream not matching what's reported by both dumpbin and symchk/windbg.  So my question remains, how is it doing that?  Subtracting some pre-determined value from the pdb age?

@dedndave, correct.  I am testing on ntoskrnl.exe and the symbol server retrieves the symbols for the mp version.  So, I guess that's correct?  There actually aren't that many differences in the two binaries, so it makes sense that the symbol files would be the same.

clive

The symbol fetch, whose URL I redacted, pulls the PDB based on the GUID/Version (nee Age) in the PE executable's RSDS tag.

The name/path on the symbol server is unique based on those codes, as the PDB name is rather ambiguous. The local caching replicates the tree structure of the server as I recall.

So it pulls the 5BADD44F6B58A34D931DBDFE9100357D/00000002 version, you get the 5BADD44F6B58A34D931DBDFE9100357D/00000005 variant which is usable.
It could be a random act of randomness. Those happen a lot as well.

dedndave

Quote from: lilhoser on February 21, 2012, 06:41:24 PM
@dedndave, correct.  I am testing on ntoskrnl.exe and the symbol server retrieves the symbols for the mp version.  So, I guess that's correct?  There actually aren't that many differences in the two binaries, so it makes sense that the symbol files would be the same.

well - there are more differences than one might think   :P
i cannot say that one would have more or different symbols than another
but, it would seem possible or even likely

i ran across this issue when i noticed that the ntoskrnl.exe file in my
System32 folder did not match the one in my System32\dllcache folder

the one in System32 named ntoskrnl.exe does match the
one in System32\dllcache named ntkrnlmp (whew !)

i thought this was a strange way to handle it
it would seem more logical for both files to be named ntoskrnl,
as there is little need for the other versions unless i change out the motherboard

but, the one that gets loaded is selected by the boot.ini file
i guess, if i wanted to, i could boot up with the "standard" version and not have multiprocessing   :P
the naming method they use allows all versions to be modified during hotifx or service pack updates

clive

NT classically booted the MP version during the setup from CD, and then installed the appropriate NTOSKRNL based on the BIOS, APIC, etc. I also seem to vaguely recall Compaq having there own version for high processor count systems, which may have been NUMA vs SMP, but definitely related to the localization of threads/processes/memory in a manner more appropriate to the hardware than the bog-standard Microsoft release which didn't scale as well.

Basically the difference in builds were how the locking/mutex/semaphore type operations were in-lined in a system appropriate way, as a single processor version could assume a serialized execution of the instruction stream without worrying about coherency across processors/cache/memory, and would consequently run much faster than the multi-processor version which had to care about such things.
It could be a random act of randomness. Those happen a lot as well.

lilhoser

Quote from: clive on February 21, 2012, 06:53:54 PM
The symbol fetch, whose URL I redacted, pulls the PDB based on the GUID/Version (nee Age) in the PE executable's RSDS tag.

I kind of feel like we are going in circles here..my apologies.

All of the research I can find on this topic indicates this is not how the debugger matches a PDB to EXE.  It uses the GUID+age, because those are the only two bits of information that are both stored in the EXE PE debug info AND in the PDB stream data (ie, there is no timestamp in PDB format).  Yet, I am seeing Microsoft EXE's whose debug entry references a PDB age that does not match what is stored in the PDB retrieved by the symbol server.  That is my question.

Quote from: clive on February 21, 2012, 06:53:54 PM
The name/path on the symbol server is unique based on those codes, as the PDB name is rather ambiguous. The local caching replicates the tree structure of the server as I recall.
So it pulls the 5BADD44F6B58A34D931DBDFE9100357D/00000002 version, you get the 5BADD44F6B58A34D931DBDFE9100357D/00000005 variant which is usable.

The folder name is <GUID>+<age>.  Ie, for my original example with ntoskrnl.exe:

[SYMCHK] PDB: "c:\downloads\ntos\ntkrnlmp.pdb\47F5C3BF9E0A493C9F63BB8F6413358B2\ntkrnlmp.pdb"
...
PdbSignature        {47F5C3BF-9E0A-493C-9F63-BB8F6413358B}
PdbDbiAge           0x00000002

The GUID is 47F5C3BF-9E0A-493C-9F63-BB8F6413358B and the age is 2, so the folder name is 47F5C3BF9E0A493C9F63BB8F6413358B2

clive

Quote from: lilhoser on February 21, 2012, 08:33:44 PM
I kind of feel like we are going in circles here..my apologies.

It is cataloguing and caching the PDB files based on the values in the EXE file. ie 2

You, in this case, request version 2, it sends you a PDB image with version 5 inside. The treeing/naming all reflects version 2.

The age/version values in the PDB don't always match the EXE values, Microsoft provides you with an equivalent file with a level of detail that mirrors your need-to-know.

PDB files have always had time date stamps, the GUID method only occurred in later versions of the PDB format. As Dave noted there are several, I'll note that some of Microsoft own tools are incapable of parsing older versions properly.

PdbDbiAge           0x00000002 <<<<

This is a ruse, the file is 0x00000005

Notice also the discrepancy of when the PDB was created by LINK, and it's file date from the server.

Microsoft (R) Cabinet Extraction Tool - Version 6.1.7600.16385
Copyright (c) Microsoft Corporation. All rights reserved..

Cabinet symget.cab

06-23-2011 11:43:50p A---     8,752,128 ntkrnlmp.pdb
                 1 File       8,752,128 bytes


06/23/2011  11:43 PM         8,752,128 ntkrnlmp.pdb

sn     iBlk       Size     Module
0001 : 00000020 : 00000059
iBlk     Blk      FileOffs Size
00000020 0000213D 0084F400   59

4E02AAA7 - 00000005 - TDS/Version - Wed Jun 22 21:53:27 2011
BFC3F5470A9E3C499F63BB8F6413358B - 00000005 - GUID/Version


All that matters to the debugger, or more importantly you the user, is that the symbolic information contained in the PDB is in sync with the executable being debugged. ie a direct and correct correlation between addresses and symbols in the two pieces.

I should probably improve the byte swizzling of the GUID in DumpPDB, I just dumped the 16 linear bytes because it was easier.
It could be a random act of randomness. Those happen a lot as well.