News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

EXE Jump Tables

Started by dedndave, May 29, 2009, 05:51:54 PM

Previous topic - Next topic

jj2007

Quote from: hutch-- on June 02, 2009, 02:04:20 AM
I think I have already answerd that one, a call to a function outside the calling app's memory space is REEEEEEEEEEELLLLY SLOOOOOOOOOOOOW

Quote from: NightWare on June 02, 2009, 02:41:32 AM
Quote from: jj2007 on June 01, 2009, 05:39:37 PM
Which means the latter pulls an awful amount of code into the cache.
::) i'm just curious, can you explain me why a code executed once or two times SHOULD put SOMETHING IN the (code/trace) cache ?
(yeah, now i'm going to proceed by asking question... maybe it will give better results...  :P)

I am pleased to see that a simple assembly-related question can still provoke so strong reactions, outside the Colosseum.

@Hutch: GetTickCount is outside the calling app's memory space, too. Same behaviour for GetDesktopWindow, see below.
@NightWare: No, I can't explain it. That's why I posted it. Just guessing: Could it be that the code is performing some loops during the 8,000 cycles?? And that these loops finish in the cache??
But maybe you can explain it, and are willing to share your knowledge with us earthlings?

Celeron M:
7268    cycles for 1000*GetTickCount, indirect
5500    cycles for 1000*GetTickCount, direct
16340   cycles for 1000*GetDesktopWindow, indirect
15045   cycles for 1000*GetDesktopWindow, direct
8107    cycles for PostMessage, indirect
8106    cycles for PostMessage, direct



[attachment deleted by admin]

UtillMasm

 :U
12367   cycles for 1000*GetTickCount, indirect
10703   cycles for 1000*GetTickCount, direct
16101   cycles for 1000*GetDesktopWindow, indirect
15087   cycles for 1000*GetDesktopWindow, direct
7229    cycles for PostMessage, indirect
6681    cycles for PostMessage, direct

--- ok ---

sinsi

Surely *any* code we call is in our 'address space' by definition. I think the problem is when we get into the API's that call low-level stuff - ring3 to ring0.
There is a fair bit of overhead involved in that.
Light travels faster than sound, that's why some people seem bright until you hear them.

hutch--

JJ,

> @Hutch: GetTickCount is outside the calling app's memory space, too. Same behaviour for GetDesktopWindow, see below.

So is every other Windows API. You seem to have missed the value of the comment, using the CALL mnemonic with an address outside the app's address space is REAAAAAAAALLLLY SLOOOOOOOOOWWW. I mentioned that a local CALL to a local label followed by a direct jump to the start address is a pair of faster mnemonics than the single CALL directly to an external address.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

sinsi,

> Surely *any* code we call is in our 'address space' by definition.

Nope, system DLLs are loaded at addresses above 2 gig which is above the normal address load range of a non system DLL.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

sinsi

Well, system DLL's are still in our 4gig address space, otherwise we couldn't call them.

I think I'm being pedantic about 'address space' - to someone of the roll-your-own-os crowd, I think we're talking about different things...
Light travels faster than sound, that's why some people seem bright until you hear them.

MichaelW

I though one of the main points of putting system code in DLLs was to avoid having the same code mapped (or perhaps copied is a better term) into multiple processes. I think "virtual" is the key word here.
eschew obfuscation

Tedd

Any DLL you 'import' is loaded into your address space - thus, GetTickCount and GetDesktopWindow are also mapped into your address space -- that's the very reason you can call them.
The physical pages for system DLLs are mapped (once) into the virtual address space of each process that loads them (in the 'shared area' which is usually above the 2GB mark.) User DLLs have an option to make them shared too, so they're probably not shared by default (except by multiple instances of the same application.)
No snowflake in an avalanche feels responsible.

jj2007

Quote from: hutch-- on June 02, 2009, 07:49:09 AM
JJ,

> @Hutch: GetTickCount is outside the calling app's memory space, too. Same behaviour for GetDesktopWindow, see below.

So is every other Windows API. You seem to have missed the value of the comment, using the CALL mnemonic with an address outside the app's address space is REAAAAAAAALLLLY SLOOOOOOOOOWWW. I mentioned that a local CALL to a local label followed by a direct jump to the start address is a pair of faster mnemonics than the single CALL directly to an external address.

Hutch,

I read your comments, and in general I understand them, too. Sorry that I am not able to see its value. Perhaps because my timings say the opposite? I chose GetTickCount because first, it is indeed outside what you call either "the app's address space" or "the app's memory space" (my best guess is you mean "close to the app's core code"), and second, it has little overhead. The timings show that without the extra jmp, it takes 3 or 4 cycles less. Why it sometimes behaves different with the 8,000 cycles instruction PostMessage is beyond my knowledge.

redskull

I don't know if it's applicable, but you can "really" directly call GetTickCount by just executing interrupt 2A (at least, you used to)

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

hutch--

JJ,

Yopu worry me at times, the info I posted is straight, well known system information, Windows API functions reside in system DLLs that are loaded at a DIFFERENT memory address range than application DLLs. Above 2 gig is the action here and within the framework of Windows you can call and run the functions as executable code but cannot write to those address ranges in ring3. Now its not a matter of guesswork, its an OS defined limitation so that you can allocate and use the bottom 2 gig and the OS controls the upper 2 gig.

Now it does not matter which system API call you make, its in the same class as the rest, loaded ABOVE 2 gig which is above application address space. With a system DLL you don't reload it like an application DLL, its already there in memory loaded at startup, that how windows is designed. Now come back to the comparison of direct address CALL to indirect CALL and JMP, the indirect CALL in local memory is MUCH FASTER than a CALL outside the app's address space. The argument remains as to whether the following unconditional JMP is as slow as a direct CALL to an external address.

Now with testing results, you will get variation depending on the age of the hardware and its BUS speed, older hardware hides the difference, later stuff favours the faster pair of instructions. Another factor that interferes with your timing results, do your testing in REAL TIME with absolutely no interpretation for durations of over 500 ms and you will get down under 1% most of the time. The testing uisually requires REAL TIME priority for the most accurate results.

The architectural model you are having problems with has been around for about 15 years, winNT 3.5 and later, there is nothing new, exciting or different, they are 32 bit address range operating systems that have remained more or less compatible for many years. Its not a matter of conjecture, its a matter of simply looking up the reference material.

PS: I should have added, since NT4 you have the layering of system DLLs, NTDLL.DLL and below that you have NTOSKRNL.EXE, disassemble them to see where the work is done and why the choice of CALL mnemonic or the alternative is irrelevant. The system was designed by the VAX guys in the early 90s for Microsoft and among the design considerations is the address table at the end of the executable code.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on June 02, 2009, 02:43:46 PM
JJ,

Yopu worry me at times, ...

:bg
Hutch,

I can only return the compliment - and apologies if I have failed in using your terminology (app memory space, app address space etc) correctly. You might have a look at the posts of Tedd, Sinsi and Michael, they know more about these subtle distinctions than I do. So I limit myself to the observation that the direct call to a "fast" WinApi of the GetTickCount and GetDesktopWindow type is some cycles faster than the indirect version using a call plus a jmp table. Which is precisely the topic of this thread. By the way: What does your P4 say? Haven't seen any P4 timings yet...
:thumbu

Mark Jones

AMD x2 4000+ / Win7 x64
18407   cycles for 1000*GetTickCount, indirect
13751   cycles for 1000*GetTickCount, direct
37443   cycles for 1000*GetDesktopWindow, indirect
56384   cycles for 1000*GetDesktopWindow, direct
45891   cycles for PostMessage, indirect
45762   cycles for PostMessage, direct
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

dedndave

prescotts give funky numbers
they are high and inconsistent
which brings us full-circle back to why I was asking about the jump tables
as you know, Jochen, I am working on "super-daves" (tongue-in-cheek) timing routines for multi/single cores
I am playing with a few different methods of synchronizing threads
when I am done, I'll probably be the only one that uses the code - lol
but, hey, at least someone will be happy   :bg

dedndave

Prescott dual-core @ 3GHz - XP MCE 2005 SP2

18320   cycles for 1000*GetTickCount, indirect
16449   cycles for 1000*GetTickCount, direct
32506   cycles for 1000*GetDesktopWindow, indirect
30223   cycles for 1000*GetDesktopWindow, direct
18201   cycles for PostMessage, indirect
18099   cycles for PostMessage, direct

18297   cycles for 1000*GetTickCount, indirect
14920   cycles for 1000*GetTickCount, direct
32634   cycles for 1000*GetDesktopWindow, indirect
30162   cycles for 1000*GetDesktopWindow, direct
18170   cycles for PostMessage, indirect
18300   cycles for PostMessage, direct

18557   cycles for 1000*GetTickCount, indirect
15239   cycles for 1000*GetTickCount, direct
32524   cycles for 1000*GetDesktopWindow, indirect
30973   cycles for 1000*GetDesktopWindow, direct
18179   cycles for PostMessage, indirect
18276   cycles for PostMessage, direct