News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

EXE Jump Tables

Started by dedndave, May 29, 2009, 05:51:54 PM

Previous topic - Next topic

Mark Jones

#60
AMD x2 4000+ / Win7 Beta x64

For this code, I get
592426

direct indirect
------ --------
36445   37438
4197    24960
35124   23647
36914   25259
30691   7555
32651   40061
26122   16160
34643   33773
11805   26739
24058   33236
17137   26391
35018   21328
23747   35509
29827   30289
31598   28075
17094   32144
27323   29540
21413   26742
26123   33217
35882   35033
------ --------
537812  567096


For the latest,
2 cycles, indirect
0 cycles, direct


Edit: Thanks Dave.
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

dedndave

you have to have "Target" running (a window app) - then run "Test" from a command line

0 cycles - lol - well, that's just wrong - very nice, but wrong

jj2007

Celeron M timings:

7321    cycles for 1000*GetTickCount, indirect
5501    cycles for 1000*GetTickCount, direct
8327    cycles for PostMessage, indirect
8311    cycles for PostMessage, direct


Shorter and faster...

[attachment deleted by admin]

UtillMasm

Core Duo timings:13243   cycles for 1000*GetTickCount, indirect
10443   cycles for 1000*GetTickCount, direct
10851   cycles for PostMessage, indirect
12192   cycles for PostMessage, direct

--- ok ---

MichaelW

Quote from: dedndave on May 31, 2009, 03:52:29 PM
0 cycles - lol - well, that's just wrong - very nice, but wrong

Zero is an entirely reasonable result under the circumstances. The resolution of the TSC is no better than one clock cycle, and recent processors can execute as many a four instructions per cycle. And then you have the inability to completely isolate the timed instructions from the timing instructions, so some of the timed instructions can end up executing in parallel with the timing instructions.
eschew obfuscation

BogdanOntanu

I have split the talks about OBJ generation into this new topic:
http://www.masm32.com/board/index.php?topic=11555.0
Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

ToutEnMasm


Dynamic link suppress the need of jump table.
Interesting question is : What is faster,a dynamic link or a link with a libray ?

hutch--

Yves,

That one is simple, a library gets built into the exe so its address is within local memory space where a DLL procedure has to be loaded. It does not matter in many instances but if the called routine is very small you will see the difference. With a DLL if you get the address from it and load it into a variable or even a register it will tend to be faster as the DLL is aso mapped into the EXE memory space.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

BogdanOntanu

Quote from: ToutEnMasm on June 01, 2009, 07:28:01 AM

Dynamic link suppress the need of jump table.

NO. it is exactly the opposite.

In static linking the linker adds the called procedure code to your code (let us say at the end) and the address is known at link time. Because of this the call is relative and there is no need for anything else to be done at run-time. The problem with static linking is that you can not load/unload procedures/ library at runtime and you can not use static linking for calling OS API.

Dynamic linking is used mainly because the address of the API is NOT known at compile or link time.

Some prefer a direct CALL dword ptr [IAT.API_address] others an CALL near to a jmp.[IAT.Api_address] but one way or another the value of the API_address will be fixed at run time by the OS loader and this can not be done statically.

The whole talk in this tread refers to advantages or disadvantages of using or not using an jump table as an intermediate central steep in between your "invoke API_xxx, ..., ... " in the code and the API address in IAT. Some assemblers/linkers do generate such a table and some do not.

Quote
Interesting question is : What is faster,a dynamic link or a link with a libray ?

Faster when? At execution time or at compile time?

At execution time it is logically that the static solution is faster becaus eit is only a near CALL to a well known address BUT it is not possible to use it for API's because their code and address does change with every new OS version or security update.

The dynamic linking solution can be slightly faster at compile time because a part of the work to be done by the linker is left for the OS loader. However it is logically that it will be slower at run time because at least one intermediate step has to be taken for a call to an API. In the case of a jump table there are 2 (two) such steps to be taken.

Anyway I think that the speed differences at execution time are not worthy to consider because the API itself will perform more operations with parameters checking than the very few cycles saved by avoiding one jump.

Dynamic linking has the advantage of DLL loading / unloading at runtime.

Ambition is a lame excuse for the ones not brave enough to be lazy.
http://www.oby.ro

ecube


11228   cycles for 1000*GetTickCount, indirect
8283    cycles for 1000*GetTickCount, direct
5401    cycles for PostMessage, indirect
5576    cycles for PostMessage, direct

--- ok ---


I wonder howcome PostMessage is apparently faster using the indirect?

jj2007

Quote from: E^cube on June 01, 2009, 03:04:13 PM

11228   cycles for 1000*GetTickCount, indirect
8283    cycles for 1000*GetTickCount, direct
5401    cycles for PostMessage, indirect
5576    cycles for PostMessage, direct

--- ok ---


I wonder howcome PostMessage is apparently faster using the indirect?

For my Celeron, it is a little bit faster, but UtilMasm's Core Duo favours indirect, too. It might be a cache effect of some sorts. Here is an interesting quote:

Quotemain memory is very slow compared to the CPU cache, so code that is slightly larger can cause more cache misses and therefor be slower, even if significantly fewer commands are executed.

in addition frequently the effect isn't direct (i.e. no noticable difference on the code you are changing, but instead the change makes other code slower as it gets evicted from the cache.

Note timings are for 1000 calls to GetTickCount (11/8 cycles on average) but only one to PostMessage. Which means the latter pulls an awful amount of code into the cache.

hutch--

 :bg

I think I have already answerd that one, a call to a function outside the calling app's memory space is REEEEEEEEEEELLLLY SLOOOOOOOOOOOOW where a call to a local label then a jump is not. Older hardware will hide the difference but any later PIV, Core 2 duo, quad etc .... will respond better to a faster pair than a single slow opcode.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

NightWare

Quote from: jj2007 on June 01, 2009, 05:39:37 PM
Which means the latter pulls an awful amount of code into the cache.
::) i'm just curious, can you explain me why a code executed once or two times SHOULD put SOMETHING IN the (code/trace) cache ?
(yeah, now i'm going to proceed by asking question... maybe it will give better results...  :P)

dedndave

lol - i don't even think the guys at intel know how the cache works
and - sometimes, it doesn't

UtillMasm