The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: dedndave on May 29, 2009, 05:51:54 PM

Title: EXE Jump Tables
Post by: dedndave on May 29, 2009, 05:51:54 PM
do any of you experienced programmers have a simple method of eliminating the need for jump tables at the end of an EXE ?
i am guessing they are inserted by the assembler ?
Title: Re: EXE Jump Tables
Post by: jj2007 on May 29, 2009, 06:22:44 PM
Good question indeed. They don't look very efficient. Maybe it's a noob question: how does the assembler "know" the memory location of GetStdHandle, i.e. 7C812F3A? Is it guaranteed to be the same in all versions of Windows??

0040106A              .  6A F5                       push -0B  ; StdHandle = STD_OUTPUT_HANDLE
0040106C              .  E8 A7000000                 call <jmp.&kernel32.GetStdHandle>
...
00401118               $ FF25 74114000              jmp near dword ptr [<&kernel32.GetStdHandle>]
...
GetStdHandle            8BFF                         mov edi, edi  ; HANDLE kernel32.GetStdHandle(StdHandle)
7C812F3B                55                           push ebp
7C812F3C                8BEC                         mov ebp, esp
Title: Re: EXE Jump Tables
Post by: PBrennick on May 29, 2009, 06:31:06 PM
GetStdHandle, for example, has its laddress stored in a lookup table in kernel.dll so even though there are differing versions of that DLL, the address, which can vary, will always be found.

Paul
Title: Re: EXE Jump Tables
Post by: BogdanOntanu on May 29, 2009, 06:31:52 PM
Quote from: dedndave on May 29, 2009, 05:51:54 PM
do any of you experienced programmers have a simple method of eliminating the need for jump tables at the end of an EXE ?

Simple? Well ...NO. Possible? Yes.

See here for a possible solution: http://www.masm32.com/board/index.php?topic=6519.0;topicseen

The MS libs contain two kinds of API "glue" code: one that will generate a jump table and one that will call "directly".

The jump table is more efficient for many reasons.

AFAIK there are other older threads about this issue also.

Quote
i am guessing they are inserted by the assembler ?

No. Usually this jump table is generated / inserted by the linker when it links extern / API's from LIBs .

However my Sol_Asm assembler does generate the jump tables :D but this is an exception to confirm the above rule.


Title: Re: EXE Jump Tables
Post by: dedndave on May 29, 2009, 06:38:33 PM
well - i read many of the other related threads - i did not see anything specific about eliminating the tables
much ado about why they are there, however
i seem to recall something someplace about eliminating them
and, yes, i can see how a program may load faster with the tables
once it has loaded, however, i would have to think it would be faster without them
as always, the best solution is probably a hybrid, where functions that are speed-critical are referenced without tables
and funtions that are used several times, but are not speed-critical use the tables (similar to procs vs macros argument)

EDIT
@JJ - i don't think it matters if they are the same or not
perhaps running under one OS, they are one value and under a different OS, a different value
from what i gather, they are externals that are unresolved until run-time

surely, if you reference a function 100 different places in the code, the table may be a good way to go
that way, the OS only has to set the value one time when the program is loaded
Title: Re: EXE Jump Tables
Post by: BogdanOntanu on May 29, 2009, 06:48:50 PM
Quote from: jj2007 on May 29, 2009, 06:22:44 PM
Good question indeed.

In fact a relatively irrelevant question unless you are writing a compiler.

Quote
They don't look very efficient.

It depends on the "angle" of view.

- Each "direct" call is still "indirect" in fact from the CPU's point of view.
- Each "direct" call is 1 byte longer than the jump table and this adds up when you use a lot of API in your code.
- Each "direct" call needs relocations and more solving by the linker and thus makes assembly/ linking slower and DLL's bigger
- indirect call's allow you an extra central "hooking" location that can be useful with portable applications and other OS'es (GOT/PLT like/ ready)

Quote
Maybe it's a noob question: how does the assembler "know" the memory location of GetStdHandle, i.e. 7C812F3A? Is it guaranteed to be the same in all versions of Windows??

It does not. IF you generate an OBJ (most common) then the assembler  leaves this task to the linker. The linker uses the information in the LIB's to glue together an jump table or "direct" indirect calls. Both methods make reference to the IAT structure in the PE specification and further fixing is deferred to the OS loader.

The OS loader KNOWS where the API address is in the current OS version/ layout. The OS loader loads the PE executable and patches the IAT with the correct address and since the jump table or the "direct" calls are both referencing the IAT this "magically" and finally solves the problem.

DLL's need further relocations solving but everything else is the same.


Quote
0040106A              .  6A F5                       push -0B  ; StdHandle = STD_OUTPUT_HANDLE
0040106C              .  E8 A7000000                 call <jmp.&kernel32.GetStdHandle>
...


This "jmp.&Kernel32.getStdhandle" is in fact "jmp [iat.dll_01.function_01] UNTIL the OS loader fixes the corect address and Olly is kind enough to show you friendly names. However Olly does this after the OS loader has performed his job.

Check with a hex editor / disassembler to see that in the "cold" executable the values are different.
Title: Re: EXE Jump Tables
Post by: dedndave on May 29, 2009, 06:59:23 PM
i knew i saw it someplace - it was in Hutch's scrap-book of source code

the method was devised by EliCZ - and it does not look all that simple - lol
i may have a go, just as a learning experience

the first download on the page....
http://movsd.com/source.htm
Title: Re: EXE Jump Tables
Post by: BogdanOntanu on May 29, 2009, 07:05:22 PM
Quote from: dedndave on May 29, 2009, 06:38:33 PM
well - i read many of the other related threads - i did not see anything specific about eliminating the tables

The info is there somewhere... I do not recall the more detailed threads exactly but this kind of subject pops in and out periodically in the advanced sections.

Basically IF you import your API functions with specific names like this: __imp__ExitProcess@4 THEN the linker selects an glue code that does not generate an jump table (if the LIB does provide such glue code). The exact details might be slightly diferent but this is the idea.


Quote
and, yes, i can see how a program may load faster with the tables

There are less places to fix at linking (only the jump table) and less relocations at DLL loading time and the jump table is slightly smaller than direct call with a lot of API calls (common case).

Quote
once it has loaded, however, i would have to think it would be faster without them

Oh well... debatable but anyway when you call an API "speed" is no longer of the essence.

Quote
as always, the best solution is probably a hybrid, where functions that are speed-critical are referenced without tables
and funtions that are used several times, but are not speed-critical use the tables (similar to procs vs macros argument)

The jump table is generated ONLY for the API and not for your own functions. None of the API call should be considered speed critical.


Quote
surely, if you reference a function 100 different places in the code, the table may be a good way to go
that way, the OS only has to set the value one time when the program is loaded

The OS loader only fixes one value anyway... and this value is the function address in IAT PE's table (not the jump table).

For DLL's indeed there are more relocations to fix (only if DLL is relocated). One relocation fix is needed for each direct call in code.
Title: Re: EXE Jump Tables
Post by: jj2007 on May 29, 2009, 07:08:07 PM
Very nicely explained, thanxalot, Bogdan :U
Title: Re: EXE Jump Tables
Post by: Vortex on May 29, 2009, 07:18:15 PM
The trick to generate direct calls is based on the declaration of external symbols :

EXTERNDEF _imp__ExitProcess@4:PTR pr1
ExitProcess EQU <_imp__ExitProcess@4>

EXTERNDEF _imp__GetCommandLineA@0:PTR pr0
GetCommandLine EQU <_imp__GetCommandLineA@0>

EXTERNDEF _imp__GetModuleHandleA@4:PTR pr1
GetModuleHandle EQU <_imp__GetModuleHandleA@4>

EXTERNDEF _imp__CreateWindowExA@48:PTR pr12
CreateWindowEx EQU <_imp__CreateWindowExA@48>


pr0, pr1, pr2, pr3 etc. are defined in windows.inc

The creation of the EXTERNDEFs above is automated with Scan.exe

[attachment deleted by admin]
Title: Re: EXE Jump Tables
Post by: dedndave on May 29, 2009, 07:23:40 PM
ahhhh - cool
let me play with that one, too, Vortex - thank you
Title: Re: EXE Jump Tables
Post by: jj2007 on May 29, 2009, 09:05:48 PM
Cool indeed, Vortex - thanks. So that is how the crt_ imp stuff was created.

If I understand correctly, placing the call in the jmp table makes sense for calls that are used more than a few times. So is there a good reason to place GetCommandLine and ExitProcess there?
Title: Re: EXE Jump Tables
Post by: dedndave on May 29, 2009, 09:22:13 PM
i think i will use a selective approach
there are very few instances where i want to reduce overhead as much as possible
these are primarily timing and thread synchronization related functions
i understand that system calls are inherently long-winded, but there is no reason i can't try to get the most out of it
other than that, the tables are probably a much better deal
i am even open to using both methods for a function if i think it is the best approach
for example, if i have a function in one critical spot - i want no table branch
i use the same function several other places - use the table for those

now, if i can just get access to the function-name strings for an error routine, i will be a happy camper - lol
i think i can figure that one out for myself

yes - thank you Bogdan and Vortex both
that is exactly what i was looking for Vortex - very simple
i like the growing window thingy too - lol
Title: Re: EXE Jump Tables
Post by: mitchi on May 29, 2009, 09:30:02 PM
Very interesting read Bogdan. I've learned a few things here!  :bg
What exactly are the relocations you talk about?

Vortex :

Nice tool, nice explanation. So it's really all about the symbols declared in the obj :)
Title: Re: EXE Jump Tables
Post by: dedndave on May 30, 2009, 02:48:13 AM
Jochen -

QuoteIf I understand correctly, placing the call in the jmp table makes sense for calls that are used more than a few times. So is there a good reason to place GetCommandLine and ExitProcess there?

the answer is here, i suspect...

QuoteThe creation of the EXTERNDEFs above is automated with Scan.exe

it is fairly simple to create them manually - or let the Scan.exe make the IMP file and remove the unwanted ones

btw - it seems imperative to use PoLink
Title: Re: EXE Jump Tables
Post by: PBrennick on May 30, 2009, 03:58:45 AM
In this case, yes. PoLink gives you more latitude to do such things. Especially libraries. A lot of the things that are done in the installation of the GeneSys SDK rely on such latitude and Vortex is the one I thank for that. He has put a lot of effort into being a toolmaker. It would probably be a good idea to explore his other tools, also. They are pretty fantastic.

Paul
Title: Re: EXE Jump Tables
Post by: dedndave on May 30, 2009, 04:07:49 AM
funny thing you should mention it Paul
i had just added his site to my bookmarks - lol
there are a lot of nice toys in there - not only for general use, but for learning (which is where i am)
Title: Re: EXE Jump Tables
Post by: jj2007 on May 30, 2009, 05:53:11 AM
Quote from: dedndave on May 30, 2009, 02:48:13 AM
btw - it seems imperative to use PoLink

Not sure what you mean. Code below assembles & links fine wih link.exe and polink.exe...

include \masm32\include\masm32rt.inc

EXTERNDEF _imp__ExitProcess@4:PTR pr1

.code
start:
   ; invoke ExitProcess, 0
   invoke _imp__ExitProcess@4, 0

end start

Title: Re: EXE Jump Tables
Post by: dedndave on May 30, 2009, 06:30:18 AM
ahh - it must be the includes - i have a small program i am working on
my only includes are....

        include    \masm32\include\windows.inc
        include    \masm32\include\kernel32.inc
        includelib \masm32\lib\kernel32.lib

i was trying to write some of the basic functions with no crt or masm32 files - lol

i tried the method in there and get unresolved external with link

anyways - that is a very neat technique
Title: Re: EXE Jump Tables
Post by: Vortex on May 30, 2009, 07:01:17 AM
Hi Jochen,

QuoteCool indeed, Vortex - thanks. So that is how the crt_ imp stuff was created.

If I understand correctly, placing the call in the jmp table makes sense for calls that are used more than a few times. So is there a good reason to place GetCommandLine and ExitProcess there?

In my opinion, all the calls should be placed in the jump table. It's practical for daily programming. It would be interesting to make include files generating direct calls.

Polink is not the only option. It's my favourite MS COFF linker. MS link.exe can be used too.
Title: Re: EXE Jump Tables
Post by: hutch-- on May 30, 2009, 07:21:17 AM
The answer to the question is contained in the masm32 project. Look in "tools\l2extia\" read the text file and how to use the exe file to create as many of your own include files as you need. This allows you to use the less efficient direct call form in the binary output code.. For what its worth the jump table is more efficient.
Title: Re: EXE Jump Tables
Post by: sinsi on May 30, 2009, 07:27:52 AM
If you have multiple calls to an API in a proc, it is nice to be able to load a register from the import dword and invoke using that register, that way you get the checking that invoke uses (and the code is smaller).
Title: Re: EXE Jump Tables
Post by: dedndave on May 30, 2009, 07:33:42 AM
i don't understand what you mean sinsi
Title: Re: EXE Jump Tables
Post by: jj2007 on May 30, 2009, 07:34:03 AM
Quote from: BogdanOntanu on May 29, 2009, 06:31:52 PM
The jump table is more efficient for many reasons.
Quote from: hutch-- on May 30, 2009, 07:21:17 AM
For what its worth the jump table is more efficient.

I hear the message but I don't get it. Why is a call plus a jmp, e.g. for ExitProcess, more efficient than a call without a jmp? Because the linker and/or the OS loader need a few nanoseconds less? That can't be the reason...
Title: Re: EXE Jump Tables
Post by: dedndave on May 30, 2009, 07:37:12 AM
well - i can see it if the function is used several times - well - more than 2, let's say
Title: Re: EXE Jump Tables
Post by: sinsi on May 30, 2009, 07:55:56 AM
dedndave, here's what I meant

prwsprintf TYPEDEF PROTO C :DWORD, :VARARG
pwsprintf  TYPEDEF PTR prwsprintf
EXTERNDEF _imp__wsprintfA:pwsprintf
wsprintf TEXTEQU <_imp__wsprintfA>

...
  mov esi,wsprintf
  assume esi:pwsprintf
  invoke esi,blah,blah
  ...
  invoke esi,blah,blah,blah
  ...
  ret
  assume esi:nothing

Of course, that was my noob days, now I push/push/call like a real asm programmer  :bdg

I asked about this once before here - http://www.masm32.com/board/index.php?topic=5486.15
Title: Re: EXE Jump Tables
Post by: jj2007 on May 30, 2009, 08:17:48 AM
Quote from: dedndave on May 30, 2009, 07:37:12 AM
well - i can see it if the function is used several times - well - more than 2, let's say

5*5+6=31
5*6=30

More than 5...

invoke ExitProcess, 0

Quote00401001               ?  6A 00                      push 0
00401003               ?  E8 00000000                call <jmp.&kernel32.ExitProcess>
00401008               ?  FF25 40104000              jmp near dword ptr [<&kernel32.ExitProcess>]

The red bytes are the offset :bg

Title: Re: EXE Jump Tables
Post by: dedndave on May 30, 2009, 07:08:01 PM
ahhh - that's a good one to know about also sinsi - thanks

EDIT - is call/jmp from a register faster than immediate ?
Title: Re: EXE Jump Tables
Post by: mitchi on May 30, 2009, 07:40:40 PM
The Visual C++ optimizer does that with ESI or EDI when you call the same function a lot of times...
Since they have contacts with the Intel guys and AMD guys, I assume that it's a bit faster.
Title: Re: EXE Jump Tables
Post by: dedndave on May 30, 2009, 08:28:29 PM
lol @ "contacts" - they see each other every night at bedtime
Title: Re: EXE Jump Tables
Post by: dedndave on May 30, 2009, 09:25:38 PM
at the location of the invoke is a CALL relative
that branches to a JMP dword ptr [nnnnnnnn] indirect
the value at that nnnnnnnn address is the address of the api function

this code works

        mov     esi,labelA-4
        mov     esi,labelA[esi+2]
        mov     eax,[esi]
        call    eax
        exit

        INVOKE  GetCurrentProcess
labelA  label   dword


but this code does not work

        jmp short test01

test00: INVOKE  GetCurrentProcess
labelA  label   dword
        exit

test01: mov     esi,labelA-4
        mov     esi,labelA[esi+2]
        mov     eax,[esi]
        sub     eax,offset labelA
        mov     labelA-4,eax
        jmp     test00


just like microsoft - take me right up to the point of almost an orgasm,
then show me a picture of rosie o'donnell and spray cold water on me
Title: Re: EXE Jump Tables
Post by: hutch-- on May 31, 2009, 01:09:46 AM
There were always alternatives, GetProcEddress gives you a callable DWORD address but you can cut another corner, copy the API to local app memory set to execute and run the API within your own app. On win9x systems you got a speed increase, don't know about NT based versions.

The reason why I never lost much sleep over it is most API calls are so slow that a million cycles here and there don't matter much and you lose nothing like that much with address call variations.
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 01:21:48 AM
i suspect that would get you a security violation with win2K or higher, Hutch
but that would be a nice technique to see how some of the functions work
Title: Re: EXE Jump Tables
Post by: hutch-- on May 31, 2009, 01:36:40 AM
Nah, thats not the problem, if you can call an address you can also copy from it, its more to do with how the internals of the OS work, API calls the NTDLL.DLL, some procedures within that call even lower level DLLs so the best you can get from it is one level reduction in the call layers. Back in the win9x days the technique seemed to work best on GDI calls but that was for a simple reason, a lot of Win9x GDI was written in MASM.
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 03:46:08 AM
ok - i isolated the bad guy....
Quotebut this code does not work

        jmp short test01

test00: INVOKE  GetCurrentProcess
labelA  label   dword
        exit

test01: mov     esi,labelA-4
        mov     esi,labelA[esi+2]
        mov     eax,[esi]
        sub     eax,offset labelA
        mov     labelA-4,eax
        jmp     test00

this line assembles fine, but crashes the program

        mov     labelA-4,eax

i made a temporary work-around by placing the "labelA-4" address in esi then mov [esi],eax
that crashes also

i am working on that

now, i need somebody really sharp to tell me about re-based PE's - lol
Vortex maybe ?
here is my question....
if the PE gets re-based at load-time, do these tables become far jumps ?
and another question.....
is there a way to force an exe to be re-based for testing purposes ?
Title: Re: EXE Jump Tables
Post by: Neo on May 31, 2009, 06:27:11 AM
This is a bit of a tangent, but with the assembler I've built into Inventor IDE (http://www.codecortex.com/ide/), I don't generate jump tables, since despite Bogdan's explanation of efficiency:

All that said, I really don't think there's a huge performance difference (since imported functions are usually pretty lengthy anyway), but if I had to choose, my money would be on that not using jump tables is slightly more efficient overall.  Anyone up for a tough performance testing challenge?  :wink

P.S. There are currently other issues with importing libraries other than kernel32/user32/gdi32 in Inventor IDE (it is just an alpha after all), so it's far from perfect.  I'm just using it as an example w.r.t. jump tables versus no jump tables.
Title: Re: EXE Jump Tables
Post by: BogdanOntanu on May 31, 2009, 07:04:14 AM
Quote from: Neo on May 31, 2009, 06:27:11 AM
This is a bit of a tangent, but with the assembler I've built into Inventor IDE (http://www.codecortex.com/ide/), I don't generate jump tables, since despite Bogdan's explanation of efficiency:

It is my own preference to have jump tables. I do not claim "efficiency" at run time. I claim that it does not matter much at run time. I would add an option to disable jump table generation for my own assembler if this makes users happy.

Quote

  • You need to call each imported function an average of >5 times before jump tables are more space efficient in terms of the code alone (or at all if the executable isn't relocatable.)

Yes this is true but not related to relocations... it is related to the code size of a relative jump versus and absolute indirect jump.

Quote

  • The extra relocations are only there when you specify that you want the executable/library to be relocatable, which isn't the default for executables.

But it is the default and needed for DLL's.

Besides run-time or load time relocations there is another kind of relocations that are generated inside the OBJ. Unlike the run-time kind of relocations those kind of compile time relocations are mandatory if you generate OBJ's and link multiple modules.

It will take one such relocation for each API call in an executable with no jump table. With jump tables it will only take one for each API used. Hence compilation and linking speed is helped here and this was my primary concern since I create huge ASM projects.

Quote
...
  Plus, relocations are only resolved upon starting the application, whereas the extra jump is done every time an import is called.  Each import appears only once in the Import Address Table and Import Lookup Table, regardless of whether there are any relocations or not.

Yes, true but once you call an API speed is no longer of the essence.

Yes each import only appears once in the IAT table BUT each direct call requires an run time relocation (in a DLL).


Quote

  • You don't need the standard lib files at all if the assembler knows that it can call the functions through the import table, which is why you don't need any lib files to assemble Windows apps with Inventor IDE.

Ok, this is nice advertising for your Inventor IDE... I will check it out. Is it written in full ASM?

FYI Sol_Asm does not require any kind of libs when directly creating an Executable/DLL/binary. Neither does FASM or NASM AFAIK etc... In fact neither does MASM for generating the OBJ... The libs are only needed by the linker when it links multiple OBJ's.

This feature is in no way related to the subject.

However and assembler that can NOT produce OBJ's in order to be linked together by a linker has a huge miss feature. "Most" professional projects out there involve generating OBJ's and then linking them together to create the final executable.

After all the jump table method is also calling through the very same import table. Is the API calls in code that are relative  in one case and absolute indirect in another case but both methods do reach the very same IAT Table in the end.

Quote
All that said, I really don't think there's a huge performance difference (since imported functions are usually pretty lengthy anyway), but if I had to choose, my money would be on that not using jump tables is slightly more efficient overall.

I prefer jump tables because the run time speed improvement is not worthy in this case, the size of the executable/dll is potentially smaller, the compilation speed is bigger, the load time is faster and OBJ size is smaller. If i want speed then I choose better algorithms, write my own functions to reduce API's overhead but I do not try to optimize every opcode/byte/cycle.

However this is my personal preference.


Title: Re: EXE Jump Tables
Post by: BogdanOntanu on May 31, 2009, 07:27:59 AM
Quote from: dedndave on May 31, 2009, 03:46:08 AM
...
now, i need somebody really sharp to tell me about re-based PE's - lol
Vortex maybe ?
here is my question....
if the PE gets re-based at load-time, do these tables become far jumps ?
and another question.....
is there a way to force an exe to be re-based for testing purposes ?

By "re-based" I guess you mean relocated at run time. There is another tool named exactly "rebase" that can "cold" change the preferred load address of an executable or DLL after compile time.

Quote
if the PE gets re-based at load-time, do these tables become far jumps ?

No. Everything that is absolute must be relocated in this case but the jumps remain "near".

There is no use for "far" jumps in normal user mode win32 programming. Everything is near in flat protected mode (win32) but some addresses in code are absolute (not relative) and those addresses need to be changed IF the base address is changed.

Quote
is there a way to force an exe to be re-based for testing purposes ?

EXE's are rarely (if ever) relocated in Win32. Only if they are DLL's in disguise or plugins to be loaded by another EXE. The default load / base address of and PE EXE is normally free at EXE's  load time.

However DLL's are often relocated because you can not be sure of the load order and memory position of all DLL's needed for an EXE / process.

One way to force a run time relocation to occur is to have 2 DLL's compiled for the very same preferred base address and then load them by hand one after another. The second one must be relocated by the OS loader because it's address space is already occupied by the first DLL.

Another method would be to compile an EXE for a preferred address (other than the default 0x40_0000) that is already in use by the OS.

If the PE EXE has run time relocations stored inside then you can use that "re-base" tool to change it's base address even after compile time.
Title: Re: EXE Jump Tables
Post by: jj2007 on May 31, 2009, 07:40:03 AM
Quote from: dedndave on May 30, 2009, 09:25:38 PM
at the location of the invoke is a CALL relative
that branches to a JMP dword ptr [nnnnnnnn] indirect
the value at that nnnnnnnn address is the address of the api function


Here is the simplest variant for calling with a register:

include \masm32\include\masm32rt.inc

.code
start:
mov esi, MessageBox
push MB_OK
push chr$("Hello")
push chr$("Called via esi")
push 0
call esi

exit

end start


Slightly more sophisticated:
include \masm32\include\masm32rt.inc

MBox = 0
Exit = 4

.data
MyJumpTable dd MessageBox, ExitProcess

.code
start:
mov esi, offset MyJumpTable
push MB_OK
push chr$("Hello")
push chr$("Called via esi")
push 0
call dword ptr [esi+MBox]
push 0
call dword ptr [esi+Exit]

end start


But whether that is more efficient... no idea
Title: Re: EXE Jump Tables
Post by: UtillMasm on May 31, 2009, 08:16:25 AM
 :U
very clean, i like this more:
comment #
@echo off
\masm32\bin\ml.exe /c /coff /Focall2.obj /nologo call2.asm
\masm32\bin\link.exe /subsystem:windows /out:call2.exe call2.obj /nologo
pause
#
include\masm32\include\masm32rt.inc
MBox=0
Exit=4
.data
MyJumpTable dd MessageBox,ExitProcess
.code
start:mov esi,offset MyJumpTable
push MB_OK
push chr$("Hello")
push chr$("Called via esi")
push 0
call dword ptr[esi+MBox]
push 0
call dword ptr[esi+Exit]
end start

and like radasm msg jump table too.
:wink
Title: Re: EXE Jump Tables
Post by: Vortex on May 31, 2009, 08:37:08 AM
Hi dedndave,

Quoteis there a way to force an exe to be re-based for testing purposes ?

You would like to have a look at the thread Loading and running EXEs and DLLs from memory (http://www.masm32.com/board/index.php?topic=3150.0) The EXE\DLL is loaded to a memory address allocated by VirtualAlloc
Title: Re: EXE Jump Tables
Post by: hutch-- on May 31, 2009, 08:50:36 AM
First, there is more to an API call than the difference between a direct call in code and an indirect call through an address table. A call outside the running app memory space is measurably slower than an internal call where a direct JMP to an address usually is not. For the indirect method you get a fast call and a fast JMP, with the direct method you get a slow call. The example I have picked is SendMessageA which gets bashed in an app a massive number of times which justifies it being placed in an address table to save space and usually be in cache.

I doubt you could successfully benchmark the difference but indirect calls never went slower than the direct call.


; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷
; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷
; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷


004011FB 6A00                   push    0
004011FD 6A02                   push    2
004011FF 6811010000             push    111h
00401204 FF3550304000           push    dword ptr [403050h]
0040120A E86D000000             call    jmp_SendMessageA

jmp_SendMessageA:               jmp     dword ptr [SendMessageA]

; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷
; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷
; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷


00401212 6A00                   push    0
00401214 6A02                   push    2
00401216 6811010000             push    111h
0040121B FF3550304000           push    dword ptr [403050h]
00401221 FF1518204000           call    dword ptr [SendMessageA]


; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷
; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷
; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷
Title: Re: EXE Jump Tables
Post by: jj2007 on May 31, 2009, 09:42:06 AM
Since we are all in the brainstorming mode now, here one more idea to play with:

include \masm32\include\masm32rt.inc

MBox = 121
Exit = 0

.data
MyJumpTable dd ExitProcess
dd 120 dup(0) ; 120 slots for other API's
dd MessageBox

.data?
RetAdd dd ?
ChkEsp dd ?

.code
start:
mov ChkEsp, esp

mov esi, Scheduler

push MB_OK
push chr$("Hello")
push chr$("Called via esi")
push 0
push MBox ; MBox = 121
call esi ; works but is only one byte shorter

invoke MessageBox, 0, chr$("The conventional way"), chr$("Title"), MB_OK

sub ChkEsp, esp
MsgBox 0, str$(ChkEsp), "Esp diff=0?", MB_OK

push 0 ; ret 0
push 0 ; Exit
call esi

Scheduler proc
  pop RetAdd
  pop eax
  lea eax, [MyJumpTable+4*eax]
  call dword ptr [eax]
  jmp RetAdd
Scheduler endp

end start


It works, it's probably utterly slow, but for code size freaks it might be interesting :bg
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 10:28:58 AM
Quotethis code works

        mov     esi,labelA-4       ;get the relative address from INVOKE
        mov     esi,labelA[esi+2]  ;get the address part of the indirect JMP
        mov     eax,[esi]          ;get the API target refered to in the JMP
        call    eax
        exit

        INVOKE  GetCurrentProcess
labelA  label   dword
notice that the IAT method takes:
4 bytes in the INVOKE code
6 bytes for the indirect JMP
4 more bytes for the target
-------------------
14 bytes total

and, while it may be true that "CALL reg" direct may be faster than "CALL near rel"
let's not forget that we have to get the target address into the register to begin with

i think, overall, the fastest would be "CALL near rel" (E8 nn nn nn nn)
which is what the INVOKE currently uses
if we eliminate the IAT table, as well as the target address, we reduce the byte-count by 10
reducing bytes is nice, but let's face it, not a big issue with todays storage sizes
if i have 100 different API calls, that's only 1 KB - not an issue

the only problem i am having at the moment is that the OS
will not let me over-write the 4 bytes in the CALL instruction of the INVOKE sequence
i suspect that this is a write protection fault, for obvious security reasons

because i intend to replace the operand in only a few select places,
i can work around this by using something other than an INVOKE or CALL
need be, i can hard code it like this:

        db 0E8h
labelB  db 4 dup(?)

and fill it in during initialization

while it is true that most API calls are slow to begin with, there are a few that are relatively fast
i would have to think that QueryPerformanceCounter is fairly fast, as an example,
because there isn't a lot of decision-making to be done - just gimme 2 dword values

as i mentioned before, i am interested in synchronizing threads with the "highest resolution possible"
i am trying to develop a technique for timing evaluation code on single/multi core machines
the idea is, to have one thread perform the timing operation, while another thread runs the code
the eval code thread needs to be ready to run, then
once the time-keeping thread has read it's initial timer value, it will release the eval code thread for execution
the reason for the dual-thread method is that some machines have more than one core
on those machines, the TSC needs to be run with a process affinity mask of only one selected core
the eval thread can be run with all cores selected, or whatever the test calls for
i am trying to keep the overhead of the SetProcessAffinityMask function out of the evaluation measurement
Title: Re: EXE Jump Tables
Post by: MichaelW on May 31, 2009, 11:19:59 AM
I'm not sure that all of this is correct. I selected PostMessage instead of SendMessage because PostMessage returns immediately without waiting for the window procedure to process the message. If the cycle count is not more than a few hundred cycles my P3 normally returns very consistent counts. I can't get consistent results here, partly because the cycle counts are too high, and I think partly because the called function has a variable execution time. In any case, under Windows 2000 I can see no significant difference (or if there is, it's smaller than the variation).

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      hwndTarget dd 0
      itotal     dd 0
      dtotal     dd 0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    EXTERNDEF _imp__PostMessageA@16:NEAR PTR

    invoke FindWindow, NULL, chr$("TARGET")
    mov hwndTarget, eax
    print ustr$(hwndTarget),13,10,13,10
    .IF hwndTarget

      nops 3

      push 0
      push 0
      push WM_NULL
      push hwndTarget
      call _imp__PostMessageA@16

      nops 3

      invoke PostMessage, hwndTarget, WM_NULL, 0, 0

      nops 3

      print "direct indirect",13,10
      print "------ -------- ",13,10

      invoke Sleep, 4000

      REPEAT 20

        counter_begin 1000, REALTIME_PRIORITY_CLASS
          push 0
          push 0
          push WM_NULL
          push hwndTarget
          call _imp__PostMessageA@16
        counter_end
        add dtotal, eax
        print ustr$(eax),9

        counter_begin 1000, REALTIME_PRIORITY_CLASS
          invoke PostMessage, hwndTarget, WM_NULL, 0, 0
        counter_end
        add itotal, eax
        print ustr$(eax),13,10

      ENDM

      print "------ -------- ",13,10
      print ustr$(dtotal), 9
      print ustr$(itotal),13,10,13,10

    .ENDIF

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


00401048 90                     nop
00401049 90                     nop
0040104A 90                     nop
0040104B 6A00                   push    0
0040104D 6A00                   push    0
0040104F 6A00                   push    0
00401051 FF3500504000           push    dword ptr [405000h]
00401057 FF1534404000           call    dword ptr [PostMessageA]
0040105D 90                     nop
0040105E 90                     nop
0040105F 90                     nop
00401060 6A00                   push    0
00401062 6A00                   push    0
00401064 6A00                   push    0
00401066 FF3500504000           push    dword ptr [405000h]
0040106C E827270000             call    fn_00403798
00401071 90                     nop
00401072 90                     nop
00401073 90                     nop
. . .
00403798                    fn_00403798:
00403798 FF2534404000           jmp     dword ptr [PostMessageA]

Typical results on my P3:

direct indirect
------ --------
1332    1206
1202    1198
1191    1204
1189    1192
1192    1206
1192    1199
1190    1252
1189    1196
1210    1202
1190    1192
1197    1209
1201    1196
1190    1201
1188    1193
1190    1206
1190    1192
1191    1201
1187    1191
1192    1203
1190    1191
------ --------
23993   24030



[attachment deleted by admin]
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 11:28:13 AM
dual-core prescott....

13828614

direct indirect
------ --------
32608   32794
32285   33110
33037   33008
32435   33149
33087   32608
32551   33496
32165   28947
31687   31494
29901   31369
31580   30794
31328   31350
30840   30946
31516   30736
31468   31378
30782   31489
31116   31100
30823   31301
31092   32632
33431   31313
30686   30589
------ --------
634418  633603

as you can see, my numbers are slightly higher - lol
sumpin's not right, here - this dual-core @ 3 Ghz performs fairly well
Title: Re: EXE Jump Tables
Post by: MichaelW on May 31, 2009, 11:38:15 AM
You're running XP? My numbers were for 2000.
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 11:39:07 AM
yes - xp sp2 - and holding - lol
Title: Re: EXE Jump Tables
Post by: jj2007 on May 31, 2009, 11:40:05 AM
Why not try a "faster" API?

.nolist
include \masm32\include\masm32rt.inc
.686
include \masm32\macros\timers.asm
LOOP_COUNT = 100000

EXTERNDEF _imp__GetTickCount@0:PTR pr0

.code
start:
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT 100
   invoke GetTickCount
ENDM
counter_end
print str$(eax), 9, "cycles for 100*GetTickCount, indirect", 13, 10

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT 100
  invoke _imp__GetTickCount@0
ENDM
counter_end
print str$(eax), 9, "cycles for 100*GetTickCount, direct", 13, 10

inkey chr$(13, 10, "--- ok ---", 13)
exit
end start
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 11:46:16 AM
yes - i was thinking the same thing
here is a short list of some that i think should be fast...

GetProcessAffinityMask
QueryPerformanceCounter
GetCurrentProcess
CreateTimerQueue
EnterCriticalSection
LeaveCriticalSection
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 11:50:39 AM
i am seeing a 300 cycle diff JJ - 1700 vs 1400

but - all these timing measurements are going to prevent peeps from seeing my post about the replacement code - lol
(this sentance is intended to make them go back and look)
Title: Re: EXE Jump Tables
Post by: UtillMasm on May 31, 2009, 11:51:42 AM
Intel Core Duo 1.83Ghz with Vista SP2
9570552

direct indirect
------ --------
3661    4399
28143   9589
29073   23326
28886   28982
21307   16514
29851   37303
34544   14749
20351   1805
12799   10889
14832   28986
21583   13989
15321   6758
16165   15626
9959    1466
15746   23869
3090    2793
4223    2726
4613    3774
3178    1507
2827    1698
------ --------
320152  250748

Press any key to exit...
Title: Re: EXE Jump Tables
Post by: MichaelW on May 31, 2009, 11:51:56 AM
QuoteWhy not try a "faster" API?

I did, I selected one that was faster than SendMessage. If you want to test something fast, forget the API and code a DLL with a procedure that contains only a RET, and call it by the same mechanisms.
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 11:55:36 AM
well - we know that result
it is certain synchronization calls that are of primary interest
but there are some fast APIs (just not the ones we always want to be fast - lol)

btw Michael - you are making me think about switching to Win2K - lol
Title: Re: EXE Jump Tables
Post by: MichaelW on May 31, 2009, 12:33:01 PM
For a minimal procedure the direct call is consistently 2 cycles faster.

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc

    aret PROTO
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
      hInstance dd 0
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

LibMain proc instance:DWORD,reason:DWORD,unused:DWORD

    .if reason == DLL_PROCESS_ATTACH
      push instance
      pop hInstance
      mov eax, TRUE

    .elseif reason == DLL_PROCESS_DETACH

    .elseif reason == DLL_THREAD_ATTACH

    .elseif reason == DLL_THREAD_DETACH

    .endif

    ret

LibMain endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

aret proc
    ret
aret endp

; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««

end LibMain


; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm

    aret PROTO

    EXTERNDEF _imp__aret@0:NEAR PTR
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    nops 3
    call aret
    nops 3
    call _imp__aret@0
    nops 3

    invoke Sleep, 4000

    counter_begin 1000, HIGH_PRIORITY_CLASS
      call aret
    counter_end
    print ustr$(eax)," cycles, indirect",13,10

    counter_begin 1000, HIGH_PRIORITY_CLASS
      call _imp__aret@0
    counter_end
    print ustr$(eax)," cycles, direct",13,10,13,10

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start


00401000 90                     nop
00401001 90                     nop
00401002 90                     nop
00401003 E8F2010000             call    fn_004011FA
00401008 90                     nop
00401009 90                     nop
0040100A 90                     nop
0040100B FF1530204000           call    dword ptr [aret]
00401011 90                     nop
00401012 90                     nop
00401013 90                     nop
. . .
004011FA                    fn_004011FA:
004011FA FF2530204000           jmp     dword ptr [aret]


3 cycles, indirect
1 cycles, direct



[attachment deleted by admin]
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 12:39:09 PM
you guys are not paying attention
look at the code i posted
GET THE TARGET ADDRESS
Title: Re: EXE Jump Tables
Post by: MichaelW on May 31, 2009, 12:44:55 PM
Who pays attention over the weekend?
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 12:46:44 PM
lol - is it the weekend already ?
Title: Re: EXE Jump Tables
Post by: UtillMasm on May 31, 2009, 01:12:18 PM
for great weekend. :wink

btw: to dear MichaelW, which one for these text files? :wink

[attachment deleted by admin]
Title: Re: EXE Jump Tables
Post by: MichaelW on May 31, 2009, 01:39:39 PM
I can only guess ANSI or Western European.
Title: Re: EXE Jump Tables
Post by: Mark Jones on May 31, 2009, 03:49:56 PM
AMD x2 4000+ / Win7 Beta x64

For this (http://www.masm32.com/board/index.php?topic=11541.msg86615#msg86615) code, I get
592426

direct indirect
------ --------
36445   37438
4197    24960
35124   23647
36914   25259
30691   7555
32651   40061
26122   16160
34643   33773
11805   26739
24058   33236
17137   26391
35018   21328
23747   35509
29827   30289
31598   28075
17094   32144
27323   29540
21413   26742
26123   33217
35882   35033
------ --------
537812  567096


For the latest,
2 cycles, indirect
0 cycles, direct


Edit: Thanks Dave.
Title: Re: EXE Jump Tables
Post by: dedndave on May 31, 2009, 03:52:29 PM
you have to have "Target" running (a window app) - then run "Test" from a command line

0 cycles - lol - well, that's just wrong - very nice, but wrong
Title: Re: EXE Jump Tables
Post by: jj2007 on May 31, 2009, 05:23:56 PM
Celeron M timings:

7321    cycles for 1000*GetTickCount, indirect
5501    cycles for 1000*GetTickCount, direct
8327    cycles for PostMessage, indirect
8311    cycles for PostMessage, direct


Shorter and faster...

[attachment deleted by admin]
Title: Re: EXE Jump Tables
Post by: UtillMasm on May 31, 2009, 05:55:59 PM
Core Duo timings:13243   cycles for 1000*GetTickCount, indirect
10443   cycles for 1000*GetTickCount, direct
10851   cycles for PostMessage, indirect
12192   cycles for PostMessage, direct

--- ok ---
Title: Re: EXE Jump Tables
Post by: MichaelW on May 31, 2009, 10:45:26 PM
Quote from: dedndave on May 31, 2009, 03:52:29 PM
0 cycles - lol - well, that's just wrong - very nice, but wrong

Zero is an entirely reasonable result under the circumstances. The resolution of the TSC is no better than one clock cycle, and recent processors can execute as many a four instructions per cycle. And then you have the inability to completely isolate the timed instructions from the timing instructions, so some of the timed instructions can end up executing in parallel with the timing instructions.
Title: Re: EXE Jump Tables
Post by: BogdanOntanu on June 01, 2009, 07:23:01 AM
I have split the talks about OBJ generation into this new topic:
http://www.masm32.com/board/index.php?topic=11555.0
Title: Re: EXE Jump Tables
Post by: ToutEnMasm on June 01, 2009, 07:28:01 AM

Dynamic link suppress the need of jump table.
Interesting question is : What is faster,a dynamic link or a link with a libray ?
Title: Re: EXE Jump Tables
Post by: hutch-- on June 01, 2009, 08:00:46 AM
Yves,

That one is simple, a library gets built into the exe so its address is within local memory space where a DLL procedure has to be loaded. It does not matter in many instances but if the called routine is very small you will see the difference. With a DLL if you get the address from it and load it into a variable or even a register it will tend to be faster as the DLL is aso mapped into the EXE memory space.
Title: Re: EXE Jump Tables
Post by: BogdanOntanu on June 01, 2009, 10:54:34 AM
Quote from: ToutEnMasm on June 01, 2009, 07:28:01 AM

Dynamic link suppress the need of jump table.

NO. it is exactly the opposite.

In static linking the linker adds the called procedure code to your code (let us say at the end) and the address is known at link time. Because of this the call is relative and there is no need for anything else to be done at run-time. The problem with static linking is that you can not load/unload procedures/ library at runtime and you can not use static linking for calling OS API.

Dynamic linking is used mainly because the address of the API is NOT known at compile or link time.

Some prefer a direct CALL dword ptr [IAT.API_address] others an CALL near to a jmp.[IAT.Api_address] but one way or another the value of the API_address will be fixed at run time by the OS loader and this can not be done statically.

The whole talk in this tread refers to advantages or disadvantages of using or not using an jump table as an intermediate central steep in between your "invoke API_xxx, ..., ... " in the code and the API address in IAT. Some assemblers/linkers do generate such a table and some do not.

Quote
Interesting question is : What is faster,a dynamic link or a link with a libray ?

Faster when? At execution time or at compile time?

At execution time it is logically that the static solution is faster becaus eit is only a near CALL to a well known address BUT it is not possible to use it for API's because their code and address does change with every new OS version or security update.

The dynamic linking solution can be slightly faster at compile time because a part of the work to be done by the linker is left for the OS loader. However it is logically that it will be slower at run time because at least one intermediate step has to be taken for a call to an API. In the case of a jump table there are 2 (two) such steps to be taken.

Anyway I think that the speed differences at execution time are not worthy to consider because the API itself will perform more operations with parameters checking than the very few cycles saved by avoiding one jump.

Dynamic linking has the advantage of DLL loading / unloading at runtime.

Title: Re: EXE Jump Tables
Post by: ecube on June 01, 2009, 03:04:13 PM

11228   cycles for 1000*GetTickCount, indirect
8283    cycles for 1000*GetTickCount, direct
5401    cycles for PostMessage, indirect
5576    cycles for PostMessage, direct

--- ok ---


I wonder howcome PostMessage is apparently faster using the indirect?
Title: Re: EXE Jump Tables
Post by: jj2007 on June 01, 2009, 05:39:37 PM
Quote from: E^cube on June 01, 2009, 03:04:13 PM

11228   cycles for 1000*GetTickCount, indirect
8283    cycles for 1000*GetTickCount, direct
5401    cycles for PostMessage, indirect
5576    cycles for PostMessage, direct

--- ok ---


I wonder howcome PostMessage is apparently faster using the indirect?

For my Celeron, it is a little bit faster, but UtilMasm's Core Duo favours indirect, too. It might be a cache effect of some sorts. Here is an interesting quote (http://kerneltrap.org/node/553/2131):

Quotemain memory is very slow compared to the CPU cache, so code that is slightly larger can cause more cache misses and therefor be slower, even if significantly fewer commands are executed.

in addition frequently the effect isn't direct (i.e. no noticable difference on the code you are changing, but instead the change makes other code slower as it gets evicted from the cache.

Note timings are for 1000 calls to GetTickCount (11/8 cycles on average) but only one to PostMessage. Which means the latter pulls an awful amount of code into the cache.
Title: Re: EXE Jump Tables
Post by: hutch-- on June 02, 2009, 02:04:20 AM
 :bg

I think I have already answerd that one, a call to a function outside the calling app's memory space is REEEEEEEEEEELLLLY SLOOOOOOOOOOOOW where a call to a local label then a jump is not. Older hardware will hide the difference but any later PIV, Core 2 duo, quad etc .... will respond better to a faster pair than a single slow opcode.
Title: Re: EXE Jump Tables
Post by: NightWare on June 02, 2009, 02:41:32 AM
Quote from: jj2007 on June 01, 2009, 05:39:37 PM
Which means the latter pulls an awful amount of code into the cache.
::) i'm just curious, can you explain me why a code executed once or two times SHOULD put SOMETHING IN the (code/trace) cache ?
(yeah, now i'm going to proceed by asking question... maybe it will give better results...  :P)
Title: Re: EXE Jump Tables
Post by: dedndave on June 02, 2009, 02:48:41 AM
lol - i don't even think the guys at intel know how the cache works
and - sometimes, it doesn't
Title: Re: EXE Jump Tables
Post by: UtillMasm on June 02, 2009, 03:02:18 AM
 :clap: :green2
Title: Re: EXE Jump Tables
Post by: jj2007 on June 02, 2009, 06:30:38 AM
Quote from: hutch-- on June 02, 2009, 02:04:20 AM
I think I have already answerd that one, a call to a function outside the calling app's memory space is REEEEEEEEEEELLLLY SLOOOOOOOOOOOOW

Quote from: NightWare on June 02, 2009, 02:41:32 AM
Quote from: jj2007 on June 01, 2009, 05:39:37 PM
Which means the latter pulls an awful amount of code into the cache.
::) i'm just curious, can you explain me why a code executed once or two times SHOULD put SOMETHING IN the (code/trace) cache ?
(yeah, now i'm going to proceed by asking question... maybe it will give better results...  :P)

I am pleased to see that a simple assembly-related question can still provoke so strong reactions, outside the Colosseum.

@Hutch: GetTickCount is outside the calling app's memory space, too. Same behaviour for GetDesktopWindow, see below.
@NightWare: No, I can't explain it. That's why I posted it. Just guessing: Could it be that the code is performing some loops during the 8,000 cycles?? And that these loops finish in the cache??
But maybe you can explain it, and are willing to share your knowledge with us earthlings?

Celeron M:
7268    cycles for 1000*GetTickCount, indirect
5500    cycles for 1000*GetTickCount, direct
16340   cycles for 1000*GetDesktopWindow, indirect
15045   cycles for 1000*GetDesktopWindow, direct
8107    cycles for PostMessage, indirect
8106    cycles for PostMessage, direct



[attachment deleted by admin]
Title: Re: EXE Jump Tables
Post by: UtillMasm on June 02, 2009, 06:45:03 AM
 :U
12367   cycles for 1000*GetTickCount, indirect
10703   cycles for 1000*GetTickCount, direct
16101   cycles for 1000*GetDesktopWindow, indirect
15087   cycles for 1000*GetDesktopWindow, direct
7229    cycles for PostMessage, indirect
6681    cycles for PostMessage, direct

--- ok ---
Title: Re: EXE Jump Tables
Post by: sinsi on June 02, 2009, 07:44:32 AM
Surely *any* code we call is in our 'address space' by definition. I think the problem is when we get into the API's that call low-level stuff - ring3 to ring0.
There is a fair bit of overhead involved in that.
Title: Re: EXE Jump Tables
Post by: hutch-- on June 02, 2009, 07:49:09 AM
JJ,

> @Hutch: GetTickCount is outside the calling app's memory space, too. Same behaviour for GetDesktopWindow, see below.

So is every other Windows API. You seem to have missed the value of the comment, using the CALL mnemonic with an address outside the app's address space is REAAAAAAAALLLLY SLOOOOOOOOOWWW. I mentioned that a local CALL to a local label followed by a direct jump to the start address is a pair of faster mnemonics than the single CALL directly to an external address.
Title: Re: EXE Jump Tables
Post by: hutch-- on June 02, 2009, 07:52:03 AM
sinsi,

> Surely *any* code we call is in our 'address space' by definition.

Nope, system DLLs are loaded at addresses above 2 gig which is above the normal address load range of a non system DLL.
Title: Re: EXE Jump Tables
Post by: sinsi on June 02, 2009, 07:59:14 AM
Well, system DLL's are still in our 4gig address space, otherwise we couldn't call them.

I think I'm being pedantic about 'address space' - to someone of the roll-your-own-os crowd, I think we're talking about different things...
Title: Re: EXE Jump Tables
Post by: MichaelW on June 02, 2009, 10:05:07 AM
I though one of the main points of putting system code in DLLs was to avoid having the same code mapped (or perhaps copied is a better term) into multiple processes. I think "virtual" is the key word here.
Title: Re: EXE Jump Tables
Post by: Tedd on June 02, 2009, 12:05:14 PM
Any DLL you 'import' is loaded into your address space - thus, GetTickCount and GetDesktopWindow are also mapped into your address space -- that's the very reason you can call them.
The physical pages for system DLLs are mapped (once) into the virtual address space of each process that loads them (in the 'shared area' which is usually above the 2GB mark.) User DLLs have an option to make them shared too, so they're probably not shared by default (except by multiple instances of the same application.)
Title: Re: EXE Jump Tables
Post by: jj2007 on June 02, 2009, 12:43:05 PM
Quote from: hutch-- on June 02, 2009, 07:49:09 AM
JJ,

> @Hutch: GetTickCount is outside the calling app's memory space, too. Same behaviour for GetDesktopWindow, see below.

So is every other Windows API. You seem to have missed the value of the comment, using the CALL mnemonic with an address outside the app's address space is REAAAAAAAALLLLY SLOOOOOOOOOWWW. I mentioned that a local CALL to a local label followed by a direct jump to the start address is a pair of faster mnemonics than the single CALL directly to an external address.

Hutch,

I read your comments, and in general I understand them, too. Sorry that I am not able to see its value. Perhaps because my timings say the opposite? I chose GetTickCount because first, it is indeed outside what you call either "the app's address space" or "the app's memory space" (my best guess is you mean "close to the app's core code"), and second, it has little overhead. The timings show that without the extra jmp, it takes 3 or 4 cycles less. Why it sometimes behaves different with the 8,000 cycles instruction PostMessage is beyond my knowledge.
Title: Re: EXE Jump Tables
Post by: redskull on June 02, 2009, 02:15:14 PM
I don't know if it's applicable, but you can "really" directly call GetTickCount by just executing interrupt 2A (at least, you used to)

-r
Title: Re: EXE Jump Tables
Post by: hutch-- on June 02, 2009, 02:43:46 PM
JJ,

Yopu worry me at times, the info I posted is straight, well known system information, Windows API functions reside in system DLLs that are loaded at a DIFFERENT memory address range than application DLLs. Above 2 gig is the action here and within the framework of Windows you can call and run the functions as executable code but cannot write to those address ranges in ring3. Now its not a matter of guesswork, its an OS defined limitation so that you can allocate and use the bottom 2 gig and the OS controls the upper 2 gig.

Now it does not matter which system API call you make, its in the same class as the rest, loaded ABOVE 2 gig which is above application address space. With a system DLL you don't reload it like an application DLL, its already there in memory loaded at startup, that how windows is designed. Now come back to the comparison of direct address CALL to indirect CALL and JMP, the indirect CALL in local memory is MUCH FASTER than a CALL outside the app's address space. The argument remains as to whether the following unconditional JMP is as slow as a direct CALL to an external address.

Now with testing results, you will get variation depending on the age of the hardware and its BUS speed, older hardware hides the difference, later stuff favours the faster pair of instructions. Another factor that interferes with your timing results, do your testing in REAL TIME with absolutely no interpretation for durations of over 500 ms and you will get down under 1% most of the time. The testing uisually requires REAL TIME priority for the most accurate results.

The architectural model you are having problems with has been around for about 15 years, winNT 3.5 and later, there is nothing new, exciting or different, they are 32 bit address range operating systems that have remained more or less compatible for many years. Its not a matter of conjecture, its a matter of simply looking up the reference material.

PS: I should have added, since NT4 you have the layering of system DLLs, NTDLL.DLL and below that you have NTOSKRNL.EXE, disassemble them to see where the work is done and why the choice of CALL mnemonic or the alternative is irrelevant. The system was designed by the VAX guys in the early 90s for Microsoft and among the design considerations is the address table at the end of the executable code.
Title: Re: EXE Jump Tables
Post by: jj2007 on June 02, 2009, 04:42:55 PM
Quote from: hutch-- on June 02, 2009, 02:43:46 PM
JJ,

Yopu worry me at times, ...

:bg
Hutch,

I can only return the compliment - and apologies if I have failed in using your terminology (app memory space, app address space etc) correctly. You might have a look at the posts of Tedd, Sinsi and Michael, they know more about these subtle distinctions than I do. So I limit myself to the observation that the direct call to a "fast" WinApi of the GetTickCount and GetDesktopWindow type is some cycles faster than the indirect version using a call plus a jmp table. Which is precisely the topic of this thread. By the way: What does your P4 say? Haven't seen any P4 timings yet...
:thumbu
Title: Re: EXE Jump Tables
Post by: Mark Jones on June 02, 2009, 06:10:44 PM
AMD x2 4000+ / Win7 x64
18407   cycles for 1000*GetTickCount, indirect
13751   cycles for 1000*GetTickCount, direct
37443   cycles for 1000*GetDesktopWindow, indirect
56384   cycles for 1000*GetDesktopWindow, direct
45891   cycles for PostMessage, indirect
45762   cycles for PostMessage, direct
Title: Re: EXE Jump Tables
Post by: dedndave on June 02, 2009, 06:14:47 PM
prescotts give funky numbers
they are high and inconsistent
which brings us full-circle back to why I was asking about the jump tables
as you know, Jochen, I am working on "super-daves" (tongue-in-cheek) timing routines for multi/single cores
I am playing with a few different methods of synchronizing threads
when I am done, I'll probably be the only one that uses the code - lol
but, hey, at least someone will be happy   :bg
Title: Re: EXE Jump Tables
Post by: dedndave on June 02, 2009, 06:18:31 PM
Prescott dual-core @ 3GHz - XP MCE 2005 SP2

18320   cycles for 1000*GetTickCount, indirect
16449   cycles for 1000*GetTickCount, direct
32506   cycles for 1000*GetDesktopWindow, indirect
30223   cycles for 1000*GetDesktopWindow, direct
18201   cycles for PostMessage, indirect
18099   cycles for PostMessage, direct

18297   cycles for 1000*GetTickCount, indirect
14920   cycles for 1000*GetTickCount, direct
32634   cycles for 1000*GetDesktopWindow, indirect
30162   cycles for 1000*GetDesktopWindow, direct
18170   cycles for PostMessage, indirect
18300   cycles for PostMessage, direct

18557   cycles for 1000*GetTickCount, indirect
15239   cycles for 1000*GetTickCount, direct
32524   cycles for 1000*GetDesktopWindow, indirect
30973   cycles for 1000*GetDesktopWindow, direct
18179   cycles for PostMessage, indirect
18276   cycles for PostMessage, direct
Title: Re: EXE Jump Tables
Post by: jj2007 on June 02, 2009, 06:22:37 PM
Quote from: dedndave on June 02, 2009, 06:14:47 PM
prescotts give funky numbers
they are high and inconsistant
which brings us full-circle back to why I was asking about the jump tables
as you know, Jochen, I am working on "super-daves" (tongue-in-cheek) timing routines for multi/single cores
i am playing with a few different methods of synchronizing threads
when I am done, I'll probably be the only one that uses the code - lol
but, hey, at least someone will be happy   :bg

Maybe I'll use them, too - if the price is right :bg

@Mark: Thanks for the timings, and for increasing the confusion :wink
45,000 instead of 8,000 for PostMessage is quite a big jmp - is that 64-bit progress??? ::)
Title: Re: EXE Jump Tables
Post by: dedndave on June 02, 2009, 06:25:11 PM
that is MS progress - they are getting smarter
newer OS's have built-in obsolecence
with XP, they have to look for ways to make it bad
they learned their lesson
when they want you to buy "Windows 8", they already have a plan for making you dislike 7
Title: Re: EXE Jump Tables
Post by: NightWare on June 02, 2009, 10:03:22 PM
Quote from: jj2007 on June 02, 2009, 06:30:38 AM
@NightWare: No, I can't explain it. That's why I posted it. Just guessing: Could it be that the code is performing some loops during the 8,000 cycles?? And that these loops finish in the cache??
But maybe you can explain it, and are willing to share your knowledge with us earthlings?
well, the problem when you use a WINapi, is you use an API TREE, in the case of PostMessage you call an api from a dll, and this dll will call a function from another dll (ntdll.dll). so the function put something in the cache (coz there is a loop for printing digits, certainly rep movsb or something like that...). but it's only few bytes, anyway not enough to explain the slowdown... for MOST of the algo nothing will be put on the cache, coz there is no reason ! (no code called frequently enough...). now why it's slow ?
1. because most of the algo read (and execute after) data from memory or l2 cache.
2. because you call/read code from SEVERALS memory or l2 cache locations (coz severals dll), and remember that memory IS SLOW.

EDIT :
3. i've forgotten the cost of the mispredictions generated by the calls and libraries jmp table. (note : and it's something usefull to know, there is no possible misprediction for ret, coz the return address is automatically stored by the call instruction...)
Title: Re: EXE Jump Tables
Post by: jj2007 on June 02, 2009, 11:09:47 PM
Quote from: redskull on June 02, 2009, 02:15:14 PM
I don't know if it's applicable, but you can "really" directly call GetTickCount by just executing interrupt 2A (at least, you used to)

-r

Strangely enough, that still works under XP SP2...

include \masm32\include\masm32rt.inc

.code
start:
print "Diff int 2A - GetTickCount = "
invoke GetTickCount
push eax
invoke Sleep, 500
int 2ah
pop ecx
sub eax, ecx
print str$(eax), 13, 10
getkey
exit

end start

Title: Re: EXE Jump Tables
Post by: dedndave on June 03, 2009, 02:02:20 AM
hmmmmmm - "int 2Ah" - that looks familiar, somehow - is that a foriegn language?
i wonder what other interrupts work with 32-bit
Title: Re: EXE Jump Tables
Post by: hutch-- on June 03, 2009, 01:12:05 PM
Here is a simple benchmark that tests in real time, has no interpretive code embedded in it and makes no assumptions about how to get the results, it simply adds each result to a variable then after the test has completed it divided each total by 8 to get the average.


1500 indirect call
1515 direct call
1500 indirect call
1500 direct call
1500 indirect call
1516 direct call
1515 indirect call
1516 direct call
1500 indirect call
1515 direct call
1516 indirect call
1516 direct call
1500 indirect call
1516 direct call
1500 indirect call
1500 direct call

1503 average indirect call timing
1511 average direct call timing

Press any key to continue ...


The only deviation in the results are due to granularity in GetTickCount results but with the sample size the deviation is far under 1%. Timing shows there is no meaningful difference between the speed of indirect versus direct calls on the Prescott 3.2 gig PIV I am using. This is consistent with my other PIVs. Uninterpreted real time testing is the only safe way to decide matter like this for the very little it is worth, the more complicated and interpreted the testing method becomes, the more unreliable its results are.

The test piece.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *


    externdef _imp__GetTickCount@0:PTR pr0
    GetTickCountX equ <_imp__GetTickCount@0>


    .data?
      value dd ?

    .data
      item dd 0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL cnt1  :DWORD
    LOCAL cnt2  :DWORD

    mov cnt1, 0
    mov cnt2, 0

    lpcnt equ <300000000>

    invoke SetPriorityClass,rv(GetCurrentProcess),REALTIME_PRIORITY_CLASS

    invoke SleepEx,100,0

    push esi

  ; =================================================

    REPEAT 8

    mov esi, lpcnt

    invoke GetTickCount
    push eax

  @@:
    invoke GetTickCount     ; << tested API
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx
    add cnt1, eax
    print str$(eax)," indirect call",13,10

    invoke SleepEx,100,0

    mov esi, lpcnt

    invoke GetTickCount
    push eax

  @@:
    invoke GetTickCountX    ; << tested API
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx
    add cnt2, eax
    print str$(eax)," direct call",13,10

    invoke SleepEx,100,0

    ENDM

  ; =================================================

    invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS

  ; format the output
  ; -----------------

    print chr$(13,10)

    shr cnt1, 3
    print str$(cnt1)," average indirect call timing",13,10

    shr cnt2, 3
    print str$(cnt2)," average direct call timing",13,10,13,10


    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Title: Re: EXE Jump Tables
Post by: UtillMasm on June 03, 2009, 01:37:44 PM
 :U@echo off
\masm32\bin\ml.exe /c /coff /Fohutch.obj /nologo hutch.asm
\masm32\bin\link.exe /subsystem:console /out:hutch.exe hutch.obj /nologo
pause
2407 indirect call
2047 direct call
2390 indirect call
2046 direct call
2219 indirect call
2047 direct call
2219 indirect call
2031 direct call
2218 indirect call
1875 direct call
2219 indirect call
2031 direct call
2218 indirect call
2047 direct call
2204 indirect call
2047 direct call

2261 average indirect call timing
2021 average direct call timing

Press any key to continue ...
Title: Re: EXE Jump Tables
Post by: hutch-- on June 03, 2009, 01:39:43 PM
UtillMasm,

Are you using a Core 2 Duo ?
Title: Re: EXE Jump Tables
Post by: UtillMasm on June 03, 2009, 01:40:27 PM
Core Duo 1.83ghz
Title: Re: EXE Jump Tables
Post by: ecube on June 04, 2009, 07:16:53 AM
This explains the int 2ah http://www.masm32.com/board/index.php?topic=7010.0
Title: Re: EXE Jump Tables
Post by: Vortex on June 07, 2009, 06:27:19 PM
Direct call function declarations moved to a custom invoke macro :


_invoke MACRO FuncName:REQ,args:VARARG

LOCAL counter,counter2,params

params      TEXTEQU <>
counter      = 0
counter2     = 0

    FOR param,<args>

        counter=counter+1

    ENDM

    counter2 = 4*counter


    EXTERNDEF @CatStr(_imp__&FuncName&@,%counter2) : PTR @CatStr(<pr>,%counter)

    FuncName EQU <@CatStr(_imp__&FuncName&@,%counter2)>

    IF counter
        invoke  FuncName,args
       
    ELSE
        invoke  FuncName
       
    ENDIF

ENDM

[attachment deleted by admin]
Title: Re: EXE Jump Tables
Post by: BlackVortex on June 07, 2009, 06:36:55 PM
I haven't read the whole thread, but I just want to add that goasm+golink don't use import jump table. The calls to the API point directly to the import table.   :cheekygreen:
Title: Re: EXE Jump Tables
Post by: dedndave on June 07, 2009, 07:09:59 PM
i think some guys are still missing the point altogether

        mov     esi,labelA-4
        mov     esi,labelA[esi+2]
        mov     eax,[esi]
        call    eax
        exit

        INVOKE  GetCurrentProcess
labelA  label   dword

this code gets the address for a direct API call
in this example, it is not the same thing as _imp__GetCurrentProcess
(http://www.awicons.com/stock-icons/aero-icons/preview/arrow-up-red.png)
it's all good, though - i have what i wanted - lol
Title: Re: EXE Jump Tables
Post by: mitchi on June 07, 2009, 07:14:44 PM
Quote from: Vortex on June 07, 2009, 06:27:19 PM
Direct call function declarations moved to a custom invoke macro :


_invoke MACRO FuncName:REQ,args:VARARG

LOCAL counter,counter2,params

params      TEXTEQU <>
counter      = 0
counter2     = 0

    FOR param,<args>

        counter=counter+1

    ENDM

    counter2 = 4*counter


    EXTERNDEF @CatStr(_imp__&FuncName&@,%counter2) : PTR @CatStr(<pr>,%counter)

    FuncName EQU <@CatStr(_imp__&FuncName&@,%counter2)>

    IF counter
        invoke  FuncName,args
       
    ELSE
        invoke  FuncName
       
    ENDIF

ENDM


WoW!!! That's just sweet Vortex!
So the next time we need help in a thread, we can do _invoke Vortex now  :green

Title: Re: EXE Jump Tables
Post by: ecube on June 07, 2009, 07:26:38 PM
Quote from: BlackVortex on June 07, 2009, 06:36:55 PM
I haven't read the whole thread, but I just want to add that goasm+golink don't use import jump table. The calls to the API point directly to the import table.   :cheekygreen:

that's because Jeremy is a genius.
Title: Re: EXE Jump Tables
Post by: BlackVortex on June 07, 2009, 09:24:20 PM
Quote from: E^cube on June 07, 2009, 07:26:38 PM
Quote from: BlackVortex on June 07, 2009, 06:36:55 PM
I haven't read the whole thread, but I just want to add that goasm+golink don't use import jump table. The calls to the API point directly to the import table.   :cheekygreen:

that's because Jeremy is a genius.
IDA loses the ball, it doesn't recognize/resolve imports correctly on the GoTools executable I tried   :green2

And I thought it was a very advanced analyzer/disassembler (I absolutely never use it)  :P
Title: Re: EXE Jump Tables
Post by: hutch-- on June 08, 2009, 03:11:07 AM
Fortunately MASM users have the choice of either.  :bg
Title: Re: EXE Jump Tables
Post by: BlackVortex on June 08, 2009, 04:49:21 AM
Quote from: hutch-- on June 08, 2009, 03:11:07 AM
Fortunately MASM users have the choice of either.  :bg
How ? You mean the MS linker has an option for that ?

EDIT: Oh, I see custom macro weirdness. Goddamnit I hate macros   :eek
Title: Re: EXE Jump Tables
Post by: dedndave on June 08, 2009, 04:52:42 AM
well - Vortex has one method - i think i have a better one, though
Title: Re: EXE Jump Tables
Post by: UtillMasm on June 08, 2009, 05:11:22 AM
i hate macro and the damnit english.
:wink
Title: Re: EXE Jump Tables
Post by: hutch-- on June 08, 2009, 05:17:03 AM
 :bg

> How ? You mean the MS linker has an option for that ?

No, ML.EXE, pick your prototype type, get the style of calling you want, either direct or indirect. Ain't MASM great !  :P
Title: Re: EXE Jump Tables
Post by: dedndave on June 08, 2009, 05:23:46 AM
Quotei hate macro and the damnit english.
wink

that's "damned English" - lol
and - we are all glad ms is not in Beijing
writing code in Chinese - instructions actually WOULD execute - it'd be like a death sentance on every line

somehow, i don't imagine UtillMasm is much of a swearer
Title: Re: EXE Jump Tables
Post by: rags on June 08, 2009, 09:45:37 AM
QuoteNo, ML.EXE, pick your prototype type, get the style of calling you want, either direct or indirect. Ain't MASM great !
Hutch (or anyone) , how does a function's prototype affect whether the function is called directly or indirectly through a jump table?
How is a function prototyped to get a direct call to the api?
Title: Re: EXE Jump Tables
Post by: hutch-- on June 08, 2009, 09:57:23 AM
Mike,

Without knowing the mechanism and how MASM was written, the best I can offer is that when you use one type of prototype, the assembler produces the code for an indirect call, when you use another type of prototype you get a direct call.
Title: Re: EXE Jump Tables
Post by: sinsi on June 08, 2009, 10:27:52 AM
indirect

ExitProcess proto :dword


direct

p1 typedef proto :dword

EXTERNDEF _imp__ExitProcess@4:PTR p1
ExitProcess TEXTEQU <_imp__ExitProcess@4>


I think the library defines which one will be used - if you declare using _imp__ prefix then direct will be used.
Title: Re: EXE Jump Tables
Post by: Vortex on June 08, 2009, 04:44:10 PM
Quote from: UtillMasm on June 08, 2009, 05:11:22 AM
i hate macro and the damnit english.
:wink

Why do you hate macros? Don't you use invoke? It's a macro.
Title: Re: EXE Jump Tables
Post by: BlackVortex on June 08, 2009, 05:12:52 PM
I started the macro hating trend, so I will respond.  I hate macros that I don't know about. Invoke rocks my socks !

When I use macros and then look at my disassembled code when debugging, I see all kinds of weird crap between my nice code. It feels so out of place, like it's not my code. Also, their implementation feels weird and unituitive, so I've never learned how to create even the simplest macro.
Title: Re: EXE Jump Tables
Post by: dedndave on June 08, 2009, 05:53:29 PM
QuoteI started the macro hating trend, so I will respond.
that's not fair - i have been no big fan of them for a long time - lol
actually, macros can be your friend
the problems stem from using macros created by someone other than yourself
some of them are helpful in getting a program up and running
but, i would prefer to replace many of them with my own code at the end of the day
this is true for any kind of program that i intend to distribute
for programs of my own use, or for forum discussion/distribution - the macros are great
we all speak the same "macro" language in here
i also have to say - i have learned a lot by looking at how the macros were written
Title: Re: EXE Jump Tables
Post by: rags on June 09, 2009, 12:22:55 AM
Thanks Hutch and Sinsi for the explanations. :U
Title: Re: EXE Jump Tables
Post by: hutch-- on June 09, 2009, 12:28:55 AM
BlackVortex,

There is a trick to it, read the documentation for the macro, look at how its written and if you don't like it, improve it. The action with macros is multifold, at the simplest its just a shortcut to get something done, at its most sophisticated it put the programmer in charge of language design without having some compiler designer holding your hand telling you what you can and cannot do.
Title: Re: EXE Jump Tables
Post by: BlackVortex on June 09, 2009, 01:07:56 AM
Quote from: hutch-- on June 09, 2009, 12:28:55 AM
BlackVortex,

There is a trick to it, read the documentation for the macro, look at how its written and if you don't like it, improve it.
Touche   :thumbu

But I don't really need it, using procs is enough for me. The last thing I need is more red tape.
Title: Re: EXE Jump Tables
Post by: Vortex on June 09, 2009, 10:00:55 AM
Hi BlackVortex,

This document from MS can help you :

MASM Programmer's Guide - Chapter Nine: Using Macros (http://webster.cs.ucr.edu/Page_TechDocs/MASMDoc/ProgrammersGuide/Chap_09.htm)
Title: Re: EXE Jump Tables
Post by: dedndave on June 11, 2009, 10:59:05 PM
the Relat routine fixes INVOKE CALLs to always be relative
it has been adapted to work with the "Vortex" method, as well

        INCLUDE   \masm32\include\masm32rt.inc

        EXTERNDEF _imp__GetCurrentProcess@0:PTR pr0

        .CODE

;-----------------------------------------------------------------------------

_main   PROC

;modify the addresses

        mov     esi,offset LabelA
        call    Relat

        mov     esi,offset LabelB
        call    Relat

;test the functions after modification

        call    Test1

        exit

_main   ENDP

;-----------------------------------------------------------------------------

Test1   PROC

        INVOKE  GetCurrentProcess
LabelA  label   dword

        print   uhex$(eax),13,10

        INVOKE  _imp__GetCurrentProcess@0
LabelB  label   dword

        print   uhex$(eax),13,10
        ret

Test1   ENDP

;-----------------------------------------------------------------------------

Relat   PROC

;Adjust the CALL address of an INVOKE to eliminate the IAT JMP

;Call With: ESI = address of code just after the INVOKE

;  Returns: modifies the address of the INVOKE

        sub     esi,6
        push    esi
        sub     esp,4
        INVOKE  VirtualProtect,
                esi,
                6,
                PAGE_EXECUTE_READWRITE,
                esp
        pop     edx
        pop     esi
        or      eax,eax
        jz      Relat3

        cld
        lodsw
        cmp     ah,0E8h
        jz      Relat0

        cmp     ax,15FFh
        lodsd
        jnz     Relat2

        mov word ptr [esi-6],0E890h
        jmp short Relat1

Relat0: lodsd
        add     eax,esi
        push    esi
        xchg    eax,esi
        lodsw
        cmp     ax,25FFh
        lodsd
        pop     esi
        jnz     Relat2

Relat1: mov     eax,[eax]
        sub     eax,esi
        mov     [esi-4],eax

Relat2: sub     esi,6
        sub     esp,4
        INVOKE  VirtualProtect,
                esi,
                6,
                edx,
                esp
        add     esp,4

Relat3: ret

Relat   ENDP

;-----------------------------------------------------------------------------

        END     _main

Title: Re: EXE Jump Tables
Post by: dedndave on June 13, 2009, 02:38:38 AM

Before modification:
      INVOKE GetCurrentProcess
Address: 00401419 Code: E8 000001FA     CALL           00401618
Address: 00401618 Code: FF 25 00402000  JMP  DWord Ptr [00402000]
Address: 00402000 Data: 7C80E00D

      INVOKE _imp__GetCurrentProcess@0
Address: 00401447 Code: FF 15 00402000  CALL DWord Ptr [00402000]
Address: 00402000 Data: 7C80E00D


After modification:
      INVOKE GetCurrentProcess
Address: 00401419 Code: E8 7C40CBEF     CALL           7C80E00D

      INVOKE _imp__GetCurrentProcess@0
Address: 00401447 Code: 90              NOP
Address: 00401448 Code: E8 7C40CBC0     CALL           7C80E00D


Function Test Results:
        GetCurrentProcess: FFFFFFFF
_imp__GetCurrentProcess@0: FFFFFFFF

i must be the only one that thinks this is cool as hell - lol

[attachment deleted by admin]