Title: EXE Jump Tables Post by: dedndave on May 29, 2009, 05:51:54 PM do any of you experienced programmers have a simple method of eliminating the need for jump tables at the end of an EXE ?
i am guessing they are inserted by the assembler ? Title: Re: EXE Jump Tables Post by: jj2007 on May 29, 2009, 06:22:44 PM Good question indeed. They don't look very efficient. Maybe it's a noob question: how does the assembler "know" the memory location of GetStdHandle, i.e. 7C812F3A? Is it guaranteed to be the same in all versions of Windows??
Code: 0040106A . 6A F5 push -0B ; StdHandle = STD_OUTPUT_HANDLE 0040106C . E8 A7000000 call <jmp.&kernel32.GetStdHandle> ... 00401118 $ FF25 74114000 jmp near dword ptr [<&kernel32.GetStdHandle>] ... GetStdHandle 8BFF mov edi, edi ; HANDLE kernel32.GetStdHandle(StdHandle) 7C812F3B 55 push ebp 7C812F3C 8BEC mov ebp, esp Title: Re: EXE Jump Tables Post by: PBrennick on May 29, 2009, 06:31:06 PM GetStdHandle, for example, has its laddress stored in a lookup table in kernel.dll so even though there are differing versions of that DLL, the address, which can vary, will always be found.
Paul Title: Re: EXE Jump Tables Post by: BogdanOntanu on May 29, 2009, 06:31:52 PM Quote from: dedndave on May 29, 2009, 05:51:54 PM do any of you experienced programmers have a simple method of eliminating the need for jump tables at the end of an EXE ? Simple? Well ...NO. Possible? Yes. See here for a possible solution: http://www.masm32.com/board/index.php?topic=6519.0;topicseen The MS libs contain two kinds of API "glue" code: one that will generate a jump table and one that will call "directly". The jump table is more efficient for many reasons. AFAIK there are other older threads about this issue also. Quote: i am guessing they are inserted by the assembler ? No. Usually this jump table is generated / inserted by the linker when it links extern / API's from LIBs . However my Sol_Asm assembler does generate the jump tables :D but this is an exception to confirm the above rule. Title: Re: EXE Jump Tables Post by: dedndave on May 29, 2009, 06:38:33 PM well - i read many of the other related threads - i did not see anything specific about eliminating the tables
much ado about why they are there, however i seem to recall something someplace about eliminating them and, yes, i can see how a program may load faster with the tables once it has loaded, however, i would have to think it would be faster without them as always, the best solution is probably a hybrid, where functions that are speed-critical are referenced without tables and funtions that are used several times, but are not speed-critical use the tables (similar to procs vs macros argument) EDIT @JJ - i don't think it matters if they are the same or not perhaps running under one OS, they are one value and under a different OS, a different value from what i gather, they are externals that are unresolved until run-time surely, if you reference a function 100 different places in the code, the table may be a good way to go that way, the OS only has to set the value one time when the program is loaded Title: Re: EXE Jump Tables Post by: BogdanOntanu on May 29, 2009, 06:48:50 PM Quote from: jj2007 on May 29, 2009, 06:22:44 PM Good question indeed. In fact a relatively irrelevant question unless you are writing a compiler. Quote: They don't look very efficient. It depends on the "angle" of view. - Each "direct" call is still "indirect" in fact from the CPU's point of view. - Each "direct" call is 1 byte longer than the jump table and this adds up when you use a lot of API in your code. - Each "direct" call needs relocations and more solving by the linker and thus makes assembly/ linking slower and DLL's bigger - indirect call's allow you an extra central "hooking" location that can be useful with portable applications and other OS'es (GOT/PLT like/ ready) Quote: Maybe it's a noob question: how does the assembler "know" the memory location of GetStdHandle, i.e. 7C812F3A? Is it guaranteed to be the same in all versions of Windows?? It does not. IF you generate an OBJ (most common) then the assembler leaves this task to the linker. The linker uses the information in the LIB's to glue together an jump table or "direct" indirect calls. Both methods make reference to the IAT structure in the PE specification and further fixing is deferred to the OS loader. The OS loader KNOWS where the API address is in the current OS version/ layout. The OS loader loads the PE executable and patches the IAT with the correct address and since the jump table or the "direct" calls are both referencing the IAT this "magically" and finally solves the problem. DLL's need further relocations solving but everything else is the same. Quote: Code: 0040106A . 6A F5 push -0B ; StdHandle = STD_OUTPUT_HANDLE 0040106C . E8 A7000000 call <jmp.&kernel32.GetStdHandle> ... This "jmp.&Kernel32.getStdhandle" is in fact "jmp [iat.dll_01.function_01] UNTIL the OS loader fixes the corect address and Olly is kind enough to show you friendly names. However Olly does this after the OS loader has performed his job. Check with a hex editor / disassembler to see that in the "cold" executable the values are different. Title: Re: EXE Jump Tables Post by: dedndave on May 29, 2009, 06:59:23 PM i knew i saw it someplace - it was in Hutch's scrap-book of source code
the method was devised by EliCZ - and it does not look all that simple - lol i may have a go, just as a learning experience the first download on the page.... http://movsd.com/source.htm Title: Re: EXE Jump Tables Post by: BogdanOntanu on May 29, 2009, 07:05:22 PM Quote from: dedndave on May 29, 2009, 06:38:33 PM well - i read many of the other related threads - i did not see anything specific about eliminating the tables The info is there somewhere... I do not recall the more detailed threads exactly but this kind of subject pops in and out periodically in the advanced sections. Basically IF you import your API functions with specific names like this: __imp__ExitProcess@4 THEN the linker selects an glue code that does not generate an jump table (if the LIB does provide such glue code). The exact details might be slightly diferent but this is the idea. Quote: and, yes, i can see how a program may load faster with the tables There are less places to fix at linking (only the jump table) and less relocations at DLL loading time and the jump table is slightly smaller than direct call with a lot of API calls (common case). Quote: once it has loaded, however, i would have to think it would be faster without them Oh well... debatable but anyway when you call an API "speed" is no longer of the essence. Quote: as always, the best solution is probably a hybrid, where functions that are speed-critical are referenced without tables and funtions that are used several times, but are not speed-critical use the tables (similar to procs vs macros argument) The jump table is generated ONLY for the API and not for your own functions. None of the API call should be considered speed critical. Quote: surely, if you reference a function 100 different places in the code, the table may be a good way to go that way, the OS only has to set the value one time when the program is loaded The OS loader only fixes one value anyway... and this value is the function address in IAT PE's table (not the jump table). For DLL's indeed there are more relocations to fix (only if DLL is relocated). One relocation fix is needed for each direct call in code. Title: Re: EXE Jump Tables Post by: jj2007 on May 29, 2009, 07:08:07 PM Very nicely explained, thanxalot, Bogdan :U
Title: Re: EXE Jump Tables Post by: Vortex on May 29, 2009, 07:18:15 PM The trick to generate direct calls is based on the declaration of external symbols :
Code: EXTERNDEF _imp__ExitProcess@4:PTR pr1 ExitProcess EQU <_imp__ExitProcess@4> EXTERNDEF _imp__GetCommandLineA@0:PTR pr0 GetCommandLine EQU <_imp__GetCommandLineA@0> EXTERNDEF _imp__GetModuleHandleA@4:PTR pr1 GetModuleHandle EQU <_imp__GetModuleHandleA@4> EXTERNDEF _imp__CreateWindowExA@48:PTR pr12 CreateWindowEx EQU <_imp__CreateWindowExA@48> pr0, pr1, pr2, pr3 etc. are defined in windows.inc The creation of the EXTERNDEFs above is automated with Scan.exe [attachment deleted by admin] Title: Re: EXE Jump Tables Post by: dedndave on May 29, 2009, 07:23:40 PM ahhhh - cool
let me play with that one, too, Vortex - thank you Title: Re: EXE Jump Tables Post by: jj2007 on May 29, 2009, 09:05:48 PM Cool indeed, Vortex - thanks. So that is how the crt_ imp stuff was created.
If I understand correctly, placing the call in the jmp table makes sense for calls that are used more than a few times. So is there a good reason to place GetCommandLine and ExitProcess there? Title: Re: EXE Jump Tables Post by: dedndave on May 29, 2009, 09:22:13 PM i think i will use a selective approach
there are very few instances where i want to reduce overhead as much as possible these are primarily timing and thread synchronization related functions i understand that system calls are inherently long-winded, but there is no reason i can't try to get the most out of it other than that, the tables are probably a much better deal i am even open to using both methods for a function if i think it is the best approach for example, if i have a function in one critical spot - i want no table branch i use the same function several other places - use the table for those now, if i can just get access to the function-name strings for an error routine, i will be a happy camper - lol i think i can figure that one out for myself yes - thank you Bogdan and Vortex both that is exactly what i was looking for Vortex - very simple i like the growing window thingy too - lol Title: Re: EXE Jump Tables Post by: mitchi on May 29, 2009, 09:30:02 PM Very interesting read Bogdan. I've learned a few things here! :bg
What exactly are the relocations you talk about? Vortex : Nice tool, nice explanation. So it's really all about the symbols declared in the obj :) Title: Re: EXE Jump Tables Post by: dedndave on May 30, 2009, 02:48:13 AM Jochen -
Quote: If I understand correctly, placing the call in the jmp table makes sense for calls that are used more than a few times. So is there a good reason to place GetCommandLine and ExitProcess there? the answer is here, i suspect... Quote: The creation of the EXTERNDEFs above is automated with Scan.exe it is fairly simple to create them manually - or let the Scan.exe make the IMP file and remove the unwanted ones btw - it seems imperative to use PoLink Title: Re: EXE Jump Tables Post by: PBrennick on May 30, 2009, 03:58:45 AM In this case, yes. PoLink gives you more latitude to do such things. Especially libraries. A lot of the things that are done in the installation of the GeneSys SDK rely on such latitude and Vortex is the one I thank for that. He has put a lot of effort into being a toolmaker. It would probably be a good idea to explore his other tools, also. They are pretty fantastic.
Paul Title: Re: EXE Jump Tables Post by: dedndave on May 30, 2009, 04:07:49 AM funny thing you should mention it Paul
i had just added his site to my bookmarks - lol there are a lot of nice toys in there - not only for general use, but for learning (which is where i am) Title: Re: EXE Jump Tables Post by: jj2007 on May 30, 2009, 05:53:11 AM Quote from: dedndave on May 30, 2009, 02:48:13 AM btw - it seems imperative to use PoLink Not sure what you mean. Code below assembles & links fine wih link.exe and polink.exe... include \masm32\include\masm32rt.inc EXTERNDEF _imp__ExitProcess@4:PTR pr1 .code start: ; invoke ExitProcess, 0 invoke _imp__ExitProcess@4, 0 end start Title: Re: EXE Jump Tables Post by: dedndave on May 30, 2009, 06:30:18 AM ahh - it must be the includes - i have a small program i am working on
my only includes are.... include \masm32\include\windows.inc include \masm32\include\kernel32.inc includelib \masm32\lib\kernel32.lib i was trying to write some of the basic functions with no crt or masm32 files - lol i tried the method in there and get unresolved external with link anyways - that is a very neat technique Title: Re: EXE Jump Tables Post by: Vortex on May 30, 2009, 07:01:17 AM Hi Jochen,
Quote: Cool indeed, Vortex - thanks. So that is how the crt_ imp stuff was created. If I understand correctly, placing the call in the jmp table makes sense for calls that are used more than a few times. So is there a good reason to place GetCommandLine and ExitProcess there? In my opinion, all the calls should be placed in the jump table. It's practical for daily programming. It would be interesting to make include files generating direct calls. Polink is not the only option. It's my favourite MS COFF linker. MS link.exe can be used too. Title: Re: EXE Jump Tables Post by: hutch-- on May 30, 2009, 07:21:17 AM The answer to the question is contained in the masm32 project. Look in "tools\l2extia\" read the text file and how to use the exe file to create as many of your own include files as you need. This allows you to use the less efficient direct call form in the binary output code.. For what its worth the jump table is more efficient.
Title: Re: EXE Jump Tables Post by: sinsi on May 30, 2009, 07:27:52 AM If you have multiple calls to an API in a proc, it is nice to be able to load a register from the import dword and invoke using that register, that way you get the checking that invoke uses (and the code is smaller).
Title: Re: EXE Jump Tables Post by: dedndave on May 30, 2009, 07:33:42 AM i don't understand what you mean sinsi
Title: Re: EXE Jump Tables Post by: jj2007 on May 30, 2009, 07:34:03 AM Quote from: BogdanOntanu on May 29, 2009, 06:31:52 PM The jump table is more efficient for many reasons. Quote from: hutch-- on May 30, 2009, 07:21:17 AM For what its worth the jump table is more efficient. I hear the message but I don't get it. Why is a call plus a jmp, e.g. for ExitProcess, more efficient than a call without a jmp? Because the linker and/or the OS loader need a few nanoseconds less? That can't be the reason... Title: Re: EXE Jump Tables Post by: dedndave on May 30, 2009, 07:37:12 AM well - i can see it if the function is used several times - well - more than 2, let's say
Title: Re: EXE Jump Tables Post by: sinsi on May 30, 2009, 07:55:56 AM dedndave, here's what I meant
Code: prwsprintf TYPEDEF PROTO C :DWORD, :VARARG Of course, that was my noob days, now I push/push/call like a real asm programmer :bdgpwsprintf TYPEDEF PTR prwsprintf EXTERNDEF _imp__wsprintfA:pwsprintf wsprintf TEXTEQU <_imp__wsprintfA> ... mov esi,wsprintf assume esi:pwsprintf invoke esi,blah,blah ... invoke esi,blah,blah,blah ... ret assume esi:nothing I asked about this once before here - http://www.masm32.com/board/index.php?topic=5486.15 Title: Re: EXE Jump Tables Post by: jj2007 on May 30, 2009, 08:17:48 AM Quote from: dedndave on May 30, 2009, 07:37:12 AM well - i can see it if the function is used several times - well - more than 2, let's say 5*5+6=31 5*6=30 More than 5... invoke ExitProcess, 0 Quote: 00401001 ? 6A 00 push 0 00401003 ? E8 00000000 call <jmp.&kernel32.ExitProcess> 00401008 ? FF25 40104000 jmp near dword ptr [<&kernel32.ExitProcess>] The red bytes are the offset :bg Title: Re: EXE Jump Tables Post by: dedndave on May 30, 2009, 07:08:01 PM ahhh - that's a good one to know about also sinsi - thanks
EDIT - is call/jmp from a register faster than immediate ? Title: Re: EXE Jump Tables Post by: mitchi on May 30, 2009, 07:40:40 PM The Visual C++ optimizer does that with ESI or EDI when you call the same function a lot of times...
Since they have contacts with the Intel guys and AMD guys, I assume that it's a bit faster. Title: Re: EXE Jump Tables Post by: dedndave on May 30, 2009, 08:28:29 PM lol @ "contacts" - they see each other every night at bedtime
Title: Re: EXE Jump Tables Post by: dedndave on May 30, 2009, 09:25:38 PM at the location of the invoke is a CALL relative
that branches to a JMP dword ptr [nnnnnnnn] indirect the value at that nnnnnnnn address is the address of the api function this code works mov esi,labelA-4 mov esi,labelA[esi+2] mov eax,[esi] call eax exit INVOKE GetCurrentProcess labelA label dword but this code does not work jmp short test01 test00: INVOKE GetCurrentProcess labelA label dword exit test01: mov esi,labelA-4 mov esi,labelA[esi+2] mov eax,[esi] sub eax,offset labelA mov labelA-4,eax jmp test00 just like microsoft - take me right up to the point of almost an orgasm, then show me a picture of rosie o'donnell and spray cold water on me Title: Re: EXE Jump Tables Post by: hutch-- on May 31, 2009, 01:09:46 AM There were always alternatives, GetProcEddress gives you a callable DWORD address but you can cut another corner, copy the API to local app memory set to execute and run the API within your own app. On win9x systems you got a speed increase, don't know about NT based versions.
The reason why I never lost much sleep over it is most API calls are so slow that a million cycles here and there don't matter much and you lose nothing like that much with address call variations. Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 01:21:48 AM i suspect that would get you a security violation with win2K or higher, Hutch
but that would be a nice technique to see how some of the functions work Title: Re: EXE Jump Tables Post by: hutch-- on May 31, 2009, 01:36:40 AM Nah, thats not the problem, if you can call an address you can also copy from it, its more to do with how the internals of the OS work, API calls the NTDLL.DLL, some procedures within that call even lower level DLLs so the best you can get from it is one level reduction in the call layers. Back in the win9x days the technique seemed to work best on GDI calls but that was for a simple reason, a lot of Win9x GDI was written in MASM.
Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 03:46:08 AM ok - i isolated the bad guy....
Quote: but this code does not work jmp short test01 test00: INVOKE GetCurrentProcess labelA label dword exit test01: mov esi,labelA-4 mov esi,labelA[esi+2] mov eax,[esi] sub eax,offset labelA mov labelA-4,eax jmp test00 this line assembles fine, but crashes the program mov labelA-4,eax i made a temporary work-around by placing the "labelA-4" address in esi then mov [esi],eax that crashes also i am working on that now, i need somebody really sharp to tell me about re-based PE's - lol Vortex maybe ? here is my question.... if the PE gets re-based at load-time, do these tables become far jumps ? and another question..... is there a way to force an exe to be re-based for testing purposes ? Title: Re: EXE Jump Tables Post by: Neo on May 31, 2009, 06:27:11 AM This is a bit of a tangent, but with the assembler I've built into Inventor IDE (http://www.codecortex.com/ide/), I don't generate jump tables, since despite Bogdan's explanation of efficiency:
All that said, I really don't think there's a huge performance difference (since imported functions are usually pretty lengthy anyway), but if I had to choose, my money would be on that not using jump tables is slightly more efficient overall. Anyone up for a tough performance testing challenge? :wink P.S. There are currently other issues with importing libraries other than kernel32/user32/gdi32 in Inventor IDE (it is just an alpha after all), so it's far from perfect. I'm just using it as an example w.r.t. jump tables versus no jump tables. Title: Re: EXE Jump Tables Post by: BogdanOntanu on May 31, 2009, 07:04:14 AM Quote from: Neo on May 31, 2009, 06:27:11 AM This is a bit of a tangent, but with the assembler I've built into Inventor IDE (http://www.codecortex.com/ide/), I don't generate jump tables, since despite Bogdan's explanation of efficiency: It is my own preference to have jump tables. I do not claim "efficiency" at run time. I claim that it does not matter much at run time. I would add an option to disable jump table generation for my own assembler if this makes users happy. Quote:
Yes this is true but not related to relocations... it is related to the code size of a relative jump versus and absolute indirect jump. Quote:
But it is the default and needed for DLL's. Besides run-time or load time relocations there is another kind of relocations that are generated inside the OBJ. Unlike the run-time kind of relocations those kind of compile time relocations are mandatory if you generate OBJ's and link multiple modules. It will take one such relocation for each API call in an executable with no jump table. With jump tables it will only take one for each API used. Hence compilation and linking speed is helped here and this was my primary concern since I create huge ASM projects. Quote: ... Plus, relocations are only resolved upon starting the application, whereas the extra jump is done every time an import is called. Each import appears only once in the Import Address Table and Import Lookup Table, regardless of whether there are any relocations or not. Yes, true but once you call an API speed is no longer of the essence. Yes each import only appears once in the IAT table BUT each direct call requires an run time relocation (in a DLL). Quote:
Ok, this is nice advertising for your Inventor IDE... I will check it out. Is it written in full ASM? FYI Sol_Asm does not require any kind of libs when directly creating an Executable/DLL/binary. Neither does FASM or NASM AFAIK etc... In fact neither does MASM for generating the OBJ... The libs are only needed by the linker when it links multiple OBJ's. This feature is in no way related to the subject. However and assembler that can NOT produce OBJ's in order to be linked together by a linker has a huge miss feature. "Most" professional projects out there involve generating OBJ's and then linking them together to create the final executable. After all the jump table method is also calling through the very same import table. Is the API calls in code that are relative in one case and absolute indirect in another case but both methods do reach the very same IAT Table in the end. Quote: All that said, I really don't think there's a huge performance difference (since imported functions are usually pretty lengthy anyway), but if I had to choose, my money would be on that not using jump tables is slightly more efficient overall. I prefer jump tables because the run time speed improvement is not worthy in this case, the size of the executable/dll is potentially smaller, the compilation speed is bigger, the load time is faster and OBJ size is smaller. If i want speed then I choose better algorithms, write my own functions to reduce API's overhead but I do not try to optimize every opcode/byte/cycle. However this is my personal preference. Title: Re: EXE Jump Tables Post by: BogdanOntanu on May 31, 2009, 07:27:59 AM Quote from: dedndave on May 31, 2009, 03:46:08 AM ... now, i need somebody really sharp to tell me about re-based PE's - lol Vortex maybe ? here is my question.... if the PE gets re-based at load-time, do these tables become far jumps ? and another question..... is there a way to force an exe to be re-based for testing purposes ? By "re-based" I guess you mean relocated at run time. There is another tool named exactly "rebase" that can "cold" change the preferred load address of an executable or DLL after compile time. Quote: if the PE gets re-based at load-time, do these tables become far jumps ? No. Everything that is absolute must be relocated in this case but the jumps remain "near". There is no use for "far" jumps in normal user mode win32 programming. Everything is near in flat protected mode (win32) but some addresses in code are absolute (not relative) and those addresses need to be changed IF the base address is changed. Quote: is there a way to force an exe to be re-based for testing purposes ? EXE's are rarely (if ever) relocated in Win32. Only if they are DLL's in disguise or plugins to be loaded by another EXE. The default load / base address of and PE EXE is normally free at EXE's load time. However DLL's are often relocated because you can not be sure of the load order and memory position of all DLL's needed for an EXE / process. One way to force a run time relocation to occur is to have 2 DLL's compiled for the very same preferred base address and then load them by hand one after another. The second one must be relocated by the OS loader because it's address space is already occupied by the first DLL. Another method would be to compile an EXE for a preferred address (other than the default 0x40_0000) that is already in use by the OS. If the PE EXE has run time relocations stored inside then you can use that "re-base" tool to change it's base address even after compile time. Title: Re: EXE Jump Tables Post by: jj2007 on May 31, 2009, 07:40:03 AM Quote from: dedndave on May 30, 2009, 09:25:38 PM at the location of the invoke is a CALL relative that branches to a JMP dword ptr [nnnnnnnn] indirect the value at that nnnnnnnn address is the address of the api function Here is the simplest variant for calling with a register: Code: include \masm32\include\masm32rt.inc .code start: mov esi, MessageBox push MB_OK push chr$("Hello") push chr$("Called via esi") push 0 call esi exit end start Slightly more sophisticated: Code: include \masm32\include\masm32rt.inc MBox = 0 Exit = 4 .data MyJumpTable dd MessageBox, ExitProcess .code start: mov esi, offset MyJumpTable push MB_OK push chr$("Hello") push chr$("Called via esi") push 0 call dword ptr [esi+MBox] push 0 call dword ptr [esi+Exit] end start But whether that is more efficient... no idea Title: Re: EXE Jump Tables Post by: UtillMasm on May 31, 2009, 08:16:25 AM :U
very clean, i like this more: Code: comment # and like radasm msg jump table too.@echo off \masm32\bin\ml.exe /c /coff /Focall2.obj /nologo call2.asm \masm32\bin\link.exe /subsystem:windows /out:call2.exe call2.obj /nologo pause # include\masm32\include\masm32rt.inc MBox=0 Exit=4 .data MyJumpTable dd MessageBox,ExitProcess .code start:mov esi,offset MyJumpTable push MB_OK push chr$("Hello") push chr$("Called via esi") push 0 call dword ptr[esi+MBox] push 0 call dword ptr[esi+Exit] end start :wink Title: Re: EXE Jump Tables Post by: Vortex on May 31, 2009, 08:37:08 AM Hi dedndave,
Quote: is there a way to force an exe to be re-based for testing purposes ? You would like to have a look at the thread Loading and running EXEs and DLLs from memory (http://www.masm32.com/board/index.php?topic=3150.0) The EXE\DLL is loaded to a memory address allocated by VirtualAlloc Title: Re: EXE Jump Tables Post by: hutch-- on May 31, 2009, 08:50:36 AM First, there is more to an API call than the difference between a direct call in code and an indirect call through an address table. A call outside the running app memory space is measurably slower than an internal call where a direct JMP to an address usually is not. For the indirect method you get a fast call and a fast JMP, with the direct method you get a slow call. The example I have picked is SendMessageA which gets bashed in an app a massive number of times which justifies it being placed in an address table to save space and usually be in cache.
I doubt you could successfully benchmark the difference but indirect calls never went slower than the direct call. Code: ; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷ ; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷ ; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷ 004011FB 6A00 push 0 004011FD 6A02 push 2 004011FF 6811010000 push 111h 00401204 FF3550304000 push dword ptr [403050h] 0040120A E86D000000 call jmp_SendMessageA jmp_SendMessageA: jmp dword ptr [SendMessageA] ; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷ ; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷ ; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷ 00401212 6A00 push 0 00401214 6A02 push 2 00401216 6811010000 push 111h 0040121B FF3550304000 push dword ptr [403050h] 00401221 FF1518204000 call dword ptr [SendMessageA] ; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷ ; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷ ; ÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷·÷ Title: Re: EXE Jump Tables Post by: jj2007 on May 31, 2009, 09:42:06 AM Since we are all in the brainstorming mode now, here one more idea to play with:
Code: include \masm32\include\masm32rt.inc MBox = 121 Exit = 0 .data MyJumpTable dd ExitProcess dd 120 dup(0) ; 120 slots for other API's dd MessageBox .data? RetAdd dd ? ChkEsp dd ? .code start: mov ChkEsp, esp mov esi, Scheduler push MB_OK push chr$("Hello") push chr$("Called via esi") push 0 push MBox ; MBox = 121 call esi ; works but is only one byte shorter invoke MessageBox, 0, chr$("The conventional way"), chr$("Title"), MB_OK sub ChkEsp, esp MsgBox 0, str$(ChkEsp), "Esp diff=0?", MB_OK push 0 ; ret 0 push 0 ; Exit call esi Scheduler proc pop RetAdd pop eax lea eax, [MyJumpTable+4*eax] call dword ptr [eax] jmp RetAdd Scheduler endp end start It works, it's probably utterly slow, but for code size freaks it might be interesting :bg Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 10:28:58 AM Quote: this code works mov esi,labelA-4 ;get the relative address from INVOKE mov esi,labelA[esi+2] ;get the address part of the indirect JMP mov eax,[esi] ;get the API target refered to in the JMP call eax exit INVOKE GetCurrentProcess labelA label dword notice that the IAT method takes: 4 bytes in the INVOKE code 6 bytes for the indirect JMP 4 more bytes for the target ------------------- 14 bytes total and, while it may be true that "CALL reg" direct may be faster than "CALL near rel" let's not forget that we have to get the target address into the register to begin with i think, overall, the fastest would be "CALL near rel" (E8 nn nn nn nn) which is what the INVOKE currently uses if we eliminate the IAT table, as well as the target address, we reduce the byte-count by 10 reducing bytes is nice, but let's face it, not a big issue with todays storage sizes if i have 100 different API calls, that's only 1 KB - not an issue the only problem i am having at the moment is that the OS will not let me over-write the 4 bytes in the CALL instruction of the INVOKE sequence i suspect that this is a write protection fault, for obvious security reasons because i intend to replace the operand in only a few select places, i can work around this by using something other than an INVOKE or CALL need be, i can hard code it like this: db 0E8h labelB db 4 dup(?) and fill it in during initialization while it is true that most API calls are slow to begin with, there are a few that are relatively fast i would have to think that QueryPerformanceCounter is fairly fast, as an example, because there isn't a lot of decision-making to be done - just gimme 2 dword values as i mentioned before, i am interested in synchronizing threads with the "highest resolution possible" i am trying to develop a technique for timing evaluation code on single/multi core machines the idea is, to have one thread perform the timing operation, while another thread runs the code the eval code thread needs to be ready to run, then once the time-keeping thread has read it's initial timer value, it will release the eval code thread for execution the reason for the dual-thread method is that some machines have more than one core on those machines, the TSC needs to be run with a process affinity mask of only one selected core the eval thread can be run with all cores selected, or whatever the test calls for i am trying to keep the overhead of the SetProcessAffinityMask function out of the evaluation measurement Title: Re: EXE Jump Tables Post by: MichaelW on May 31, 2009, 11:19:59 AM I’m not sure that all of this is correct. I selected PostMessage instead of SendMessage because PostMessage returns immediately without waiting for the window procedure to process the message. If the cycle count is not more than a few hundred cycles my P3 normally returns very consistent counts. I can’t get consistent results here, partly because the cycle counts are too high, and I think partly because the called function has a variable execution time. In any case, under Windows 2000 I can see no significant difference (or if there is, it's smaller than the variation).
Code: ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« include \masm32\include\masm32rt.inc .686 include \masm32\macros\timers.asm ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« .data hwndTarget dd 0 itotal dd 0 dtotal dd 0 .code ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« start: ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« EXTERNDEF _imp__PostMessageA@16:NEAR PTR invoke FindWindow, NULL, chr$("TARGET") mov hwndTarget, eax print ustr$(hwndTarget),13,10,13,10 .IF hwndTarget nops 3 push 0 push 0 push WM_NULL push hwndTarget call _imp__PostMessageA@16 nops 3 invoke PostMessage, hwndTarget, WM_NULL, 0, 0 nops 3 print "direct indirect",13,10 print "------ -------- ",13,10 invoke Sleep, 4000 REPEAT 20 counter_begin 1000, REALTIME_PRIORITY_CLASS push 0 push 0 push WM_NULL push hwndTarget call _imp__PostMessageA@16 counter_end add dtotal, eax print ustr$(eax),9 counter_begin 1000, REALTIME_PRIORITY_CLASS invoke PostMessage, hwndTarget, WM_NULL, 0, 0 counter_end add itotal, eax print ustr$(eax),13,10 ENDM print "------ -------- ",13,10 print ustr$(dtotal), 9 print ustr$(itotal),13,10,13,10 .ENDIF inkey "Press any key to exit..." exit ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« end start Code: 00401048 90 nop Typical results on my P3:00401049 90 nop 0040104A 90 nop 0040104B 6A00 push 0 0040104D 6A00 push 0 0040104F 6A00 push 0 00401051 FF3500504000 push dword ptr [405000h] 00401057 FF1534404000 call dword ptr [PostMessageA] 0040105D 90 nop 0040105E 90 nop 0040105F 90 nop 00401060 6A00 push 0 00401062 6A00 push 0 00401064 6A00 push 0 00401066 FF3500504000 push dword ptr [405000h] 0040106C E827270000 call fn_00403798 00401071 90 nop 00401072 90 nop 00401073 90 nop . . . 00403798 fn_00403798: 00403798 FF2534404000 jmp dword ptr [PostMessageA] Code: direct indirect
[attachment deleted by admin]------ -------- 1332 1206 1202 1198 1191 1204 1189 1192 1192 1206 1192 1199 1190 1252 1189 1196 1210 1202 1190 1192 1197 1209 1201 1196 1190 1201 1188 1193 1190 1206 1190 1192 1191 1201 1187 1191 1192 1203 1190 1191 ------ -------- 23993 24030 Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 11:28:13 AM dual-core prescott....
13828614 direct indirect ------ -------- 32608 32794 32285 33110 33037 33008 32435 33149 33087 32608 32551 33496 32165 28947 31687 31494 29901 31369 31580 30794 31328 31350 30840 30946 31516 30736 31468 31378 30782 31489 31116 31100 30823 31301 31092 32632 33431 31313 30686 30589 ------ -------- 634418 633603 as you can see, my numbers are slightly higher - lol sumpin's not right, here - this dual-core @ 3 Ghz performs fairly well Title: Re: EXE Jump Tables Post by: MichaelW on May 31, 2009, 11:38:15 AM You're running XP? My numbers were for 2000.
Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 11:39:07 AM yes - xp sp2 - and holding - lol
Title: Re: EXE Jump Tables Post by: jj2007 on May 31, 2009, 11:40:05 AM Why not try a "faster" API?
Code: .nolist include \masm32\include\masm32rt.inc .686 include \masm32\macros\timers.asm LOOP_COUNT = 100000 EXTERNDEF _imp__GetTickCount@0:PTR pr0 .code start: counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS REPEAT 100 invoke GetTickCount ENDM counter_end print str$(eax), 9, "cycles for 100*GetTickCount, indirect", 13, 10 counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS REPEAT 100 invoke _imp__GetTickCount@0 ENDM counter_end print str$(eax), 9, "cycles for 100*GetTickCount, direct", 13, 10 inkey chr$(13, 10, "--- ok ---", 13) exit end start Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 11:46:16 AM yes - i was thinking the same thing
here is a short list of some that i think should be fast... GetProcessAffinityMask QueryPerformanceCounter GetCurrentProcess CreateTimerQueue EnterCriticalSection LeaveCriticalSection Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 11:50:39 AM i am seeing a 300 cycle diff JJ - 1700 vs 1400
but - all these timing measurements are going to prevent peeps from seeing my post about the replacement code - lol (this sentance is intended to make them go back and look) Title: Re: EXE Jump Tables Post by: UtillMasm on May 31, 2009, 11:51:42 AM Intel Core Duo 1.83Ghz with Vista SP2
Code: 9570552 direct indirect ------ -------- 3661 4399 28143 9589 29073 23326 28886 28982 21307 16514 29851 37303 34544 14749 20351 1805 12799 10889 14832 28986 21583 13989 15321 6758 16165 15626 9959 1466 15746 23869 3090 2793 4223 2726 4613 3774 3178 1507 2827 1698 ------ -------- 320152 250748 Press any key to exit... Title: Re: EXE Jump Tables Post by: MichaelW on May 31, 2009, 11:51:56 AM Quote: Why not try a "faster" API? I did, I selected one that was faster than SendMessage. If you want to test something fast, forget the API and code a DLL with a procedure that contains only a RET, and call it by the same mechanisms. Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 11:55:36 AM well - we know that result
it is certain synchronization calls that are of primary interest but there are some fast APIs (just not the ones we always want to be fast - lol) btw Michael - you are making me think about switching to Win2K - lol Title: Re: EXE Jump Tables Post by: MichaelW on May 31, 2009, 12:33:01 PM For a minimal procedure the direct call is consistently 2 cycles faster.
Code: ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« include \masm32\include\masm32rt.inc aret PROTO ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« .data hInstance dd 0 .code ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« LibMain proc instance:DWORD,reason:DWORD,unused:DWORD .if reason == DLL_PROCESS_ATTACH push instance pop hInstance mov eax, TRUE .elseif reason == DLL_PROCESS_DETACH .elseif reason == DLL_THREAD_ATTACH .elseif reason == DLL_THREAD_DETACH .endif ret LibMain endp ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« aret proc ret aret endp ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« end LibMain Code: ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« include \masm32\include\masm32rt.inc .686 include \masm32\macros\timers.asm aret PROTO EXTERNDEF _imp__aret@0:NEAR PTR ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« .data .code ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« start: ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« nops 3 call aret nops 3 call _imp__aret@0 nops 3 invoke Sleep, 4000 counter_begin 1000, HIGH_PRIORITY_CLASS call aret counter_end print ustr$(eax)," cycles, indirect",13,10 counter_begin 1000, HIGH_PRIORITY_CLASS call _imp__aret@0 counter_end print ustr$(eax)," cycles, direct",13,10,13,10 inkey "Press any key to exit..." exit ; ««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««« end start Code: 00401000 90 nop 00401001 90 nop 00401002 90 nop 00401003 E8F2010000 call fn_004011FA 00401008 90 nop 00401009 90 nop 0040100A 90 nop 0040100B FF1530204000 call dword ptr [aret] 00401011 90 nop 00401012 90 nop 00401013 90 nop . . . 004011FA fn_004011FA: 004011FA FF2530204000 jmp dword ptr [aret] Code: 3 cycles, indirect
[attachment deleted by admin]1 cycles, direct Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 12:39:09 PM you guys are not paying attention
look at the code i posted GET THE TARGET ADDRESS Title: Re: EXE Jump Tables Post by: MichaelW on May 31, 2009, 12:44:55 PM Who pays attention over the weekend?
Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 12:46:44 PM lol - is it the weekend already ?
Title: Re: EXE Jump Tables Post by: UtillMasm on May 31, 2009, 01:12:18 PM for great weekend. :wink
btw: to dear MichaelW, which one for these text files? :wink [attachment deleted by admin] Title: Re: EXE Jump Tables Post by: MichaelW on May 31, 2009, 01:39:39 PM I can only guess ANSI or Western European.
Title: Re: EXE Jump Tables Post by: Mark Jones on May 31, 2009, 03:49:56 PM AMD x2 4000+ / Win7 Beta x64
For this (http://www.masm32.com/board/index.php?topic=11541.msg86615#msg86615) code, I get Code: 592426 direct indirect ------ -------- 36445 37438 4197 24960 35124 23647 36914 25259 30691 7555 32651 40061 26122 16160 34643 33773 11805 26739 24058 33236 17137 26391 35018 21328 23747 35509 29827 30289 31598 28075 17094 32144 27323 29540 21413 26742 26123 33217 35882 35033 ------ -------- 537812 567096 For the latest, Code: 2 cycles, indirect 0 cycles, direct Edit: Thanks Dave. Title: Re: EXE Jump Tables Post by: dedndave on May 31, 2009, 03:52:29 PM you have to have "Target" running (a window app) - then run "Test" from a command line
0 cycles - lol - well, that's just wrong - very nice, but wrong Title: Re: EXE Jump Tables Post by: jj2007 on May 31, 2009, 05:23:56 PM Celeron M timings:
Code: 7321 cycles for 1000*GetTickCount, indirect 5501 cycles for 1000*GetTickCount, direct 8327 cycles for PostMessage, indirect 8311 cycles for PostMessage, direct Shorter and faster... [attachment deleted by admin] Title: Re: EXE Jump Tables Post by: UtillMasm on May 31, 2009, 05:55:59 PM Core Duo timings:
Code: 13243 cycles for 1000*GetTickCount, indirect 10443 cycles for 1000*GetTickCount, direct 10851 cycles for PostMessage, indirect 12192 cycles for PostMessage, direct --- ok --- Title: Re: EXE Jump Tables Post by: MichaelW on May 31, 2009, 10:45:26 PM Quote from: dedndave on May 31, 2009, 03:52:29 PM 0 cycles - lol - well, that's just wrong - very nice, but wrong Zero is an entirely reasonable result under the circumstances. The resolution of the TSC is no better than one clock cycle, and recent processors can execute as many a four instructions per cycle. And then you have the inability to completely isolate the timed instructions from the timing instructions, so some of the timed instructions can end up executing in parallel with the timing instructions. Title: Re: EXE Jump Tables Post by: BogdanOntanu on June 01, 2009, 07:23:01 AM I have split the talks about OBJ generation into this new topic:
http://www.masm32.com/board/index.php?topic=11555.0 Title: Re: EXE Jump Tables Post by: ToutEnMasm on June 01, 2009, 07:28:01 AM Dynamic link suppress the need of jump table. Interesting question is : What is faster,a dynamic link or a link with a libray ? Title: Re: EXE Jump Tables Post by: hutch-- on June 01, 2009, 08:00:46 AM Yves,
That one is simple, a library gets built into the exe so its address is within local memory space where a DLL procedure has to be loaded. It does not matter in many instances but if the called routine is very small you will see the difference. With a DLL if you get the address from it and load it into a variable or even a register it will tend to be faster as the DLL is aso mapped into the EXE memory space. Title: Re: EXE Jump Tables Post by: BogdanOntanu on June 01, 2009, 10:54:34 AM Quote from: ToutEnMasm on June 01, 2009, 07:28:01 AM Dynamic link suppress the need of jump table. NO. it is exactly the opposite. In static linking the linker adds the called procedure code to your code (let us say at the end) and the address is known at link time. Because of this the call is relative and there is no need for anything else to be done at run-time. The problem with static linking is that you can not load/unload procedures/ library at runtime and you can not use static linking for calling OS API. Dynamic linking is used mainly because the address of the API is NOT known at compile or link time. Some prefer a direct CALL dword ptr [IAT.API_address] others an CALL near to a jmp.[IAT.Api_address] but one way or another the value of the API_address will be fixed at run time by the OS loader and this can not be done statically. The whole talk in this tread refers to advantages or disadvantages of using or not using an jump table as an intermediate central steep in between your "invoke API_xxx, ..., ... " in the code and the API address in IAT. Some assemblers/linkers do generate such a table and some do not. Quote: Interesting question is : What is faster,a dynamic link or a link with a libray ? Faster when? At execution time or at compile time? At execution time it is logically that the static solution is faster becaus eit is only a near CALL to a well known address BUT it is not possible to use it for API's because their code and address does change with every new OS version or security update. The dynamic linking solution can be slightly faster at compile time because a part of the work to be done by the linker is left for the OS loader. However it is logically that it will be slower at run time because at least one intermediate step has to be taken for a call to an API. In the case of a jump table there are 2 (two) such steps to be taken. Anyway I think that the speed differences at execution time are not worthy to consider because the API itself will perform more operations with parameters checking than the very few cycles saved by avoiding one jump. Dynamic linking has the advantage of DLL loading / unloading at runtime. Title: Re: EXE Jump Tables Post by: ecube on June 01, 2009, 03:04:13 PM Code: 11228 cycles for 1000*GetTickCount, indirect 8283 cycles for 1000*GetTickCount, direct 5401 cycles for PostMessage, indirect 5576 cycles for PostMessage, direct --- ok --- I wonder howcome PostMessage is apparently faster using the indirect? Title: Re: EXE Jump Tables Post by: jj2007 on June 01, 2009, 05:39:37 PM Quote from: E^cube on June 01, 2009, 03:04:13 PM Code: 11228 cycles for 1000*GetTickCount, indirect 8283 cycles for 1000*GetTickCount, direct 5401 cycles for PostMessage, indirect 5576 cycles for PostMessage, direct --- ok --- I wonder howcome PostMessage is apparently faster using the indirect? For my Celeron, it is a little bit faster, but UtilMasm's Core Duo favours indirect, too. It might be a cache effect of some sorts. Here is an interesting quote (http://kerneltrap.org/node/553/2131): Quote: main memory is very slow compared to the CPU cache, so code that is slightly larger can cause more cache misses and therefor be slower, even if significantly fewer commands are executed. in addition frequently the effect isn't direct (i.e. no noticable difference on the code you are changing, but instead the change makes other code slower as it gets evicted from the cache. Note timings are for 1000 calls to GetTickCount (11/8 cycles on average) but only one to PostMessage. Which means the latter pulls an awful amount of code into the cache. Title: Re: EXE Jump Tables Post by: hutch-- on June 02, 2009, 02:04:20 AM :bg
I think I have already answerd that one, a call to a function outside the calling app's memory space is REEEEEEEEEEELLLLY SLOOOOOOOOOOOOW where a call to a local label then a jump is not. Older hardware will hide the difference but any later PIV, Core 2 duo, quad etc .... will respond better to a faster pair than a single slow opcode. Title: Re: EXE Jump Tables Post by: NightWare on June 02, 2009, 02:41:32 AM Quote from: jj2007 on June 01, 2009, 05:39:37 PM Which means the latter pulls an awful amount of code into the cache. ::) i'm just curious, can you explain me why a code executed once or two times SHOULD put SOMETHING IN the (code/trace) cache ? (yeah, now i'm going to proceed by asking question... maybe it will give better results... :P) Title: Re: EXE Jump Tables Post by: dedndave on June 02, 2009, 02:48:41 AM lol - i don't even think the guys at intel know how the cache works
and - sometimes, it doesn't Title: Re: EXE Jump Tables Post by: UtillMasm on June 02, 2009, 03:02:18 AM :clap: :green2
Title: Re: EXE Jump Tables Post by: jj2007 on June 02, 2009, 06:30:38 AM Quote from: hutch-- on June 02, 2009, 02:04:20 AM I think I have already answerd that one, a call to a function outside the calling app's memory space is REEEEEEEEEEELLLLY SLOOOOOOOOOOOOW Quote from: NightWare on June 02, 2009, 02:41:32 AM Quote from: jj2007 on June 01, 2009, 05:39:37 PM Which means the latter pulls an awful amount of code into the cache. ::) i'm just curious, can you explain me why a code executed once or two times SHOULD put SOMETHING IN the (code/trace) cache ? (yeah, now i'm going to proceed by asking question... maybe it will give better results... :P) I am pleased to see that a simple assembly-related question can still provoke so strong reactions, outside the Colosseum. @Hutch: GetTickCount is outside the calling app's memory space, too. Same behaviour for GetDesktopWindow, see below. @NightWare: No, I can't explain it. That's why I posted it. Just guessing: Could it be that the code is performing some loops during the 8,000 cycles?? And that these loops finish in the cache?? But maybe you can explain it, and are willing to share your knowledge with us earthlings? Celeron M: Code: 7268 cycles for 1000*GetTickCount, indirect
[attachment deleted by admin]5500 cycles for 1000*GetTickCount, direct 16340 cycles for 1000*GetDesktopWindow, indirect 15045 cycles for 1000*GetDesktopWindow, direct 8107 cycles for PostMessage, indirect 8106 cycles for PostMessage, direct Title: Re: EXE Jump Tables Post by: UtillMasm on June 02, 2009, 06:45:03 AM :U
Code: 12367 cycles for 1000*GetTickCount, indirect 10703 cycles for 1000*GetTickCount, direct 16101 cycles for 1000*GetDesktopWindow, indirect 15087 cycles for 1000*GetDesktopWindow, direct 7229 cycles for PostMessage, indirect 6681 cycles for PostMessage, direct --- ok --- Title: Re: EXE Jump Tables Post by: sinsi on June 02, 2009, 07:44:32 AM Surely *any* code we call is in our 'address space' by definition. I think the problem is when we get into the API's that call low-level stuff - ring3 to ring0.
There is a fair bit of overhead involved in that. Title: Re: EXE Jump Tables Post by: hutch-- on June 02, 2009, 07:49:09 AM JJ,
> @Hutch: GetTickCount is outside the calling app's memory space, too. Same behaviour for GetDesktopWindow, see below. So is every other Windows API. You seem to have missed the value of the comment, using the CALL mnemonic with an address outside the app's address space is REAAAAAAAALLLLY SLOOOOOOOOOWWW. I mentioned that a local CALL to a local label followed by a direct jump to the start address is a pair of faster mnemonics than the single CALL directly to an external address. Title: Re: EXE Jump Tables Post by: hutch-- on June 02, 2009, 07:52:03 AM sinsi,
> Surely *any* code we call is in our 'address space' by definition. Nope, system DLLs are loaded at addresses above 2 gig which is above the normal address load range of a non system DLL. Title: Re: EXE Jump Tables Post by: sinsi on June 02, 2009, 07:59:14 AM Well, system DLL's are still in our 4gig address space, otherwise we couldn't call them.
I think I'm being pedantic about 'address space' - to someone of the roll-your-own-os crowd, I think we're talking about different things... Title: Re: EXE Jump Tables Post by: MichaelW on June 02, 2009, 10:05:07 AM I though one of the main points of putting system code in DLLs was to avoid having the same code mapped (or perhaps copied is a better term) into multiple processes. I think "virtual" is the key word here.
Title: Re: EXE Jump Tables Post by: Tedd on June 02, 2009, 12:05:14 PM Any DLL you 'import' is loaded into your address space - thus, GetTickCount and GetDesktopWindow are also mapped into your address space -- that's the very reason you can call them.
The physical pages for system DLLs are mapped (once) into the virtual address space of each process that loads them (in the 'shared area' which is usually above the 2GB mark.) User DLLs have an option to make them shared too, so they're probably not shared by default (except by multiple instances of the same application.) Title: Re: EXE Jump Tables Post by: jj2007 on June 02, 2009, 12:43:05 PM Quote from: hutch-- on June 02, 2009, 07:49:09 AM JJ, > @Hutch: GetTickCount is outside the calling app's memory space, too. Same behaviour for GetDesktopWindow, see below. So is every other Windows API. You seem to have missed the value of the comment, using the CALL mnemonic with an address outside the app's address space is REAAAAAAAALLLLY SLOOOOOOOOOWWW. I mentioned that a local CALL to a local label followed by a direct jump to the start address is a pair of faster mnemonics than the single CALL directly to an external address. Hutch, I read your comments, and in general I understand them, too. Sorry that I am not able to see its value. Perhaps because my timings say the opposite? I chose GetTickCount because first, it is indeed outside what you call either "the app's address space" or "the app's memory space" (my best guess is you mean "close to the app's core code"), and second, it has little overhead. The timings show that without the extra jmp, it takes 3 or 4 cycles less. Why it sometimes behaves different with the 8,000 cycles instruction PostMessage is beyond my knowledge. Title: Re: EXE Jump Tables Post by: redskull on June 02, 2009, 02:15:14 PM I don't know if it's applicable, but you can "really" directly call GetTickCount by just executing interrupt 2A (at least, you used to)
-r Title: Re: EXE Jump Tables Post by: hutch-- on June 02, 2009, 02:43:46 PM JJ,
Yopu worry me at times, the info I posted is straight, well known system information, Windows API functions reside in system DLLs that are loaded at a DIFFERENT memory address range than application DLLs. Above 2 gig is the action here and within the framework of Windows you can call and run the functions as executable code but cannot write to those address ranges in ring3. Now its not a matter of guesswork, its an OS defined limitation so that you can allocate and use the bottom 2 gig and the OS controls the upper 2 gig. Now it does not matter which system API call you make, its in the same class as the rest, loaded ABOVE 2 gig which is above application address space. With a system DLL you don't reload it like an application DLL, its already there in memory loaded at startup, that how windows is designed. Now come back to the comparison of direct address CALL to indirect CALL and JMP, the indirect CALL in local memory is MUCH FASTER than a CALL outside the app's address space. The argument remains as to whether the following unconditional JMP is as slow as a direct CALL to an external address. Now with testing results, you will get variation depending on the age of the hardware and its BUS speed, older hardware hides the difference, later stuff favours the faster pair of instructions. Another factor that interferes with your timing results, do your testing in REAL TIME with absolutely no interpretation for durations of over 500 ms and you will get down under 1% most of the time. The testing uisually requires REAL TIME priority for the most accurate results. The architectural model you are having problems with has been around for about 15 years, winNT 3.5 and later, there is nothing new, exciting or different, they are 32 bit address range operating systems that have remained more or less compatible for many years. Its not a matter of conjecture, its a matter of simply looking up the reference material. PS: I should have added, since NT4 you have the layering of system DLLs, NTDLL.DLL and below that you have NTOSKRNL.EXE, disassemble them to see where the work is done and why the choice of CALL mnemonic or the alternative is irrelevant. The system was designed by the VAX guys in the early 90s for Microsoft and among the design considerations is the address table at the end of the executable code. Title: Re: EXE Jump Tables Post by: jj2007 on June 02, 2009, 04:42:55 PM Quote from: hutch-- on June 02, 2009, 02:43:46 PM JJ, Yopu worry me at times, ... :bg Hutch, I can only return the compliment - and apologies if I have failed in using your terminology (app memory space, app address space etc) correctly. You might have a look at the posts of Tedd, Sinsi and Michael, they know more about these subtle distinctions than I do. So I limit myself to the observation that the direct call to a "fast" WinApi of the GetTickCount and GetDesktopWindow type is some cycles faster than the indirect version using a call plus a jmp table. Which is precisely the topic of this thread. By the way: What does your P4 say? Haven't seen any P4 timings yet... :thumbu Title: Re: EXE Jump Tables Post by: Mark Jones on June 02, 2009, 06:10:44 PM AMD x2 4000+ / Win7 x64
Code: 18407 cycles for 1000*GetTickCount, indirect 13751 cycles for 1000*GetTickCount, direct 37443 cycles for 1000*GetDesktopWindow, indirect 56384 cycles for 1000*GetDesktopWindow, direct 45891 cycles for PostMessage, indirect 45762 cycles for PostMessage, direct Title: Re: EXE Jump Tables Post by: dedndave on June 02, 2009, 06:14:47 PM prescotts give funky numbers
they are high and inconsistent which brings us full-circle back to why I was asking about the jump tables as you know, Jochen, I am working on "super-daves" (tongue-in-cheek) timing routines for multi/single cores I am playing with a few different methods of synchronizing threads when I am done, I'll probably be the only one that uses the code - lol but, hey, at least someone will be happy :bg Title: Re: EXE Jump Tables Post by: dedndave on June 02, 2009, 06:18:31 PM Prescott dual-core @ 3GHz - XP MCE 2005 SP2
18320 cycles for 1000*GetTickCount, indirect 16449 cycles for 1000*GetTickCount, direct 32506 cycles for 1000*GetDesktopWindow, indirect 30223 cycles for 1000*GetDesktopWindow, direct 18201 cycles for PostMessage, indirect 18099 cycles for PostMessage, direct 18297 cycles for 1000*GetTickCount, indirect 14920 cycles for 1000*GetTickCount, direct 32634 cycles for 1000*GetDesktopWindow, indirect 30162 cycles for 1000*GetDesktopWindow, direct 18170 cycles for PostMessage, indirect 18300 cycles for PostMessage, direct 18557 cycles for 1000*GetTickCount, indirect 15239 cycles for 1000*GetTickCount, direct 32524 cycles for 1000*GetDesktopWindow, indirect 30973 cycles for 1000*GetDesktopWindow, direct 18179 cycles for PostMessage, indirect 18276 cycles for PostMessage, direct Title: Re: EXE Jump Tables Post by: jj2007 on June 02, 2009, 06:22:37 PM Quote from: dedndave on June 02, 2009, 06:14:47 PM prescotts give funky numbers they are high and inconsistant which brings us full-circle back to why I was asking about the jump tables as you know, Jochen, I am working on "super-daves" (tongue-in-cheek) timing routines for multi/single cores i am playing with a few different methods of synchronizing threads when I am done, I'll probably be the only one that uses the code - lol but, hey, at least someone will be happy :bg Maybe I'll use them, too - if the price is right :bg @Mark: Thanks for the timings, and for increasing the confusion :wink 45,000 instead of 8,000 for PostMessage is quite a big jmp - is that 64-bit progress??? ::) Title: Re: EXE Jump Tables Post by: dedndave on June 02, 2009, 06:25:11 PM that is MS progress - they are getting smarter
newer OS's have built-in obsolecence with XP, they have to look for ways to make it bad they learned their lesson when they want you to buy "Windows 8", they already have a plan for making you dislike 7 Title: Re: EXE Jump Tables Post by: NightWare on June 02, 2009, 10:03:22 PM Quote from: jj2007 on June 02, 2009, 06:30:38 AM @NightWare: No, I can't explain it. That's why I posted it. Just guessing: Could it be that the code is performing some loops during the 8,000 cycles?? And that these loops finish in the cache?? But maybe you can explain it, and are willing to share your knowledge with us earthlings? well, the problem when you use a WINapi, is you use an API TREE, in the case of PostMessage you call an api from a dll, and this dll will call a function from another dll (ntdll.dll). so the function put something in the cache (coz there is a loop for printing digits, certainly rep movsb or something like that...). but it's only few bytes, anyway not enough to explain the slowdown... for MOST of the algo nothing will be put on the cache, coz there is no reason ! (no code called frequently enough...). now why it's slow ? 1. because most of the algo read (and execute after) data from memory or l2 cache. 2. because you call/read code from SEVERALS memory or l2 cache locations (coz severals dll), and remember that memory IS SLOW. EDIT : 3. i've forgotten the cost of the mispredictions generated by the calls and libraries jmp table. (note : and it's something usefull to know, there is no possible misprediction for ret, coz the return address is automatically stored by the call instruction...) Title: Re: EXE Jump Tables Post by: jj2007 on June 02, 2009, 11:09:47 PM Quote from: redskull on June 02, 2009, 02:15:14 PM I don't know if it's applicable, but you can "really" directly call GetTickCount by just executing interrupt 2A (at least, you used to) -r Strangely enough, that still works under XP SP2... Code: include \masm32\include\masm32rt.inc .code start: print "Diff int 2A - GetTickCount = " invoke GetTickCount push eax invoke Sleep, 500 int 2ah pop ecx sub eax, ecx print str$(eax), 13, 10 getkey exit end start Title: Re: EXE Jump Tables Post by: dedndave on June 03, 2009, 02:02:20 AM hmmmmmm - "int 2Ah" - that looks familiar, somehow - is that a foriegn language?
i wonder what other interrupts work with 32-bit Title: Re: EXE Jump Tables Post by: hutch-- on June 03, 2009, 01:12:05 PM Here is a simple benchmark that tests in real time, has no interpretive code embedded in it and makes no assumptions about how to get the results, it simply adds each result to a variable then after the test has completed it divided each total by 8 to get the average.
Code: 1500 indirect call 1515 direct call 1500 indirect call 1500 direct call 1500 indirect call 1516 direct call 1515 indirect call 1516 direct call 1500 indirect call 1515 direct call 1516 indirect call 1516 direct call 1500 indirect call 1516 direct call 1500 indirect call 1500 direct call 1503 average indirect call timing 1511 average direct call timing Press any key to continue ... The only deviation in the results are due to granularity in GetTickCount results but with the sample size the deviation is far under 1%. Timing shows there is no meaningful difference between the speed of indirect versus direct calls on the Prescott 3.2 gig PIV I am using. This is consistent with my other PIVs. Uninterpreted real time testing is the only safe way to decide matter like this for the very little it is worth, the more complicated and interpreted the testing method becomes, the more unreliable its results are. The test piece. Code: ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤ include \masm32\include\masm32rt.inc ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤ comment * ----------------------------------------------------- Build this template with "CONSOLE ASSEMBLE AND LINK" ----------------------------------------------------- * externdef _imp__GetTickCount@0:PTR pr0 GetTickCountX equ <_imp__GetTickCount@0> .data? value dd ? .data item dd 0 .code start: ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤ call main inkey exit ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤ main proc LOCAL cnt1 :DWORD LOCAL cnt2 :DWORD mov cnt1, 0 mov cnt2, 0 lpcnt equ <300000000> invoke SetPriorityClass,rv(GetCurrentProcess),REALTIME_PRIORITY_CLASS invoke SleepEx,100,0 push esi ; ================================================= REPEAT 8 mov esi, lpcnt invoke GetTickCount push eax @@: invoke GetTickCount ; << tested API sub esi, 1 jnz @B invoke GetTickCount pop ecx sub eax, ecx add cnt1, eax print str$(eax)," indirect call",13,10 invoke SleepEx,100,0 mov esi, lpcnt invoke GetTickCount push eax @@: invoke GetTickCountX ; << tested API sub esi, 1 jnz @B invoke GetTickCount pop ecx sub eax, ecx add cnt2, eax print str$(eax)," direct call",13,10 invoke SleepEx,100,0 ENDM ; ================================================= invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS ; format the output ; ----------------- print chr$(13,10) shr cnt1, 3 print str$(cnt1)," average indirect call timing",13,10 shr cnt2, 3 print str$(cnt2)," average direct call timing",13,10,13,10 pop esi ret main endp ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤ end start Title: Re: EXE Jump Tables Post by: UtillMasm on June 03, 2009, 01:37:44 PM :U
Code: @echo off \masm32\bin\ml.exe /c /coff /Fohutch.obj /nologo hutch.asm \masm32\bin\link.exe /subsystem:console /out:hutch.exe hutch.obj /nologo pause Code: 2407 indirect call 2047 direct call 2390 indirect call 2046 direct call 2219 indirect call 2047 direct call 2219 indirect call 2031 direct call 2218 indirect call 1875 direct call 2219 indirect call 2031 direct call 2218 indirect call 2047 direct call 2204 indirect call 2047 direct call 2261 average indirect call timing 2021 average direct call timing Press any key to continue ... Title: Re: EXE Jump Tables Post by: hutch-- on June 03, 2009, 01:39:43 PM UtillMasm,
Are you using a Core 2 Duo ? Title: Re: EXE Jump Tables Post by: UtillMasm on June 03, 2009, 01:40:27 PM Core Duo 1.83ghz
Title: Re: EXE Jump Tables Post by: ecube on June 04, 2009, 07:16:53 AM This explains the int 2ah http://www.masm32.com/board/index.php?topic=7010.0
Title: Re: EXE Jump Tables Post by: Vortex on June 07, 2009, 06:27:19 PM Direct call function declarations moved to a custom invoke macro :
Code: _invoke MACRO FuncName:REQ,args:VARARG
[attachment deleted by admin]LOCAL counter,counter2,params params TEXTEQU <> counter = 0 counter2 = 0 FOR param,<args> counter=counter+1 ENDM counter2 = 4*counter EXTERNDEF @CatStr(_imp__&FuncName&@,%counter2) : PTR @CatStr(<pr>,%counter) FuncName EQU <@CatStr(_imp__&FuncName&@,%counter2)> IF counter invoke FuncName,args ELSE invoke FuncName ENDIF ENDM Title: Re: EXE Jump Tables Post by: BlackVortex on June 07, 2009, 06:36:55 PM I haven't read the whole thread, but I just want to add that goasm+golink don't use import jump table. The calls to the API point directly to the import table. :cheekygreen:
Title: Re: EXE Jump Tables Post by: dedndave on June 07, 2009, 07:09:59 PM i think some guys are still missing the point altogether
mov esi,labelA-4 mov esi,labelA[esi+2] mov eax,[esi] call eax exit INVOKE GetCurrentProcess labelA label dword this code gets the address for a direct API call in this example, it is not the same thing as _imp__GetCurrentProcess (http://www.awicons.com/stock-icons/aero-icons/preview/arrow-up-red.png) it's all good, though - i have what i wanted - lol Title: Re: EXE Jump Tables Post by: mitchi on June 07, 2009, 07:14:44 PM Quote from: Vortex on June 07, 2009, 06:27:19 PM Direct call function declarations moved to a custom invoke macro : Code: _invoke MACRO FuncName:REQ,args:VARARG LOCAL counter,counter2,params params TEXTEQU <> counter = 0 counter2 = 0 FOR param,<args> counter=counter+1 ENDM counter2 = 4*counter EXTERNDEF @CatStr(_imp__&FuncName&@,%counter2) : PTR @CatStr(<pr>,%counter) FuncName EQU <@CatStr(_imp__&FuncName&@,%counter2)> IF counter invoke FuncName,args ELSE invoke FuncName ENDIF ENDM WoW!!! That's just sweet Vortex! So the next time we need help in a thread, we can do _invoke Vortex now :green Title: Re: EXE Jump Tables Post by: ecube on June 07, 2009, 07:26:38 PM Quote from: BlackVortex on June 07, 2009, 06:36:55 PM I haven't read the whole thread, but I just want to add that goasm+golink don't use import jump table. The calls to the API point directly to the import table. :cheekygreen: that's because Jeremy is a genius. Title: Re: EXE Jump Tables Post by: BlackVortex on June 07, 2009, 09:24:20 PM Quote from: E^cube on June 07, 2009, 07:26:38 PM Quote from: BlackVortex on June 07, 2009, 06:36:55 PM I haven't read the whole thread, but I just want to add that goasm+golink don't use import jump table. The calls to the API point directly to the import table. :cheekygreen: that's because Jeremy is a genius. IDA loses the ball, it doesn't recognize/resolve imports correctly on the GoTools executable I tried :green2 And I thought it was a very advanced analyzer/disassembler (I absolutely never use it) :P Title: Re: EXE Jump Tables Post by: hutch-- on June 08, 2009, 03:11:07 AM Fortunately MASM users have the choice of either. :bg
Title: Re: EXE Jump Tables Post by: BlackVortex on June 08, 2009, 04:49:21 AM Quote from: hutch-- on June 08, 2009, 03:11:07 AM Fortunately MASM users have the choice of either. :bg How ? You mean the MS linker has an option for that ? EDIT: Oh, I see custom macro weirdness. Goddamnit I hate macros :eek Title: Re: EXE Jump Tables Post by: dedndave on June 08, 2009, 04:52:42 AM well - Vortex has one method - i think i have a better one, though
Title: Re: EXE Jump Tables Post by: UtillMasm on June 08, 2009, 05:11:22 AM i hate macro and the damnit english.
:wink Title: Re: EXE Jump Tables Post by: hutch-- on June 08, 2009, 05:17:03 AM :bg
> How ? You mean the MS linker has an option for that ? No, ML.EXE, pick your prototype type, get the style of calling you want, either direct or indirect. Ain't MASM great ! :P Title: Re: EXE Jump Tables Post by: dedndave on June 08, 2009, 05:23:46 AM Quote: i hate macro and the damnit english. wink that's "damned English" - lol and - we are all glad ms is not in Beijing writing code in Chinese - instructions actually WOULD execute - it'd be like a death sentance on every line somehow, i don't imagine UtillMasm is much of a swearer Title: Re: EXE Jump Tables Post by: rags on June 08, 2009, 09:45:37 AM Quote: No, ML.EXE, pick your prototype type, get the style of calling you want, either direct or indirect. Ain't MASM great ! Hutch (or anyone) , how does a function's prototype affect whether the function is called directly or indirectly through a jump table? How is a function prototyped to get a direct call to the api? Title: Re: EXE Jump Tables Post by: hutch-- on June 08, 2009, 09:57:23 AM Mike,
Without knowing the mechanism and how MASM was written, the best I can offer is that when you use one type of prototype, the assembler produces the code for an indirect call, when you use another type of prototype you get a direct call. Title: Re: EXE Jump Tables Post by: sinsi on June 08, 2009, 10:27:52 AM indirect
Code: ExitProcess proto :dword direct Code: p1 typedef proto :dword EXTERNDEF _imp__ExitProcess@4:PTR p1 ExitProcess TEXTEQU <_imp__ExitProcess@4> I think the library defines which one will be used - if you declare using _imp__ prefix then direct will be used. Title: Re: EXE Jump Tables Post by: Vortex on June 08, 2009, 04:44:10 PM Quote from: UtillMasm on June 08, 2009, 05:11:22 AM i hate macro and the damnit english. :wink Why do you hate macros? Don't you use invoke? It's a macro. Title: Re: EXE Jump Tables Post by: BlackVortex on June 08, 2009, 05:12:52 PM I started the macro hating trend, so I will respond. I hate macros that I don't know about. Invoke rocks my socks !
When I use macros and then look at my disassembled code when debugging, I see all kinds of weird crap between my nice code. It feels so out of place, like it's not my code. Also, their implementation feels weird and unituitive, so I've never learned how to create even the simplest macro. Title: Re: EXE Jump Tables Post by: dedndave on June 08, 2009, 05:53:29 PM Quote: I started the macro hating trend, so I will respond. that's not fair - i have been no big fan of them for a long time - lol actually, macros can be your friend the problems stem from using macros created by someone other than yourself some of them are helpful in getting a program up and running but, i would prefer to replace many of them with my own code at the end of the day this is true for any kind of program that i intend to distribute for programs of my own use, or for forum discussion/distribution - the macros are great we all speak the same "macro" language in here i also have to say - i have learned a lot by looking at how the macros were written Title: Re: EXE Jump Tables Post by: rags on June 09, 2009, 12:22:55 AM Thanks Hutch and Sinsi for the explanations. :U
Title: Re: EXE Jump Tables Post by: hutch-- on June 09, 2009, 12:28:55 AM BlackVortex,
There is a trick to it, read the documentation for the macro, look at how its written and if you don't like it, improve it. The action with macros is multifold, at the simplest its just a shortcut to get something done, at its most sophisticated it put the programmer in charge of language design without having some compiler designer holding your hand telling you what you can and cannot do. Title: Re: EXE Jump Tables Post by: BlackVortex on June 09, 2009, 01:07:56 AM Quote from: hutch-- on June 09, 2009, 12:28:55 AM BlackVortex, There is a trick to it, read the documentation for the macro, look at how its written and if you don't like it, improve it. Touche :thumbu But I don't really need it, using procs is enough for me. The last thing I need is more red tape. Title: Re: EXE Jump Tables Post by: Vortex on June 09, 2009, 10:00:55 AM Hi BlackVortex,
This document from MS can help you : MASM Programmer's Guide - Chapter Nine: Using Macros (http://webster.cs.ucr.edu/Page_TechDocs/MASMDoc/ProgrammersGuide/Chap_09.htm) Title: Re: EXE Jump Tables Post by: dedndave on June 11, 2009, 10:59:05 PM the Relat routine fixes INVOKE CALLs to always be relative
it has been adapted to work with the "Vortex" method, as well INCLUDE \masm32\include\masm32rt.inc EXTERNDEF _imp__GetCurrentProcess@0:PTR pr0 .CODE ;----------------------------------------------------------------------------- _main PROC ;modify the addresses mov esi,offset LabelA call Relat mov esi,offset LabelB call Relat ;test the functions after modification call Test1 exit _main ENDP ;----------------------------------------------------------------------------- Test1 PROC INVOKE GetCurrentProcess LabelA label dword print uhex$(eax),13,10 INVOKE _imp__GetCurrentProcess@0 LabelB label dword print uhex$(eax),13,10 ret Test1 ENDP ;----------------------------------------------------------------------------- Relat PROC ;Adjust the CALL address of an INVOKE to eliminate the IAT JMP ;Call With: ESI = address of code just after the INVOKE ; Returns: modifies the address of the INVOKE sub esi,6 push esi sub esp,4 INVOKE VirtualProtect, esi, 6, PAGE_EXECUTE_READWRITE, esp pop edx pop esi or eax,eax jz Relat3 cld lodsw cmp ah,0E8h jz Relat0 cmp ax,15FFh lodsd jnz Relat2 mov word ptr [esi-6],0E890h jmp short Relat1 Relat0: lodsd add eax,esi push esi xchg eax,esi lodsw cmp ax,25FFh lodsd pop esi jnz Relat2 Relat1: mov eax,[eax] sub eax,esi mov [esi-4],eax Relat2: sub esi,6 sub esp,4 INVOKE VirtualProtect, esi, 6, edx, esp add esp,4 Relat3: ret Relat ENDP ;----------------------------------------------------------------------------- END _main Title: Re: EXE Jump Tables Post by: dedndave on June 13, 2009, 02:38:38 AM Before modification: INVOKE GetCurrentProcess Address: 00401419 Code: E8 000001FA CALL 00401618 Address: 00401618 Code: FF 25 00402000 JMP DWord Ptr [00402000] Address: 00402000 Data: 7C80E00D INVOKE _imp__GetCurrentProcess@0 Address: 00401447 Code: FF 15 00402000 CALL DWord Ptr [00402000] Address: 00402000 Data: 7C80E00D After modification: INVOKE GetCurrentProcess Address: 00401419 Code: E8 7C40CBEF CALL 7C80E00D INVOKE _imp__GetCurrentProcess@0 Address: 00401447 Code: 90 NOP Address: 00401448 Code: E8 7C40CBC0 CALL 7C80E00D Function Test Results: GetCurrentProcess: FFFFFFFF _imp__GetCurrentProcess@0: FFFFFFFF i must be the only one that thinks this is cool as hell - lol [attachment deleted by admin]
The MASM Forum Archive 2004 to 2012 | Powered by SMF 1.0.12.
© 2001-2005, Lewis Media. All Rights Reserved. |