News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

EXE Jump Tables

Started by dedndave, May 29, 2009, 05:51:54 PM

Previous topic - Next topic

jj2007

Quote from: dedndave on June 02, 2009, 06:14:47 PM
prescotts give funky numbers
they are high and inconsistant
which brings us full-circle back to why I was asking about the jump tables
as you know, Jochen, I am working on "super-daves" (tongue-in-cheek) timing routines for multi/single cores
i am playing with a few different methods of synchronizing threads
when I am done, I'll probably be the only one that uses the code - lol
but, hey, at least someone will be happy   :bg

Maybe I'll use them, too - if the price is right :bg

@Mark: Thanks for the timings, and for increasing the confusion :wink
45,000 instead of 8,000 for PostMessage is quite a big jmp - is that 64-bit progress??? ::)

dedndave

that is MS progress - they are getting smarter
newer OS's have built-in obsolecence
with XP, they have to look for ways to make it bad
they learned their lesson
when they want you to buy "Windows 8", they already have a plan for making you dislike 7

NightWare

Quote from: jj2007 on June 02, 2009, 06:30:38 AM
@NightWare: No, I can't explain it. That's why I posted it. Just guessing: Could it be that the code is performing some loops during the 8,000 cycles?? And that these loops finish in the cache??
But maybe you can explain it, and are willing to share your knowledge with us earthlings?
well, the problem when you use a WINapi, is you use an API TREE, in the case of PostMessage you call an api from a dll, and this dll will call a function from another dll (ntdll.dll). so the function put something in the cache (coz there is a loop for printing digits, certainly rep movsb or something like that...). but it's only few bytes, anyway not enough to explain the slowdown... for MOST of the algo nothing will be put on the cache, coz there is no reason ! (no code called frequently enough...). now why it's slow ?
1. because most of the algo read (and execute after) data from memory or l2 cache.
2. because you call/read code from SEVERALS memory or l2 cache locations (coz severals dll), and remember that memory IS SLOW.

EDIT :
3. i've forgotten the cost of the mispredictions generated by the calls and libraries jmp table. (note : and it's something usefull to know, there is no possible misprediction for ret, coz the return address is automatically stored by the call instruction...)

jj2007

Quote from: redskull on June 02, 2009, 02:15:14 PM
I don't know if it's applicable, but you can "really" directly call GetTickCount by just executing interrupt 2A (at least, you used to)

-r

Strangely enough, that still works under XP SP2...

include \masm32\include\masm32rt.inc

.code
start:
print "Diff int 2A - GetTickCount = "
invoke GetTickCount
push eax
invoke Sleep, 500
int 2ah
pop ecx
sub eax, ecx
print str$(eax), 13, 10
getkey
exit

end start


dedndave

hmmmmmm - "int 2Ah" - that looks familiar, somehow - is that a foriegn language?
i wonder what other interrupts work with 32-bit

hutch--

Here is a simple benchmark that tests in real time, has no interpretive code embedded in it and makes no assumptions about how to get the results, it simply adds each result to a variable then after the test has completed it divided each total by 8 to get the average.


1500 indirect call
1515 direct call
1500 indirect call
1500 direct call
1500 indirect call
1516 direct call
1515 indirect call
1516 direct call
1500 indirect call
1515 direct call
1516 indirect call
1516 direct call
1500 indirect call
1516 direct call
1500 indirect call
1500 direct call

1503 average indirect call timing
1511 average direct call timing

Press any key to continue ...


The only deviation in the results are due to granularity in GetTickCount results but with the sample size the deviation is far under 1%. Timing shows there is no meaningful difference between the speed of indirect versus direct calls on the Prescott 3.2 gig PIV I am using. This is consistent with my other PIVs. Uninterpreted real time testing is the only safe way to decide matter like this for the very little it is worth, the more complicated and interpreted the testing method becomes, the more unreliable its results are.

The test piece.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
    include \masm32\include\masm32rt.inc
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

comment * -----------------------------------------------------
                        Build this  template with
                       "CONSOLE ASSEMBLE AND LINK"
        ----------------------------------------------------- *


    externdef _imp__GetTickCount@0:PTR pr0
    GetTickCountX equ <_imp__GetTickCount@0>


    .data?
      value dd ?

    .data
      item dd 0

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL cnt1  :DWORD
    LOCAL cnt2  :DWORD

    mov cnt1, 0
    mov cnt2, 0

    lpcnt equ <300000000>

    invoke SetPriorityClass,rv(GetCurrentProcess),REALTIME_PRIORITY_CLASS

    invoke SleepEx,100,0

    push esi

  ; =================================================

    REPEAT 8

    mov esi, lpcnt

    invoke GetTickCount
    push eax

  @@:
    invoke GetTickCount     ; << tested API
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx
    add cnt1, eax
    print str$(eax)," indirect call",13,10

    invoke SleepEx,100,0

    mov esi, lpcnt

    invoke GetTickCount
    push eax

  @@:
    invoke GetTickCountX    ; << tested API
    sub esi, 1
    jnz @B

    invoke GetTickCount
    pop ecx
    sub eax, ecx
    add cnt2, eax
    print str$(eax)," direct call",13,10

    invoke SleepEx,100,0

    ENDM

  ; =================================================

    invoke SetPriorityClass,rv(GetCurrentProcess),NORMAL_PRIORITY_CLASS

  ; format the output
  ; -----------------

    print chr$(13,10)

    shr cnt1, 3
    print str$(cnt1)," average indirect call timing",13,10

    shr cnt2, 3
    print str$(cnt2)," average direct call timing",13,10,13,10


    pop esi

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

UtillMasm

 :U@echo off
\masm32\bin\ml.exe /c /coff /Fohutch.obj /nologo hutch.asm
\masm32\bin\link.exe /subsystem:console /out:hutch.exe hutch.obj /nologo
pause
2407 indirect call
2047 direct call
2390 indirect call
2046 direct call
2219 indirect call
2047 direct call
2219 indirect call
2031 direct call
2218 indirect call
1875 direct call
2219 indirect call
2031 direct call
2218 indirect call
2047 direct call
2204 indirect call
2047 direct call

2261 average indirect call timing
2021 average direct call timing

Press any key to continue ...

hutch--

UtillMasm,

Are you using a Core 2 Duo ?
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

UtillMasm



Vortex

Direct call function declarations moved to a custom invoke macro :


_invoke MACRO FuncName:REQ,args:VARARG

LOCAL counter,counter2,params

params      TEXTEQU <>
counter      = 0
counter2     = 0

    FOR param,<args>

        counter=counter+1

    ENDM

    counter2 = 4*counter


    EXTERNDEF @CatStr(_imp__&FuncName&@,%counter2) : PTR @CatStr(<pr>,%counter)

    FuncName EQU <@CatStr(_imp__&FuncName&@,%counter2)>

    IF counter
        invoke  FuncName,args
       
    ELSE
        invoke  FuncName
       
    ENDIF

ENDM

[attachment deleted by admin]

BlackVortex

I haven't read the whole thread, but I just want to add that goasm+golink don't use import jump table. The calls to the API point directly to the import table.   :cheekygreen:

dedndave

i think some guys are still missing the point altogether

        mov     esi,labelA-4
        mov     esi,labelA[esi+2]
        mov     eax,[esi]
        call    eax
        exit

        INVOKE  GetCurrentProcess
labelA  label   dword

this code gets the address for a direct API call
in this example, it is not the same thing as _imp__GetCurrentProcess

it's all good, though - i have what i wanted - lol

mitchi

Quote from: Vortex on June 07, 2009, 06:27:19 PM
Direct call function declarations moved to a custom invoke macro :


_invoke MACRO FuncName:REQ,args:VARARG

LOCAL counter,counter2,params

params      TEXTEQU <>
counter      = 0
counter2     = 0

    FOR param,<args>

        counter=counter+1

    ENDM

    counter2 = 4*counter


    EXTERNDEF @CatStr(_imp__&FuncName&@,%counter2) : PTR @CatStr(<pr>,%counter)

    FuncName EQU <@CatStr(_imp__&FuncName&@,%counter2)>

    IF counter
        invoke  FuncName,args
       
    ELSE
        invoke  FuncName
       
    ENDIF

ENDM


WoW!!! That's just sweet Vortex!
So the next time we need help in a thread, we can do _invoke Vortex now  :green


ecube

Quote from: BlackVortex on June 07, 2009, 06:36:55 PM
I haven't read the whole thread, but I just want to add that goasm+golink don't use import jump table. The calls to the API point directly to the import table.   :cheekygreen:

that's because Jeremy is a genius.