News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Code timing macros

Started by MichaelW, February 16, 2005, 03:21:52 AM

Previous topic - Next topic

sysfce2

Sorry, I didn't word that very well.  I am talking about counter2.asm.

I guess what I'm asking is if the jump on line 101 is intended to jump to the local label on line 98 (as is the current behaviour) or the local label on line 60 (which is unused).

I believe counters2 is for the lowest time taken while timers is an average.  Am I correct and what is the preferred choice for benchmarking?

Thanks,
sysfce2

MichaelW

The jump on line 101 is supposed to jump to the local label on line 98. The local label on line 60 is not supposed to be there.
eschew obfuscation

dedndave

hate to wake up an old thread   :P

as many of you know, i have always had a bit of trouble getting reliable timing numbers on my p4 prescott
while researching CPUID, i came across a little piece of info on some gamer forum
their thread had nothing to do with timing code, but it rang a bell in my head
i decided to try it out for the fun of it, and what do ya know - it helped my timing issue
it is a simple change in the registry

of course, unless you are having trouble getting reliable readings, i wouldn't suggest altering the registry

REGEDIT4

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet002\Control\Session Manager\Throttle]
"PerfEnablePackageIdle"=dword:00000001


any non-zero value turns it on - zero turns it off
reg file attached

another item i saw in the same thread added this to the boot.ini line
this one is more likely to cause trouble than to fix it, but i thought i would mention it for the sake of completness

/usepmtimer

MichaelW

Interesting, but does it actually improve the repeatability on your system? Some additional information:

http://support.microsoft.com/kb/896256

eschew obfuscation

dedndave

yah - i already have that KB installed   :P
it makes a big difference, Michael
i wish i had known about it when i was working on ling long kai fang - would have saved me a lot of time

KeepingRealBusy

MichaelW

I have been using the timing macros in the Compare Two Strings Topic here in the Laboratory. I added a .list at the start of the .code segment and added a /Fl to create a .lst file to look at the created code, mainly in the test cases, but noticed that the timing and counting macros themselves were hard to follow with jumps to labels that were not present:


000000D0      2       ??001F:                 
000000D0  68 00000001 R     1 push offset Src1
000000D5  68 0000D702 R     1 push offset Src2
000000DA  E8 00003ADC      1 call StrCmpSSE
000000DF  83 2D 0001D5FC R  2         sub   __counter__loop__counter__, 1
   01
000000E6  75 E8      2         jnz   __counter__loop__label__
000000E8  33 C0      2         xor   eax, eax


Adding the jnz offset of E8 to the next instruction offset of E8 gives the correct target of D0 (??001f). I found that if you modified these macros to add a .listmacroall and .listmacro to the counter_begin and timer_begin macros as follows you will get a clearer listing (at least you will get a label that matches the jnz target):


.listmacroall
          label:                ;; Start test loop
            __timer__loop__label__ equ <label>
.listmacro


This is the resulting listing fragment:


     2 .listmacroall
000000D0      2       ??001F:                 
= ??001F      2         __counter__loop__label__ equ <??001F>
     2 .listmacro
000000D0  68 00000001 R     1 push offset Src1
000000D5  68 0000D702 R     1 push offset Src2
000000DA  E8 00003ADC      1 call StrCmpSSE
000000DF  83 2D 0001D5FC R  2         sub   __counter__loop__counter__, 1
   01
000000E6  75 E8      2         jnz   __counter__loop__label__
000000E8  33 C0      2         xor   eax, eax


Of course, turning off .list eliminates all code, leaving only the symbols and deleting the ML option /Fl deletes the .lst file.

Dave.

ecube

MichaelW I been using a lot of Nt functions lately and I came across NtDelayExecution, the significance of this over regular winapi Sleep, is the resolution is 'Delay in 100-ns units', and they're are 1000000 nanoseconds in 1 millisecond so this much better than sleeps 10 ms delay, which could make your macros that much more accurate.

oex

.... There is some interesting info on timing posted here:

http://www.masm32.com/board/index.php?topic=14031.0

NtDelayExecution wasnt suggested but other 100ns timing was, it may be of interest for comparison
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

dedndave

i don't think Michael's code relies on the resolution of Sleep   :red
he uses Sleep simply to relinquish execution until the next time-slice
it may still provide a slight improvement

milomir

pentium dual-core e5500 @3.2ghz
Quote
20 cycles
24 cycles
60 ms
75 ms

Quote
HIGH_PRIORITY_CLASS
-14 cycles, empty
-14 cycles, mov eax,1
-14 cycles, mov eax,1 mov eax,2
-14 cycles, nops 4
-14 cycles, mul ecx
-14 cycles, rol ecx,32
0 cycles, rcr ecx,31
14 cycles, div ecx
14 cycles, StrLen

REALTIME_PRIORITY_CLASS
0 cycles, empty
-210 cycles, mov eax,1
-14 cycles, mov eax,1 mov eax,2
-182 cycles, nops 4
-14 cycles, mul ecx
-28 cycles, rol ecx,32
0 cycles, rcr ecx,31
14 cycles, div ecx
14 cycles, StrLen

dedndave

 :bg
what code are you running ?
there may be some things we can do to improve the results

ok - i think i found it
it is either c2test2x.exe or c2test2x-idle.exe

try placing this code at the beginning of the program, then re-assemble
        INVOKE  GetCurrentProcess
        INVOKE  SetProcessAffinityMask,eax,1
        INVOKE  Sleep,750


my machine is known for turning out unusualy results
Michael suggested using 500 mS
but, on my machine, if i wait 750 mS for it to bind, i get much nicer numbers