News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

An Easter egg for DednDave

Started by jj2007, April 11, 2012, 11:00:12 PM

Previous topic - Next topic

jj2007

... who refuses to push flags because it's so awfully slow. Here is a workaround :bg

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
2608    cycles for pushf/popf
306     cycles for lahf/sahf
893     cycles for lahf/sahf with push/pop eax

2607    cycles for pushf/popf
306     cycles for lahf/sahf
892     cycles for lahf/sahf with push/pop eax

2608    cycles for pushf/popf
306     cycles for lahf/sahf
882     cycles for lahf/sahf with push/pop eax

dedndave

 :bg

LAHF/PUSHFD aren't too bad - it's SAHF/POPFD that are slow - the difference is that they may alter flags
also - CLD/STD are slow and probably STI/CLI (never tried those - lol)
STC, CLC, and CMC seem to be ok   :U

dedndave


prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
9693    cycles for pushf/popf
851     cycles for lahf/sahf
1405    cycles for lahf/sahf with push/pop eax

9726    cycles for pushf/popf
847     cycles for lahf/sahf
1405    cycles for lahf/sahf with push/pop eax

9685    cycles for pushf/popf
845     cycles for lahf/sahf
1404    cycles for lahf/sahf with push/pop eax


i am still waiting for the egg   :bg

jj2007

Quote from: dedndave on April 11, 2012, 11:13:10 PM
i am still waiting for the egg   :bg

Hey, I gave you a factor 11 speed up for your prescott w/handbrakes on :bg


FORTRANS

Hi,

   Another set of data points.

Cheers,

Steve


pre-P4 (SSE1)
2521 cycles for pushf/popf
301 cycles for lahf/sahf
1208 cycles for lahf/sahf with push/pop eax

2521 cycles for pushf/popf
301 cycles for lahf/sahf
1208 cycles for lahf/sahf with push/pop eax

2522 cycles for pushf/popf
301 cycles for lahf/sahf
1209 cycles for lahf/sahf with push/pop eax


--- ok ---

pre-P4919 cycles for pushf/popf
613 cycles for lahf/sahf
715 cycles for lahf/sahf with push/pop eax

918 cycles for pushf/popf
614 cycles for lahf/sahf
718 cycles for lahf/sahf with push/pop eax

924 cycles for pushf/popf
614 cycles for lahf/sahf
715 cycles for lahf/sahf with push/pop eax


--- ok ---

Intel(R) Pentium(R) M processor 1.70GHz (SSE2)
2727 cycles for pushf/popf
309 cycles for lahf/sahf
696 cycles for lahf/sahf with push/pop eax

2728 cycles for pushf/popf
309 cycles for lahf/sahf
721 cycles for lahf/sahf with push/pop eax

2727 cycles for pushf/popf
309 cycles for lahf/sahf
680 cycles for lahf/sahf with push/pop eax


--- ok ---

TASMUser

In case somebody is interested in this...

AMD Athlon(tm) II X4 635 Processor (SSE3)
1495 cycles for pushf/popf
395 cycles for lahf/sahf
883 cycles for lahf/sahf with push/pop eax

1500 cycles for pushf/popf
395 cycles for lahf/sahf
883 cycles for lahf/sahf with push/pop eax

1501 cycles for pushf/popf
394 cycles for lahf/sahf
883 cycles for lahf/sahf with push/pop eax


--- ok ---


jj2007

Thanks. So it seems that on all CPUs a lahf/sahf combi is a much faster way to save the flags. Good to know :P

dedndave

yah - if only they stored the overflow flag   ::)

hutch--

Isn't this technology something like trying to tune the last 2% of performance out of a T model Ford ?
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on April 13, 2012, 06:13:59 AM
Isn't this technology something like trying to tune the last 2% of performance out of a T model Ford ?

It is true that there is rarely a reason to save and restore flags in a speed-critical loop. On the other hand, 3 instead of 15 cycles is a good argument. Unless you have a better technology to offer, of course.

hutch--

Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php