The MASM Forum Archive 2004 to 2012
Welcome, Guest. Please login or register.
October 01, 2020, 05:04:17 PM

Login with username, password and session length
Search:     Advanced search
128553 Posts in 15254 Topics by 684 Members
Latest Member: mottt
* Home Help Search Login Register
+  The MASM Forum Archive 2004 to 2012
|-+  General Forums
| |-+  The Workshop
| | |-+  Fast bitwise swap?
« previous next »
Pages: 1 [2] 3 Print
Author Topic: Fast bitwise swap?  (Read 17386 times)
dedndave
Member
*****
Posts: 12523


Re: Fast bitwise swap?
« Reply #15 on: November 18, 2009, 02:47:12 PM »

i would think a 256 byte LUT would be a good comprimise between size and speed
really, how fast does this thing need to be ?
just my 2 cents - lol
now, you are 2 cents richer, i guess   BigGrin
don't spend it all at once
« Last Edit: November 18, 2009, 03:56:01 PM by dedndave » Logged
drizz
Member
*****
Gender: Male
Posts: 628



Re: Fast bitwise swap?
« Reply #16 on: November 19, 2009, 01:02:31 AM »

Hello everyone,
Cool
Code:
MirrorEAX:
bswap eax
mov edx,00001111000011110000111100001111b
and edx,eax
and eax,11110000111100001111000011110000b
shl edx,4
shr eax,4
or eax,edx
mov edx,00110011001100110011001100110011b
and edx,eax
and eax,11001100110011001100110011001100b
shr eax,2
lea eax,[edx*4+eax]; shl edx,2; or eax,edx
mov edx,01010101010101010101010101010101b
and edx,eax
and eax,10101010101010101010101010101010b
shr eax,1
lea eax,[edx*2+eax]; shl edx,1; or eax,edx
Logged

The truth cannot be learned ... it can only be recognized.
dedndave
Member
*****
Posts: 12523


Re: Fast bitwise swap?
« Reply #17 on: November 19, 2009, 01:55:54 AM »

very cool, Drizz   ThumbsUp
i will have to remember that one - lol
did you just make that up ? - or have you used it before ?
Logged
MichaelW
Global Moderator
Member
*****
Gender: Male
Posts: 5161


Re: Fast bitwise swap?
« Reply #18 on: November 19, 2009, 03:57:06 AM »

The attachment tests everything except the SSE2 code, all for 32-bit values and encapsulated in procedures without a stack frame.
Cycle counts on a P3:
Code:
27 cycles, qword 256-byte-lut, 313 bytes
101 cycles, dave shr-rcl-loop, 19 bytes
12 cycles, drizz stir-fry, 64 bytes
13 cycles, drizz/lingo, 68 bytes

* BitwiseSwap.zip (3.05 KB - downloaded 255 times.)
« Last Edit: November 19, 2009, 07:26:57 AM by MichaelW » Logged

eschew obfuscation
sinsi
Member
*****
Gender: Male
Posts: 1758


RIP Bodie 1999-2011


Re: Fast bitwise swap?
« Reply #19 on: November 19, 2009, 04:06:31 AM »

Q6600
Code:
7 cycles, qword 256-byte-lut, 313 bytes
63 cycles, dave shr-rcl-loop, 19 bytes
6 cycles, drizz stir-fry, 64 bytes

One day drizz you will comment your code so we know what in the hell you are on about  BigGrin very cool.
Logged

Light travels faster than sound, that's why some people seem bright until you hear them.
lingo
Member
*****
Posts: 625



Re: Fast bitwise swap?
« Reply #20 on: November 19, 2009, 07:08:51 AM »

MichaelW,
Please include this in your test as a bitswap_drizz_1:(not tested) wink
Code:
align 16
bitswap_drizz proc ddval:DWORD
pop ecx
pop eax
bswap        eax
mov edx,eax
and eax,11110000111100001111000011110000b
shr eax,4
and edx,00001111000011110000111100001111b
shl edx,4
mov ecx, 00110011001100110011001100110011b
or eax, edx
mov edx, eax
and eax,11001100110011001100110011001100b
shr eax,2
and edx,ecx
lea eax,[edx*4+eax]; shl edx,2; or eax,edx
mov ecx,01010101010101010101010101010101b
        mov             edx,eax
        and eax,10101010101010101010101010101010b
        shr eax,1
and edx,ecx
lea eax,[edx*2+eax]; shl edx,1; or eax,edx
jmp dword ptr [esp-2*4]
Logged
sinsi
Member
*****
Gender: Male
Posts: 1758


RIP Bodie 1999-2011


Re: Fast bitwise swap?
« Reply #21 on: November 19, 2009, 07:26:09 AM »

Code:
7 cycles, qword 256-byte-lut, 313 bytes
63 cycles, dave shr-rcl-loop, 19 bytes
6 cycles, drizz stir-fry, 68 bytes
Replaced. 4 bytes more, jj will not be happy, jan.

edit:replaced into the drizz stir-fry, sorry lingo. redface
Logged

Light travels faster than sound, that's why some people seem bright until you hear them.
lingo
Member
*****
Posts: 625



Re: Fast bitwise swap?
« Reply #22 on: November 19, 2009, 02:30:08 PM »

sinsi,
Please test it for me too..:
The code is by bitRake and Nexo wink
Code:

align 16
bitNexo proc ddval:DWORD
pop  edx
pop  eax
mov  edx,eax
shr  eax,1
and  edx,055555555h
and  eax,055555555h
lea  eax,[2*edx+eax]
mov  edx,eax
shr  eax,2
and  edx,033333333h
and  eax,033333333h
lea  eax,[4*edx+eax]
mov  edx,eax
shr  eax,4
and  edx,0F0F0F0Fh
and  eax,0F0F0F0Fh
shl  edx,4
add  eax,edx
bswap eax
jmp  dword ptr [esp-2*4]
Logged
drizz
Member
*****
Gender: Male
Posts: 628



Re: Fast bitwise swap?
« Reply #23 on: November 19, 2009, 02:37:56 PM »

did you just make that up ? - or have you used it before ?
I remembered that i used similar technique for bit count (popcnt), so I just applied that for mirroring.
One day drizz you will comment your code so we know what in the hell you are on about  BigGrin very cool.
Code:
0123 4567 89AB CDEF | GHIJ KLMN OPQR STUV
------------------------------------------- swap 16 bits _
GHIJ KLMN OPQR STUV | 0123 4567 89AB CDEF                 \ bswap
------------------------------------------- swap 8 bits  _/
OPQR STUV GHIJ KLMN | 89AB CDEF 0123 4567
------------------------------------------- swap 4 bits
STUV OPQR KLMN GHIJ | CDEF 89AB 4567 0123
------------------------------------------- swap 2 bits
UVST QROP MNKL IJGH | EFCD AB89 6745 2301
------------------------------------------- swap 1 bit
VUTS RQPO NMLK JIHG | FEDC BA98 7654 3210
Logged

The truth cannot be learned ... it can only be recognized.
dedndave
Member
*****
Posts: 12523


Re: Fast bitwise swap?
« Reply #24 on: November 19, 2009, 03:09:36 PM »

        lea     [reg32+n*reg32]

for n <> 1, it isn't a very fast instruction on many processors
Logged
lingo
Member
*****
Posts: 625



Re: Fast bitwise swap?
« Reply #25 on: November 19, 2009, 03:51:07 PM »

You can try without lea..:
Code:
align 16
swapbits proc ddval:DWORD
pop  ecx
pop  eax
        bswap   eax           
mov     edx,eax       
and     eax,00F0F0F0Fh
xor     edx,eax       
rol     eax,8           
or      eax,edx
mov     edx,eax
and     eax,033333333h
xor     edx,eax
rol     eax,4
or      eax,edx       
mov     edx,eax       
and     eax,055555555h
xor     edx,eax
rol     eax,2
or      eax,edx
ror     eax,7           
jmp     ecx
Logged
dedndave
Member
*****
Posts: 12523


Re: Fast bitwise swap?
« Reply #26 on: November 19, 2009, 05:35:59 PM »

i like how you TRIED to sneak those XOR's in there on us - lol
that is the best non-simd code yet, i think
Logged
sinsi
Member
*****
Gender: Male
Posts: 1758


RIP Bodie 1999-2011


Re: Fast bitwise swap?
« Reply #27 on: November 20, 2009, 07:21:40 AM »

Code:
align 16
swapbits proc ddval:DWORD
pop ecx
pop eax
...
jmp ecx
Just a question, aren't cpu's optimised for call/ret?
Wouldn't a simple "mov eax,[esp+4]" with a "ret 4" be quicker?
Logged

Light travels faster than sound, that's why some people seem bright until you hear them.
jj2007
Member
*****
Gender: Male
Posts: 6011



Re: Fast bitwise swap?
« Reply #28 on: November 20, 2009, 08:00:42 AM »

The attachment tests everything except the SSE2 code, all for 32-bit values and encapsulated in procedures without a stack frame.
Cycle counts on a P3:
Code:
27 cycles, qword 256-byte-lut, 313 bytes
101 cycles, dave shr-rcl-loop, 19 bytes
12 cycles, drizz stir-fry, 64 bytes
13 cycles, drizz/lingo, 68 bytes


Code:
bitswap_qword endp
size_qword = $-bitswap_qword+256
Interesting technique to get the code size, thanks Michael. It fails for ml 9.0 but JWasm and ml 6.14 are fine.
My timings on Celeron M:
Code:
9 cycles, qword 256-byte-lut, 313 bytes
114 cycles, dave shr-rcl-loop, 19 bytes
9 cycles, drizz stir-fry, 64 bytes
9 cycles, drizz/lingo, 68 bytes
New algos proposed by Lingo:
Code:
9       cycles for swapbits
7       cycles for bitNexo
9       cycles for bitswap_drizz
Logged

lingo
Member
*****
Posts: 625



Re: Fast bitwise swap?
« Reply #29 on: November 21, 2009, 03:37:46 PM »

"Wouldn't a simple "mov eax,[esp+4]" with a "ret 4" be quicker?"

With my CPU NO...I like drizz's algo so, just copied and changed "mov eax,[esp+4]"&"ret 4"
with "pop ecx"&"jmp ecx" and you can see the difference:
Code:

C:\My Documents\ASM\bitswap>test
00010010 00110100 01010110 01111000
00011110 01101010 00101100 01001000
00011110 01101010 00101100 01001000
00011110 01101010 00101100 01001000

8 cycles, qword 256-byte-lut, 313 bytes
6 cycles, drizz stir-fry, 64 bytes
3 cycles, drizzNew stir-fry, 61 bytes

Press any key to exit...

3 cycles and 3 bytes less. wink

* bitswap.zip (2.73 KB - downloaded 275 times.)
Logged
Pages: 1 [2] 3 Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP The MASM Forum Archive 2004 to 2012 | Powered by SMF 1.0.12.
© 2001-2005, Lewis Media. All Rights Reserved.
Valid XHTML 1.0! Valid CSS!