News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Counting 1s in a Bit-Stream

Started by Torben, November 10, 2010, 04:11:38 PM

Previous topic - Next topic

FORTRANS

Hi,

   PIII

Regards,

Steve

pre-P4 (SSE1)
28      cycles for Clive, result=4
34      cycles for Dave, result=4
16      cycles for Jochen, result=4
19      cycles for Brute1, result=4
19      cycles for Brute2, result=4

28      cycles for Clive, result=4
34      cycles for Dave, result=4
16      cycles for Jochen, result=4
19      cycles for Brute1, result=4
19      cycles for Brute2, result=4


--- ok ---

dedndave

you guys just enjoy making me look bad   :lol

MichaelW

For doing a single byte at a time, why not a table lookup?
eschew obfuscation

dedndave

yah - 256 bytes is a bit much - lol
but, you could split the byte in 2 and have a 16 byte LUT
it would probably rival the shake-n-bake method

jj2007

Popcount2.zip:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
16      cycles for Clive, result=4
22      cycles for Dave, result=4
16      cycles for Jochen, result=4
14      cycles for Brute1, result=4
15      cycles for Brute2, result=4

jj2007

#20
Quote from: MichaelW on November 10, 2010, 07:09:23 PM
For doing a single byte at a time, why not a table lookup?

.data
pcTable db 0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5
db 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
db 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
db 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
db 1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6
db 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
db 2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7
db 3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8
...

mov eax, Source
movzx eax, byte ptr pcTable[eax]


Certainly fast - one cycle on my Celeron.

Here is the DWORD version, running in 7 cycles:
push ecx
push ebx
mov ebx, Source
movzx ecx, bh
movzx eax, byte ptr pcTable[ecx]
movzx ecx, bl
movzx edx, byte ptr pcTable[ecx]
bswap ebx
movzx ecx, bh
add eax, edx
movzx edx, byte ptr pcTable[ecx]
add eax, edx
movzx ecx, bl
movzx edx, byte ptr pcTable[ecx]
add eax, edx
pop ebx
pop ecx

drizz

algo from hackers delight (http://www.hackersdelight.org/)
mov eax,123

imul eax,08040201h
shr eax,3
and eax,11111111h
imul eax,11111111h
shr eax,28


The truth cannot be learned ... it can only be recognized.


Antariy

Used not proper constant at my test...  :green2

Dave,  :P

Antariy

Quote from: dedndave on November 11, 2010, 12:52:11 AM
i don't see how you'd get anything but 0   :lol

Dave, BYTEs, not DWORDs. Put 0FFh into EAX, not 100h or 0FFFFFFFFh

drizz

Quote from: Antariy on November 11, 2010, 12:42:17 AM
Which is intention and usage of the algo?
Byte quantities, I thought it was obvious.

Quote from: Antariy on November 11, 2010, 12:42:17 AM
For 123T I got 00654321h.
it's impossible to get that after "SHR 28". You obviusly have different radix set.

123 is sample byte value
The truth cannot be learned ... it can only be recognized.

dedndave

ahhhhhhhhh - i see where i went wrong
it is 11111111h - not 11111111b
i saw all the ones and was thinking binary - lol

Antariy

Dave entangle hex with binary, I'm used hex constant instead of decimal  :P

dedndave

i would have seen it, had i looked in a debugger - lol

i saw "11111111" and thought in my head "0FFh"

Mirno

For longer than one byte (which is short enough to do tricks or LUTS with) the fastest way has always been something like:

  xor ecx, ecx
@@:
  lea edx, [eax - 1]
  inc ecx
  and eax, edx
  jnz @B


It should only loop once for each set bit, so on average it should beat the slightly simpler shr/adc as it must loop through zeros at the start of it's number.
As always, the "best" method depends on the "average" data (and also what you can bear as a worst case).

Thanks,

Mirno