ASM for FUN - #5 SUB [Displaying random numbers generated]

Started by frktons, April 25, 2010, 06:17:46 AM

Previous topic - Next topic

frktons

After the generation of the sequence of random numbers covered in the previous SUB,
now it's time to "see" what we have produced. It's always better to check if everything
is on place before moving to the difficult #6 STEP.  :P

Working on #5 right now.

Enjoy
Mind is like a parachute. You know what to do in order to use it :-)

frktons

Here we are with the generating package and the displaying one.
In the attached file:

1) RandGen3.bas  -- Source code
2) RandGen3.exe  -- the executable - generate the numbers and create the file RandGen.dat - execute first
3) RandGen.scn    -- Screen format to display the CPU cycles and the Milliseconds elapsed to generate the numbers
4) RandGenView.bas -- Source code
5) RandGenView.exe -- executable to "see" what we produced with first step
6) RandGenView.scn -- Screen format to display the groups of numbers with surfing keys

in RandGenView.exe if you press PgUp you go back 100 datarec, and PgDown takes you 100 datarec forward

Now we go back to SUB #4 for the optimization task.

If there is any inconsistence please let me know.

Cheers

Frank
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

i come up with 1736 clock cycles per 4-value group on my prescott
i am guessing we should be able to get that down to something like 100 clock cycles

dang - i ran it several more times and can't get under 2175 with it for some reason

frktons

Quote from: dedndave on April 25, 2010, 02:26:21 PM
i come up with 1736 clock cycles per 4-value group on my prescott
i am guessing we should be able to get that down to something like 100 clock cycles

dang - i ran it several more times and can't get under 2175 with it for some reason

If we'll get to 100 cycles per 4-value group it'll be great. I'll think about something myself
as the algos approach my mind  ::)
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

that's only an "order of magnitude" estimation
it may be as high as 200

frktons

Quote from: dedndave on April 25, 2010, 04:27:43 PM
that's only an "order of magnitude" estimation
it may be as high as 200

While I was looking for some good idea, I tried to estrapolate just the RND function and used it
320,000 times to see what kind of performance it has, and it showed that it is very slow, as you
already knew.

The RND function alone takes about 600 CPU cycles per 4-value group on my core duo processor.
It is quite a lot.  ::)
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

it probably generates some very "random" data, too (i.e., it's a good generator)
the one we use may not be as good
but, we only need to generate 320,000 random numbers
these generators repeat a pattern after so many pulls
that's not the only measure of a random number generator, though
for what we want, we can sacrifice some randomness for some speed

EDIT - that brings up another point
random numbers may not be the best approach for testing what you want to do
because, so much of the performance measurement is based on the generator
in the real-world application, these values probably come from a file or something
it would be more realistic to test file reading speed than random number generation   :bg
you could remove the random number generation from your test completely
they might be pre-generated or even set to 0
then, you would be spending time optimizing the part of the code that will actually be used

frktons

Quote from: dedndave on April 25, 2010, 05:13:44 PM
it probably generates some very "random" data, too (i.e., it's a good generator)
the one we use may not be as good
but, we only need to generate 320,000 random numbers
these generators repeat a pattern after so many pulls
that's not the only measure of a random number generator, though
for what we want, we can sacrifice some randomness for some speed

EDIT - that brings up another point
random numbers may not be the best approach for testing what you want to do
because, so much of the performance measurement is based on the generator
in the real-world application, these values probably come from a file or something
it would be more realistic to test file reading speed than random number generation   :bg
you could remove the random number generation from your test completely
they might be pre-generated or even set to 0
then, you would be spending time optimizing the part of the code that will actually be used

Agreed. I think we can move to STEP #6, where probably we can use some ASM trick to
apply in what I've in mind.  :P
Mind is like a parachute. You know what to do in order to use it :-)

MichaelW

This one should be much faster, 18 cycles on my P3:

rand32 proc
    mov eax, rand_seed
    mov ecx, 16807              ; a = 7^5
    mul ecx                     ; edx:eax == a*seed == D:A
    mov ecx, 7fffffffh          ; ecx = m
    add edx, edx                ; edx = 2*D
    cmp eax, ecx                ; eax = A
    jna @F
    sub eax, ecx                ; if A>m, A = A - m
  @@:
    add eax, edx                ; eax = A + 2*D
    jns @F
    sub eax, ecx                ; If (A + 2*D)>m
  @@:
    mov rand_seed, eax          ; save new seed
    ret
rand32  endp


It is the Rand32 code posted by Abel here, without the scaling code.
eschew obfuscation

frktons

Quote from: MichaelW on April 25, 2010, 11:41:00 PM
This one should be much faster, 18 cycles on my P3:

rand32 proc
    mov eax, rand_seed
    mov ecx, 16807              ; a = 7^5
    mul ecx                     ; edx:eax == a*seed == D:A
    mov ecx, 7fffffffh          ; ecx = m
    add edx, edx                ; edx = 2*D
    cmp eax, ecx                ; eax = A
    jna @F
    sub eax, ecx                ; if A>m, A = A - m
  @@:
    add eax, edx                ; eax = A + 2*D
    jns @F
    sub eax, ecx                ; If (A + 2*D)>m
  @@:
    mov rand_seed, eax          ; save new seed
    ret
rand32  endp


It is the Rand32 code posted by Abel here, without the scaling code.


Nice one, I'll have a look at that. I'm quite slow in the process of learning, so
it could take a while.

Only one question: it takes 18 cycles for each generated number?
It is pretty much faster than RND function in PB  :U
Mind is like a parachute. You know what to do in order to use it :-)

MichaelW

QuoteOnly one question: it takes 18 cycles for each generated number?

Yes, and scaling the value with DIV would more than double the cycles. I eliminated the scaling because I was experimenting with extracting 4 numbers from each 32-bit value, with each of the numbers scaled to a limited range (1 to 50). The problem is with doing this efficiently and in a way that will result in a uniform distribution of the values. I was hoping to somehow combine the extraction operation with the scaling operation, but I'm nowhere near anything workable.

eschew obfuscation

frktons

Quote from: MichaelW on April 26, 2010, 01:40:56 AM
QuoteOnly one question: it takes 18 cycles for each generated number?

Yes, and scaling the value with DIV would more than double the cycles. I eliminated the scaling because I was experimenting with extracting 4 numbers from each 32-bit value, with each of the numbers scaled to a limited range (1 to 50). The problem is with doing this efficiently and in a way that will result in a uniform distribution of the values. I was hoping to somehow combine the extraction operation with the scaling operation, but I'm nowhere near anything workable.



The idea is interesting. Let me know if you find something doable and fast enough.  :U
Mind is like a parachute. You know what to do in order to use it :-)