News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Random Generator

Started by Neil, June 17, 2009, 12:44:13 PM

Previous topic - Next topic

Antariy

Quote from: FORTRANS on November 05, 2010, 10:09:30 PM
   Do you mean you want it to output a limited set of integers?

Yes, Steve. Just at programming we more frequentrly have need in limited output value, instead of "true random long pad". For generation of the coordinates etc. That makes algo more useful in general usage.



Alex

dedndave

Alex,
you can add some clipping/scaling (i like to clip the ends off) code that can be used with any generator
the scaling code wouldn't change, and thus not affect the timing

this would be a great routine to add some auto-reseed code, too
i seem to recall we played with MWC in the forum before

Antariy

Quote from: dedndave on November 05, 2010, 10:19:13 PM
you can add some clipping/scaling (i like to clip the ends off) code that can be used with any generator
the scaling code wouldn't change, and thus not affect the timing

this would be a great routine to add some auto-reseed code, too
i seem to recall we played with MWC in the forum before

Dave, thanks for advice, but I know that :P
Just I'm dislike constructions, which is convert something to something other  :lol If algo can do this internally - that would be better.



Alex

dedndave

SeedCnt dd      1
Rnd32lo dd      ?
Rnd32hi dd      ?


ASeed   PROC

        dec     SeedCnt
        jnz     ASeed0

        push    edx
        push    eax
        INVOKE  QueryPerformanceCounter,esp
        pop     Rnd32hi
        pop     eax
        mov     Rnd32lo,eax
        and     eax,07Fh
        or      al,80h
        mov     SeedCnt,eax

ASeed0: mov     edx,1517746329
        mov     eax,Rnd32lo
        mul     edx
        add     eax,Rnd32hi
        adc     edx,0
        mov     Rnd32lo,eax
        mov     Rnd32hi,edx
        ret

ASeed   ENDP

Antariy

The faster, simple to using, RNG with DWORD output is:

mov eax,[esp+123]


:P

:green2

The seed is "123" - next time needed to use something other  :green2



Alex

dedndave

what ? - you guys don't like my routine ?

Antariy

Quote from: dedndave on November 06, 2010, 01:52:39 AM
what ? - you guys don't like my routine ?

You mistaked - I like it.  :bg

Just mov eax,[esp+123] is faster :P



Alex

dedndave

#127
that thing will spit out some nice garbage   :P

here is a slight improvement...
SeedCnt dd      1
Rnd32hi dd      ?
Rnd32lo dd      ?


ASeed   PROC

        dec     SeedCnt
        jnz     ASeed0

        INVOKE  QueryPerformanceCounter,offset Rnd32hi
        movzx   eax,byte ptr Rnd32hi
        or      al,80h
        mov     SeedCnt,eax

ASeed0: mov     edx,1517746329
        mov     eax,Rnd32lo
        mul     edx
        add     eax,Rnd32hi
        adc     edx,0
        mov     Rnd32lo,eax
        mov     Rnd32hi,edx
        ret

ASeed   ENDP


EDIT: changed this...
        movzx   eax,byte ptr Rnd32lo
to this...
        movzx   eax,byte ptr Rnd32hi
for improved randomnicity

Antariy

Quote from: dedndave on November 06, 2010, 03:45:42 AM
that thing will spit out some nice garbage   :P

here is a slight improvement...

This is improvement of my algo :P


OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
Axrand_s:
Axrand proc STDCALL dwRange:DWORD
mov edx,Initial
mov eax,Initial
imul edx,12347
add edx,17
bswap edx
mov Initial,edx
mul dword ptr [esp+4]
mov eax,edx
ret 4
Axrand endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


4 cycles faster on my CPU

ECX is not used intentionally.



Alex

Antariy

There is Jochen's test, with my slightly optimized algo, and with Steve's algo.

There is results which I gets:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
16      cycles for Axrand
17      cycles for Axrand2
14      cycles for Axrand3
113     cycles for LimitRand
13      cycles for Rand()

15      cycles for Axrand
16      cycles for Axrand2
14      cycles for Axrand3
113     cycles for LimitRand
13      cycles for Rand()

15      cycles for Axrand
17      cycles for Axrand2
14      cycles for Axrand3
113     cycles for LimitRand
12      cycles for Rand()

14      cycles for Axrand
17      cycles for Axrand2
14      cycles for Axrand3
112     cycles for LimitRand
13      cycles for Rand()

13      cycles for Axrand
17      cycles for Axrand2
14      cycles for Axrand3
113     cycles for LimitRand
12      cycles for Rand()

37       bytes for Axrand
30       bytes for Axrand2
31       bytes for Axrand3
29       bytes for Rand()
Add 7-10 bytes per call


Working with carry on PIV is very slow  :eek
Steve, have a look - maybe something wrong?  :eek I only make returning and getting results standartized - get range from stack, and return number in EAX. But this must slowdown algo by ~2-4 clocks at maximum only.  :eek

Test attached in archive. Run it, please.



Alex

oex

AMD Sempron(tm) Processor 3100+ (SSE3)
2       cycles for Axrand
4       cycles for Axrand2
1       cycles for Axrand3
15      cycles for LimitRand
4       cycles for Rand()

2       cycles for Axrand
4       cycles for Axrand2
2       cycles for Axrand3
15      cycles for LimitRand
4       cycles for Rand()

2       cycles for Axrand
4       cycles for Axrand2
1       cycles for Axrand3
14      cycles for LimitRand
4       cycles for Rand()

2       cycles for Axrand
4       cycles for Axrand2
1       cycles for Axrand3
15      cycles for LimitRand
4       cycles for Rand()

2       cycles for Axrand
4       cycles for Axrand2
2       cycles for Axrand3
15      cycles for LimitRand
4       cycles for Rand()

37       bytes for Axrand
30       bytes for Axrand2
31       bytes for Axrand3
29       bytes for Rand()
Add 7-10 bytes per call

--- ok ---
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

Antariy

Quote from: oex on November 06, 2010, 10:11:21 PM
AMD Sempron(tm) Processor 3100+ (SSE3)

Hi, Peter!  :bg

AMD works with pushs-pops very well.
So, problem is not work with carry - it is - many memory references.



Alex

jj2007

Here is an update with my latest AxRand3 and Rand() versions. I have multiplied the loops by 10 to prevent unrealistically short timings. So the real timings are around 12...20 cycles:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
145     cycles for 10*Axrand
121     cycles for 10*Axrand3
201     cycles for 10*LimitRand
115     cycles for 10*Rand()

142     cycles for 10*Axrand
121     cycles for 10*Axrand3
200     cycles for 10*LimitRand
115     cycles for 10*Rand()

142     cycles for 10*Axrand
121     cycles for 10*Axrand3
201     cycles for 10*LimitRand
115     cycles for 10*Rand()

37       bytes for Axrand
27       bytes for Axrand3
25       bytes for Rand()
Add 7-10 bytes per call


New version of timings:
LOOP_COUNT = 1000000 ; One Mio would be a typical value
RepCt = 10 ; multiple timings prevent negative cycles ;-)
invoke Sleep, 50
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT RepCt
push 1000
call Axrand
mov ecx, eax
ENDM
counter_end
print str$(eax), 9, "cycles for "
print str$(RepCt), "*Axrand", 13, 10


And here my latest version:
Axrand3 proc STDCALL dwRange:DWORD ; 27 bytes short
mov eax, Initial ; get the seed
imul eax, -127 ; randomise
add eax, -124 ; -127/-124 is the perfect randomness pair
bswap eax ; we need to shuffle a bit to get the high order bits
mov Initial, eax ; save for next round
mul dword ptr [esp+4] ; multiply Initial with range
mov eax, edx ; return random number (could also be returned in edx)
ret 4
Axrand3 endp

Antariy

That is time for Hutch to make great real-time test-bed.  :bg

Timings:


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
209     cycles for 10*Axrand
203     cycles for 10*Axrand3
253     cycles for 10*LimitRand
176     cycles for 10*Rand()

209     cycles for 10*Axrand
198     cycles for 10*Axrand3
253     cycles for 10*LimitRand
174     cycles for 10*Rand()

208     cycles for 10*Axrand
204     cycles for 10*Axrand3
251     cycles for 10*LimitRand
172     cycles for 10*Rand()

37       bytes for Axrand
27       bytes for Axrand3
25       bytes for Rand()
Add 7-10 bytes per call




Alex

dedndave

you didn't put mine in there ?   :(