News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Random Generator

Started by Neil, June 17, 2009, 12:44:13 PM

Previous topic - Next topic

oex

AMD Sempron(tm) Processor 3100+ (SSE3)
144     cycles for 10*Axrand
99      cycles for 10*Axrand3
212     cycles for 10*LimitRand
102     cycles for 10*Rand()

97      cycles for 10*Axrand
91      cycles for 10*Axrand3
215     cycles for 10*LimitRand
99      cycles for 10*Rand()

96      cycles for 10*Axrand
95      cycles for 10*Axrand3
210     cycles for 10*LimitRand
102     cycles for 10*Rand()

37       bytes for Axrand
27       bytes for Axrand3
25       bytes for Rand()
Add 7-10 bytes per call

--- ok ---
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

Antariy

Quote from: dedndave on November 07, 2010, 12:11:39 AM
you didn't put mine in there ?   :(

Sorry, Dave, please!

Here it is:

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
208     cycles for 10*Axrand
204     cycles for 10*Axrand3
252     cycles for 10*LimitRand
293     cycles for 10*ASeed
176     cycles for 10*Rand()

210     cycles for 10*Axrand
203     cycles for 10*Axrand3
252     cycles for 10*LimitRand
291     cycles for 10*ASeed
170     cycles for 10*Rand()

211     cycles for 10*Axrand
205     cycles for 10*Axrand3
251     cycles for 10*LimitRand
292     cycles for 10*ASeed
179     cycles for 10*Rand()

37       bytes for Axrand
27       bytes for Axrand3
25       bytes for Rand()
Add 7-10 bytes per call




Alex


dedndave

thanks Alex
problem is....
you need to run it ~192 times to get an average time
that is because it reseeds once every 192 pulls (on average)
otherwise, it's going to look slower than it actually is

edit - and, i haven't tried to optimize it, at all - lol
i can extrapolate that it should be about 20 to 21 cycles

jj2007

With Dave's code and code sizes:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
143     cycles for 10*Axrand
121     cycles for 10*Axrand3
202     cycles for 10*LimitRand
270     cycles for 10*ASeed
115     cycles for 10*Rand()

142     cycles for 10*Axrand
121     cycles for 10*Axrand3
200     cycles for 10*LimitRand
270     cycles for 10*ASeed
115     cycles for 10*Rand()

143     cycles for 10*Axrand
121     cycles for 10*Axrand3
200     cycles for 10*LimitRand
268     cycles for 10*ASeed
115     cycles for 10*Rand()

37       bytes for Axrand
27       bytes for Axrand3
65       bytes for ASeed
47       bytes for LimitRand
25       bytes for Rand()


Antariy

Quote from: dedndave on November 07, 2010, 12:21:49 AM
thanks Alex
problem is....
you need to run it ~192 times to get an average time
that is because it reseeds once every 192 pulls (on average)

Dave, it was re-runned:

........
LOOP_COUNT = 1000000 ; One Mio would be a typical value


I.e. - 1 million times.  :bg

And reseeding is the part of the algo - so its timing should be included too. That is price of auto-reseeding - other algos does not have that feature.



Alex

dedndave

oh - ok - gotcha   :U
it should compare favorably in a randomness test

i can get it down to 27 cycles easily, i think   :bg

Antariy


dedndave

lol
well - that was a simplification
what i mean is, i haven't timed it to select faster instructions

Antariy

Quote from: dedndave on November 07, 2010, 12:21:49 AM
i can extrapolate that it should be about 20 to 21 cycles

Method of placing of many calls subsequently have many disadvantages. First - it is eat many resource for itself only.

Antariy

Quote from: dedndave on November 07, 2010, 12:30:12 AM
lol
well - that was a simplification
what i mean is, i haven't timed it to select faster instructions

I'm joking, Dave  :bg

Your code is good - it cannot be faster very much. QPC very heavy API.



Alex

oex

AMD Sempron(tm) Processor 3100+ (SSE3)
97      cycles for 10*Axrand
96      cycles for 10*Axrand3
229     cycles for 10*LimitRand
230     cycles for 10*ASeed
100     cycles for 10*Rand()

97      cycles for 10*Axrand
92      cycles for 10*Axrand3
246     cycles for 10*LimitRand
229     cycles for 10*ASeed
100     cycles for 10*Rand()

97      cycles for 10*Axrand
92      cycles for 10*Axrand3
248     cycles for 10*LimitRand
234     cycles for 10*ASeed
103     cycles for 10*Rand()

37       bytes for Axrand
27       bytes for Axrand3
65       bytes for ASeed
47       bytes for LimitRand
25       bytes for Rand()
Add 7-10 bytes per call

--- ok ---
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

Antariy


dedndave

results for a Prescott...
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
204     cycles for 10*Axrand
199     cycles for 10*Axrand3
245     cycles for 10*LimitRand
311     cycles for 10*ASeed
170     cycles for 10*Rand()

204     cycles for 10*Axrand
198     cycles for 10*Axrand3
246     cycles for 10*LimitRand
316     cycles for 10*ASeed
170     cycles for 10*Rand()

203     cycles for 10*Axrand
198     cycles for 10*Axrand3
274     cycles for 10*LimitRand
312     cycles for 10*ASeed
174     cycles for 10*Rand()

FORTRANS

Hi,

   Well, here are my timings.  One from Alex and one from
jj2007.  I must say that I am a bit surprised by the slow
MCW timings.  I tried a variant that commented out the
ADC to see if that was the culprit.

Regards

Steve N.


PIII Win 2000.
Reply #129  Test.zip
pre-P4 (SSE1)
9   cycles for Axrand
11   cycles for Axrand2
9   cycles for Axrand3
28   cycles for LimitRand
9   cycles for Rand()

9   cycles for Axrand
11   cycles for Axrand2
9   cycles for Axrand3
28   cycles for LimitRand
8   cycles for Rand()

9   cycles for Axrand
11   cycles for Axrand2
9   cycles for Axrand3
28   cycles for LimitRand
8   cycles for Rand()

9   cycles for Axrand
11   cycles for Axrand2
9   cycles for Axrand3
28   cycles for LimitRand
8   cycles for Rand()

9   cycles for Axrand
11   cycles for Axrand2
9   cycles for Axrand3
29   cycles for LimitRand
8   cycles for Rand()

37    bytes for Axrand
30    bytes for Axrand2
31    bytes for Axrand3
29    bytes for Rand()
Add 7-10 bytes per call

--- ok ---
Above edited to comment ADC.
pre-P4 (SSE1)
9   cycles for Axrand
11   cycles for Axrand2
9   cycles for Axrand3
27   cycles for LimitRand
8   cycles for Rand()

9   cycles for Axrand
11   cycles for Axrand2
9   cycles for Axrand3
27   cycles for LimitRand
9   cycles for Rand()

9   cycles for Axrand
11   cycles for Axrand2
9   cycles for Axrand3
27   cycles for LimitRand
8   cycles for Rand()

9   cycles for Axrand
11   cycles for Axrand2
9   cycles for Axrand3
27   cycles for LimitRand
8   cycles for Rand()

9   cycles for Axrand
11   cycles for Axrand2
9   cycles for Axrand3
27   cycles for LimitRand
8   cycles for Rand()

37    bytes for Axrand
30    bytes for Axrand2
31    bytes for Axrand3
29    bytes for Rand()
Add 7-10 bytes per call

--- ok ---
Reply #138  axRandTimings.zip
pre-P4 (SSE1)
133   cycles for 10*Axrand
126   cycles for 10*Axrand3
194   cycles for 10*LimitRand
205   cycles for 10*ASeed
119   cycles for 10*Rand()

133   cycles for 10*Axrand
126   cycles for 10*Axrand3
195   cycles for 10*LimitRand
205   cycles for 10*ASeed
120   cycles for 10*Rand()

133   cycles for 10*Axrand
125   cycles for 10*Axrand3
194   cycles for 10*LimitRand
206   cycles for 10*ASeed
120   cycles for 10*Rand()

37    bytes for Axrand
27    bytes for Axrand3
65    bytes for ASeed
47    bytes for LimitRand
25    bytes for Rand()
Add 7-10 bytes per call

--- ok ---

FORTRANS

Hi,

   Tried an excursion.  Compare to above results.  Some
improvement.

Regards,

Steve


.data
ALIGN 4
Rand32  DD      31415926        ; a la K
        DD      1013904223      ; As per NR RandC
RandA   DD      1517746329      ; See RandMWC



pre-P4 (SSE1)
133   cycles for 10*Axrand
125   cycles for 10*Axrand3
163   cycles for 10*LimitRand
200   cycles for 10*ASeed
120   cycles for 10*Rand()

133   cycles for 10*Axrand
126   cycles for 10*Axrand3
162   cycles for 10*LimitRand
200   cycles for 10*ASeed
120   cycles for 10*Rand()

134   cycles for 10*Axrand
126   cycles for 10*Axrand3
162   cycles for 10*LimitRand
200   cycles for 10*ASeed
120   cycles for 10*Rand()

37    bytes for Axrand
27    bytes for Axrand3
65    bytes for ASeed
47    bytes for LimitRand
25    bytes for Rand()
Add 7-10 bytes per call

--- ok ---