I was reading some recent posts & came across one about generating random sequences, I think it was jj who said that you need an outside influence to create truly random sequences. To that end I've adapted a routine from one of my programs whereby the keyboard is the outside influence, I've called it Password generator (it's not a cracker). The choice is 6 to 18 characters & these are generated from upper & lower case characters + the digits 0 to 9. I have made it so that each character can only be selected once, so to that end it is not truly random, more of a demo of what can be achieved. 12 'Passwords' can be saved of varying lengths so that they can be compared. If the moderators are happy with this post i'll upload the source.
[attachment deleted by admin]
Doesn't seem to be any complaints, so here's the source.
Use 'Console Assemble & Link' to build.
[attachment deleted by admin]
I've had another look at this program & decided that It could be made useful (my opinion). To that end I've removed the code that only allowed one instance of a character so that now it is about as random as you can get. I've also added a routine that allows you to save a password to a text file which makes it easy to copy & paste. Inside the zip file is a self extracting exe file that places the program in a folder in the root of drive C, no other changes are made so to remove it just delete the folder named 'Password'.
If anyone would like the source I'll be happy to post it.
http://www.sctzine.com/rastgele-sayi-ureteci-ve-gettickcount-rdtsc-nrandom-fonksiyonlari/
A random generator with sources. This is for newbies. Maybe useful!
Neil,
It depends on how good a random sequence you need, the best stuff is external, the sound of the universe, radio noise, wind patterns on microphones etc .... but you will come close if you use a large external source and a pair of different random algos that repeatedly re-seed each other. The crypto guys are the fussiest and the actions is always on making the sequence unreproducable. Long ago I played with a 256 square pad that you scribbled on with the mouse until you got the sample size you wanted, it then checked if the range was large enough and rejected the result if it was too small.
If you have not already got it, get John Walker's ENT program, it is on the forum web site, does very useful analysis on random sequences.
There is also a random generator in the crt library (c++ express 10) with source code in c.
Name rand_s.c.
Quote from: silentenigma on October 08, 2010, 07:14:11 PM
http://www.sctzine.com/rastgele-sayi-ureteci-ve-gettickcount-rdtsc-nrandom-fonksiyonlari/
A random generator with sources. This is for newbies. Maybe useful!
Not that useful, actually. Apart from the source being non-English, the code simply uses the Masm32
nrandom function that is documented in \masm32\help\masmlib.chm
Thanks for pointing out the link hutch, I've got it & will look at it when I've got time.
Going back to random, most peoples perception of random is nothing like reality. I remember a few years ago one of the universities did a test which went like this :-
A large number of students were given a blank sheet of A4 paper & told to put 1000 random dots on it, almost without fail they covered the sheet with a pretty uniform pattern of dots. Then they were given another sheet & told to put one dot anywhere on it, these dots were then combined onto one sheet & the result was very different. The sheet had clusters of dots with large blank spaces in between. This is how random works in the real world.
Quote from: hutch-- on October 09, 2010, 07:23:07 AM
It depends on how good a random sequence you need, the best stuff is external, the sound of the universe, radio noise, wind patterns on microphones etc...
Unless you need extreme security, I guess the (GetTickCount and 255) you get after a Sleep(0) is already pretty "random". Multiply it with PI, take the (whatever and 15)th decimal after the comma, and there you are with a namber between 0 and 9. Note this is not a high speed random generator, as Sleep(0) costs quite a bit of cycles.
on the subject of random number generators i've just done a quick asm translation of this version of the well512 rng http://www.lomont.org/Math/Papers/2008/Lomont_PRNG_2008.pdf
/* initialize state to random bits */
static unsigned long state[16];
/* init should also reset this to 0 */
static unsigned int index = 0;
/* return 32 bit random number */
unsigned long WELLRNG512(void)
{
unsigned long a, b, c, d;
a = state[index];
c = state[(index+13)&15];
b = a^c^(a<<16)^(c<<15);
c = state[(index+9)&15];
c ^= (c>>11);
a = state[index] = b^c;
d = a^((a<<5)&0xDA442D20UL);
index = (index + 15)&15;
a = state[index];
state[index] = a^b^d^(a<<2)^(b<<18)^(c<<28);
return state[index];
}
btw its only been quickly tested as i've not got round to writing a proc to seed the state so heres hoping there are no glaring errors:)
.data
state DWORD 16 DUP(?) ;make sure you initialise the state to random bits
index DWORD 0 ;at init you should reset this to 0
;return 32 bit random number in eax
.code
WELLRNG512 PROC USES ebx ecx edx esi
LOCAL tmp1:DWORD, tmp2:DWORD, tmp3:DWORD
mov esi, index
mov eax, state[esi*4]
add esi, 13
and esi, 15
mov ecx, state[esi*4]
mov tmp1, eax
shl tmp1, 16
mov tmp2, ecx
shl tmp2, 15
mov ebx, eax
xor ebx, ecx
xor ebx, tmp1
xor ebx, tmp2
mov esi, index
add esi, 9
and esi, 15
mov ecx, state[esi*4]
mov tmp1, ecx
shr tmp1, 11
xor ecx, tmp1
mov esi, index
mov tmp1, ebx
xor tmp1, ecx
mov eax, tmp1
mov state[esi*4], eax
shl tmp1, 5
and tmp1, 0DA442D20h
xor tmp1, eax
mov edx, tmp1
add index, 15
and index, 15
mov esi, index
mov eax, state[esi*4]
mov tmp1, eax
shl tmp1, 2
mov tmp2, ebx
shl tmp2, 18
mov tmp3, ecx
shl tmp3, 28
xor eax, ebx
xor eax, edx
xor eax, tmp1
xor eax, tmp2
xor eax, tmp3
mov state[esi*4], eax
ret
WELLRNG512 ENDP
END
:bg
JJ,
you would like my "quick and dirty" technique with GetTickCount. Convert it to string, reverse the string then chop a random offset of characters from the left side of the string and convert it back to an integer. I would not use it for crypto but it seems to do most things well and it is very hard to reproduce. I use this as a seeding technique to feed two different random algos that reseed each other while producing the random pad I am after.
Hutch,
Sounds convincing. Why not start a little competition? Attached is a testbed that returns a number between 0 and 15 and counts its frequency:
15 499766
14 500351
13 499320
12 499228
11 499175
10 499827
9 500228
8 500401
7 499839
6 500762
5 499522
4 500195
3 500908
2 500337
1 500529
0 499613
Code size is 72 bytes only, and it's pretty fast, too.
Here is another test bed that in addition to the distribution also checks the arithmetic mean, and does a scatter plot. This version tests only Rand15.
;====================================================================
; Build as console app so print and printf will work.
;====================================================================
include \masm32\include\masm32rt.inc
.686
;====================================================================
.data?
array dd 100000 dup(?)
.data
result REAL8 ?
counts dd 16 dup(0)
hInstance dd 0
hDlg dd 0
hdc dd 0
msg MSG <>
rc RECT <>
sz db 30 dup(0)
ctr dd 0
.code
;====================================================================
Rand15 proc base:DWORD
.data?
r9Seed dd ?
.code
mov eax, r9Seed
push edx
push ecx
test eax, eax
jne @F
invoke Sleep, eax ; spend some cycles here and there
invoke GetTickCount ; get a seed value
mov r9Seed, eax
@@: ffree st(7) ; free a rarely used FP register
mov ecx, 0FFFFFFH
and eax, ecx
inc eax ; make sure it's not zero
push eax
fldpi ; push PI on FPU
fimul dword ptr [esp] ; 3.14159*[1...edx+1]
rdtsc ; we'll use only the loword
and eax, ecx ; and 0FFFFFFH
inc eax ; make sure it's not zero
mov [esp], eax ; second external influence is rdtsc
fimul dword ptr [esp] ; *(rdtsc and 0FFFFFFH)
pop eax
fstp r9Seed ; now a cheap trick: pop a Real4 from the FPU ...
mov eax, r9Seed ; ... but mov an integer ;-)
;and eax, 15 ; return a number between 0 and 15
xor edx, edx
mov ecx, base
div ecx
mov eax, edx
pop ecx ; restore two
pop edx ; precious registers
ret
Rand15 endp
;====================================================================
;-------------------------------------------------------
; This procedure calculates the arithmetic mean for the
; specified dword array and leaves the result in ST(0).
;-------------------------------------------------------
amean proc pddArray:DWORD, count:DWORD
LOCAL sum:QWORD
mov DWORD PTR [sum], 0
mov DWORD PTR [sum+4], 0
mov edx, pddArray
mov ecx, count
dec ecx
@@:
mov eax, [edx+ecx*4]
add DWORD PTR [sum], eax
adc DWORD PTR [sum+4], 0
dec ecx
jns @B
finit
fild sum
fild count
fdiv
ret
amean endp
;====================================================================
DialogProc proc hwndDlg:DWORD, uMsg:DWORD, wParam:DWORD, lParam:DWORD
SWITCH uMsg
CASE WM_CTLCOLORDLG
invoke GetStockObject, WHITE_BRUSH
ret
CASE WM_SIZE
invoke InvalidateRgn, hwndDlg, NULL, TRUE
CASE WM_CLOSE
invoke DestroyWindow, hwndDlg
CASE WM_DESTROY
invoke PostQuitMessage, NULL
ENDSW
xor eax, eax
ret
DialogProc endp
;====================================================================
start:
;====================================================================
xor ebx, ebx
.WHILE ebx < 8000000
invoke Rand15, 16
inc DWORD PTR [counts+eax*4]
inc ebx
.ENDW
N=0
REPEAT 16
print str$(counts+N),13,10
N=N+4
ENDM
xor ebx, ebx
.WHILE ebx < 100000
invoke Rand15, 16
mov [array+ebx*4], eax
inc ebx
.ENDW
invoke amean, ADDR array, 100000
fstp result
invoke crt_printf, cfm$("\n%f\n"), result
invoke GetModuleHandle, NULL
mov hInstance, eax
Dialog "Test", \
"FixedSys", 11, \
WS_VISIBLE or WS_OVERLAPPEDWINDOW or \
DS_CENTER, \
0, \
0,0,100,75, \
1024
CallModelessDialog hInstance, 0, DialogProc, NULL
mov hDlg, eax
invoke GetDC, hDlg
mov hdc, eax
invoke Sleep, 1000
msgLoop:
invoke PeekMessage, ADDR msg, NULL, 0, 0, PM_REMOVE
.IF msg.message != WM_QUIT
.IF eax != 0
invoke TranslateMessage, ADDR msg
invoke DispatchMessage, ADDR msg
.ELSE
invoke GetClientRect, hDlg, ADDR rc
inc ctr
cmp ctr, 1000
jb @f
invoke szappend, ADDR sz, chr$(" "), 0
mov esi, eax
invoke szappend, ADDR sz, str$(rc.right), esi
mov esi, eax
invoke szappend, ADDR sz, chr$(" x "), esi
mov esi, eax
invoke szappend, ADDR sz, str$(rc.bottom), esi
invoke SetWindowText, hDlg, ADDR sz
mov ctr, 0
@@:
invoke Rand15, rc.right
mov ebx, eax
;------------------------------------------
; Avoid a divide by zero when user resizes
; the dialog such that rc.bottom == 0.
;------------------------------------------
.IF rc.bottom == 0
mov rc.bottom, 1
.ENDIF
invoke Rand15, rc.bottom
mov esi, eax
invoke Rand15, 1 SHL 24
mov edi, eax
invoke SetPixel, hdc, ebx, esi, edi
.ENDIF
jmp msgLoop
.ENDIF
invoke ExitProcess, eax
;====================================================================
end start
heres the well512 random number generator i posted the other day:)
btw its been tested and its output matches the output from the reference implementation
anyway some info...
QuoteWELL Algorithm
Matsumoto (co-creator of the Mersenne Twister), L'Ecuyer (a major RNG researcher),
and Panneton introduced another class of TGFSR PRNGs in 2006 [Panneton06]. These
algorithms produce numbers with better equidistribution than MT19937 and improve
upon "bit-mixing" properties. WELL stands for "Well Equidistributed Long-period
Linear," and they seem to be better choices for anywhere MT19937 is currently used.
They are fast, come in many sizes, and produce higher quality random numbers.
WELL period sizes are presented for period 2^n for n = 512, 521, 607, 800, 1024, 19937,
21701, 23209, and 44497, with corresponding state sizes. This allows a user to trade
period length for state size. All run at similar speed. 2^512 is about 10^154, and it is
unlikely any video game will ever need that many random numbers, since it is far larger
then the number of particles in the universe. The larger periods ones aren't really needed
except for computation like weather modeling or earth simulations. A standard PC needs
over a googol7 of years to count to 2^512.
A significant place the WELL PRNGs perform better than MT19937 is in escaping states
with a large number of zeros. If MT19937 is seeded with many zeros, or somehow falls
into such a state, then the generated numbers have heavy bias towards zeros for a many
iterations. The WELL algorithms behave much better, escaping zero bias states quickly.
The only downside is that they are slightly slower than MT19937, but not much. The
upside is the numbers are considered to be higher quality, and the code is significantly
simpler. Here is WELL512 C/C++ code written by the author and placed in the public
domain8. It is about 40% faster than the code presented on L'Ecuyer's site, and is about
40% faster than MT19937 presented on Matsumoto's site.
here the ent results for 200000 random dword generated by the well512 algo
QuoteEntropy = 7.999756 bits per byte.
Optimum compression would reduce the size
of this 800000 byte file by 0 percent.
Chi square distribution for 800000 samples is 270.10, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bytes is 127.5372 (127.5 = random).
Monte Carlo value for Pi is 3.146257866 (error 0.15 percent).
Serial correlation coefficient is -0.000857 (totally uncorrelated = 0.0).
and the same file using the -b flag
QuoteEntropy = 1.000000 bits per bit.
Optimum compression would reduce the size
of this 6400000 bit file by 0 percent.
Chi square distribution for 6400000 samples is 0.15, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bits is 0.4999 (0.5 = random).
Monte Carlo value for Pi is 3.146257866 (error 0.15 percent).
Serial correlation coefficient is 0.000212 (totally uncorrelated = 0.0).
Well, this is my random numeric generator, which is not very fast, and not very big: :P
.data
Initial dd "RAND"
.code
option prologue:none
option epilogue:none
Axrand proc STDCALL dwRange:DWORD
xor edx,edx
mov eax,Initial
mov ecx,eax
div dword ptr [esp+4]
push edx
mov eax,ecx
xor eax,dword ptr [esp+4] ; just one more seed
imul eax,12347
add eax,17
bswap eax
mov Initial,eax
pop eax
ret 4
Axrand endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
Of course, this is not "great thing", but with using of ENT.EXE, I got these results of generation of 200,000 random numbers with using of DWORD range in all calls:
Entropy = 7.999750 bits per byte.
Optimum compression would reduce the size
of this 800000 byte file by 0 percent.
Chi square distribution for 800000 samples is 277.21, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bytes is 127.6224 (127.5 = random).
Monte Carlo value for Pi is 3.140137850 (error 0.05 percent).
Serial correlation coefficient is -0.001622 (totally uncorrelated = 0.0).
With -b switch
Entropy = 1.000000 bits per bit.
Optimum compression would reduce the size
of this 6400000 bit file by 0 percent.
Chi square distribution for 6400000 samples is 2.16, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bits is 0.5003 (0.5 = random).
Monte Carlo value for Pi is 3.140137850 (error 0.05 percent).
Serial correlation coefficient is 0.000450 (totally uncorrelated = 0.0).
So, have look at results - they not so bad :bg
Hutch, I'm allow using of this algo in any case as you want.
But using and "optimization" of this algo is strictly forbidden for hater of "asian lamers" :green2
Alex
P.S. Code size is 40 bytes.
Initial seed at programmers responsibility. When I made test in post above, I don't made initial seed at all. And this is not make algo worst :P
Why used DIV, not "reversed MUL"? Well, couple years ago, I rewrite this algo from ASM to C, and I been wanted to use only language constructions. So, modulo of division is one from that. Initially algo been slightly different.
Alex
Excellent, Alex! The div is slow, but I don't see any alternative. Have a look at the variant, it saves ecx.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
32 cycles for Axrand
30 cycles for Axrand2
31 cycles for Axrand
30 cycles for Axrand2
40 bytes for Axrand
38 bytes for Axrand2
Quote from: jj2007 on October 24, 2010, 01:30:26 AM
Excellent, Alex! The div is slow, but I don't see any alternative. Have a look at the variant, it saves ecx.
If change DIV to MUL, the algorithm almost the same - HIGH DWORD of resulting QWORD in EDX would contain number between
0 and
Range-1. What is needed.
Alex
There is slightly optimized version, which is used MUL instead DIV, 2-3 times faster than previous (but not portable to HLL).
.data
Initial dd "RAND"
.code
option prologue:none
option epilogue:none
Axrand proc STDCALL dwRange:DWORD
mov eax,Initial
mul dword ptr [esp+4]
mov eax,edx
mov edx,Initial
xor edx,dword ptr [esp]
imul edx,12347
add edx,17
bswap edx
mov Initial,edx
ret 4
Axrand endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
There is test of 200,000 randomly generated DWORDs in file, with ENT.EXE:
Entropy = 7.999761 bits per byte.
Optimum compression would reduce the size
of this 800000 byte file by 0 percent.
Chi square distribution for 800000 samples is 264.80, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.6013 (127.5 = random).
Monte Carlo value for Pi is 3.141307853 (error 0.01 percent).
Serial correlation coefficient is -0.001857 (totally uncorrelated = 0.0).
With -b switch:
Entropy = 1.000000 bits per bit.
Optimum compression would reduce the size
of this 6400000 bit file by 0 percent.
Chi square distribution for 6400000 samples is 2.58, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bits is 0.5003 (0.5 = random).
Monte Carlo value for Pi is 3.141307853 (error 0.01 percent).
Serial correlation coefficient is 0.000411 (totally uncorrelated = 0.0).
EDITED: In case of randomness, these results is slightly better than in DIV version.
Test by clocks:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
16 cycles for Axrand
47 cycles for Axrand2
16 cycles for Axrand
46 cycles for Axrand2
40 bytes for Axrand
38 bytes for Axrand2
EDITED: Axrand - is new proc, with using of MUL; Axrand2 - is old proc, changed by Jochen to save ECX.
New proc with using MUL also saves an ECX.
Alex
Alex,
At 10 cycles running on a P3, your latest version is the fastest generator (of apparently good random numbers) that I have tested. The attachment contains the results for George Marsaglia's Diehard tests, available here (http://www.stat.fsu.edu/pub/diehard/).
Perfect! I attach your new version, Alex.
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
9 cycles for Axrand
30 cycles for Axrand2
alex, did you implement this prng yourself or is it from an existing algorithm?
Quote from: brethren on October 25, 2010, 08:55:41 PM
alex, did you implement this prng yourself or is it from an existing algorithm?
I have implemented this algo myself. Of course, I cannot know - maybe someone on other side of Earth implement this algo, too.
To be honest, I have implemented this algo much before I have good connection to the Internet, and at that days I don't made search for existence of this algo - I just have no time for that.
Hopefully, this is my own, copyrighted algo :bg
Alex
Quote from: MichaelW on October 25, 2010, 07:38:17 PM
At 10 cycles running on a P3, your latest version is the fastest generator (of apparently good random numbers) that I have tested. The attachment contains the results for George Marsaglia's Diehard tests, available here (http://www.stat.fsu.edu/pub/diehard/).
Thanks,
Michael !
Thanks again - I have downloaded sources of pointed program, too.
Thanks
Jochen!
There is a timings for the last your archive posted:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
16 cycles for Axrand
47 cycles for Axrand2
40 bytes for Axrand
38 bytes for Axrand2
Alex
alex, would you mind if i used axrand in my own library?
Quote from: brethren on October 26, 2010, 01:15:08 PM
alex, would you mind if i used axrand in my own library?
Which library do you made? Can I know some details? Just curious.
Of course, you can use it in good project, if you are write about copyright of it somewhere. :bg
Alex
nothing special, just my own library of optimized procedures that i use for cross-platform stuff:)
Hi,
Just as a comparison, here are the ENT results from a
simple 32-bit Linear Congruential "Random" number Generator.
Partly initialized using the current time. First one million bytes
using the high byte from the DWORD random number. A
fairly good cheap pseudorandom value. Next a million bytes
using the low byte. A known bad sequence as it repeats
every 256 values.
Regards,
Steve N.
One Million high bytes from DWORD (LCG) Linear Congruential "Random"
number Generator derived from Knuth/Numerical Recipies.
ENT.EXE results.
Entropy = 7.999825 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.
Chi square distribution for 1000000 samples is 242.07, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.5479 (127.5 = random).
Monte Carlo value for Pi is 3.143268573 (error 0.05 percent).
Serial correlation coefficient is -0.000500 (totally uncorrelated = 0.0).
One Million low bytes from DWORD LCG Linear Congruential "Random"
number Generator derived from Knuth/Numerical Recipies. (Known
poor to bad.)
ENT.EXE results.
Entropy = 8.000000 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.
Chi square distribution for 1000000 samples is 0.01, and randomly
would exceed this value 99.99 percent of the times.
Arithmetic mean value of data bytes is 127.4999 (127.5 = random).
Monte Carlo value for Pi is 3.218724875 (error 2.46 percent).
Serial correlation coefficient is -0.049491 (totally uncorrelated = 0.0).
QuoteA known bad sequence as it repeats every 256 values.
that can depend on the constants used, no ?
what multiplier and addend did you use for the test routine ?
Hi Dave,
Yes that "feature" can be changed. In the general case a
linear congruential generator is of the form;
RandN+1 = MOD( (RandN * MulConst + AddConst), Modulus )
In this case I used a modulo of 2**32 as it comes for free
with the X86 multiply instruction. And that is were the short run
repeating comes from. I have a program that shows the
kind of repetition you can expect. First bit alternates. Second
bit, cycle of four. And so on.
Knuth showed some code that uses 2**N + 1 and 2**N - 1
as a modulus that avoids that kind of obvious repetitive behavior.
The code is somewhat more complex though. Using a prime
modulus helps as well. But that requires an explicit modulo
operation in most cases, which messes with the quick and easy
(dirty?) philosophy. And they only make the repeat cycle to
not be as obvious to humans. See Knuth's AoCP for the real
nitty gritty.
The multiplier is 1664525 and the additive constant is
1013904223. These were used in "Numerical Recipes the Art
of Scientific Computing". Knuth analyzed a number of
multipliers for common word sizes, and that was one of the
better 32-bit ones. He says the additive part just has to
be nonzero.
Does that help, or just muddy things?
Regards,
Steve N.
This is an attempt to determine if each of the byte values generated by Alex's generator is equally probable.
;=========================================================================
include \masm32\include\masm32rt.inc
;=========================================================================
.data
mean REAL8 ?
total dq 0
min dd -1
max dd 0
range dd 0
counts dd 256 dup(0)
.code
;=========================================================================
.data
align 4
Initial dd "RAND"
.code
option prologue:none
option epilogue:none
align 4
Axrand proc STDCALL dwRange:DWORD
mov eax,Initial
mul dword ptr [esp+4]
mov eax,edx
mov edx,Initial
xor edx,dword ptr [esp]
imul edx,12347
add edx,17
bswap edx
mov Initial,edx
ret 4
Axrand endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;=========================================================================
start:
;=========================================================================
N = 1000000000
;------------------------------------------------
; Allocate a buffer and generate N random bytes.
;------------------------------------------------
mov esi, alloc(N)
xor ebx, ebx
.WHILE ebx < N/4
invoke Axrand, 0ffffffffh
mov [esi+ebx*4], eax
inc ebx
.ENDW
;--------------------------------------
; Count the occurrences of the values.
;--------------------------------------
xor ebx, ebx
.WHILE ebx < N
movzx eax, BYTE PTR [esi+ebx]
inc DWORD PTR [counts+eax*4]
inc ebx
.ENDW
;-----------------------------------------------------------
; Get and display the minimum and maximum counts, the range
; between the minimum and maximum, and the mean count.
;-----------------------------------------------------------
xor ebx, ebx
.WHILE ebx < 256
mov eax, [counts+ebx*4]
.IF eax < min
mov min, eax
.ENDIF
.IF eax > max
mov max, eax
.ENDIF
add DWORD PTR total, eax
adc DWORD PTR total+4, 0
inc ebx
.ENDW
mov eax, max
sub eax, min
mov range, eax
fild total
push 256
fild DWORD PTR [esp]
add esp, 4
fdiv
fstp mean
invoke crt_printf, cfm$("%d\t%d\t%d\t%.3f\n"), min, max, range, mean
free esi
inkey "Press any key to exit..."
exit
;=========================================================================
end start
As coded, I get these results:
3901388 3912076 10688 3906250.000
And with a 50% increase in N:
5852804 5867231 14427 5859375.000
The mean in both cases is where it should be, and the range shrunk in proportion to N, so the minimum and maximum do appear to be converging on the mean. I would be interested to see the results for a much larger N (that my wimpy system cannot reasonably handle).
Quote from: FORTRANS on October 27, 2010, 01:29:23 PM
Hi,
Just as a comparison, here are the ENT results from a
simple 32-bit Linear Congruential "Random" number Generator.
Partly initialized using the current time. First one million bytes
using the high byte from the DWORD random number.
Hi, Steve!
Just note - when I make test for Axrand - I'm using all bytes from resulting DWORD - not any selective bytes.
I have just:
invoke Axrand,-1
and write *full* EAX to the output file.
After that I test resulting file with ENT.EXE
Alex
Hi Alex,
Yes, testing a DWORD would be the best plan. But ENT
does bytes. And the results I posted shows that the low byte
is bad, while the high byte is rather good. For most uses of
random numbers the DWORD should be good. Only if your
application depends on the low byte(s) separate from the rest
of the number should this generator be considered suspect.
Of course it also shows why LCG is not considered optimal.
Anyway, for what it's worth here is the ENT results for
250,000 DWORDs. If anybody cares I can do separate byte
values for the other two bytes in the DWORD.
Regards,
Steve N.
250,000 DWORDs as 1 million bytes
ENT.EXE results.
Entropy = 7.999902 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.
Chi square distribution for 1000000 samples is 135.95, and randomly
would exceed this value 99.99 percent of the times.
Arithmetic mean value of data bytes is 127.5447 (127.5 = random).
Monte Carlo value for Pi is 3.140940564 (error 0.02 percent).
Serial correlation coefficient is -0.000114 (totally uncorrelated = 0.0).
not considered optimal, thats an understatement:)
QuoteIf the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random.
Hi,
And in my earlier post I showed it got 50% if you treat
them one at a time. When you treat them four at a time,
or use the wrong parts, it fails.
Cheers,
Steve
it seems to me that the test, itself, needs a bit of an update
it was designed to make measurements on random byte values
trying to run dwords through there, no matter how you do it, is like jamming a square peg into a round hole
on the flip side, we could design byte-oriented random generators
that seems stupid because, these days, we all want to use dwords or even larger
the only thing that makes sense is to re-vamp the test program to make measurements on dword files
This does seem like a weird random discussion to me though I am no maths expert.... How can a byte stream be random and the same stream in dwords not? Surely the test is duff unless you arent using the full range maybe in which case your randomness algorithm is incomplete?
i think they are comparing apples to orangutans :lol
Hi Dave,
Did my earlier answer actually answer your question?
Quote from: dedndave on October 28, 2010, 08:29:14 PM
it seems to me that the test, itself, needs a bit of an update
it was designed to make measurements on random byte values
trying to run dwords through there, no matter how you do it, is like jamming a square peg into a round hole
Quite right, that was a point I tried to make. Trying to answer Alex
seems to stirred up some silt. Looks like a weekend (weakened?)
project. I wrote some statistics software back when. And I can look
at Knuth's set of tests, though if I remember, I kind of glazed over
looking at them.
Regards,
Steve
Hi,
Quote from: oex on October 28, 2010, 08:39:35 PM
How can a byte stream be random and the same stream in dwords not?
Because this byte stream is made up of random bytes, it is random.
Because the bytes in this DWORD stream are not independent, the byte
stream made from the DWORDs need not (and in this case is not) very
random. Even if the DWORDs are perfectly good random DWORDs.
Regards,
Steve N.
Steve,
Could post your code? It seems to me that if the bytes don't form a random sequence then the dwords won't either. Both will be biased in some way.
oh yes, Steve - it did
you have to admit - the LCG method is simple and fast
i think it makes a decent generator if you re-seed it from time-to-time
Quote from: FORTRANS on October 28, 2010, 09:27:50 PM
Quote from: oex on October 28, 2010, 08:39:35 PM
How can a byte stream be random and the same stream in dwords not?
Because this byte stream is made up of random bytes, it is random.
Because the bytes in this DWORD stream are not independent, the byte
stream made from the DWORDs need not (and in this case is not) very
random. Even if the DWORDs are perfectly good random DWORDs.
I was never any good at probabilities I think I'll just have to accept I cannot understand this.... If there are patterns in a DWORD stream (ie it is not random) I do not see how the same stream in BYTE form cannot still have the same patterns (Except strung out every 4 bytes).... Unless the full DWORD range is not used in which case there is no point in analysing a DWORD of data at a time in the first place....
It just seems to me that you are looking at a piece of paper in a 3D world from the side on and delaring the world 2D.... Surely random is random in any dimension or vunerable to very simple attack and not really random at all? (Note. This is a genuine question not a rhetorical question)
It seems to me according to your logic that if I looked at that BYTE stream in BITS it would no longer be random? Alternatively if you were to use more than one BYTE the second BYTE would not be random based on the first....
:lol I'm probably just confusing myself (and everybody else) at this stage.... I am kinda interested in this but I'll try and pick up some scraps as you guys discuss it :bg
Hi,
Yesterday I replied from my parent's machine using IE that
was not cooperating with entering text, and may have not been
clear on some points. Sorry. I will make a last attempt to clean
things up now that I can edit text again in Netscape. Sigh.
In Reply #28 the ENT results were for byte streams created
from bytes taken from DWORDs created by a LCG algorithm.
One good using the high byte. One bad using the low byte.
Things seem to have gone south from there.
In Reply #32, Alex mentioned using all bytes in the DWORD
in his algorithm. I use all bytes in the LCG, but reported only
byte values in #28, so I reported DWORD ENT results in
Reply #33, noting that the results would appear bad. I should
have emphasized that the DWORDs are correlated as to the bytes
that are contained in the DWORD. That is a consequence of
the LCG algorithm and why it "is not optimum". An optimum
algorithm would not only generate decent pseudorandom values,
it would not have discernable internal structure.
brethren in Reply #34 made a comment on that comment
of mine that I tried to clarify in Reply #35 referring back to #28
and the "good" byte stream. Probably a mistake as it was too
brief?
Reply #36, #38, and #39 tried to point out the DWORD and
byte streams are different and should not be confused. Reply
#40 should have mentioned that the "byte stream" was from the
first part of Reply #28 and the DWORD stream was from #33.
MichaelW, I have had 16-bit and 32-bit assembly, plus FORTRAN
code from Numerical Recipes (and I thought my own?). All produce
the same results. Most of that is on a "dead" machine of one sort
or another. 16-bit code appended to the end, pseudo-code is;
Rand
N+1 = 1664525 * Rand
N + 1013904223
plus an initial value, plus the current time. The DWORDs, considered
as DWORDs, are a fairly good random sequence, if used properly. If
used as a stream of bytes, it is crummy. You can possibly find
a way to invalidate the use of the DWORDs, which is why more
complex algorithms are now used. For simple cases it works well.
Reply #43
QuoteSurely random is random in any dimension or vulnerable to very simple attack and not really random at all? (Note. This is a genuine question not a rhetorical question)
Right. Pseudorandom is not random. Simple is not infallible.
I was trying to show a simple algorithm to generate a decent
"random" sequence as cheaply as possible. As opposed to a
"costly approach". The simple route was muddied by the ENT
results? As I am not making sense to you (sorry), maybe "The
Art of Computer Programming", Volume 2 / Seminumerical
Algorithms, Chapter 3, Random Numbers, by Donald E. Knuth
will help. I have the second edition of 1981 and rather liked it.
Regards all,
Steve N.
DATASEG SEGMENT PUBLIC
; - - - - - -
; "Random number routine" from Knuth/Numerical Recipies
;RandLCG: ; Linear Congruential "Random" number Generator from Knuth/Numerical Recipies
Rand32 DD 31415926 ; a la K, Use zero to test NR results
RandA DD 1664525 ; As per NR
RandC DD 1013904223 ; As per NR
OutputFile DB 'RandLGC.Out', 0
DATASEG ENDS
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CODE SEGMENT USE16 PUBLIC
ASSUME CS:CODE, DS:DATASEG, SS:STCKSEG
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Initialize Random Number generator...
RInit:
SCALL GTIME ; Macro to call MS-DOS Time function.
XOR Word Ptr [Rand32],DX
XOR Word Ptr [Rand32+2],CX
RET
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Linear Congruential "Random" number Generator from Knuth/Numerical Recipies
; IDUM = IDUM*IA + IC = MOD ( IDUM*IA + IC ), IM )
;
; Because this will run on an HP200LX, use 16 bit math and discard
; the high order results.
; 30 January 2001, SRN
; 8 July 2004, tidy up... [Rand32+0]*[RandA+0] => [I1][LOR]
; [Rand32+0]*[RandA+2] => [I3][I2]
; [Rand32+2]*[RandA+2] => [I5][I4]
; 9 July 2004, conform to Numerical Recipes values
RandLCG:
; Multiply part
MOV AX,WORD PTR [Rand32]
MOV BX,AX
MUL WORD PTR [RandA]; Low order words => DX = I1, AX = LOR
PUSH AX ; save low order result
MOV CX,DX ; save intermediate result 1
MOV AX,BX
MUL WORD PTR [RandA+2]; Low & Medium words => DX = I3, AX = I2
MOV BX,AX ; save intermediate result 2
MOV AX,WORD PTR [Rand32+2]
MUL WORD PTR [RandA]; Medium & Low order words => DX = I5, AX = I4
; Accumulate part
ADD BX,AX ; I2 + I4
ADD BX,CX ; I2 + I4 + I1 ; 9 July 2004 (was ADC)
POP AX ; LOR
; Add part
ADD AX,WORD PTR [RandC]
ADC BX,WORD PTR [RandC+2]
MOV WORD PTR [Rand32],AX
MOV WORD PTR [Rand32+2],BX
RET
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CODE ENDS
Quote from: FORTRANS on October 29, 2010, 02:05:38 PM
As I am not making sense to you (sorry), maybe "The
Art of Computer Programming", Volume 2 / Seminumerical
Algorithms, Chapter 3, Random Numbers, by Donald E. Knuth
will help. I have the second edition of 1981 and rather liked it.
:lol cheers Steve, the logic of randomness isnt the easiest thing to grasp.... Thank you for your attempts :bg
Hi to all!
I have made test for full testing of Axrand proc.
The generation file "Axrand.exe" gets an one parameter at command line - the number of the required test-values to be generated. If parameter does not exist, or parameter is 0, then used default value 200,000 generated DWORDs and bytes of 4 level - #0, #1, #2, #3 bytes (al,ah,#2,#3 bytes of EAX). Number of test-values, specified in the batch file, is 250,000. At least 2 MB of drive-space is required to run test with current settings.
Source code for the test-files generation program is included. Writing of selective bytes are unified. After each test, the initial value of the seed was reseted.
For made test needed to run an batch file ("RunStrangeTestForBytes.bat").
The testing results output file are named as "Strange_Bytes_Test_Instead_Dwords_Test_results.txt".
Well, there is a results, which I get on my machine. I'm just don't see any "bad bytes" at these tests, sorry.
Running a strange bytes test for an algo which intended to DWORDs
Generation of the files was done
#############################################################
Test for #0 byte
Entropy = 7.999184 bits per byte.
Optimum compression would reduce the size
of this 250000 byte file by 0 percent.
Chi square distribution for 250000 samples is 282.80, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bytes is 127.4872 (127.5 = random).
Monte Carlo value for Pi is 3.147410359 (error 0.19 percent).
Serial correlation coefficient is -0.000940 (totally uncorrelated = 0.0).
#############################################################
Test for #1 byte
Entropy = 7.999350 bits per byte.
Optimum compression would reduce the size
of this 250000 byte file by 0 percent.
Chi square distribution for 250000 samples is 224.62, and randomly
would exceed this value 90.00 percent of the times.
Arithmetic mean value of data bytes is 127.5163 (127.5 = random).
Monte Carlo value for Pi is 3.133394134 (error 0.26 percent).
Serial correlation coefficient is -0.000363 (totally uncorrelated = 0.0).
#############################################################
Test for #2 byte
Entropy = 7.999306 bits per byte.
Optimum compression would reduce the size
of this 250000 byte file by 0 percent.
Chi square distribution for 250000 samples is 240.66, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.5753 (127.5 = random).
Monte Carlo value for Pi is 3.144050305 (error 0.08 percent).
Serial correlation coefficient is 0.001299 (totally uncorrelated = 0.0).
#############################################################
Test for #3 byte
Entropy = 7.999183 bits per byte.
Optimum compression would reduce the size
of this 250000 byte file by 0 percent.
Chi square distribution for 250000 samples is 282.86, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bytes is 127.3206 (127.5 = random).
Monte Carlo value for Pi is 3.147506360 (error 0.19 percent).
Serial correlation coefficient is -0.001592 (totally uncorrelated = 0.0).
#############################################################
Test for full DWORD
Entropy = 7.999814 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.
Chi square distribution for 1000000 samples is 257.41, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.4748 (127.5 = random).
Monte Carlo value for Pi is 3.143412574 (error 0.06 percent).
Serial correlation coefficient is 0.000313 (totally uncorrelated = 0.0).
All BYTEs results in the good permissible range, as far as I understand them. The DWORD results (which is the main intention of this algo) - is in the best range, again - as far as I understand it.
So, I just *does not understand*, what is bad at this algo, maybe very small size?
If DWORD has very good randomness results, how can it be "bad in bytes", if it contains these bytes - it constructed from them? If these bytes are repeatedly bad, the DWORDs constructed from them - must be repeatedly bad, too. And this should be very clear in tests, due to ENT.EXE test *bytes* anyway.
In my test I see that each byte have very good value for a software RNG. Moreover, with increasing size of test, the randomeness was increased - this point to good side of the algo. So, probably, this algo can be used to generation the entire pads, not only DWORD.
Of course, my point of view can be different from other's, anyway, thanks for attention! Sorry if initially I'm don't understand the point of the discuss :bg
Test attached in the archive. For automating of things needed to run the "RunStrangeTestForBytes.bat" file. The output testing results file would be opened after test automatically.
Alex
it looks like steve is right about the lsb being a poor choice for randomness. not with all prng's, mind, but with lcg and others
Quote...is poor, because the low-order bits of many random number generators are distressingly non-random
http://c-faq.com/lib/randrange.html
Quote from: brethren on October 29, 2010, 10:17:57 PM
it looks like steve is right about the lsb being a poor choice for randomness. not with all prng's, mind, but with lcg and others
Well, have look to Axrand - I *do not see* that its low-order byte is bad. :bg
Have you runned the test?
Alex
sorry, we're getting confused here:) your algo looks fine, its the lcg algo that has the problem
btw i have run your test, yes. i'm seeing good results:)
what steve did was run the lcg algo generating dwords and from there extracted only the least significant bytes, which he then ran through ent.
btw results for axrand:)
Running a strange bytes test for an algo which intended to DWORDs
Generation of the files was done
#############################################################
Test for #0 byte
Entropy = 7.999184 bits per byte.
Optimum compression would reduce the size
of this 250000 byte file by 0 percent.
Chi square distribution for 250000 samples is 282.80, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bytes is 127.4872 (127.5 = random).
Monte Carlo value for Pi is 3.147410359 (error 0.19 percent).
Serial correlation coefficient is -0.000940 (totally uncorrelated = 0.0).
#############################################################
Test for #1 byte
Entropy = 7.999350 bits per byte.
Optimum compression would reduce the size
of this 250000 byte file by 0 percent.
Chi square distribution for 250000 samples is 224.62, and randomly
would exceed this value 90.00 percent of the times.
Arithmetic mean value of data bytes is 127.5163 (127.5 = random).
Monte Carlo value for Pi is 3.133394134 (error 0.26 percent).
Serial correlation coefficient is -0.000363 (totally uncorrelated = 0.0).
#############################################################
Test for #2 byte
Entropy = 7.999306 bits per byte.
Optimum compression would reduce the size
of this 250000 byte file by 0 percent.
Chi square distribution for 250000 samples is 240.66, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.5753 (127.5 = random).
Monte Carlo value for Pi is 3.144050305 (error 0.08 percent).
Serial correlation coefficient is 0.001299 (totally uncorrelated = 0.0).
#############################################################
Test for #3 byte
Entropy = 7.999183 bits per byte.
Optimum compression would reduce the size
of this 250000 byte file by 0 percent.
Chi square distribution for 250000 samples is 282.86, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bytes is 127.3206 (127.5 = random).
Monte Carlo value for Pi is 3.147506360 (error 0.19 percent).
Serial correlation coefficient is -0.001592 (totally uncorrelated = 0.0).
#############################################################
Test for full DWORD
Entropy = 7.999814 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.
Chi square distribution for 1000000 samples is 257.41, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.4748 (127.5 = random).
Monte Carlo value for Pi is 3.143412574 (error 0.06 percent).
Serial correlation coefficient is 0.000313 (totally uncorrelated = 0.0).
Quote from: brethren on October 29, 2010, 10:27:34 PM
sorry, we're getting confused here:) your algo looks fine, its the lcg algo that has the problem
:eek
If that - sorry, probably I haven't understand about which algo is going the discuss :)
Alex
Quote from: brethren on October 29, 2010, 10:27:34 PM
what steve did was run the lcg algo generating dwords and from there extracted only the least significant bytes, which he then ran through ent.
I understand now.
I have check entire DWORD and the all bytes of DWORD.
Well, my test is not needed, it seems.
Alex
QuoteWell, my test is not needed, it seems.
i'm glad you posted the test:) axrand produces great results with ent
QuoteIt seems to me that if the bytes don't form a random sequence then the dwords won't
either. Both will be biased in some way.
QuoteIf there are patterns in a DWORD stream (ie it is not random) I do not see how the
same stream in BYTE form cannot still have the same patterns (Except strung out every
4 bytes).... Unless the full DWORD range is not used in which case there is no point in
analysing a DWORD of data at a time in the first place....
QuoteIf DWORD has very good randomness results, how can it be "bad in bytes", if it contains
these bytes - it constructed from them?
Hi,
Given the above quotes, I will walk through the workings of the Linear
Congruential "Random" number Generator (LCG) that I am currently using.
It derives from the Knuth book I referenced previously, with some
modifications to match the version from "Numerical Recipes". It is the
following algorithm.
RandN+1 = RandN * 1664525 + 1013904223 (Ordered to make examples easier.)
1664525 = 19660DH = 110010110011000001101B
1013904223 = 3C6EF35FH = 111100011011101111001101011111B
One multiply and one addition.
Okay, let us look at the least significant bit (Bit0) only. From
RandN, it can be either a zero or a one. First case, 1 * 0 + 1 = 1.
Second case, 1 * 1 + 1 = 0. So, if you give it a zero, you get a one.
And if you give it a one, gets you a zero. In actual use you get an
alternating pattern with a period or run length of two. No higher bit
can affect the results.
Second bit, (Bit1) because the actions of multiplication and addition,
actions on the lower bit can affect the outcome. On Bit1, only a carry
from the addition will affect the outcome. (Multiplying by one does not
change things.) With higher bits, the multiply can propagate bits as well.
Four cases possible as follows.
01 * 00 + 11 = 11
01 * 01 + 11 = 00
01 * 10 + 11 = 01
01 * 11 + 11 = 10
A fixed pattern with a period of four. Hm, still does not look very random.
But better than the pattern using only one bit.
I will show you the results with three bits. Here both the multiply
and the addition of Bit0 and Bit1 can have an effect on Bit2.
101 * 000 + 111 = 111
101 * 001 + 111 = 100
101 * 010 + 111 = 001
101 * 011 + 111 = 110
101 * 100 + 111 = 011
101 * 101 + 111 = 000
101 * 110 + 111 = 101
101 * 111 + 111 = 010
A fixed pattern with a period of eight. If you only need eight values,
it's not too bad. If you want more, then it is a bit shabby as a random
number generator.
If you work out the next few cases you will find the run length of a
pattern is determined by the number of bits considered. Eight bits will
generate a pattern with a length of 256. And so the low byte will give
rather bad statistics when you create a byte stream longer than 256 from
them. See the ENT results from my earlier reply. It is looking a bit
more random though, when you first look at it. The higher the bit
considered, there are more lower bits that will affect the outcome.
Now work out the last case. If you do it by hand it will take you a
fair amount of time. There are 4,294,967,296 cases to consider and it
will create a pattern with a length of (left as an exercise for the reader).
And by gosh, if you look at the high byte only (as in Reply #28 using ENT)
The statistics look rather good. And you can effectively ignore the lower
bits for the first hundred million calls, or some other number depending
on what your application requires. You have the multiply propagation of 30
bits and the carry out of Bit30 mucking about with Bit31. That causes
Bit31 to actually look very random.
I fired up one of my seriously ill computers and got one of my programs
to show the above process. (Give or take, I was experimenting on some
"bright" ideas. Ignore the second part for now. If you have question
I might have an answer...) Basically you seed the generator, and generate
a whole bunch of numbers, mask off just one fixed bit in each, and plot
them on a graphic display to see what it looks like. Forget to pause, then
repeat the process for the next bit down the line. Standard DOS graphics.
I do hope this discussion and the program has cleared up at least some
of the points that were in question. Feel free to poke holes in where
you can. Or comment as needed.
"Things should be as simple as possible. But not simpler"
Regards,
Steve
Hi, Steve!
When I meant - if DWORD constructed with "bad bytes" - I meant: it must be much more bad in randomess point, too. My point is: if with ENT you get good results with testing of DWORDs from some algo - then bytes must be good, too - due to ENT tests only bytes, fact that we write DWORD is have no meaning for ENT. Hence - if results for DWORD/BYTEs, with no difference, is bad - this algo produces bad random numbers in any usage - byte/word/dword/...
Is not it?
I.e. - When you check the last byte of LCG - it show bad results, and when you check the DWORD results - they show incredible results for Chi-square, too.
Quote from: FORTRANS on October 30, 2010, 03:00:11 PM
"Things should be as simple as possible. But not simpler"
So, my algo is bad - it too small and simple :bg
Alex
Steve, the program have exiting key? Just I have terminate it only after minimizing the window.
Alex
Quote from: Antariy on October 30, 2010, 09:33:26 PM
Hi, Steve!
When I meant - if DWORD constructed with "bad bytes" - I meant: it must be much more bad in randomess point, too. My point is: if with ENT you get good results with testing of DWORDs from some algo - then bytes must be good, too - due to ENT tests only bytes, fact that we write DWORD is have no meaning for ENT. Hence - if results for DWORD/BYTEs, with no difference, is bad - this algo produces bad random numbers in any usage - byte/word/dword/...
Is not it?
I.e. - When you check the last byte of LCG - it show bad results, and when you check the DWORD results - they show incredible results for Chi-square, too.
Hi Alex.
If it is checked using DWORDs the LCG would show good results.
ENT tests bytes, and that means it sees the number broken up. The
LGC numbers are tested in Knuth's book, and show as okay. It
passes the Chi-square test.
Quote
Quote from: FORTRANS on October 30, 2010, 03:00:11 PM
"Things should be as simple as possible. But not simpler"
So, my algo is bad - it too small and simple :bg
Simple is good. Too simple is bad. LGC is about the simplest
random number generator you can have that is reasonably
useful. And LCG with a bad multiplier can be bad as well.
Regards,
Steve N.
Quote from: Antariy on October 30, 2010, 09:44:31 PM
Steve, the program have exiting key? Just I have terminate it only after minimizing the window.
The program should change video mode, post a message,
and wait for a key press. Then it cycles through the 32 bit
planes as screens full of pixels, it posts a message, and then
waits for a key press so you can read the somewhat bad
message. It then cycles through 16 bit planes of a 16-bit
LCG, and waits for a key press to exit. Tested with DOS,
OS/2 VDM, Windows 2000 and Windows XP command
prompts. All tests were done in the full screen mode not a
window.
Regards,
Steve N.
Quote from: FORTRANS on October 30, 2010, 10:20:37 PM
Simple is good. Too simple is bad. LGC is about the simplest
random number generator you can have that is reasonably
useful. And LCG with a bad multiplier can be bad as well.
So, in short: Axrand is bad, or it is good? If bytes generated by it is good - it shold by good in any usage, probably?
I waits the answer! :bg
Alex
Quote from: FORTRANS on October 30, 2010, 10:20:37 PM
If it is checked using DWORDs the LCG would show good results.
ENT tests bytes, and that means it sees the number broken up. The
LGC numbers are tested in Knuth's book, and show as okay. It
passes the Chi-square test.
Yes, this is which I meant also - if check the output file of LCG with current ENT, even if write full output DWORD - results would be incredible anyway, because ENT test bytes.
Probably I must use translate.google.com when I wants to say something :bg
Alex
Quote from: Antariy on October 30, 2010, 10:36:41 PM
So, in short: Axrand is bad, or it is good? If bytes generated by it is good - it shold by good in any usage, probably?
I waits the answer! :bg
Hi Alex,
Well, you tested it with ENT, and ENT thinks it good.
Chi value of 25% should be okay. Knuth uses 10% for
a "Almost Suspect" generator, 5% for "Suspect", and 1%
for "Reject". But the standard for testing random numbers
is the "Diehard" suite of tests.
Cheers,
Steve N.
Edit:
It should be noted that Knuth ran multiple tests with small
sample sizes and some erratic behavior must be expected.
the results posted here with ENT have much, much larger
sample sizes. So the cut off would be larger.
SRN
Hi Alex,
Do you agree that I add AxRand to the next version of the MasmBasic library?
I would implement it as
Rand MACRO range ; based on AxRand by Antariy, returns eax in the range 0...range-1
...
push range
push 0 ; mode is range
call AxRand
Exitm <eax>
ENDM
Rnd MACRO dest ; based on AxRand by Antariy, returns a REAL8 in the range 0...1
...
push dest
push 1 ; mode is offset to destination, or leave result in ST(0)
call AxRandR8
ifb <dest>
Exitm <> ; result in ST(0)
else
Exitm <eax> ; result is pointer to destination
endif
ENDM
What do you think? It's not only the shortest and fastest random generator ever seen, it seems also a real good one according to ENT...
Quote from: FORTRANS on October 30, 2010, 10:55:32 PM
Quote from: Antariy on October 30, 2010, 10:36:41 PM
So, in short: Axrand is bad, or it is good? If bytes generated by it is good - it shold by good in any usage, probably?
I waits the answer! :bg
Well, you tested it with ENT, and ENT thinks it good.
Chi value of 25% should be okay. Knuth uses 10% for
a "Almost Suspect" generator, 5% for "Suspect", and 1%
for "Reject". But the standard for testing random numbers
is the "Diehard" suite of tests.
I think - if bytes generated by algo is good, than any thing constructed from them *must* be good - no any other way. If 4 subsequently bytes is goodly-random, then DWORD, which is contain them - cannot be bad at point of randomness.
Michael already test Axrand with Diehard, as I recall, and he says that results good. I does not build the testing program from sources yet.
Alex
Quote from: jj2007 on October 30, 2010, 11:15:32 PM
Do you agree that I add AxRand to the next version of the MasmBasic library?
.............
What do you think? It's not only the shortest and fastest random generator ever seen, it seems also a real good one according to ENT...
I'm not contrary to using of Axrand :bg
Alex
Quote from: Antariy on October 30, 2010, 11:22:55 PM
Quote from: jj2007 on October 30, 2010, 11:15:32 PM
Do you agree that I add AxRand to the next version of the MasmBasic library?
.............
What do you think? It's not only the shortest and fastest random generator ever seen, it seems also a real good one according to ENT...
I'm not contrary to using of Axrand :bg
Thanks a lot, Alex :U
In the meantime, I have made small changes to optimise the algo, and they seem to work in terms of speed and size, at least for my archaic CPU...
The attachment is close to the "Full_testing_of_Axrand.zip" posted above. Here are my timings:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
10 cycles for Axrand
10 cycles for Axrand2
7 cycles for Axrand3
7 cycles for Rand()
37 bytes for Axrand
30 bytes for Axrand2
31 bytes for Axrand3
29 bytes for Rand()
Add 7-10 bytes per call
Regarding size: Calling the macro twice in a program costs 58 bytes, while using twice the Axrand3 version, and assuming it's 16-byte aligned, costs 32+2*10=52 bytes.
Quote from: Antariy on October 30, 2010, 11:21:21 PM
I think - if bytes generated by algo is good, than any thing constructed from them *must* be good - no any other way. If 4 subsequently bytes is goodly-random, then DWORD, which is contain them - cannot be bad at point of randomness.
Axrand seems to be highly random, but independently of that you have raised an interesting question here: Imagine the low bytes are being
calculated from two perfectly random high bytes - would the dword be perfectly random?
Try this in the last attachment, axrand.asm:
Quotemov eax, edx ; return random number
shr edx, 16
not edx
mov ax, dx
Quote from: jj2007 on October 31, 2010, 04:41:33 PM
you have raised an interesting question here: Imagine the low bytes are being calculated from two perfectly random high bytes - would the dword be perfectly random?
Hi,
As I pointed out (or thought I had) in Reply #53 and the
attached program, you can put some hideous stuff in the low
word and still get a good random number. While what you
suggest should not hurt in practice, you should note that it
changes the generator from having 4,294,967,296 distinct
outputs to one having only 65,536 possible numbers. As your
change does not affect the period, it remains 4,294,967,296
long before it starts to repeat. That is assuming I properly
understand his algorithm. So as far as I can see, there will
be no practical difference, unless you really need more than
65,536 distinct outputs.
HTH,
Steve N.
Quote from: FORTRANS on October 31, 2010, 05:11:33 PM
So as far as I can see, there will be no practical difference
Chi square distribution ... would exceed this value 5.00 percent of the times., i.e. "suspect"
But what you write is obviously correct - the # of distinct values is reduced to word size.
Here is a new version of the modified algo - a bit shorter, the macro now weighs in at 23 bytes (25 for ranges>127).
Test for #0 byte
Entropy = 7.999208 bits per byte.
would exceed this value 25.00 percent of the times.
Serial correlation coefficient is 0.001851 (totally uncorrelated = 0.0).
Test for #1 byte
Entropy = 7.999241 bits per byte.
would exceed this value 50.00 percent of the times.
Serial correlation coefficient is -0.000120 (totally uncorrelated = 0.0).
Test for #2 byte
Entropy = 7.999224 bits per byte.
would exceed this value 50.00 percent of the times.
Serial correlation coefficient is -0.000343 (totally uncorrelated = 0.0).
Test for #3 byte
Entropy = 7.999208 bits per byte.
would exceed this value 25.00 percent of the times.
Serial correlation coefficient is 0.002037 (totally uncorrelated = 0.0).
Test for full DWORD
Entropy = 7.999803 bits per byte.
would exceed this value 25.00 percent of the times.
Serial correlation coefficient is 0.003151 (totally uncorrelated = 0.0).
Quote from: jj2007 on October 31, 2010, 05:29:15 PM
Quote from: FORTRANS on October 31, 2010, 05:11:33 PM
So as far as I can see, there will be no practical difference
Chi square distribution ... would exceed this value 5.00 percent of the times., i.e. "suspect"
Hi jj2007,
You cannot use ENT on a DWORD stream and expect the
correct result for the random numbers involved. But yes, that
is an expected result from ENT as the two words are now
correlated. And it is indeed nice that Axrand is not affected
by that inconsistency.
Working on a Chi-Squared test program using "Numerical
Recipes" subroutines and have duplicated some of the results
in Knuth's AoCP. And it is working with the LCG as well
Regards,
Steve N.
Edit: removed comment about Numerical Recipes" data not
found.
SRN
furthermore, you cannot take bytes from a dword generator in any form and test them in a byte-oriented test program
don't make me come over there and re-write the test code - lol
Quote from: dedndave on October 31, 2010, 08:50:00 PM
furthermore, you cannot take bytes from a dword generator in any form and test them in a byte-oriented test program
Shouldn't be a problem is the range is -1, i.e. all elements of a dword enjoy the same probability to appear...
Quote
don't make me come over there and re-write the test code - lol
Come over, Dave, but I warn you it's raining cats and dogs right now :green2
i may get over that way, one of these days
better keep a few beers nice and frosty - us yanks like em cold
Quote from: dedndave on October 31, 2010, 08:50:00 PM
furthermore, you cannot take bytes from a dword generator in any form and test them in a byte-oriented test program
don't make me come over there and re-write the test code - lol
Dave, this is not dword oriented algo (in full meaning of hard-coded result, as C/C++ generator) - it
have range specifier.
Well, now I do test, replace
invoke Axrand,-1
to
invoke Axrand,256
To force algo to generate output values in rage of the BYTE only (0-255).
So, there are results which I get for the lower byte (other bytes are 0 - as must be):
#############################################################
Test for #0 byte
Entropy = 7.999183 bits per byte.
Optimum compression would reduce the size
of this 250000 byte file by 0 percent.
Chi square distribution for 250000 samples is 282.86, and randomly
would exceed this value 25.00 percent of the times.
Arithmetic mean value of data bytes is 127.3206 (127.5 = random).
Monte Carlo value for Pi is 3.147506360 (error 0.19 percent).
Serial correlation coefficient is -0.001592 (totally uncorrelated = 0.0).
So, you don't needed to write new test :P - results still good, even with generation of the BYTEs, not DWORDs :bg
NOTE: test maded with generation of 250,000 of BYTEs - and results are showed above. So, this is probably very good generator, because it generate good stream which is in ~1000 times bigger than the required range :bg
And note: there is "dirty low byte" :bg
Alex
Hi,
For what it is worth, I coded up one of the tests suggested
by Knuth in his chapter on random numbers. The test is a
simulation of throwing two dice. I took the chi-squared routines
from "Numerical Recipes" and fed it some data. So I am posting
some results. The first three were from Knuth's discussion to
verify it was working. And then fed it some data from RandLCG,
RAND, and AxRand.
The results were not what I expected. It seems too sensitive
to variation? Does anyone else have this coded up to compare
against? Suggestions for another set of data to test?
Regards,
Steve N.
Knuth sample (144 real dice throws).
Number of bins = 11, Constraint = 1
Expected events 2. 4. 10. 12. 22. 29. 21. 15. 14. 9. 6.
Observed events 4. 8. 12. 16. 20. 24. 20. 16. 12. 8. 4.
DF, CHSQ, PROB 10.00 7.15 0.71
Knuth example #1 (Too high variance.)
Number of bins = 11, Constraint = 1
Expected events 4. 10. 10. 13. 20. 18. 18. 11. 13. 14. 13.
Observed events 4. 8. 12. 16. 20. 24. 20. 16. 12. 8. 4.
DF, CHSQ, PROB 10.00 29.49 0.10
Knuth example #2 (Too low variance.)
Number of bins = 11, Constraint = 1
Expected events 3. 7. 11. 15. 19. 24. 21. 17. 13. 9. 5.
Observed events 4. 8. 12. 16. 20. 24. 20. 16. 12. 8. 4.
DF, CHSQ, PROB 10.00 1.14 99.97
RANLCG Random number was not working as planned during debug and used
a fixed seed.
WFL386 RANLCG 103kb
Number of bins = 11, Constraint = 1
Observed events 42. 84. 131. 167. 187. 221. 221. 152. 109. 90. 36.
Expected events 40. 80. 120. 160. 200. 240. 200. 160. 120. 80. 40.
DF, CHSQ, PROB 10.00 9.23 51.07
RMFORT RANLCG 36kb
Number of bins = 11, Constraint = 1
Observed events 42. 84. 131. 167. 187. 221. 221. 152. 109. 90. 36.
Expected events 40. 80. 120. 160. 200. 240. 200. 160. 120. 80. 40.
DF, CHSQ, PROB 10.00 9.23 51.07
WFL386 RAND (ACM) 103kb (not warmed up)
Number of bins = 11, Constraint = 1
Observed events 44. 78. 124. 173. 199. 226. 202. 154. 129. 76. 35.
Expected events 40. 80. 120. 160. 200. 240. 200. 160. 120. 80. 40.
DF, CHSQ, PROB 10.00 4.21 93.76
WFL386 RAND (ACM) 103kb (warmed upish)
Number of bins = 11, Constraint = 1
Observed events 42. 69. 115. 153. 206. 268. 188. 152. 113. 79. 55.
Expected events 40. 80. 120. 160. 200. 240. 200. 160. 120. 80. 40.
DF, CHSQ, PROB 10.00 12.74 23.86
RANLCG (fixed) 1440 events to compare with above runs.
EOF
Roll of Dice Simulation, Chi-Squared Test
Number of bins = 11, Constraint = 1, Number of events = 1440
Observed evnt 47. 82. 92. 169. 171. 258. 209. 171. 125. 79. 37.
Expected evnt 40. 80. 120. 160. 200. 240. 200. 160. 120. 80. 40.
DF, CHSQ, PROB 10.00 15.48 11.56
RANLCG max events.
Roll of Dice Simulation, Chi-Squared Test
Number of bins = 11, Constraint = 1, Number of events =100000
Observed evnt 2758. 5559. 8481.11053.13894.16801.13804.11032. 8223. 5581. 2814.
Expected evnt 2778. 5556. 8333.11111.13889.16667.13889.11111. 8333. 5556. 2778.
DF, CHSQ, PROB 10.00 7.28 69.88
RANLCG (Allow for random seed.) 1440 events to compare with above runs.
EOF
Roll of Dice Simulation, Chi-Squared Test
Number of bins = 11, Constraint = 1, Number of events = 1440
Observed evnt 34. 85. 104. 163. 188. 238. 209. 156. 146. 87. 30.
Expected evnt 40. 80. 120. 160. 200. 240. 200. 160. 120. 80. 40.
DF, CHSQ, PROB 10.00 13.39 20.27
AxRand DWORD limited to first 100,000 x2 DWORDS.
10/31/2010 01:13p 1,000,000 rand_dword.bin
Roll of Dice Simulation, Chi-Squared Test
Number of bins = 11, Constraint = 1, Number of events =100000
Observed evnt 2777. 5584. 8279.11167.13793.16660.13855.11232. 8306. 5526. 2821.
Expected evnt 2778. 5556. 8333.11111.13889.16667.13889.11111. 8333. 5556. 2778.
DF, CHSQ, PROB 10.00 3.76 95.74
Hi, Steve!
Do you have the Diehard compiled with good Fortran compiler?
I have downloaded sources for C some days ago (when Michael made the test), but tried to build the program yesterday only.
I have builded it with different versions of MSVC (6 and 10), with PellesC, but program seems to work badly. Floating point results output says about errors or NaNs. Sources is ported not cleanly, too - compilers have many "asks" to code.
So, I'll ask you for the EXE which is compiled with Fortran compiler, as I understand - it is the original language on which is written that test.
Alex
Quote from: FORTRANS on November 01, 2010, 10:00:52 PM
The results were not what I expected. It seems too sensitive
to variation? Does anyone else have this coded up to compare
against? Suggestions for another set of data to test?
What you are expected?
Michael, which version of Diehard do you used for testing of Axrand some days ago? The output is very different from output of C-port, in format of the tables, and in results (many results not shown anyway - FPU errors occurs). Program tested on different files (even not random), with different sizes.
Alex
I used the diehard.exe from the archive here (this is the Windows Software (732 Kb) link on this page (http://www.stat.fsu.edu/pub/diehard/)):
http://www.stat.fsu.edu/pub/diehard/diehard.zip
The author admitted that the C source is a mess, having been translated from old Fortran code, so I decided not to try building it, and now it looks like I made the right decision :bg
Quote from: MichaelW on November 01, 2010, 10:48:56 PM
The author admitted that the C source is a mess, having been translated from old Fortran code, so I decided not to try building it.
That is been a bit hm-m-m-m... tricky to build it, really strange port. You are decided rightly, but I'm does not know this info at that time :bg
Thank you for link to the program!
Alex
This code is supposed to determine the period of the generated sequence:
;=========================================================================
include \masm32\include\masm32rt.inc
;=========================================================================
.data
count dq 0
sysTime dq 0
buffer dd 1000 dup(0)
hTimer dd 0
hThread dd 0
.code
;=========================================================================
option prologue:none
option epilogue:none
align 16
Axrand proc STDCALL dwRange:DWORD
mov eax,Initial
mul dword ptr [esp+4]
mov eax,edx
mov edx,Initial
;xor edx,dword ptr [esp] ; for making results clearness, I have remove this, due to this line is compile-time dependent.
imul edx,12347
add edx,17
bswap edx
mov Initial,edx
ret 4
.data
align 4
Initial dd "RAND"
.code
Axrand endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
;=========================================================================
ThreadProc proc lpParameter:DWORD
@@:
invoke WaitForSingleObject, hTimer, INFINITE
loc 0, 0
invoke crt_printf, cfm$("%I64d\n"), count
jmp @B
ThreadProc endp
;=========================================================================
start:
;=========================================================================
;-------------------------------------------------------------------
; To avoid slowing down the L1 loop code, monitor the running count
; from a separate thread that runs only once every 2 seconds.
;-------------------------------------------------------------------
invoke CreateWaitableTimer, 0, 0, 0
mov hTimer, eax
invoke GetSystemTimeAsFileTime, ADDR sysTime
invoke SetWaitableTimer, hTimer, ADDR sysTime, 2000, 0, 0, 0
invoke CreateThread, 0, 0, ThreadProc, 0, 0, 0
mov esi, OFFSET buffer
xor ebx, ebx
.WHILE ebx < 1000
invoke Axrand, -1
mov [esi+ebx*4], eax
add DWORD PTR count, 1
adc DWORD PTR count+4, 0
inc ebx
.ENDW
L1:
invoke Axrand, -1
add DWORD PTR count, 1
adc DWORD PTR count+4, 0
cmp [esi], eax
jne L1
mov ebx, 999
xor edi, edi
L2:
inc edi
invoke Axrand, -1
cmp [esi+edi*4], eax
jne L1
dec ebx
jnz L2
inkey "Press any key to exit..."
exit
;=========================================================================
end start
For Axrand as currently coded I get a period of 4,138,934,403. The period seems to vary a lot depending on the value of the constants in the imul and add instructions. For example, if I change the 12347 to 12346, then I get a very long period. I never reached the end of it, but the count was around 438,000,000,000 when I stopped it after the app had run for ~5 hours on my 500 MHz P3. If I change it to 12345 then I get a period of 1,937,267,874. If I restore the 12347 and change the 17 to 18, I get a period of 1,299,107,688. I didn't test the effect of these changes on the randomness of the generated sequence, but I think it might be worth trying to find other constants that will produce a good random sequence with a longer period.
Quote from: MichaelW on November 02, 2010, 12:24:10 AM
For Axrand as currently coded I get a period of 4,138,934,403.
I guess, period can and must repeats for nearly range of DWORD, or not? After generation of allmost all possible DWORDs, generator just impossible to not-generate the same thing. Much nearl the "end" - high value for range - the more probably can be generated previous DWORD. If the same dword is not generated after very big time (much bigger than DWORD's range) - that is not says about fact that generator generate values from other part of range, and probably - very repeatedly?
Alex
Results for a shorter and faster variant:
Test for #0 byte
Entropy = 7.999208 bits per byte.
would exceed this value 25.00 percent of the times.
Serial correlation coefficient is 0.001851 (totally uncorrelated = 0.0).
Test for #1 byte
Entropy = 7.999241 bits per byte.
would exceed this value 50.00 percent of the times.
Serial correlation coefficient is -0.000120 (totally uncorrelated = 0.0).
Test for #2 byte
Entropy = 7.999224 bits per byte.
would exceed this value 50.00 percent of the times.
Serial correlation coefficient is -0.000343 (totally uncorrelated = 0.0).
Test for #3 byte
Entropy = 7.999208 bits per byte.
would exceed this value 25.00 percent of the times.
Serial correlation coefficient is 0.002037 (totally uncorrelated = 0.0).
Test for full DWORD
Entropy = 7.999803 bits per byte.
would exceed this value 25.00 percent of the times.
Serial correlation coefficient is 0.003151 (totally uncorrelated = 0.0).
Axrand3 proc STDCALL dwRange:DWORD
mov eax, Initial ; get the seed
imul eax, 127 ; randomise
add eax, 127
bswap eax ; we need the high order bits
mov Initial, eax ; save for next round
mul dword ptr [esp+4] ; multiply Initial with range
mov eax, edx ; return random number
ret 4 ; could also be returned in edx
Axrand3 endp
Quote from: jj2007 on November 02, 2010, 12:43:22 AM
Axrand3 proc STDCALL dwRange:DWORD
.......
mov eax, edx ; return random number
ret 4 ; could also be returned in edx
Axrand3 endp
xchg eax,edx would be smaller :P
:bg
Alex
Quote from: jj2007 on November 02, 2010, 12:43:22 AM
bswap eax ; we need the high order bits
I used BSWAP not for fact of getting some "bits" (if DWORD have bad high bits - this is not make it good contrary to DWORD with bad low bits). I use this due to I have used very small step with addition and multiply, so, I used BSWAP as shuffling. Just remove BSWAP from generator...
Alex
Quote from: Antariy on November 02, 2010, 12:46:35 AM
xchg eax,edx would be smaller :P
:bg
Yes, but it's a cycle slower :wink
The odd thing is when you drop the mov eax, edx, i.e. you simply use edx as return value, it's even slower, at least on my machine ::)
What about this pair:
imul eax, -127 ; randomise
add eax, -128
Axrand3 proc STDCALL dwRange:DWORD
mov eax, Initial ; get the seed
imul eax, -127 ; randomise
add eax, -128
bswap eax ; we need the high order bits
mov Initial, eax ; save for next round
mul dword ptr [esp+4] ; multiply Initial with range
mov eax, edx ; return random number
ret 4 ; could also be returned in edx
Axrand3 endp
QuoteTest for full DWORD
Entropy = 7.999845 bits per byte.
would exceed this value 95.00 percent of the times.
Quote from: MichaelW on November 02, 2010, 12:24:10 AM
when I stopped it after the app had run for ~5 hours on my 500 MHz P3.
..........
but I think it might be worth trying to find other constants that will produce a good random sequence with a longer period.
You did great thing, but probably finding of the best constants will require too long time to do.
Alex
QuoteYes, but it's a cycle slower
not so sure about that :P
Quote from: jj2007 on November 02, 2010, 12:58:39 AM
Yes, but it's a cycle slower :wink
QuoteTest for full DWORD
Entropy = 7.999845 bits per byte.
would exceed this value 95.00 percent of the times.
I not say *faster*, I say - smaller :bg
I agree that it can (and probably - must) be slower, because must use temporary place at least, or delays.
95% is bad value, as sayed in "help" for ENT, and as Steve said.
Alex
Quote from: Antariy on November 02, 2010, 01:03:53 AM
95% is bad value, as sayed in "help" for ENT, and as Steve said.
ent.html: If the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random. If the percentage is between 99% and 95% or between 1% and 5%, the sequence is suspect. Percentages between 90% and 95% and 5% and 10% indicate the sequence is "almost suspect"
So it is "almost suspect" but apparently still a lot better than the standard UNIX rand() function. Besides, by choosing e.g.
Ciao instead of
RAND for Initial, you can push all values into the green zone.
Let's not forget these are all pseudo-random generators... put the same initial value, and you will always get the same identical sequence :wink
Quote from: jj2007 on November 02, 2010, 01:38:54 AM
So it is "almost suspect" but apparently still a lot better than the standard UNIX rand() function. Besides, by choosing e.g. Ciao instead of RAND for Initial, you can push all values into the green zone.
Let's not forget these are all pseudo-random generators... put the same initial value, and you will always get the same identical sequence :wink
I wait your answer a long time :bg
Well, change the Initial to other value with using of -127/-128 constants, and which is the results?
Note (again): the key is the BSWAP, not other things. Not for getting other bits - for *shuffling*.
Alex
Quote from: jj2007 on November 02, 2010, 01:38:54 AM
by choosing e.g. Ciao
You can use the
Пока и
Хайр, too :bg
Alex
Quote from: Antariy on November 02, 2010, 01:47:33 AM
Quote from: jj2007 on November 02, 2010, 01:38:54 AM
by choosing e.g. Ciao
You can use the Пока и Хайр, too :bg
Almost everything goes, of course. But caution with exotic values such as 0 or -1. And the bswap is needed, too - try removing that, and the randomness parameters drop dramatically.
Quote from: jj2007 on November 02, 2010, 02:04:53 AM
Almost everything goes, of course. But caution with exotic values such as 0 or -1. And the bswap is needed, too - try removing that, and the randomness parameters drop dramatically.
Jochen, read the ~3 my previous posts, please. In them *I said: BSWAP IS THE KEY OF THE ALGO* !!! Probably you miss that posts.
I suggested to remove bswap and see results too, see previous posts :wink
Alex
The 12346 multiplier is no good:
Entropy = 7.951915 bits per byte.
Optimum compression would reduce the size
of this 10000000 byte file by 0 percent.
Chi square distribution for 10000000 samples is 661132.45, and randomly
would exceed this value 0.01 percent of the times.
Arithmetic mean value of data bytes is 127.3905 (127.5 = random).
Monte Carlo value for Pi is 3.145268458 (error 0.12 percent).
Serial correlation coefficient is -0.001283 (totally uncorrelated = 0.0).
Quote from: Antariy on November 01, 2010, 10:06:11 PM
Hi, Steve!
Do you have the Diehard compiled with good Fortran compiler?
Hi Alex,
No. But I will look into it.
Quote
I have downloaded sources for C some days ago (when Michael made the test), but tried to build the program yesterday only.
I have builded it with different versions of MSVC (6 and 10), with PellesC, but program seems to work badly. Floating point results output says about errors or NaNs. Sources is ported not cleanly, too - compilers have many "asks" to code.
So, I'll ask you for the EXE which is compiled with Fortran compiler, as I understand - it is the original language on which is written that test.
Okay.
Regards,
Steve N
Hi,
Here is a executable created with Microsoft FORTRAN.
Regards,
Steve N
http://www.stat.fsu.edu/pub/diehard/cdrom/dos/diehard.exe
Quote from: MichaelW on November 02, 2010, 04:10:28 AM
The 12346 multiplier is no good:
It seems, that my supposition is right - about fact that if generator does not generate repeated value nearly to the end of the period - it is not good. Generator should generate the same value after all possible variants (i.e. - period length) is generated for targer datatype - I guess. For very long period with fixed datatype, needed to increase size of that type - maybe use QWORDs. Or use byte-range to generate bytes, with eventually seed of Initial. And construct needed datatype with that bytes.
Alex
Quote from: FORTRANS on November 02, 2010, 06:26:49 PM
Here is a executable created with Microsoft FORTRAN.
Thank you! I have downloaded it now, too.
Alex
The Axrand code is not big. Moreover, it have only 2 places which is needed to be frequently changed for testing - that is the constants. So, if we initially write
imul edx,12345678h
add edx,12345678h
then compile this - constants would be have 4 bytes in the binary code.
We can use the piece, which contain full code of Axrand in this way:
1. Create different thread.
2. New thread copy the code of Axrand to its own space allocated (with executable attribs - for DEP).
3. New thread replace the constants at a fixed offset from the start of the Axrand binary code. New constants will be generated at runtime at main procedure - each new thread can get it's starting number for using in 2 constants from its parameter.
4. New thread works to find the best period.
5. After founding repeated thing, thread put the value of its maximal period into array, its member - the starting number which it is have at start, for example.
Testing on multicore/CPUs machines would help a lot.
To be honest - I have satisfyed with my current (i.e. - second posted at this thread) proc - it have good results in tests, and it have period which is very close to the maximal possible - the DWORD's range. :bg
Alex
I tried a number of multipliers, and while some of them worked well for the ENT tests, they all produced shorter periods than the current Axrand coding. So I decided to try the multiplier and increment values from George Marsaglia's congruential generator CONG, which has a period of 2^32:
x(n)=69069x(n-1)+1234567
This worked well for the ENT tests, but had a shorter period than the current Axrand coding. After I eliminated the BSWAP instruction, then in addition to the P3 cycles dropping to 8, I got a period of 4,294,967,297 (which I see now is off by one because my code goes one count past the end of the period). Unfortunately, without the BSWAP the generated sequence does poorly on the Chi square test:
Entropy = 7.999992 bits per byte.
Optimum compression would reduce the size
of this 10000000 byte file by 0 percent.
Chi square distribution for 10000000 samples is 107.67, and randomly
would exceed this value 99.99 percent of the times.
Arithmetic mean value of data bytes is 127.4951 (127.5 = random).
Monte Carlo value for Pi is 3.140778056 (error 0.03 percent).
Serial correlation coefficient is 0.000210 (totally uncorrelated = 0.0).
Quote from: MichaelW on November 03, 2010, 02:04:08 AM
I tried a number of multipliers ...
Hard matter to full-testing. At least needed multithreaded test, or that requires too long time, probably.
Alex
Quote from: MichaelW on November 03, 2010, 02:04:08 AM
... 1234567
Maybe that is too big constant - it does not make small step?
Alex
Michael, if change ADD EDX,17 to ADD EDX,12347, I get these results, all other things is the same:
Entropy = 7.999818 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.
Chi square distribution for 1000000 samples is 251.95, and randomly
would exceed this value 50.00 percent of the times.
Arithmetic mean value of data bytes is 127.4555 (127.5 = random).
Monte Carlo value for Pi is 3.152196609 (error 0.34 percent).
Serial correlation coefficient is 0.000846 (totally uncorrelated = 0.0).
What you think?
Alex
The period is much shorter: 667,579,016
The reason I used the values from CONG was that those values satisfy the requirements for an LCG to have a maximum period. But because of the BSWAP, Axrand is not exactly an LCG.
Quote from: MichaelW on November 03, 2010, 03:07:40 AM
The period is much shorter: 667,579,016
The reason I used the values from CONG was that those values satisfy the requirements for an LCG to have a maximum period. But because of the BSWAP, Axrand is not exactly an LCG.
Is worth to make test, which is "designed" at reply #97 ?
Somebody with multicore CPU desire to run heavy test? :green2
Alex
Quote from: Antariy on November 03, 2010, 10:57:19 PM
Is worth to make test, which is "designed" at reply #97 ?
Somebody with multicore CPU desire to run heavy test? :green2
Yes, but choose the fastest algo, and start with my favourite values :wink:
mov esi, -127
mov edi, -128
mov ebx, -1
call AxRand
Axrand proc ; STDCALL dwRange:DWORD
mov eax, Initial ; get the seed
imul eax, esi ; -127 ; randomise
add eax, edi ; -128
bswap eax ; we do need the high order bits
mov Initial, eax ; save for next round
mul ebx ; -1 aka dword ptr [esp+4] ; multiply Initial with range
; mov eax, edx ; return random number (could also be returned in edx)
ret ; 4
Axrand endp
Quote from: jj2007 on November 04, 2010, 12:42:11 AM
Yes, but choose the fastest algo, and start with my favourite values :wink:
Well, anybody can choose its own algo :bg, which period have your algo?
Reply #97 is the skeleton :P
Alex
Quote from: Antariy on November 04, 2010, 12:47:57 AM
Quote from: jj2007 on November 04, 2010, 12:42:11 AM
Yes, but choose the fastest algo, and start with my favourite values :wink:
Well, anybody can choose its own algo :bg, which period have your algo?
Reply #97 is the skeleton :P
1898349916 for imul eax, -127 and add eax, -128
Quote from: jj2007 on November 04, 2010, 01:09:32 AM
Quote from: Antariy on November 04, 2010, 12:47:57 AM
Quote from: jj2007 on November 04, 2010, 12:42:11 AM
Yes, but choose the fastest algo, and start with my favourite values :wink:
Well, anybody can choose its own algo :bg, which period have your algo?
Reply #97 is the skeleton :P
1898349916 for imul eax, -127 and add eax, -128
Algorithm, which is initially posted at second my post at this thread, have period which is close to full DWORDs range: 4,138,934,403 (posted by Michael at reply #78).
Alex
The -127/-124 pair has both a long period and excellent randomness:
Quote; with imul -127, add eax, n: ; period, chi square 0123/dword
; -128 1898349916
; -127 2766784829
; -126 786943256
; -125 1492884433
; -124 4158707804, 5*50
; -123 3720667977, 75, 75, 50, 75, 50
; -122 3008980430, 75, 50, 97.5, 75, 50
; -121 3461554912, 50, 75, 75, 50, 50
; -119 115388264, 50, 75, 50, 50, 50
; -118 1121767446
; -117 4071617755, 25, 25, 90, 25, 5
; with imul -128, add eax, n: ; period, chi square 0123/dword
; -128 0 aka 2^32, chi square horrible
; -127 0 aka 2^32, chi square horrible
; with imul -123, add eax, n: ; period, chi square 0123/dword
; -128 662055917
; -127 368088112
; -123 693250349
start:
push -1
call Axrand3
mov ebx, eax
xor esi, esi
.Repeat
push -1
call Axrand3
inc esi
.Until Zero? || eax==ebx ; 2147483648
print chr$(13, 10, "The period: ")
inkey ustr$(esi), 7, 13, 10
exit
-117 is better than other - other is too long (bigger than DWORD range).
Alex
P.S. But it is still less than DWORD can holds. So, this is interesting thing - find values which will generate period close to DWORD range, but not bigger and not lot smaller.
Yes indeed, Alex. Besides, the period also depends on the range, in ways that are not obvious:
Quote ; -127 2766784829, 50, 90, 50, 50, 50, periods 100/1000 etc: 5, 1584, 5216, 5216, 339035, ..
; -124 4158707804, 5*50, periods 100/1000 etc: 153, 2861, 10095, 111553, 956647, 956647, 956647, 279161941
However, these periods are not important when the range is -1, e.g. when you want to return a float as shown below.
RndSeed MACRO SeedName:=<MbSeed>
ifndef SeedName
LOCAL SeedName:QWORD
endif
rdtsc
mov dword ptr SeedName, eax
and dword ptr SeedName+4, 0
ifdif <SeedName>, <MbSeed>
MbSeed equ SeedName
endif
ENDM
Rnd MACRO dest
mov eax, dword ptr MbSeed ; Initial
imul eax, -127 ; randomise
add eax, -124
bswap eax ; we need the high order bits
mov dword ptr MbSeed, eax ; save for next round
ffree st(7)
if type(MbSeed) ne QWORD
% .err <MbSeed must be a qword>
endif
fild MbSeed
fmul RndMul
fstp dest
ENDM
Quote from: jj2007 on November 04, 2010, 02:07:33 AM
However, these periods are not important when the range is -1, e.g. when you want to return a float as shown below.
No, it is important at point how fastly would appear the same value, after some number of generations. Anyway, your constants gives the good results, too.
Alex
Jochen, still do you want to have ENT as DLL? Just I have no look to it, yet. This is allowed - build it as unindepended executable (i.e. - DLL)?
Alex
GregL quoted a pretty liberal license, so I guess it is allowed to turn it into a dll. It would speed up testing dramatically - imagine how many millions of cycles you need just to open a console window :lol
The terse mode would be enough. My preferred syntax would be invoke ENT, ptrCommandLine, ptrSourceBuffer, bytes, ptrResults, i.e. no need to write to a file.
Greg (http://www.masm32.com/board/index.php?topic=15067.msg122299#msg122299):
This software is in the public domain. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, without any conditions or restrictions.
Quote from: jj2007 on November 04, 2010, 02:23:59 AM
imagine how many millions of cycles you need just to open a console window :lol
Okay, okay... :bg
Alex
Hi,
While browsing Marsaglia's Diehard notes, I see that he likes
what he calls "Multiply With Carry" random number generators.
Using his MAKEWHAT.EXE as a reference, I coded up an MWC
generator. Seems good overall, except for the name. Since
"carry" has a specific meaning for assembly programmers.
ENT results show all bytes are good. See Reply#33 to
compare to the LCG results.
Full ENT.EXE results for RandMWC first file.
Entropy = 7.999839 bits per byte.
Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.
Chi square distribution for 1000000 samples is 222.84, and randomly
would exceed this value 90.00 percent of the times.
Arithmetic mean value of data bytes is 127.5642 (127.5 = random).
Monte Carlo value for Pi is 3.130236521 (error 0.36 percent).
Serial correlation coefficient is 0.001554 (totally uncorrelated = 0.0).
Condensed
Run#, Chi square percent
1 222.84, 90.00
2 240.36, 50.00
3 247.61, 50.00
4 320.84, 0.50
5 273.72, 25.00
6 253.78, 50.00
7 258.05, 50.00
8 250.85, 50.00
9 249.83, 50.00
10 255.38, 50.00
11 239.91, 50.00
12 216.47, 95.00
And using my dice throw simulation to compare MWC and LCG.
RANLCG max events.
Roll of Dice Simulation, Chi-Squared Test
Number of bins = 11, Constraint = 1, Number of events =100000
Observed evnt 2746. 5502. 8375.11029.13836.16580.13814.11160. 8462. 5661. 2835.
Expected evnt 2778. 5556. 8333.11111.13889.16667.13889.11111. 8333. 5556. 2778.
DF, CHSQ, PROB 10.00 8.13 61.59
1, CHSQ, PROB 8.13 61.59%
2, CHSQ, PROB 12.18 27.30%
3, CHSQ, PROB 7.47 68.03%
4, CHSQ, PROB 4.31 93.23%
5, CHSQ, PROB 9.13 52.02%
6, CHSQ, PROB 5.11 88.34%
RandMWC max events.
Roll of Dice Simulation, Chi-Squared Test
1, CHSQ, PROB 10.02 43.91%
2, CHSQ, PROB 11.62 31.14%
3, CHSQ, PROB 6.25 79.40%
4, CHSQ, PROB 6.27 79.25%
5, CHSQ, PROB 5.77 83.44%
6, CHSQ, PROB 10.02 43.92%
My implementation.
;RandMWC: ; Multiply with carry "Random" number Generator from Marsaglia
Rand32 DD 31415926 ; a la K
DD 1013904223 ; As per NR RandC
RandA DD 1517746329 ; See RandMWC
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; 3 November 2010, try Marsaglia's "Multiply With Carry".
; His notation:
; RandN+1 = A*RandN + Carry Mod B, M = A * B - 1 = Prime Safe.
; for my understanding:
; RandN+1 = RandA*RandN + High DWordN
; Fold would be better than carry?
;(In general, for any choice of `a`, let m=a*2^32-1. If both m
;and (m-1)/2 are prime then the period will be (m-1)/2).
RandMWC:
MOV EAX,[Rand32]
MUL [RandA]
ADD EAX,[Rand32+4]
ADC EDX,0
MOV [Rand32],EAX
MOV [Rand32+4],EDX
RET
Thoughts?
Regards,
Steve N.
It looks like it should be fairly fast.
Quote from: MichaelW on November 05, 2010, 06:59:18 PM
It looks like it should be fairly fast.
Yes, about ~8 cycles on your CPU, I guess.
Steve, if add a range specifier - that is will be best algo, probably.
Alex
Hi Alex,
Do you mean you want it to output a limited set of integers?
I would warm up the generator and use the current random
number, then fall through to refresh the random number.
Something like:
; - - - - - -
; Limit random number to 0 <= returned < LIMIT.
; Enter wwith LIMIT in ECX.
; Return result in in ECX.
LimitRand:
MOV EAX,[Rand32]
MUL ECX
MOV ECX,EDX
; - - - - - -
RandMWC:
MOV EAX,[Rand32]
MUL [RandA]
ADD EAX,[Rand32+4]
ADC EDX,0
MOV [Rand32],EAX
MOV [Rand32+4],EDX
RET
Does that make sense to you?
Regards,
Steve N.
Quote from: FORTRANS on November 05, 2010, 10:09:30 PM
Do you mean you want it to output a limited set of integers?
Yes,
Steve. Just at programming we more frequentrly have need in limited output value, instead of "true random long pad". For generation of the coordinates etc. That makes algo more useful in general usage.
Alex
Alex,
you can add some clipping/scaling (i like to clip the ends off) code that can be used with any generator
the scaling code wouldn't change, and thus not affect the timing
this would be a great routine to add some auto-reseed code, too
i seem to recall we played with MWC in the forum before
Quote from: dedndave on November 05, 2010, 10:19:13 PM
you can add some clipping/scaling (i like to clip the ends off) code that can be used with any generator
the scaling code wouldn't change, and thus not affect the timing
this would be a great routine to add some auto-reseed code, too
i seem to recall we played with MWC in the forum before
Dave, thanks for advice, but I know that :P
Just I'm dislike constructions, which is convert something to something other :lol If algo can do this internally - that would be better.
Alex
SeedCnt dd 1
Rnd32lo dd ?
Rnd32hi dd ?
ASeed PROC
dec SeedCnt
jnz ASeed0
push edx
push eax
INVOKE QueryPerformanceCounter,esp
pop Rnd32hi
pop eax
mov Rnd32lo,eax
and eax,07Fh
or al,80h
mov SeedCnt,eax
ASeed0: mov edx,1517746329
mov eax,Rnd32lo
mul edx
add eax,Rnd32hi
adc edx,0
mov Rnd32lo,eax
mov Rnd32hi,edx
ret
ASeed ENDP
The faster, simple to using, RNG with DWORD output is:
mov eax,[esp+123]
:P
:green2
The seed is "123" - next time needed to use something other :green2
Alex
what ? - you guys don't like my routine ?
Quote from: dedndave on November 06, 2010, 01:52:39 AM
what ? - you guys don't like my routine ?
You mistaked - I like it. :bg
Just mov eax,[esp+123] is faster :P
Alex
that thing will spit out some nice garbage :P
here is a slight improvement...
SeedCnt dd 1
Rnd32hi dd ?
Rnd32lo dd ?
ASeed PROC
dec SeedCnt
jnz ASeed0
INVOKE QueryPerformanceCounter,offset Rnd32hi
movzx eax,byte ptr Rnd32hi
or al,80h
mov SeedCnt,eax
ASeed0: mov edx,1517746329
mov eax,Rnd32lo
mul edx
add eax,Rnd32hi
adc edx,0
mov Rnd32lo,eax
mov Rnd32hi,edx
ret
ASeed ENDP
EDIT: changed this...
movzx eax,byte ptr Rnd32lo
to this...
movzx eax,byte ptr Rnd32hi
for improved randomnicity
Quote from: dedndave on November 06, 2010, 03:45:42 AM
that thing will spit out some nice garbage :P
here is a slight improvement...
This is improvement of my algo :P
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
align 16
Axrand_s:
Axrand proc STDCALL dwRange:DWORD
mov edx,Initial
mov eax,Initial
imul edx,12347
add edx,17
bswap edx
mov Initial,edx
mul dword ptr [esp+4]
mov eax,edx
ret 4
Axrand endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef
4 cycles faster on my CPU
ECX is not used intentionally.
Alex
There is Jochen's test, with my slightly optimized algo, and with Steve's algo.
There is results which I gets:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
16 cycles for Axrand
17 cycles for Axrand2
14 cycles for Axrand3
113 cycles for LimitRand
13 cycles for Rand()
15 cycles for Axrand
16 cycles for Axrand2
14 cycles for Axrand3
113 cycles for LimitRand
13 cycles for Rand()
15 cycles for Axrand
17 cycles for Axrand2
14 cycles for Axrand3
113 cycles for LimitRand
12 cycles for Rand()
14 cycles for Axrand
17 cycles for Axrand2
14 cycles for Axrand3
112 cycles for LimitRand
13 cycles for Rand()
13 cycles for Axrand
17 cycles for Axrand2
14 cycles for Axrand3
113 cycles for LimitRand
12 cycles for Rand()
37 bytes for Axrand
30 bytes for Axrand2
31 bytes for Axrand3
29 bytes for Rand()
Add 7-10 bytes per call
Working with carry on PIV is very slow :eek
Steve, have a look - maybe something wrong? :eek I only make returning and getting results standartized - get range from stack, and return number in EAX. But this must slowdown algo by ~2-4 clocks at maximum only. :eek
Test attached in archive. Run it, please.
Alex
AMD Sempron(tm) Processor 3100+ (SSE3)
2 cycles for Axrand
4 cycles for Axrand2
1 cycles for Axrand3
15 cycles for LimitRand
4 cycles for Rand()
2 cycles for Axrand
4 cycles for Axrand2
2 cycles for Axrand3
15 cycles for LimitRand
4 cycles for Rand()
2 cycles for Axrand
4 cycles for Axrand2
1 cycles for Axrand3
14 cycles for LimitRand
4 cycles for Rand()
2 cycles for Axrand
4 cycles for Axrand2
1 cycles for Axrand3
15 cycles for LimitRand
4 cycles for Rand()
2 cycles for Axrand
4 cycles for Axrand2
2 cycles for Axrand3
15 cycles for LimitRand
4 cycles for Rand()
37 bytes for Axrand
30 bytes for Axrand2
31 bytes for Axrand3
29 bytes for Rand()
Add 7-10 bytes per call
--- ok ---
Quote from: oex on November 06, 2010, 10:11:21 PM
AMD Sempron(tm) Processor 3100+ (SSE3)
Hi, Peter! :bg
AMD works with pushs-pops very well.
So, problem is not work with carry - it is - many memory references.
Alex
Here is an update with my latest AxRand3 and Rand() versions. I have multiplied the loops by 10 to prevent unrealistically short timings. So the real timings are around 12...20 cycles:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
145 cycles for 10*Axrand
121 cycles for 10*Axrand3
201 cycles for 10*LimitRand
115 cycles for 10*Rand()
142 cycles for 10*Axrand
121 cycles for 10*Axrand3
200 cycles for 10*LimitRand
115 cycles for 10*Rand()
142 cycles for 10*Axrand
121 cycles for 10*Axrand3
201 cycles for 10*LimitRand
115 cycles for 10*Rand()
37 bytes for Axrand
27 bytes for Axrand3
25 bytes for Rand()
Add 7-10 bytes per call
New version of timings:
LOOP_COUNT = 1000000 ; One Mio would be a typical value
RepCt = 10 ; multiple timings prevent negative cycles ;-)
invoke Sleep, 50
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS
REPEAT RepCt
push 1000
call Axrand
mov ecx, eax
ENDM
counter_end
print str$(eax), 9, "cycles for "
print str$(RepCt), "*Axrand", 13, 10
And here my latest version:
Axrand3 proc STDCALL dwRange:DWORD ; 27 bytes short
mov eax, Initial ; get the seed
imul eax, -127 ; randomise
add eax, -124 ; -127/-124 is the perfect randomness pair
bswap eax ; we need to shuffle a bit to get the high order bits
mov Initial, eax ; save for next round
mul dword ptr [esp+4] ; multiply Initial with range
mov eax, edx ; return random number (could also be returned in edx)
ret 4
Axrand3 endp
That is time for Hutch to make great real-time test-bed. :bg
Timings:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
209 cycles for 10*Axrand
203 cycles for 10*Axrand3
253 cycles for 10*LimitRand
176 cycles for 10*Rand()
209 cycles for 10*Axrand
198 cycles for 10*Axrand3
253 cycles for 10*LimitRand
174 cycles for 10*Rand()
208 cycles for 10*Axrand
204 cycles for 10*Axrand3
251 cycles for 10*LimitRand
172 cycles for 10*Rand()
37 bytes for Axrand
27 bytes for Axrand3
25 bytes for Rand()
Add 7-10 bytes per call
Alex
you didn't put mine in there ? :(
AMD Sempron(tm) Processor 3100+ (SSE3)
144 cycles for 10*Axrand
99 cycles for 10*Axrand3
212 cycles for 10*LimitRand
102 cycles for 10*Rand()
97 cycles for 10*Axrand
91 cycles for 10*Axrand3
215 cycles for 10*LimitRand
99 cycles for 10*Rand()
96 cycles for 10*Axrand
95 cycles for 10*Axrand3
210 cycles for 10*LimitRand
102 cycles for 10*Rand()
37 bytes for Axrand
27 bytes for Axrand3
25 bytes for Rand()
Add 7-10 bytes per call
--- ok ---
Quote from: dedndave on November 07, 2010, 12:11:39 AM
you didn't put mine in there ? :(
Sorry, Dave, please!
Here it is:
Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
208 cycles for 10*Axrand
204 cycles for 10*Axrand3
252 cycles for 10*LimitRand
293 cycles for 10*ASeed
176 cycles for 10*Rand()
210 cycles for 10*Axrand
203 cycles for 10*Axrand3
252 cycles for 10*LimitRand
291 cycles for 10*ASeed
170 cycles for 10*Rand()
211 cycles for 10*Axrand
205 cycles for 10*Axrand3
251 cycles for 10*LimitRand
292 cycles for 10*ASeed
179 cycles for 10*Rand()
37 bytes for Axrand
27 bytes for Axrand3
25 bytes for Rand()
Add 7-10 bytes per call
Alex
thanks Alex
problem is....
you need to run it ~192 times to get an average time
that is because it reseeds once every 192 pulls (on average)
otherwise, it's going to look slower than it actually is
edit - and, i haven't tried to optimize it, at all - lol
i can extrapolate that it should be about 20 to 21 cycles
With Dave's code and code sizes:
Intel(R) Celeron(R) M CPU 420 @ 1.60GHz (SSE3)
143 cycles for 10*Axrand
121 cycles for 10*Axrand3
202 cycles for 10*LimitRand
270 cycles for 10*ASeed
115 cycles for 10*Rand()
142 cycles for 10*Axrand
121 cycles for 10*Axrand3
200 cycles for 10*LimitRand
270 cycles for 10*ASeed
115 cycles for 10*Rand()
143 cycles for 10*Axrand
121 cycles for 10*Axrand3
200 cycles for 10*LimitRand
268 cycles for 10*ASeed
115 cycles for 10*Rand()
37 bytes for Axrand
27 bytes for Axrand3
65 bytes for ASeed
47 bytes for LimitRand
25 bytes for Rand()
Quote from: dedndave on November 07, 2010, 12:21:49 AM
thanks Alex
problem is....
you need to run it ~192 times to get an average time
that is because it reseeds once every 192 pulls (on average)
Dave, it was re-runned:
........
LOOP_COUNT = 1000000 ; One Mio would be a typical value
I.e. - 1 million times. :bg
And reseeding is the part of the algo - so its timing should be included too. That is price of auto-reseeding - other algos does not have that feature.
Alex
oh - ok - gotcha :U
it should compare favorably in a randomness test
i can get it down to 27 cycles easily, i think :bg
Quote from: dedndave on November 07, 2010, 12:21:49 AM
edit - and, i haven't tried to optimize it, at all - lol
"http://www.masm32.com/board/index.php?topic=11679.msg123943#msg123943" :P
lol
well - that was a simplification
what i mean is, i haven't timed it to select faster instructions
Quote from: dedndave on November 07, 2010, 12:21:49 AM
i can extrapolate that it should be about 20 to 21 cycles
Method of placing of many calls subsequently have many disadvantages. First - it is eat many resource for itself only.
Quote from: dedndave on November 07, 2010, 12:30:12 AM
lol
well - that was a simplification
what i mean is, i haven't timed it to select faster instructions
I'm joking, Dave :bg
Your code is good - it cannot be faster very much. QPC very heavy API.
Alex
AMD Sempron(tm) Processor 3100+ (SSE3)
97 cycles for 10*Axrand
96 cycles for 10*Axrand3
229 cycles for 10*LimitRand
230 cycles for 10*ASeed
100 cycles for 10*Rand()
97 cycles for 10*Axrand
92 cycles for 10*Axrand3
246 cycles for 10*LimitRand
229 cycles for 10*ASeed
100 cycles for 10*Rand()
97 cycles for 10*Axrand
92 cycles for 10*Axrand3
248 cycles for 10*LimitRand
234 cycles for 10*ASeed
103 cycles for 10*Rand()
37 bytes for Axrand
27 bytes for Axrand3
65 bytes for ASeed
47 bytes for LimitRand
25 bytes for Rand()
Add 7-10 bytes per call
--- ok ---
Quote from: oex on November 07, 2010, 01:00:59 AM
AMD Sempron(tm) Processor 3100+ (SSE3)
Thanks :U
Alex
results for a Prescott...
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
204 cycles for 10*Axrand
199 cycles for 10*Axrand3
245 cycles for 10*LimitRand
311 cycles for 10*ASeed
170 cycles for 10*Rand()
204 cycles for 10*Axrand
198 cycles for 10*Axrand3
246 cycles for 10*LimitRand
316 cycles for 10*ASeed
170 cycles for 10*Rand()
203 cycles for 10*Axrand
198 cycles for 10*Axrand3
274 cycles for 10*LimitRand
312 cycles for 10*ASeed
174 cycles for 10*Rand()
Hi,
Well, here are my timings. One from Alex and one from
jj2007. I must say that I am a bit surprised by the slow
MCW timings. I tried a variant that commented out the
ADC to see if that was the culprit.
Regards
Steve N.
PIII Win 2000.
Reply #129 Test.zip
pre-P4 (SSE1)
9 cycles for Axrand
11 cycles for Axrand2
9 cycles for Axrand3
28 cycles for LimitRand
9 cycles for Rand()
9 cycles for Axrand
11 cycles for Axrand2
9 cycles for Axrand3
28 cycles for LimitRand
8 cycles for Rand()
9 cycles for Axrand
11 cycles for Axrand2
9 cycles for Axrand3
28 cycles for LimitRand
8 cycles for Rand()
9 cycles for Axrand
11 cycles for Axrand2
9 cycles for Axrand3
28 cycles for LimitRand
8 cycles for Rand()
9 cycles for Axrand
11 cycles for Axrand2
9 cycles for Axrand3
29 cycles for LimitRand
8 cycles for Rand()
37 bytes for Axrand
30 bytes for Axrand2
31 bytes for Axrand3
29 bytes for Rand()
Add 7-10 bytes per call
--- ok ---
Above edited to comment ADC.
pre-P4 (SSE1)
9 cycles for Axrand
11 cycles for Axrand2
9 cycles for Axrand3
27 cycles for LimitRand
8 cycles for Rand()
9 cycles for Axrand
11 cycles for Axrand2
9 cycles for Axrand3
27 cycles for LimitRand
9 cycles for Rand()
9 cycles for Axrand
11 cycles for Axrand2
9 cycles for Axrand3
27 cycles for LimitRand
8 cycles for Rand()
9 cycles for Axrand
11 cycles for Axrand2
9 cycles for Axrand3
27 cycles for LimitRand
8 cycles for Rand()
9 cycles for Axrand
11 cycles for Axrand2
9 cycles for Axrand3
27 cycles for LimitRand
8 cycles for Rand()
37 bytes for Axrand
30 bytes for Axrand2
31 bytes for Axrand3
29 bytes for Rand()
Add 7-10 bytes per call
--- ok ---
Reply #138 axRandTimings.zip
pre-P4 (SSE1)
133 cycles for 10*Axrand
126 cycles for 10*Axrand3
194 cycles for 10*LimitRand
205 cycles for 10*ASeed
119 cycles for 10*Rand()
133 cycles for 10*Axrand
126 cycles for 10*Axrand3
195 cycles for 10*LimitRand
205 cycles for 10*ASeed
120 cycles for 10*Rand()
133 cycles for 10*Axrand
125 cycles for 10*Axrand3
194 cycles for 10*LimitRand
206 cycles for 10*ASeed
120 cycles for 10*Rand()
37 bytes for Axrand
27 bytes for Axrand3
65 bytes for ASeed
47 bytes for LimitRand
25 bytes for Rand()
Add 7-10 bytes per call
--- ok ---
Hi,
Tried an excursion. Compare to above results. Some
improvement.
Regards,
Steve
.data
ALIGN 4
Rand32 DD 31415926 ; a la K
DD 1013904223 ; As per NR RandC
RandA DD 1517746329 ; See RandMWC
pre-P4 (SSE1)
133 cycles for 10*Axrand
125 cycles for 10*Axrand3
163 cycles for 10*LimitRand
200 cycles for 10*ASeed
120 cycles for 10*Rand()
133 cycles for 10*Axrand
126 cycles for 10*Axrand3
162 cycles for 10*LimitRand
200 cycles for 10*ASeed
120 cycles for 10*Rand()
134 cycles for 10*Axrand
126 cycles for 10*Axrand3
162 cycles for 10*LimitRand
200 cycles for 10*ASeed
120 cycles for 10*Rand()
37 bytes for Axrand
27 bytes for Axrand3
65 bytes for ASeed
47 bytes for LimitRand
25 bytes for Rand()
Add 7-10 bytes per call
--- ok ---
the MUL instruction is a bit finicky
MUL reg32 seems to be prefered over MUL [Val32], even if you have to load it
but, it is nice if the instruction just before MUL is not related to EAX, EDX, or the multiplicand
and it is nice if the instruction that follows MUL is not related EAX or EDX
well - none of those guidelines can really be followed with MWC - lol
Hi Dave,
Quote from: dedndave on November 07, 2010, 02:40:54 PM
the MUL instruction is a bit finicky
MUL reg32 seems to be prefered over MUL [Val32],
Right.
Quoteeven if you have to load it
Hmm. By how much? Guess it's processor specific.
Oh well. Put it on the ToDo list...
Quote
but, it is nice if the instruction just before MUL is not related to EAX, EDX, or the multiplicand
and it is nice if the instruction that follows MUL is not related EAX or EDX
well - none of those guidelines can really be followed with MWC - lol
Except loading registers? But MWC is 64-bit, sort of, as
one needs to save both EAX and EDX contents. Whereas
the others need to save 32-bit EAX.
Cheers,
Steve N.