News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Random Generator

Started by Neil, June 17, 2009, 12:44:13 PM

Previous topic - Next topic

oex

Quote from: FORTRANS on October 29, 2010, 02:05:38 PM
As I am not making sense to you (sorry), maybe "The
Art of Computer Programming", Volume 2 / Seminumerical
Algorithms, Chapter 3, Random Numbers, by Donald E. Knuth
will help.  I have the second edition of 1981 and rather liked it.

:lol cheers Steve, the logic of randomness isnt the easiest thing to grasp.... Thank you for your attempts :bg
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

Antariy

#46
Hi to all!

I have made test for full testing of Axrand proc.

The generation file "Axrand.exe" gets an one parameter at command line - the number of the required test-values to be generated. If parameter does not exist, or parameter is 0, then used default value 200,000 generated DWORDs and bytes of 4 level - #0, #1, #2, #3 bytes (al,ah,#2,#3 bytes of EAX). Number of test-values, specified in the batch file, is 250,000. At least 2 MB of drive-space is required to run test with current settings.

Source code for the test-files generation program is included. Writing of selective bytes are unified. After each test, the initial value of the seed was reseted.

For made test needed to run an batch file ("RunStrangeTestForBytes.bat").
The testing results output file are named as "Strange_Bytes_Test_Instead_Dwords_Test_results.txt".

Well, there is a results, which I get on my machine. I'm just don't see any "bad bytes" at these tests, sorry.


Running a strange bytes test for an algo which intended to DWORDs
Generation of the files was done


#############################################################
Test for #0 byte
Entropy = 7.999184 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 282.80, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.4872 (127.5 = random).
Monte Carlo value for Pi is 3.147410359 (error 0.19 percent).
Serial correlation coefficient is -0.000940 (totally uncorrelated = 0.0).



#############################################################
Test for #1 byte
Entropy = 7.999350 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 224.62, and randomly
would exceed this value 90.00 percent of the times.

Arithmetic mean value of data bytes is 127.5163 (127.5 = random).
Monte Carlo value for Pi is 3.133394134 (error 0.26 percent).
Serial correlation coefficient is -0.000363 (totally uncorrelated = 0.0).



#############################################################
Test for #2 byte
Entropy = 7.999306 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 240.66, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.5753 (127.5 = random).
Monte Carlo value for Pi is 3.144050305 (error 0.08 percent).
Serial correlation coefficient is 0.001299 (totally uncorrelated = 0.0).



#############################################################
Test for #3 byte
Entropy = 7.999183 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 282.86, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.3206 (127.5 = random).
Monte Carlo value for Pi is 3.147506360 (error 0.19 percent).
Serial correlation coefficient is -0.001592 (totally uncorrelated = 0.0).



#############################################################
Test for full DWORD
Entropy = 7.999814 bits per byte.

Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.

Chi square distribution for 1000000 samples is 257.41, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.4748 (127.5 = random).
Monte Carlo value for Pi is 3.143412574 (error 0.06 percent).
Serial correlation coefficient is 0.000313 (totally uncorrelated = 0.0).



All BYTEs results in the good permissible range, as far as I understand them. The DWORD results (which is the main intention of this algo) - is in the best range, again - as far as I understand it.
So, I just *does not understand*, what is bad at this algo, maybe very small size?

If DWORD has very good randomness results, how can it be "bad in bytes", if it contains these bytes - it constructed from them? If these bytes are repeatedly bad, the DWORDs constructed from them - must be repeatedly bad, too. And this should be very clear in tests, due to ENT.EXE test *bytes* anyway.
In my test I see that each byte have very good value for a software RNG. Moreover, with increasing size of test, the randomeness was increased - this point to good side of the algo. So, probably, this algo can be used to generation the entire pads, not only DWORD.

Of course, my point of view can be different from other's, anyway, thanks for attention! Sorry if initially I'm don't understand the point of the discuss  :bg

Test attached in the archive. For automating of things needed to run the "RunStrangeTestForBytes.bat" file. The output testing results file would be opened after test automatically.




Alex

brethren

it looks like steve is right about the lsb being a poor choice for randomness. not with all prng's, mind, but with lcg and others

Quote...is poor, because the low-order bits of many random number generators are distressingly non-random

http://c-faq.com/lib/randrange.html


Antariy

Quote from: brethren on October 29, 2010, 10:17:57 PM
it looks like steve is right about the lsb being a poor choice for randomness. not with all prng's, mind, but with lcg and others

Well, have look to Axrand - I *do not see* that its low-order byte is bad.  :bg
Have you runned the test?



Alex

brethren

sorry, we're getting confused here:) your algo looks fine, its the lcg algo that has the problem

btw i have run your test, yes. i'm seeing good results:)

what steve did was run the lcg algo generating dwords and from there extracted only the least significant bytes, which he then ran through ent.

btw results for axrand:)
Running a strange bytes test for an algo which intended to DWORDs
Generation of the files was done


#############################################################
Test for #0 byte
Entropy = 7.999184 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 282.80, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.4872 (127.5 = random).
Monte Carlo value for Pi is 3.147410359 (error 0.19 percent).
Serial correlation coefficient is -0.000940 (totally uncorrelated = 0.0).



#############################################################
Test for #1 byte
Entropy = 7.999350 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 224.62, and randomly
would exceed this value 90.00 percent of the times.

Arithmetic mean value of data bytes is 127.5163 (127.5 = random).
Monte Carlo value for Pi is 3.133394134 (error 0.26 percent).
Serial correlation coefficient is -0.000363 (totally uncorrelated = 0.0).



#############################################################
Test for #2 byte
Entropy = 7.999306 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 240.66, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.5753 (127.5 = random).
Monte Carlo value for Pi is 3.144050305 (error 0.08 percent).
Serial correlation coefficient is 0.001299 (totally uncorrelated = 0.0).



#############################################################
Test for #3 byte
Entropy = 7.999183 bits per byte.

Optimum compression would reduce the size
of this 250000 byte file by 0 percent.

Chi square distribution for 250000 samples is 282.86, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 127.3206 (127.5 = random).
Monte Carlo value for Pi is 3.147506360 (error 0.19 percent).
Serial correlation coefficient is -0.001592 (totally uncorrelated = 0.0).



#############################################################
Test for full DWORD
Entropy = 7.999814 bits per byte.

Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.

Chi square distribution for 1000000 samples is 257.41, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 127.4748 (127.5 = random).
Monte Carlo value for Pi is 3.143412574 (error 0.06 percent).
Serial correlation coefficient is 0.000313 (totally uncorrelated = 0.0).

Antariy

Quote from: brethren on October 29, 2010, 10:27:34 PM
sorry, we're getting confused here:) your algo looks fine, its the lcg algo that has the problem

:eek

If that - sorry, probably I haven't understand about which algo  is going the discuss :)



Alex

Antariy

Quote from: brethren on October 29, 2010, 10:27:34 PM
what steve did was run the lcg algo generating dwords and from there extracted only the least significant bytes, which he then ran through ent.

I understand now.
I have check entire DWORD and the all bytes of DWORD.
Well, my test is not needed, it seems.



Alex

brethren

QuoteWell, my test is not needed, it seems.

i'm glad you posted the test:) axrand produces great results with ent

FORTRANS

#53
QuoteIt seems to me that if the bytes don't form a random sequence then the dwords won't
either. Both will be biased in some way.

QuoteIf there are patterns in a DWORD stream (ie it is not random) I do not see how the
same stream in BYTE form cannot still have the same patterns (Except strung out every
4 bytes).... Unless the full DWORD range is not used in which case there is no point in
analysing a DWORD of data at a time in the first place....

QuoteIf DWORD has very good randomness results, how can it be "bad in bytes", if it contains
these bytes - it constructed from them?


Hi,

   Given the above quotes, I will walk through the workings of the Linear
Congruential "Random" number Generator (LCG) that I am currently using.
It derives from the Knuth book I referenced previously, with some
modifications to match the version from "Numerical Recipes".  It is the
following algorithm.

   RandN+1 = RandN * 1664525 + 1013904223 (Ordered to make examples easier.)

      1664525 =   19660DH =          110010110011000001101B
   1013904223 = 3C6EF35FH = 111100011011101111001101011111B

One multiply and one addition.

   Okay, let us look at the least significant bit (Bit0) only.  From
RandN, it can be either a zero or a one.  First case, 1 * 0 + 1 = 1.
Second case, 1 * 1 + 1 = 0.  So, if you give it a zero, you get a one.
And if you give it a one, gets you a zero.  In actual use you get an
alternating pattern with a period or run length of two.  No higher bit
can affect the results.

   Second bit, (Bit1) because the actions of multiplication and addition,
actions on the lower bit can affect the outcome.  On Bit1, only a carry
from the addition will affect the outcome.  (Multiplying by one does not
change things.)  With higher bits, the multiply can propagate bits as well.
Four cases possible as follows.

   01 * 00 + 11 = 11
   01 * 01 + 11 = 00
   01 * 10 + 11 = 01
   01 * 11 + 11 = 10

A fixed pattern with a period of four.  Hm, still does not look very random.
But better than the pattern using only one bit.

   I will show you the results with three bits.  Here both the multiply
and the addition of Bit0 and Bit1 can have an effect on Bit2.

   101 * 000 + 111 = 111
   101 * 001 + 111 = 100
   101 * 010 + 111 = 001
   101 * 011 + 111 = 110
   101 * 100 + 111 = 011
   101 * 101 + 111 = 000
   101 * 110 + 111 = 101
   101 * 111 + 111 = 010

A fixed pattern with a period of eight.  If you only need eight values,
it's not too bad.  If you want more, then it is a bit shabby as a random
number generator.

   If you work out the next few cases you will find the run length of a
pattern is determined by the number of bits considered.  Eight bits will
generate a pattern with a length of 256.  And so the low byte will give
rather bad statistics when you create a byte stream longer than 256 from
them.  See the ENT results from my earlier reply.  It is looking a bit
more random though, when you first look at it.  The higher the bit
considered, there are more lower bits that will affect the outcome.

   Now work out the last case.  If you do it by hand it will take you a
fair amount of time.  There are 4,294,967,296 cases to consider and it
will create a pattern with a length of (left as an exercise for the reader).
And by gosh, if you look at the high byte only (as in Reply #28 using ENT)
The statistics look rather good.  And you can effectively ignore the lower
bits for the first hundred million calls, or some other number depending
on what your application requires.  You have the multiply propagation of 30
bits and the carry out of Bit30 mucking about with Bit31.  That causes
Bit31 to actually look very random.

   I fired up one of my seriously ill computers and got one of my programs
to show the above process.  (Give or take, I was experimenting on some
"bright" ideas.  Ignore the second part for now.  If you have question
I might have an answer...)  Basically you seed the generator, and generate
a whole bunch of numbers, mask off just one fixed bit in each, and plot
them on a graphic display to see what it looks like.  Forget to pause, then
repeat the process for the next bit down the line.  Standard DOS graphics.

   I do hope this discussion and the program has cleared up at least some
of the points that were in question.  Feel free to poke holes in where
you can.  Or comment as needed.

   "Things should be as simple as possible.  But not simpler"

Regards,

Steve

Antariy

Hi, Steve!

When I meant - if DWORD constructed with "bad bytes" - I meant: it must be much more bad in randomess point, too. My point is: if with ENT you get good results with testing of DWORDs from some algo - then bytes must be good, too - due to ENT tests only bytes, fact that we write DWORD is have no meaning for ENT. Hence - if results for DWORD/BYTEs, with no difference, is bad - this algo produces bad random numbers in any usage - byte/word/dword/...
Is not it?
I.e. - When you check the last byte of LCG - it show bad results, and when you check the DWORD results - they show incredible results for Chi-square, too.

Quote from: FORTRANS on October 30, 2010, 03:00:11 PM
 "Things should be as simple as possible.  But not simpler"

So, my algo is bad - it too small and simple  :bg



Alex

Antariy

Steve, the program have exiting key? Just I have terminate it only after minimizing the window.



Alex

FORTRANS

Quote from: Antariy on October 30, 2010, 09:33:26 PM
Hi, Steve!

When I meant - if DWORD constructed with "bad bytes" - I meant: it must be much more bad in randomess point, too. My point is: if with ENT you get good results with testing of DWORDs from some algo - then bytes must be good, too - due to ENT tests only bytes, fact that we write DWORD is have no meaning for ENT. Hence - if results for DWORD/BYTEs, with no difference, is bad - this algo produces bad random numbers in any usage - byte/word/dword/...
Is not it?
I.e. - When you check the last byte of LCG - it show bad results, and when you check the DWORD results - they show incredible results for Chi-square, too.

Hi Alex.

   If it is checked using DWORDs the LCG would show good results.
ENT tests bytes, and that means it sees the number broken up.  The
LGC numbers are tested in Knuth's book, and show as okay.  It
passes the Chi-square test.

Quote
Quote from: FORTRANS on October 30, 2010, 03:00:11 PM
  "Things should be as simple as possible.  But not simpler"

So, my algo is bad - it too small and simple  :bg

   Simple is good.  Too simple is bad.  LGC is about the simplest
random number generator you can have that is reasonably
useful.  And LCG with a bad multiplier can be bad as well.

Regards,

Steve N.

FORTRANS

Quote from: Antariy on October 30, 2010, 09:44:31 PM
Steve, the program have exiting key? Just I have terminate it only after minimizing the window.

   The program should change video mode, post a message,
and wait for a key press.  Then it cycles through the 32 bit
planes as screens full of pixels, it posts a message, and then
waits for a key press so you can read the somewhat bad
message.  It then cycles through 16 bit planes of a 16-bit
LCG, and waits for a key press to exit.  Tested with DOS,
OS/2 VDM, Windows 2000 and Windows XP command
prompts.  All tests were done in the full screen mode not a
window.

Regards,

Steve N.


Antariy

Quote from: FORTRANS on October 30, 2010, 10:20:37 PM
   Simple is good.  Too simple is bad.  LGC is about the simplest
random number generator you can have that is reasonably
useful.  And LCG with a bad multiplier can be bad as well.

So, in short: Axrand is bad, or it is good? If bytes generated by it is good - it shold by good in any usage, probably?
I waits the answer!  :bg



Alex

Antariy

Quote from: FORTRANS on October 30, 2010, 10:20:37 PM
   If it is checked using DWORDs the LCG would show good results.
ENT tests bytes, and that means it sees the number broken up.  The
LGC numbers are tested in Knuth's book, and show as okay.  It
passes the Chi-square test.

Yes, this is which I meant also - if check the output file of LCG with current ENT, even if write full output DWORD - results would be incredible anyway, because ENT test bytes.

Probably I must use translate.google.com when I wants to say something :bg



Alex