News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Random Generator

Started by Neil, June 17, 2009, 12:44:13 PM

Previous topic - Next topic

FORTRANS

Hi Dave,

   Yes that "feature" can be changed.  In the general case a
linear congruential generator is of the form;


RandN+1 = MOD( (RandN * MulConst + AddConst), Modulus )


   In this case I used a modulo of 2**32 as it comes for free
with the X86 multiply instruction.  And that is were the short run
repeating comes from.  I have a program that shows the
kind of repetition you can expect.  First bit alternates.  Second
bit, cycle of four.  And so on.

   Knuth showed some code that uses 2**N + 1 and 2**N - 1
as a modulus that avoids that kind of obvious repetitive behavior.
The code is somewhat more complex though.  Using a prime
modulus helps as well.  But that requires an explicit modulo
operation in most cases, which messes with the quick and easy
(dirty?) philosophy.  And they only make the repeat cycle to
not be as obvious to humans.  See Knuth's AoCP for the real
nitty gritty.

   The multiplier is 1664525 and the additive constant is
1013904223.  These were used in "Numerical Recipes the Art
of Scientific Computing".  Knuth analyzed a number of
multipliers for common word sizes, and that was one of the
better 32-bit ones.  He says the additive part just has to
be nonzero.

   Does that help, or just muddy things?

Regards,

Steve N.

MichaelW

This is an attempt to determine if each of the byte values generated by Alex's generator is equally probable.

;=========================================================================
    include \masm32\include\masm32rt.inc
;=========================================================================
    .data
        mean    REAL8 ?
        total   dq    0
        min     dd    -1
        max     dd    0
        range   dd    0
        counts  dd    256 dup(0)
    .code
;=========================================================================

.data
align 4
Initial dd "RAND"
.code

option prologue:none
option epilogue:none

align 4
Axrand proc STDCALL dwRange:DWORD
mov eax,Initial
mul dword ptr [esp+4]
mov eax,edx
mov edx,Initial
xor edx,dword ptr [esp]
imul edx,12347
add edx,17
bswap edx
mov Initial,edx
ret 4
Axrand endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

;=========================================================================
start:
;=========================================================================

    N = 1000000000

    ;------------------------------------------------
    ; Allocate a buffer and generate N random bytes.
    ;------------------------------------------------

    mov esi, alloc(N)
    xor ebx, ebx
    .WHILE ebx < N/4
        invoke Axrand, 0ffffffffh
        mov [esi+ebx*4], eax
        inc ebx
    .ENDW

    ;--------------------------------------
    ; Count the occurrences of the values.
    ;--------------------------------------

    xor ebx, ebx
    .WHILE ebx < N
        movzx eax, BYTE PTR [esi+ebx]
        inc DWORD PTR [counts+eax*4]
        inc ebx
    .ENDW

    ;-----------------------------------------------------------
    ; Get and display the minimum and maximum counts, the range
    ; between the minimum and maximum, and the mean count.
    ;-----------------------------------------------------------

    xor ebx, ebx
    .WHILE ebx < 256
        mov eax, [counts+ebx*4]
        .IF eax < min
            mov min, eax
        .ENDIF
        .IF eax > max
            mov max, eax
        .ENDIF
        add DWORD PTR total, eax
        adc DWORD PTR total+4, 0
        inc ebx
    .ENDW
    mov eax, max
    sub eax, min
    mov range, eax
    fild total
    push 256
    fild DWORD PTR [esp]
    add esp, 4
    fdiv
    fstp mean
    invoke crt_printf, cfm$("%d\t%d\t%d\t%.3f\n"), min, max, range, mean

    free esi

    inkey "Press any key to exit..."
    exit
;=========================================================================
end start


As coded, I get these results:

3901388 3912076 10688   3906250.000

And with a 50% increase in N:

5852804 5867231 14427   5859375.000

The mean in both cases is where it should be, and the range shrunk in proportion to N, so the minimum and maximum do appear to be converging on the mean. I would be interested to see the results for a much larger N (that my wimpy system cannot reasonably handle).
eschew obfuscation

Antariy

Quote from: FORTRANS on October 27, 2010, 01:29:23 PM
Hi,

   Just as a comparison, here are the ENT results from a
simple 32-bit Linear Congruential "Random" number Generator.
Partly initialized using the current time.  First one million bytes
using the high byte from the DWORD random number.

Hi, Steve!

Just note - when I make test for Axrand - I'm using all bytes from resulting DWORD - not any selective bytes.
I have just:

invoke Axrand,-1


and write *full* EAX to the output file.
After that I test resulting file with ENT.EXE



Alex

FORTRANS

Hi Alex,

   Yes, testing a DWORD would be the best plan.  But ENT
does bytes.  And the results I posted shows that the low byte
is bad, while the high byte is rather good.  For most uses of
random numbers the DWORD should be good.  Only if your
application depends on the low byte(s) separate from the rest
of the number should this generator be considered suspect.
Of course it also shows why LCG is not considered optimal.

   Anyway, for what it's worth here is the ENT results for
250,000 DWORDs.  If anybody cares I can do separate byte
values for the other two bytes in the DWORD.

Regards,

Steve N.


250,000 DWORDs as 1 million bytes
  ENT.EXE results.

Entropy = 7.999902 bits per byte.

Optimum compression would reduce the size
of this 1000000 byte file by 0 percent.

Chi square distribution for 1000000 samples is 135.95, and randomly
would exceed this value 99.99 percent of the times.

Arithmetic mean value of data bytes is 127.5447 (127.5 = random).
Monte Carlo value for Pi is 3.140940564 (error 0.02 percent).
Serial correlation coefficient is -0.000114 (totally uncorrelated = 0.0).

brethren

not considered optimal, thats an understatement:)

QuoteIf the percentage is greater than 99% or less than 1%, the sequence is almost certainly not random.

FORTRANS

Hi,

   And in my earlier post I showed it got 50% if you treat
them one at a time.  When you treat them four at a time,
or use the wrong parts, it fails.

Cheers,

Steve

dedndave

it seems to me that the test, itself, needs a bit of an update
it was designed to make measurements on random byte values
trying to run dwords through there, no matter how you do it, is like jamming a square peg into a round hole

on the flip side, we could design byte-oriented random generators
that seems stupid because, these days, we all want to use dwords or even larger

the only thing that makes sense is to re-vamp the test program to make measurements on dword files

oex

This does seem like a weird random discussion to me though I am no maths expert.... How can a byte stream be random and the same stream in dwords not? Surely the test is duff unless you arent using the full range maybe in which case your randomness algorithm is incomplete?
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

dedndave

i think they are comparing apples to orangutans   :lol

FORTRANS

Hi Dave,

   Did my earlier answer actually answer your question?

Quote from: dedndave on October 28, 2010, 08:29:14 PM
it seems to me that the test, itself, needs a bit of an update
it was designed to make measurements on random byte values
trying to run dwords through there, no matter how you do it, is like jamming a square peg into a round hole

   Quite right, that was a point I tried to make.  Trying to answer Alex
seems to stirred up some silt.  Looks like a weekend (weakened?)
project.  I wrote some statistics software back when.  And I can look
at Knuth's set of tests, though if I remember, I kind of glazed over
looking at them.

Regards,

Steve

FORTRANS

Hi,

Quote from: oex on October 28, 2010, 08:39:35 PM
How can a byte stream be random and the same stream in dwords not?

   Because this byte stream is made up of random bytes, it is random.
Because the bytes in this DWORD stream are not independent, the byte
stream made from the DWORDs need not (and in this case is not) very
random.  Even if the DWORDs are perfectly good random DWORDs.

Regards,

Steve N.

MichaelW

Steve,

Could post your code? It seems to me that if the bytes don't form a random sequence then the dwords won't either. Both will be biased in some way.


eschew obfuscation

dedndave

oh yes, Steve - it did
you have to admit - the LCG method is simple and fast
i think it makes a decent generator if you re-seed it from time-to-time

oex

Quote from: FORTRANS on October 28, 2010, 09:27:50 PM

Quote from: oex on October 28, 2010, 08:39:35 PM
How can a byte stream be random and the same stream in dwords not?

   Because this byte stream is made up of random bytes, it is random.
Because the bytes in this DWORD stream are not independent, the byte
stream made from the DWORDs need not (and in this case is not) very
random.  Even if the DWORDs are perfectly good random DWORDs.

I was never any good at probabilities I think I'll just have to accept I cannot understand this.... If there are patterns in a DWORD stream (ie it is not random) I do not see how the same stream in BYTE form cannot still have the same patterns (Except strung out every 4 bytes).... Unless the full DWORD range is not used in which case there is no point in analysing a DWORD of data at a time in the first place....

It just seems to me that you are looking at a piece of paper in a 3D world from the side on and delaring the world 2D.... Surely random is random in any dimension or vunerable to very simple attack and not really random at all? (Note. This is a genuine question not a rhetorical question)

It seems to me according to your logic that if I looked at that BYTE stream in BITS it would no longer be random? Alternatively if you were to use more than one BYTE the second BYTE would not be random based on the first....

:lol I'm probably just confusing myself (and everybody else) at this stage.... I am kinda interested in this but I'll try and pick up some scraps as you guys discuss it :bg
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

FORTRANS

Hi,

   Yesterday I replied from my parent's machine using IE that
was not cooperating with entering text, and may have not been
clear on some points.  Sorry.  I will make a last attempt to clean
things up now that I can edit text again in Netscape.  Sigh.

   In Reply #28 the ENT results were for byte streams created
from bytes taken from DWORDs created by a LCG algorithm.
One good using the high byte. One bad using the low byte.
Things seem to have gone south from there.

   In Reply #32, Alex mentioned using all bytes in the DWORD
in his algorithm.  I use all bytes in the LCG, but reported only
byte values in #28, so I reported DWORD ENT results in
Reply #33, noting that the results would appear bad.  I should
have emphasized that the DWORDs are correlated as to the bytes
that are contained in the DWORD.  That is a consequence of
the LCG algorithm and why it "is not optimum".  An optimum
algorithm would not only generate decent pseudorandom values,
it would not have discernable internal structure.

   brethren in Reply #34 made a comment on that comment
of mine that I tried to clarify in Reply #35 referring back to #28
and the "good" byte stream.  Probably a mistake as it was too
brief?

   Reply #36, #38, and #39 tried to point out the DWORD and
byte streams are different and should not be confused.  Reply
#40 should have mentioned that the "byte stream" was from the
first part of Reply #28 and the DWORD stream was from #33.

   MichaelW, I have had 16-bit and 32-bit assembly, plus FORTRAN
code from Numerical Recipes (and I thought my own?). All produce
the same results.   Most of that is on a "dead" machine of one sort
or another.  16-bit code appended to the end, pseudo-code is;

    RandN+1 = 1664525 *  RandN + 1013904223

plus an initial value, plus the current time.  The DWORDs, considered
as DWORDs, are a fairly good random sequence, if used properly.  If
used as a stream of bytes, it is crummy.  You can possibly find
a way to invalidate the use of the DWORDs, which is why more
complex algorithms are now used.  For simple cases it works well.

   Reply #43
QuoteSurely random is random in any dimension or vulnerable to very simple attack and not really random at all? (Note. This is a genuine question not a rhetorical question)

   Right.  Pseudorandom is not random.  Simple is not infallible.
I was trying to show a simple algorithm to generate a decent
"random" sequence as cheaply as possible.  As opposed to a
"costly approach".  The simple route was muddied by the ENT
results?  As I am not making sense to you (sorry), maybe "The
Art of Computer Programming", Volume 2 / Seminumerical
Algorithms, Chapter 3, Random Numbers, by Donald E. Knuth
will help.  I have the second edition of 1981 and rather liked it.

Regards all,

Steve N.


DATASEG SEGMENT PUBLIC

; - - - - - -
; "Random number routine" from Knuth/Numerical Recipies
;RandLCG:        ; Linear Congruential "Random" number Generator from Knuth/Numerical Recipies

Rand32  DD      31415926        ; a la K, Use zero to test NR results
RandA   DD      1664525         ; As per NR
RandC   DD      1013904223      ; As per NR

OutputFile DB   'RandLGC.Out', 0

DATASEG ENDS

; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CODE    SEGMENT USE16 PUBLIC
        ASSUME  CS:CODE, DS:DATASEG, SS:STCKSEG

; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Initialize Random Number generator...
RInit:
        SCALL GTIME    ; Macro to call MS-DOS Time function.
        XOR     Word Ptr [Rand32],DX
        XOR     Word Ptr [Rand32+2],CX
        RET
; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
; Linear Congruential "Random" number Generator from Knuth/Numerical Recipies
;     IDUM = IDUM*IA + IC = MOD ( IDUM*IA + IC ), IM )
;
;   Because this will run on an HP200LX, use 16 bit math and discard
; the high order results.
; 30 January 2001, SRN
; 8 July 2004, tidy up...    [Rand32+0]*[RandA+0] => [I1][LOR]
;                            [Rand32+0]*[RandA+2] => [I3][I2]
;                            [Rand32+2]*[RandA+2] => [I5][I4]
; 9 July 2004, conform to Numerical Recipes values

RandLCG:
; Multiply part
        MOV     AX,WORD PTR [Rand32]
        MOV     BX,AX
        MUL     WORD PTR [RandA]; Low order words => DX = I1, AX = LOR
        PUSH    AX              ; save low order result
        MOV     CX,DX           ; save intermediate result 1
        MOV     AX,BX
        MUL     WORD PTR [RandA+2]; Low & Medium words => DX = I3, AX = I2
        MOV     BX,AX           ; save intermediate result 2
        MOV     AX,WORD PTR [Rand32+2]
        MUL     WORD PTR [RandA]; Medium & Low order words => DX = I5, AX = I4
; Accumulate part
        ADD     BX,AX           ; I2 + I4
        ADD     BX,CX           ; I2 + I4 + I1  ; 9 July 2004 (was ADC)
        POP     AX              ; LOR

; Add part
        ADD     AX,WORD PTR [RandC]
        ADC     BX,WORD PTR [RandC+2]
        MOV     WORD PTR [Rand32],AX
        MOV     WORD PTR [Rand32+2],BX

        RET

; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CODE    ENDS