News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

A brief history

Started by Phil, June 09, 2005, 05:54:36 AM

Previous topic - Next topic

Phil

  It was back in March or April that I started working on re-building some of the assembly and systems level skills that I've had on the back burner for several years now. I've always liked numbers and I've always understood that they can become quite large although I've never really grasped the true meaning of things like +/- infinity. At any rate, I started working on multiple-precision routines to see how large I could get the Fibonacci series to grow on the little PC I was working with at the time. Well, I was really proud because I had written a dos based console app that could write the first 150000 terms into the Bit-Bucket, the nul device that is, in about 3 hours and a few odd minutes.
  Enter Ed Beroset ... I searched on the net for someone who seemed to know something about this ole guy Fibonacci and the other fellow named Lucas and I found Ed! Turns out that he even knew about some fancy tricks with log's that could be used to figure out how many digits a particular Fibonacci term would have. So, I sent him a copy of my 'fine' program and asked him for comments. He was wonderful and in a few days sent me back a tiny program that did the same thing I was doing in about 3 minutes! It was written using 16-bit code where I had used 32-bits and what I had imagined would be pretty quick code! Ah, it was pretty and I don't even want to think about how many weeks it took me to get that first working model tuned to the point where it didn't take a day to do what I was trying to accomplish!
  Well, I certainly learned a lot from his code and I changed my ways! Instead of using binary arithmetic and going thru the drudgery of converting all those bits back into base 10 digits I decided to give BCD a try in 32-bit mode. Ed had used unpacked BCD and the 16-bit instructions and his program flew! I searched around on the net and found an algorithm that I had seen implemented in hardware that did the same thing ... it was amazing and I've been trying to finish my program ever since I found it and actually got it working!
  Now, I'm near the limit of what I can do in 32-bit ALU mode with instructions that still work on those ancient 386's. How close to that limit, I really don't know. When I get there I'll probably know. The algorithm I'm using is reasonable when I'm actually writing every term. The conversion cost, even when working with numbers stored using a base of 1,000,000,000 so you can easily convert 9 digits at a time, is more costly than doing the math in BCD as you add each term. At least, that's what I've experienced. With help and suggestions from many of you I can now produce the first 150000 terms in just under 10 seconds when dumping them into the Bit-Bucket. I've also verified that the output is actually correct by using md5sums to validate the huge output files. The time to write the files is variable and much more dependent on the state of the system and file system so that's why I use the Bit-Bucket for my standard elapsed time test.
  Finally, enter MichaelW and his fantastic timers.asm ... I've built a little test here that produces the first 300 terms of the Fibonacci series and displays the last. Even created an ad-hoc bcd$ macro that works with the print macro like hex$ and chr$. I've really been learning quite a bit ... but I'm *still* not quite doing it *right* ... I know because it just doesn't feel right yet!
  If anyone would like to have a look and help me refine this algorithm I would certainly be grateful. I know the world doesn't need yet another Fibonacci program but, for me it's the experience of making it better incrementally that's been very benificial for me. I am probably close to my own limits of being able to see new ways to approach further improvements ... at least, without suggestions or pointers. Anyway, here's the algorithm in its current state.
;****************************************************************************
; BCD addition algorithm courtesy Douglas W. Jones at University Of Iowa
; Details available at http://www.cs.uiowa.edu/~jones/bcd/bcd.html#packed
; This is my implementation of his posted C algorithm.
;****************************************************************************

; Quick and dirty implementation ... uses globals and preserves nothing!

; Called with:
;   esi -> valid BCD source operands
;   edi -> valid BCD destination operands
;   wksize = number of dwords used in both source and destination
;   ddsize = maximum number of dwords available for each
; Returns with:
;   destination = source + destination
;   wksize incremented if another dword was needed to store final carry
;   carry clear if no errors occured
;   carry flag set on error
;   all registers destroyed


OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

        align   16

AddBCD  proc C

        mov     eax,wksize              ; Number of dwords currently used
        mov     ecx,eax
        shl     eax,2
        neg     ecx                     ; Loop counts up from minus wksize
        add     esi,eax                 ; Point 1 dword past most significant
        add     edi,eax                 ;   source and destination
nocry:  mov     eax,[esi+4*ecx]         ; Start with no carry-in for least
        mov     ebx,[edi+4*ecx]         ;  significant source and destination
        mov     edx,eax                 ;
        add     eax,66666666h           ; eax = source + excess_6
        jc      foul                    ; No carry for 6's + 9's, only F's
        add     eax,ebx                 ; eax = dst+src+excess_6
        jc      cry1                    ; Do carry-out if set
nocry1: xor     edx,ebx
        xor     edx,eax
        not     edx
        and     edx,11111110h           ; Leave a 1 for every BCD digit
        mov     ebx,edx                 ;  that did not produce a carry-out
        shr     edx,2
        shr     ebx,3
        or      edx,ebx
        or      edx,60000000h           ; Leave a 6 for each digit that did
        sub     eax,edx                 ;  not carry and remove excess_6
        mov     [edi+4*ecx],eax         ; Store valid result in destination
        add     ecx,1                   ; Count up to 0 and do each dword
        jnz     nocry                   ; Remember, no carry-in for the next
        jmp short ok

cryx:   add     eax,ebx                 ; Source was 99999999 + carry-in
        jmp short cry1

cry:    mov     eax,[esi+4*ecx]
        mov     ebx,[edi+4*ecx]
        mov     edx,eax
        add     eax,66666667h           ; Excess_6 + carry-in
        jc      cryx
        add     eax,ebx
        jnc     nocry1                  ; No carry-out or carry-in for next
cry1:   xor     edx,ebx
        xor     edx,eax
        not     edx
        and     edx,11111110h           ; Same as above except all digits
        mov     ebx,edx                 ;  can be handled with mask because
        shr     edx,2                   ;  the most significant digit DID
        shr     ebx,3                   ;  produce a carry and doesn't need
        or      edx,ebx                 ;  to be adjusted to remove excess_6
        sub     eax,edx
        mov     [edi+4*ecx],eax         ; Store valid result in destination
        add     ecx,1
        jnz     cry                     ; Do carry-in for next

        mov     eax,wksize              ; Send carry to the next dword
        cmp     eax,ddsize
        jae     foul                    ; Oops! Not enough dwords available
        add     wksize,1                ; Bump number of dwords being used
        mov dword ptr [esi],0           ; Clear next source
        mov dword ptr [edi],1           ; Store the carry and return

ok:     clc
        ret

foul:   stc
        ret

AddBCD  endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


And here are the results of 3 currently identical timing loops that are included in the zip.

C:\ASM\test>bcdtime
28826 AddBCD: 0222232244629420445529739893461909967206666939096499764990979600
28838 Test A: 0222232244629420445529739893461909967206666939096499764990979600
28813 Test B: 0222232244629420445529739893461909967206666939096499764990979600

Press enter to exit...


Test A and B are intened for anyone to modify and play with. Right now, they are identical to AddBCD. As you can see the timings won't be identical because I'm only looping 10000 times since the algorithm is so large. I'm looking for differences that make it up into the hundreds or thousands range ... should be possible and I'll keep working on it too. Just wanted to post it and give my little spiel about the wonders of BCD for very simplistic applications like this.



[attachment deleted by admin]