News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Multiplication using shift/add 32-bit

Started by delhibelly, June 28, 2011, 12:06:03 AM

Previous topic - Next topic

delhibelly

Hi, I am currently working on writing a prog that does 32-by-32bit multiplication using the shift/add algorithm. I managed to get a 16-by-16bit multiplication working using DX-AX to store the result and storing the multiplicand in BX register.
But can't seem to be able to extend that to 32bit version. If someone can point me out in the right direction, that would be great. I'm using x86. Thanks

clive

Post the code you have for the 16x16-bit version and where you've got with the 32x32-bit.

Are you trying to implement it with the 16-bit instruction set, or can you use the 32-bit instructions/registers of the 80386?

You could also probably optimize by having a 32x16 version, or switching the multiplier/multiplicand.

For non-homework assignments it's easier just to use the multiply instructions.
It could be a random act of randomness. Those happen a lot as well.

dedndave

... and faster   :P

welcome to the forum   :U

delhibelly

well the assignment is to use shift/add to multiply two 32-bit numbers. I can only use the 16-bit registers in my emulator.

Well here is the code I have so far, I know it's not the best way to do things but I'm kinda new to assembly and did the best I could.

mov si, offset A
mov di, offset B
    mov bx, offset product         
    mov dx, 0
    mov ax, 0
    mov bx, ds:di
    mov cx, 0
   
    BEGIN:   
        push bx
        and bx, 1h
        cmp bx, 0
        JE SHIFT_IT
           
            add dx, word ptr ds:si
           
        SHIFT_IT:
            pop bx
            push dx
            and dx, 01h
            cmp dx, 0
            JE DX_ZERO
           
            shr ax, 1
            or ax, 8000h
            pop dx
            shr dx, 1
            shr bx, 1
            inc cx
           
            JMP BEGIN
           
            DX_ZERO:
           
            shr ax, 1
            pop dx
            shr dx, 1
            shr bx, 1
            inc cx
           
            cmp bx, 0h
            JE DONE_PROD
           
            JMP BEGIN
           
DONE_PROD:
    mov temp, 10h
    sub temp, cx
    mov cx, temp
           
    LOOP_SET:
        push dx
        and dx, 1h
        cmp dx, 0h
        JE NO_ONE
            shr ax, 1
            or ax, 8000h
            pop dx
            shr dx, 1
            JMP END_LOOP
           
        NO_ONE:
            shr ax, 1
            pop dx
            shr dx, 1
           
        END_LOOP:
        loop LOOP_SET
           
so at the end of all this the product is stored in DX-AX and i store them in the memory allocated for the product. I'm lost as to how I can extend this to 32bits. I'm using up all my registers here. If I do store the multiplicand in DX-AX and multiplier in BX-CX then where do i temporarily store my 64bit product? That is my main concern as to how I can extend this to 32 bits. This code works a charm for 16bit multiplication.

dedndave

    mov bx, ds:di
            add dx, word ptr ds:si
try replacing those lines with
    mov bx,[di]
            add dx,[si]
the brackets tell the assembler you want the word at the address that is in DI or SI
the DS: segment override is not required, as DS is the default segment register for DI and SI in this addressing mode
WORD PTR is not required, because DX is a word size register - it can be no other size

that looks like a lot of code for a relatively simple operation   :P
(i have only glanced through it briefly)
although, you are out of registers quick because the product is 64 bits
good effort on your part   :U

delhibelly

Quote from: dedndave on June 28, 2011, 03:34:24 AM
    mov bx, ds:di
            add dx, word ptr ds:si
try replacing those lines with
    mov bx,[di]
            add dx,[si]
the brackets tell the assembler you want the word at the address that is in DI or SI
the DS: segment override is not required, as DS is the default segment register for DI and SI in this addressing mode
WORD PTR is not required, because DX is a word size register - it can be no other size

that looks like a lot of code for a relatively simple operation   :P
(i have only glanced through it briefly)
although, you are out of registers quick because the product is 64 bits
good effort on your part   :U

thanks for your help, I knew I had some redundant stuff that could be replaced with something faster or more efficient. I guess I could try making it more efficient. thing is I need to extend this to 32-by-32bit, any suggestions on how that can be achieved. whether it be pushing or storing results in temporary variables?

hutch--

Here is a really crude ADD version.  :bg

Note that if it exceeds DWORD range that it just rolls over EAX and keeps adding.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    slowmul PROTO :DWORD,:DWORD

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL var   :DWORD

    mov var, 1024*1024

    invoke slowmul,var,1024
    print ustr$(eax),13,10

    invoke slowmul,1024,var
    print ustr$(eax),13,10


    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

slowmul proc arg1:DWORD,arg2:DWORD

    xor eax, eax
    mov ecx, arg1

    .if ecx > arg2
      mov ecx, arg1
      mov edx, arg2
    .else
      mov edx, arg1
      mov ecx, arg2
    .endif

  lbl0:
    add eax, ecx
    sub edx, 1
    jnz lbl0

    ret

slowmul endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

well - the product needs to be 64 bits, as i said
because you are limited to 16-bit registers, i would use the product variable as the accumulator
then, use 4 general registers as the shifter (which is the multiplicand or addend, depending how you look at it)
this requires that you zero out the product before starting

dedndave

#8
this isn't tested and may have bugs - lol
also, it may not be the fastest
but, it gives you an idea of where to start
        .MODEL  Small
        .STACK  4096

;-------------------------------------------------------------

        .DATA

A       dd      12345678h
B       dd      23456789h

;------------------------

        .DATA?

product dw      4 dup(?)
count   dw      ?

;-------------------------------------------------------------

        .CODE
        ASSUME  DS:DGROUP

_main   PROC

        mov     ax,@DATA
        mov     ds,ax

        mov     si,offset A
        mov     di,offset B
        mov     bx,offset product
        call    LongMul

        mov     ax,4C00h
        int     21h

_main   ENDP

;-------------------------------------------------------------

LongMul PROC

        push    si
        push    di
        push    bp
        xor     ax,ax               ;zero AX register

;zero out the product

        mov     [bx],ax
        mov     [bx+2],ax
        mov     [bx+4],ax
        mov     [bx+6],ax

;load the multiplicand into DX:AX:DI:CX

        mov     dx,ax
        mov     cx,[di]
        mov     di,[di+2]

;test bit in BP and initialize count

        mov     bp,1
        mov     count,2

top_of_loop:
        test    bp,[si]
        jz      next_bit

        add     [bx],cx
        adc     [bx+2],di
        adc     [bx+4],ax
        adc     [bx+6],dx

next_bit:
        shl     cx,1
        rcl     di,1
        rcl     ax,1
        rcl     dx,1
        rol     bp,1
        jnc     top_of_loop

        add     si,2
        dec     count
        jnz     top_of_loop

        pop     bp
        pop     di
        pop     si
        ret

LongMul ENDP

;-------------------------------------------------------------

        END     _main

hutch--

#9
From the description of the original assignment, the term Russian Peasant Multiply comes to mind as it uses and ADD / div by 2 (shr reg, 1) technique and from memory its no big deal to do on a computer.

Somewhat later: This is a simple example of peasant multiply using SHIFT ADD. Not exhaustively tested but appears to be generating the correct numbers.

This is the URL for the method used. http://mathforum.org/dr.math/faq/faq.peasant.html


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    peasant_multiply PROTO :DWORD,:DWORD

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    invoke peasant_multiply,65536,16
    print ustr$(eax),13,10

    invoke peasant_multiply,9,9
    print ustr$(eax),13,10

    invoke peasant_multiply,1024,16
    print ustr$(eax),13,10

    invoke peasant_multiply,13,59
    print ustr$(eax),13,10

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

peasant_multiply proc arg1:DWORD,arg2:DWORD

    mov ecx, arg1           ; load 1st arg into ECX
    mov edx, arg2           ; load 2nd arg into EDX
    xor eax, eax            ; zero EAX
    jmp testit              ; jump to test if arg2 is ODD or EVEN

  lbl0:
    add ecx, ecx            ; double arg1
    shr edx, 1              ; div arg2 by 2
    cmp edx, 1              ; compare it to 1
    jbe lbl1                ; exit loop if below or equal

  testit:
    test edx, 00000000000000000000000000000001b
    je lbl0                 ; jump back if its even

    add eax, ecx            ; accumulate ECX in EAX if EDX is odd
    jmp lbl0

  lbl1:
    add eax, ecx            ; add last ECX to EAX and exit

    ret

peasant_multiply endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

Quote from: delhibelly on June 28, 2011, 03:18:10 AM
well the assignment is to use shift/add to multiply two 32-bit numbers. I can only use the 16-bit registers in my emulator.

delhibelly

Wow, this is all great. Thanks a lot for your help. I'll look into what you guys said and try implementing something. My assignment is due tomorrow, I'll let you guys know what happens. Thanks a again.

dedndave

i wanted a 64-bit to ASCII decimal routine in 16-bit, so i added it to the program

see attached...

MichaelW

Implementing the algorithm described here, for a 64-bit result from two 32-bit multiplicands, you can use BSR to find the bit position of the leftmost 1 in the first multiplicand, SHLD to shift a 64-bit working copy of the second multiplicand into position, do a 64-bit addition of the shifted copy with the result using ADD followed by ADC, and use BTR to clear the leftmost 1 in the first multiplicand. Note that the 64-bit working copy is simply the second multiplicand zero-extended into the high-order DWORD, and it must be reloaded from the second multiplicand for each loop, discarding the previously shifted value. In my test the algorithm duplicated the results of MUL over almost the entire 32-bit range (actually 0 to 4294901760), and ran in ~150 cycles on a P3.
eschew obfuscation

delhibelly

Quote from: dedndave on June 28, 2011, 07:08:23 PM
i wanted a 64-bit to ASCII decimal routine in 16-bit, so i added it to the program

see attached...

thanks. understood most of the code, works fine for me. Still working on my code though and think I am almost there.