Has anyone got recent code for dword to ascii, signed and unsigned ?

Started by hutch--, August 17, 2010, 04:40:43 PM

Previous topic - Next topic

FORTRANS

Hi,

   Did the elections go as you wanted?

Regards,

Steve N.

hutch--

 :bg

Nah,

Its a fiasco, looks like a hung parliament, neither side has won so far and the balance of power will be held by a collection of independents including 1 or 2 greens. The Greens have won the balance of power in the senate which was predictable.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

using the value 12345 as a test value is likely to result with a slower-than-optimal selection
i tried 7FFFFFFFh with dwtoa and it took 3 times as long
if you all optimize your routines for 12345, then select a routine based on those times.....   :red

jj2007

Quote from: dedndave on August 23, 2010, 02:26:48 PM
i tried 7FFFFFFFh with dwtoa and it took 3 times as long

Strange. For Str$() it's about 20% more, probably because the resulting string is longer; but 3 times as long?? ::)

dedndave

well - that was my result on a P4 - not representative of modern CPU's
but - that's not the point, really
the point is that you guys have been using a crappy method of selecting a routine
i just didn't want to say it that way   :lol
i suggest writing the testbed to take times from several values and find the typical
the same testbed should probably be used for optimizing

JJ - Z sends a K   :bg

hutch--

This post is mainly for Paul Dixon as it is a modified version of his conversion algo that Ian_B modified.

I put it into a test piece after doing minor mods so it took normal stack arguments and a remote bufffer address and did the exhaustive range test for unsigned DWORD values from 0 to 4 gig. The algo works on the 1st 3 gig correctly but fails in the last gig with a non numeric character in the result. I have tested it against the conversion in MSVCRT which runs the test by itself over the full range with no errors.


.................................................
3150499999 1st error
3150499999 MSVCRT
31505Φ9998 algo error result
Press any key to continue ...


This is the test piece.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    utoa_ex PROTO :DWORD,:DWORD

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL pbuf  :DWORD
    LOCAL buffer[32]:BYTE

    push ebx
    push esi
    push edi

    xor edi, edi

    xor esi, esi        ; counter
    mov esi, 3100500000

  stlp:
    mov pbuf, ptr$(buffer)
    invoke utoa_ex,esi,pbuf                 ; call the procedure

    fn szCmp,ustr$(esi),pbuf                ; compare its results to MSVCRT
    test eax, eax                           ; if identical string continue loop
    jnz @F
    print chr$(13,10)                       ; else display error and exit
    print ustr$(esi)," 1st error",13,10
    print ustr$(esi)," MSVCRT",13,10
    print pbuf," algo error result",13,10
    ret
  @@:

    add edi, 1
    cmp edi, 1000000
    jb nxt
    print "."
    xor edi, edi
  nxt:

    add esi, 1
    cmp esi, -1
    jne stlp

    pop edi
    pop esi
    pop ebx

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

    align 16

utoa_ex proc value:DWORD,buffer:DWORD

comment    * ------------------------------------------------------------
            Convert unsigned DWORD to ASCII string (no sign) by Paul Dixon
            with specific references/optimisations added by IanB.

            On entry:
                value  = value to convert
                buffer = address to write result
            On exit:
                EAX = address of result buffer
                buffer contains the zero terminated result

            Modified to pass arguments on the stack and to use an
            external buffer for the result - hutch
            ------------------------------------------------------------ *

        mov eax, [esp+4]

        push    ebx
        push    edi
        push    eax                 ; save absolute value of number
                                    ; call it 1234567890
        mov     ecx, 2814749768
        mov     edi, [esp+8+12]     ; buffer

        mul     ecx                 ; fast div by 100000
        mov     ecx, 100000         ; prepare to multiply up the top 5 digits
        shr     edx, 16             ; shift top 5 digits into place, EDX=12345
        mov     eax, edx            ; copy to eax to multiply up to real size again
        mov     ebx, edx            ; save a copy of top 5 digits
        mul     ecx                 ; EAX=1234500000
        sub     [esp], eax          ; sub from original number leaves 0000067890

        mov     eax, ebx            ; get back high digits
        mov     ecx, 429497         ; about to do div 10000 by reciprocal multiply
        mov     ebx, 10

        or      eax, eax            ; if top 5 digit = 0 then skip that part
        jz      SkipTop5

        mul     ecx                 ; div top 5 digits by 10000
        jc      digit1              ; if digit is not zero then process it
        mul     ebx                 ; else multiply by 10 to get next digit into EDX
        jc      digit2
        mul     ebx
        jc      digit3
        mul     ebx
        jc      digit4
        mul     ebx
        jc      digit5

SkipTop5:
        pop     eax                 ; retrieve lower 5 digits
        mul     ecx                 ; div 10000
        jc      digit6
        mul     ebx                 ; multiply by 10 to get next digit in EDX
        jc      digit7
        mul     ebx
        jc      digit8
        mul     ebx
        jc      digit9
        mul     ebx
        jmp     digit10

digit1:
        add     edx, 30h            ; top digit is left in EDX, convert to ascii
        mov     [edi], dl           ; store top digit
        mul     ebx                 ; multiply by 10 to get next digit into EDX
        add     edi, 1

digit2:
        add     edx, 30h
        mov     [edi], dl
        mul     ebx
        add     edi, 1

digit3:
        add     edx, 30h
        mov     [edi], dl
        mul     ebx
        add     edi, 1

digit4:
        add     edx, 30h
        mov     [edi], dl
        mul     ebx
        add     edi, 1

digit5:
        add     edx, 30h
        mov     [edi], dl

        pop     eax                 ; retrieve lower 5 digits
        mul     ecx                 ; div 10000
        add     edi, 1

digit6:
        add     edx, 30h
        mov     [edi], dl
        mul     ebx
        add     edi, 1

digit7:
        add     edx, 30h
        mov     [edi], dl
        mul     ebx
        add     edi, 1

digit8:
        add     edx, 30h
        mov     [edi], dl
        mul     ebx
        add     edi, 1

digit9:
        add     edx, 30h
        mov     [edi], dl
        mul     ebx
        add     edi, 1

digit10:
        add     edx, 30h
        lea     eax, [esp+8+12]     ; return buffer address
        mov     [edi], dx           ; last digit, store DX not DL to give a
                                    ; zero termination for the result string
        pop     edi
        pop     ebx

        ret 8

utoa_ex    endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤


end start
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

This is Paul's original algorithm converted from PowerBASIC to MASM and this one is correct across the full signed range. The test piece is only doing the negative range but the algo is sound and produces the correct results.


IF 0  ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
                      Build this template with "CONSOLE ASSEMBLE AND LINK"
ENDIF ; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    include \masm32\include\masm32rt.inc

    ltoa_ex PROTO :DWORD,:DWORD

    .code

start:
   
; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    call main
    inkey
    exit

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

main proc

    LOCAL pbuf  :DWORD
    LOCAL buffer[64]:BYTE

    push ebx
    push esi
    push edi

    mov pbuf, ptr$(buffer)

    mov esi, -1
    shr esi, 1

    print ustr$(esi),13,10


    xor ebx, ebx

  @@:
    mov edi, ustr$(esi)             ; MSVCRT
    invoke ltoa_ex,esi,pbuf         ; algo
    invoke szCmp,pbuf,edi           ; compare strings
    test eax, eax
    jnz forward
    print ustr$(esi),13,10          ; show error on count and exit
    ret

  forward:
    add ebx, 1
    cmp ebx, 10000000
    jl nxt
    print "."
    xor ebx, ebx

  nxt:
    sub esi, 1
    cmp esi, 0     ; full negative signed range
    jne @B

    pop edi
    pop esi
    pop ebx

    ret

main endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

ltoa_ex proc LongVar:DWORD,answer:DWORD

  ; --------------------------------------------------------------------------------
  ; this algorithm was written by Paul Dixon and has been converted to MASM notation
  ; --------------------------------------------------------------------------------

    push esi
    push edi

    mov eax, LongVar            ; get number
    mov ecx, answer             ; get pointer to answer string
    jmp over
     
    chartab:
      dd "00","10","20","30","40","50","60","70","80","90"
      dd "01","11","21","31","41","51","61","71","81","91"
      dd "02","12","22","32","42","52","62","72","82","92"
      dd "03","13","23","33","43","53","63","73","83","93"
      dd "04","14","24","34","44","54","64","74","84","94"
      dd "05","15","25","35","45","55","65","75","85","95"
      dd "06","16","26","36","46","56","66","76","86","96"
      dd "07","17","27","37","47","57","67","77","87","97"
      dd "08","18","28","38","48","58","68","78","88","98"
      dd "09","19","29","39","49","59","69","79","89","99"

  over:
    ; on entry eax=number to convert, ecx=pointer to answer buffer (minimum 12 bytes)
    ; on exit, eax,ecx,edx are undefined, all other registers are preserved.
    ; answer is in location pointed to by ecx on entry

  signed:
    ; do a signed DWORD to ASCII
    or eax,eax                          ; test sign
    jns udword                          ; if +ve, continue as for unsigned
    neg eax                             ; else, make number positive
    mov byte ptr [ecx],"-"              ; include the - sign
    inc ecx                             ; update the pointer

  udword:
    ; unsigned DWORD to ASCII
    mov esi,ecx                         ; get pointer to answer
    mov edi,eax                         ; save a copy of the number

    mov edx, 0D1B71759h                 ; =2^45\10000    13 bit extra shift
    mul edx                             ; gives 6 high digits in edx

    mov eax, 068DB9h                    ; =2^32\10000+1

    shr edx,13                          ; correct for multiplier offset used to give better accuracy
    jz short skiphighdigits             ; if zero then don't need to process the top 6 digits

    mov ecx,edx                         ; get a copy of high digits
    imul ecx,10000                      ; scale up high digits
    sub edi,ecx                         ; subtract high digits from original. EDI now = lower 4 digits

    mul edx                             ; get first 2 digits in edx
    mov ecx,100                         ; load ready for later

    jnc short next1                     ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  short ZeroSupressed             ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    inc esi                             ; update pointer by 1
    jmp short ZS1                       ; continue with pairs of digits to the end

  next1:
    mul ecx                             ; get next 2 digits
    jnc short next2                     ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  short ZS1a                      ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    inc esi                             ; update pointer by 1
    jmp short ZS2                       ; continue with pairs of digits to the end

  next2:
    mul ecx                             ; get next 2 digits
    jnc short next3                     ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  short ZS2a                      ; 2 digits, just continue with pairs of digits to the end
     
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    inc esi                             ; update pointer by 1
    jmp short ZS3                       ; continue with pairs of digits to the end

  next3:

  skiphighdigits:
    mov eax,edi                         ; get lower 4 digits

    mov ecx,100

    mov edx,28F5C29h                    ; 2^32\100 +1
    mul edx
    jnc short next4                     ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  short ZS3a                      ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    inc esi                             ; update pointer by 1
    jmp short  ZS4                      ; continue with pairs of digits to the end

    next4:
    mul ecx                             ; this is the last pair so don; t supress a single zero
    cmp edx,9                           ; 1 digit or 2?
    ja  short ZS4a                      ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    mov byte ptr [esi+1],0              ; zero terminate string

    jmp short  xit                      ; all done

  ZeroSupressed:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx
    add esi,2                           ; write them to answer

  ZS1:
    mul ecx                             ; get next 2 digits
    ZS1a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write them to answer
    add esi,2

  ZS2:
    mul ecx                             ; get next 2 digits
    ZS2a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write them to answer
    add esi,2

  ZS3:
    mov eax,edi                         ; get lower 4 digits
    mov edx,28F5C29h                    ; 2^32\100 +1
    mul edx                             ; edx= top pair
    ZS3a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write to answer
    add esi,2                           ; update pointer

  ZS4:
    mul ecx                             ; get final 2 digits
    ZS4a:
    mov edx,chartab[edx*4]              ; look them up
    mov [esi],dx                        ; write to answer

    mov byte ptr [esi+2],0              ; zero terminate string

  xit:
  sdwordend:

    pop edi
    pop esi

    ret

ltoa_ex endp

; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

end start
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

hutch--

This is the same algo from Paul Dixon but with the stack frame removed, table aligned and algo aligned, picked up about 10% improvement but against the old dwtoa() its about 3 times faster.


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

align 16

ltoa_ex proc LongVar:DWORD,answer:DWORD

  ; --------------------------------------------------------------------------------
  ; this algorithm was written by Paul Dixon and has been converted to MASM notation
  ; --------------------------------------------------------------------------------

    push esi
    push edi

    mov eax, [esp+4+8]          ; LongVar            ; get number
    mov ecx, [esp+8+8]          ; answer             ; get pointer to answer string
    jmp over

    align 16
    chartab:
      dd "00","10","20","30","40","50","60","70","80","90"
      dd "01","11","21","31","41","51","61","71","81","91"
      dd "02","12","22","32","42","52","62","72","82","92"
      dd "03","13","23","33","43","53","63","73","83","93"
      dd "04","14","24","34","44","54","64","74","84","94"
      dd "05","15","25","35","45","55","65","75","85","95"
      dd "06","16","26","36","46","56","66","76","86","96"
      dd "07","17","27","37","47","57","67","77","87","97"
      dd "08","18","28","38","48","58","68","78","88","98"
      dd "09","19","29","39","49","59","69","79","89","99"

  over:
    ; on entry eax=number to convert, ecx=pointer to answer buffer (minimum 12 bytes)
    ; on exit, eax,ecx,edx are undefined, all other registers are preserved.
    ; answer is in location pointed to by ecx on entry

  signed:
    ; do a signed DWORD to ASCII
    or eax,eax                          ; test sign
    jns udword                          ; if +ve, continue as for unsigned
    neg eax                             ; else, make number positive
    mov byte ptr [ecx],"-"              ; include the - sign
    add ecx, 1                          ; update the pointer

  udword:
    ; unsigned DWORD to ASCII
    mov esi,ecx                         ; get pointer to answer
    mov edi,eax                         ; save a copy of the number

    mov edx, 0D1B71759h                 ; =2^45\10000    13 bit extra shift
    mul edx                             ; gives 6 high digits in edx

    mov eax, 068DB9h                    ; =2^32\10000+1

    shr edx,13                          ; correct for multiplier offset used to give better accuracy
    jz skiphighdigits                   ; if zero then don't need to process the top 6 digits

    mov ecx,edx                         ; get a copy of high digits
    imul ecx,10000                      ; scale up high digits
    sub edi,ecx                         ; subtract high digits from original. EDI now = lower 4 digits

    mul edx                             ; get first 2 digits in edx
    mov ecx,100                         ; load ready for later

    jnc next1                           ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZeroSupressed                   ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp ZS1                             ; continue with pairs of digits to the end

  next1:
    mul ecx                             ; get next 2 digits
    jnc next2                           ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS1a                            ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp ZS2                             ; continue with pairs of digits to the end

  next2:
    mul ecx                             ; get next 2 digits
    jnc short next3                     ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS2a                            ; 2 digits, just continue with pairs of digits to the end
     
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp ZS3                             ; continue with pairs of digits to the end

  next3:

  skiphighdigits:
    mov eax,edi                         ; get lower 4 digits

    mov ecx,100

    mov edx,28F5C29h                    ; 2^32\100 +1
    mul edx
    jnc next4                           ; if zero, supress them by ignoring
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS3a                            ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    add esi, 1
    jmp  ZS4                            ; continue with pairs of digits to the end

    next4:
    mul ecx                             ; this is the last pair so don; t supress a single zero
    cmp edx,9                           ; 1 digit or 2?
    ja  ZS4a                            ; 2 digits, just continue with pairs of digits to the end

    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dh                        ; but only write the 1 we need, supress the leading zero
    mov byte ptr [esi+1],0              ; zero terminate string

    jmp  xit                            ; all done

  ZeroSupressed:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx
    add esi,2                           ; write them to answer

  ZS1:
    mul ecx                             ; get next 2 digits
    ZS1a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write them to answer
    add esi,2

  ZS2:
    mul ecx                             ; get next 2 digits
    ZS2a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write them to answer
    add esi,2

  ZS3:
    mov eax,edi                         ; get lower 4 digits
    mov edx,28F5C29h                    ; 2^32\100 +1
    mul edx                             ; edx= top pair
    ZS3a:
    mov edx,chartab[edx*4]              ; look up 2 digits
    mov [esi],dx                        ; write to answer
    add esi,2                           ; update pointer

  ZS4:
    mul ecx                             ; get final 2 digits
    ZS4a:
    mov edx,chartab[edx*4]              ; look them up
    mov [esi],dx                        ; write to answer

    mov byte ptr [esi+2],0              ; zero terminate string

  xit:
  sdwordend:

    pop edi
    pop esi

    ret 8

ltoa_ex endp

OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef


; ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

This is the same algo from Paul Dixon but with 2 times smaller table and better time... :lol.
.data
align 2
chartabL        dw "00","10","20","30","40","50","60","70","80","90"
                dw "01","11","21","31","41","51","61","71","81","91"
                dw "02","12","22","32","42","52","62","72","82","92"
                dw "03","13","23","33","43","53","63","73","83","93"
                dw "04","14","24","34","44","54","64","74","84","94"
                dw "05","15","25","35","45","55","65","75","85","95"
                dw "06","16","26","36","46","56","66","76","86","96"
                dw "07","17","27","37","47","57","67","77","87","97"
                dw "08","18","28","38","48","58","68","78","88","98"
                dw "09","19","29","39","49","59","69","79","89","99"
.code
OPTION PROLOGUE:None
OPTION EPILOGUE:None
align 16
ltoa_exLingo proc  LongVar:DWORD,answer:DWORD
mov    eax, [esp+1*4] ; eax->number
    mov    edx, 0D1B71759h
mov    ecx, [esp+2*4] ; ecx-> lpResult
        test    eax, eax                 
        jns    @f                         
        mov    byte ptr [ecx], "-"   
        inc    ecx
        neg    eax         
@@:
mov   [esp+1*4], edi    
mov   edi, eax
mov       [esp+2*4], esi    
mul   edx
shr   edx, 13
mov   eax, 68DB9h    
je   LoNext3    
imul   esi, edx, 10000
sub   edi, esi
mul   edx
mov   esi, 100
jnc   LoNext1
cmp   edx, 9
jc        Lo0
movzx   edx, word ptr chartabL[edx+edx]
add       ecx, 8     
mov       [ecx-8], edx
Lo1:
mul esi
Lo1a:
movzx edx, word ptr chartabL[edx+edx]
mov [ecx-6], edx
Lo2:
mul esi
Lo2a:
movzx edx, word ptr chartabL[edx+edx]
mov [ecx-4], edx
Lo3:
mov eax, 28F5C29h
mul edi
Lo3a:
movzx edx, word ptr chartabL[edx+edx]
mov [ecx-2], edx
Lo4:
mul esi
Lo4a:
movzx edx, word ptr chartabL[edx+edx]
pop eax
lea eax, [ecx+2]
pop edi
mov [ecx], edx
pop esi
jmp dword ptr [esp-3*4]
Lo0:
        add ecx, 7
add edx, 30h
mov [ecx-7], edx
jne Lo1
LoNext1:
mul esi
jnc @f
add ecx, 6
cmp edx, 9
ja Lo1a
add edx, 30h
sub ecx, 1
mov [ecx-5], edx
jnz Lo2
@@:
mul esi
jnc LoNext3
add ecx, 4
cmp edx, 9
ja Lo2a
add edx, 30h
sub ecx, 1
mov [ecx-3], edx
jnz Lo3
LoNext3:
mov eax, 28F5C29h
mul edi
mov esi, 100
jnc @f
add ecx, 2
cmp edx, 9
ja Lo3a
add edx, 30h
sub ecx, 1
mov [ecx-1], edx
jnz Lo4
@@:
mul esi
cmp edx, 9
ja Lo4a
add edx, 30h
pop eax
lea eax, [ecx+1]
pop edi
mov [ecx],  edx
pop esi
jmp dword ptr [esp-3*4]
ltoa_exLingo endp
OPTION PROLOGUE:PrologueDef
OPTION EPILOGUE:EpilogueDef

and results:


Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
10      cycles for ltoa_exLingo
12      cycles for ltoa_exHutch
13      cycles for ltoa_ex

10      cycles for ltoa_exLingo
12      cycles for ltoa_exHutch
14      cycles for ltoa_ex

10      cycles for ltoa_exLingo
12      cycles for ltoa_exHutch
13      cycles for ltoa_ex

10      cycles for ltoa_exLingo
12      cycles for ltoa_exHutch
13      cycles for ltoa_ex


--- ok ---


jj2007

Congrats, Lingo, that's roughly 45% faster :U

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
30      cycles for ltoa_exLingo
31      cycles for ltoa_exHutch
31      cycles for ltoa_ex

Antariy

Quote from: lingo on August 25, 2010, 09:11:04 PM
This is the same algo from Paul Dixon but with 2 times smaller table and better time... :lol.

Wow, GrandLamer lingo theif and "optimize" algos of other members? Impossible!  :bdg



Alex

dedndave

these guys with their LUT's take all the fun out of writing math functions - lol
(just kidding, Paul   :bg )
but, seriously, it seems like everything boils down to a collection of huge tables
and the fastest routine is the one that addresses the table faster - lol

maybe what we want is a routine that handles signed OR unsigned and left-justifies
gotta make some kind of challenge out of it   :P

oh - and.....
don't tell me we switched from a single test of 12345 to a single test of -1   :red

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
25      cycles for ltoa_exLingo
30      cycles for ltoa_exHutch
30      cycles for ltoa_ex

25      cycles for ltoa_exLingo
29      cycles for ltoa_exHutch
30      cycles for ltoa_ex

25      cycles for ltoa_exLingo
30      cycles for ltoa_exHutch
30      cycles for ltoa_ex

25      cycles for ltoa_exLingo
30      cycles for ltoa_exHutch
30      cycles for ltoa_ex


lingo

"Congrats, Lingo, that's roughly 45% faster"

For me it is enough so far, because P.Dixon is not so stupid like you and tubeteikin.
So, for tubeteikin I understand because it not easy to learn something from the aboriginal people
with archaic computers and living in the asian post communist ignorance and stupidity with 80% radical Muslims, etc...,
but you live in one of the most civilized country.... :lol

hutch--

I have my doubts over the validity of the test piece in this context, an algorithm of this type if it has a place over the shorter versions for ordinary usage is for streaming output of different numbers over the signed range and to test the algo over this range you need number that rannge from single characters in gradient up to 10 characters.

When I tested and timed the second version of Paul's algo I ran it for about 6 seconds over a wider numeric range, short test may deliver cute numbers that game the test piece but the test needs to match the usage.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

lingo

I agree, so feel free to change the test as you want. I just have no time for more...Sorry. :wink