News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

ASM for FUN - #1 SUB

Started by frktons, April 18, 2010, 08:52:38 PM

Previous topic - Next topic

frktons

Hi everybody.

In my spare time I'm trying to learn some PB/ASM  stuff.

Here I started:
an old routine written by J. Fielden with ConASCII to display a text-based screen on a windows console.
The fraction of code I'm trying to optimize is:

---------------------------------------------------------------------------------------------------------------------
    FOR x = 1 to 4000
      sBuf = sBuf + MID$(MainStr, x, 1)+ CHR$(32)
    NEXT x
---------------------------------------------------------------------------------------------------------------------


When I saw this LOOP I remembered what an old ASM programmer once told me:
"why to use this kind of resource consuming instructions when you can do it in a much better way?"

The MID$ funcion is used a lot in strings manipulation, but it is not the most efficient way, i guess.
So I tried to add some speed to it and I rewrote it in this shape:


---------------------------------------------------------------------------------------------------------------------
    sBuf = SPACE$(8000)

    y = STRPTR(MainStr)
    y1 = STRPTR(sBuf)

    FOR x = 1 TO 8000 STEP 2
      POKE y1, PEEK(y)
      y = y + 1
      y1 = y1 + 2
    NEXT x
---------------------------------------------------------------------------------------------------------------------


This version is much faster then the previous one, but I think it is still not the best one.

PEEKing and POKEing single bytes in a 32 bit CPU is not the most efficient way of doing things.

Is anyone [more experienced  in PB/ASM stuff] able to suggest me a better way of doing this LOOP?

Thanks for your time  :8)

Frank
Mind is like a parachute. You know what to do in order to use it :-)

hutch--

Frank,

One thing that is inherantly slow in any dialect of basic is string concatenation as it does a re-allocate for each iteration. The trick is to allocate a buffer larger enough to hold the entire result then just write each byte to the buffer sequentially. Just note though that nothing makes a console display fast, about the best you can do is create the string in large enough pieces and point it at the console in larger blocks and the display will get faster.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

you can use display page switching to make it appear faster
get an undisplayed page set up with the changes, then - bang - switch display pages
then, update the previous page and use it the next time

jj2007

Quote from: hutch-- on April 19, 2010, 01:24:26 AM
One thing that is inherantly slow in any dialect of basic is string concatenation as it does a re-allocate for each iteration.

Hutch,

It's a known problem but not limited to Basic, see e.g. string concatenation extremely slow for Pascal, or StringBuilder vs StringBuffer vs String.concat for Java. The latter says "Creating a String with a length of 65536 (character by character) already takes about 22 seconds on an AMD64 X2 4200+". The same in PureMasmTM (below) takes 0.7 seconds on a P4....:

include \masm32\MasmBasic\MasmBasic.inc

.code
start:
Let esi="x"
push Timer()
For_ n=0 To 65535
Let esi=esi+"x"
Next
void Timer()
pop edx
sub eax, edx
; result: 718 ms on a P4
Inkey Str$("Your puter has constructed a 65k string in %i milliseconds", eax)
Exit
end start


... but that is still a lot of time. How would you speed that up, in the context of Let esi=esi+c$, i.e. in a situation where you a) have no idea how long the final string will be, b) don't know in advance how many strings will be appended, c) don't know in advance how long the string to be appended will be?

hutch--

JJ,

Whilew this is not much use to Frank, this is the method i use in basic to circumvent the concatenation slowdown inherant in the design of the language. I take your point that any language that uses an allocated string will get proresssively slower as the string gets longer, fundamental pascal strings with the descriptor at the front in a block of allocated memory.

the algo below is very adjustable in terms of initial buffer size and realloc interval, speed increases with the reduction in re-allocations but at the expense of using larger blocks of memory initially. basically start with a reasonably big buffer and it keeps reallocating until the appends stop.


#IF 0  ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

    ---------------------------------------

    buffer$ = ""
    ' buffer$ = nul$(1024*1024*128)

    addin$ = "12345678901234567890123456789012345678901234567890"+_
             "12345678901234567890123456789012345678901234567890" ' 100 characters

    cloc = 0    ' start at beginning of buffer
    lpcn = 0    ' zero the loop counter

    Do
      cloc = repappend(buffer$,addon$,cloc,1024*1024)
      ! add lpcn, 1
    Loop while lpcn < 1000000               ' 1 million appends at 100 bytes each

    buffer$ = left$(buffer$,cloc)           ' trim off extra memory

    ---------------------------------------

    NOTES

      The algorithm addresses a fundamental problem with sequential string appends in
      basic, the reallocation and copy of memory each time the memory is reallocated.

      This algorithm is at it fastest when the original buffer to receive the appended
      data is large enough to hold it all without any reallocation of memory. It becomes
      progressively slower as the buffer size is decreased as more reallocations must be
      done. This is adjustable with the algorithm in two (2) ways.

      1. The larger the initial buffer is, the less times the memory is reallocated
      2. The larger the extra buffer size is the less times the memory is reallocated

      When the append string operation is completed, trim off any excess memory using
      the last return value from the function using the LEFT$() standard basic string
      function.

#ENDIF ' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤

FUNCTION repappend(src$,addit$,ByVal cloc as DWORD,ByVal iextra as DWORD) as DWORD

    #REGISTER NONE

  ' ------------------------------------------------------
  ' src$        the main buffer to append extra data to.
  ' addit$      the byte data to append to the main buffer
  ' cloc        current location pointer
  ' iextra      size of increase to buffer, min 1 meg
  ' ------------------------------------------------------

    LOCAL pstr as DWORD

    pstr = StrPtr(src$)
    ! mov esi, pstr

    pstr = StrPtr(addit$)
    ! mov edi, pstr

    ! mov ebx, [esi-4]          ' stored buffer length
    ! mov eax, cloc
    ! add eax, [edi-4]          ' write location counter AND addit$ length to EAX

    ! cmp ebx, eax              ' test if combined length greater than source buffer size
    ! jg nxt1

    ! cmp iextra, 1024*1024     ' if realloc, check buffer size
    ! jge nxt0
    ! mov iextra, 1024*1024     ' set a minimum extra buffer size

  nxt0:
      extra$ = nul$(iextra)     ' allocate a buffer of "iextra" size
      src$ = src$ + extra$      ' realloc source string size
      pstr = StrPtr(src$)
    ! mov esi, pstr

  nxt1:
    ! mov ecx, 1
    ! or eax, -1
    ! add esi, cloc             ' add current location pointer for next copy position

  ' -----------
  ' unroll by 2
  ' -----------
  lbl0:
    ! add eax, ecx
    ! movzx edx, BYTE PTR [edi+eax]
    ! mov [esi+eax], dl
    ! test edx, edx
    ! jz lbl1                   ' exit on written terminator

    ! add eax, ecx
    ! movzx ebx, BYTE PTR [edi+eax]
    ! mov [esi+eax], bl
    ! test ebx, ebx
    ! jnz lbl0                  ' exit on written terminator

  lbl1:
    ! add eax, cloc             ' location
    ! mov FUNCTION, eax         ' return updated offset pointer

END FUNCTION

' ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on April 19, 2010, 09:28:50 AM
While this is not much use to Frank, this is the method i use in basic to circumvent the concatenation slowdown inherant in the design of the language. I take your point that any language that uses an allocated string will get proresssively slower as the string gets longer, fundamental pascal strings with the descriptor at the front in a block of allocated memory.

Hutch,
Thanks. You demonstrate once more that hand-made assembler beats High-Level Language ;-)

Here is another little exercise: Destination needs itself as source. It works, but of course it cannot be very speedy for high loop counts - either because of reallocs, or because you would need to create a copy of the source before finishing the job.

include \masm32\MasmBasic\MasmBasic.inc

.code
start:
push Timer()
Let esi=" "
For_ n=1 To 20000
Let esi=Str$(n)+" "+esi+"A"
Next
void Timer()
pop edx
sub eax, edx
Inkey Str$("%i ms for ", eax), Left$(esi, 30), " ... ", Right$(esi, 10)
; P4: 1031 ms for 20000 19999 19998 19997 19996  ... AAAAAAAAAA
Exit
end start


Just out of curiosity: How many ticks does that one take in PB?

frktons

OK guys, nice suggestions. :U
I'd like to progress just one step further.
The speed of output in a console is a different problem that I think was resolved in quite a good
way by Fielden in his original routine:

-----------------------------------------------------------------------------------------------------------------------
SUB GetScreen
    LOCAL lpReadRegion AS SMALL_RECT
    REGISTER x AS LONG
    LOCAL SIZE AS DWORD

    sBuf = ""
    SIZE = MAKDWD(80, 25)
    FOR x = 1 TO 4000
      sBuf = sBuf + MID$(MainStr, x, 1)+ " "
    NEXT x
    lpReadRegion.xRight = 79
    lpReadRegion.xBottom = 24
    WriteConsoleOutPut GetStdHandle(%STD_OUTPUT_HANDLE), _
    BYVAL STRPTR(sBuf),BYVAL SIZE, BYVAL 0&, lpReadRegion
END SUB
-----------------------------------------------------------------------------------------------------------------------


as you can see, the SUB is calling an API [WriteConsoleOutPut] passing it the whole
screen to be printed, and this is problably the fastest way of doing it, except for the trick of
printing it on an allocated but invisible console and after switching to it, as dedndave suggested, but
it is just for the impression it gives, not for the real speed. And it is consuming some extra CPU cycles as well
instead of sparing them [that is the first objective I have in mind].

I have focused just in the section of CODE that prepares the area of 8000 bytes[the entire 25*80 screen] for the windows
console, starting with a different format of 4000 bytes that was used in DOS CONSOLE.:

---------------------------------------------------------------------------------------------------------------------
    FOR x = 1 to 4000
      sBuf = sBuf + MID$(MainStr, x, 1)+ CHR$(32)
    NEXT x
---------------------------------------------------------------------------------------------------------------------


This part of the SUB looks like it can be improved a lot.

What Hutch suggests doing is what I realized in PB with POINTERS and PEEK/POKE instructions:

1) I allocated the space for the final string [8000 bytes], fortunately I already know how big it'll be.
2) I MOVed single bytes into it with POKE reading them with PEEK from the original string [4000 bytes]
putting a blank after each byte, or better preserving it skipping every other byte:


---------------------------------------------------------------------------------------------------------------------
    sBuf = SPACE$(8000)

    y = STRPTR(MainStr)
    y1 = STRPTR(sBuf)

    FOR x = 1 TO 8000 STEP 2
      POKE y1, PEEK(y)
      y = y + 1
      y1 = y1 + 2
    NEXT x
---------------------------------------------------------------------------------------------------------------------


But I was thinking about something a little bit faster:

Do you think it'll be faster if I read [MOV] four bytes from the original string into a 32 bits register, and after I MOV
the four single 8 bits registers to the addresses pointed by the pointer of the target string?

In my opinion doing it this way I only use 5 MOVs instead of 8 MOVs, and one of them is faster because it involves a 32 bit MOVs.  Instead of 8 slower 8 bits MOVs I use 4 slower 8 bits MOVs and a faster 32 bits MOV.
But this is only a supposition I'm doing, because of my limited knowledge of ASM.
Can you  please focus your answers about this simple point?
I'll ask more questions as soon as I get this point a little bit clearer. :8)

The SUB that jj2007 is proposing is something alike what I'm looking for, but it is for a general  string
building, a little bit different than the problem I'm facing with.  :bg
The same for the routine hutch is pointing at.

Thanks anyway for your patience and suggestions.

Frank



Mind is like a parachute. You know what to do in order to use it :-)

dedndave

hiya Frank
one thing i would mention is, the "DOS" console mode window is not perfect
it has bugs in it - some of them are hard to work around - as far as i know, all version of windows have the same bugs   :bg
the bugs can make it difficult to make a "professional" program for console mode windows
these bugs manifest themselves when the user starts playing with the mouse - most notably, the copy/paste functions
the OS tends to lose its screen position if the user decides to paste into the console window
there is no way to disable or trap these that i know of, even though the docs say you can
it can make a mess out of your display page and, if you try to set the cursor position for output, it will not be where you think it should be
if you use direct writes to the screen buffer, you at least get the correct screen position
however, that does not eliminate the text that the user may have pasted into the window

some time ago, i was playing with this, trying to figure out a fix - kind of dropped it, though
this post has an attachment that can be used to demonstrate what i am talking about...
http://www.masm32.com/board/index.php?topic=11927.msg90799#msg90799
run that program, then start playing with copy/paste and see what the results are   :dazzled:

frktons

Quote from: dedndave on April 19, 2010, 02:11:18 PM
hiya Frank
one thing i would mention is, the "DOS" console mode window is not perfect
it has bugs in it - some of them are hard to work around - as far as i know, all version of windows have the same bugs   :bg
the bugs can make it difficult to make a "professional" program for console mode windows
these bugs manifest themselves when the user starts playing with the mouse - most notably, the copy/paste functions
the OS tends to lose its screen position if the user decides to paste into the console window
there is no way to disable or trap these that i know of, even though the docs say you can
it can make a mess out of your display page and, if you try to set the cursor position for output, it will not be where you think it should be
if you use direct writes to the screen buffer, you at least get the correct screen position
however, that does not eliminate the text that the user may have pasted into the window

some time ago, i was playing with this, trying to figure out a fix - kind of dropped it, though
this post has an attachment that can be used to demonstrate what i am talking about...
http://www.masm32.com/board/index.php?topic=11927.msg90799#msg90799
run that program, then start playing with copy/paste and see what the results are   :dazzled:

Well, I heard about this problem and fortunately I'm not going to use the mouse for the next 3-5 years  :bg
I've chosen the windows console for learning some PB/ASM just to skip [for now], the GUI weight  ::)
By the way your demo was retrieving the simple keys, not the combination with SHIFT/CTRL/ALT?
This was my impression giving it a try, and maybe it could be useful for something else as well.
I mean I'm just studing ASM, and that is ASM!
Thanks anyway and if you have any idea about my last post please let me know.

Frank
Mind is like a parachute. You know what to do in order to use it :-)

jj2007

Quote from: frktons on April 19, 2010, 01:55:24 PM
But I was thinking about something a little bit faster:

Do you think it'll be faster if I read [MOV] four bytes from the original string into a 32 bits register, and after I MOV
the four single 8 bits registers to the addresses pointed by the pointer of the target string?


Try this one:
include \masm32\include\masm32rt.inc
; frktons 19.4.2010

.data
src db "123456789012345678901234567890"
tgt_desired db "1 2 3 4 5 6 7 8 9"
tgt db "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

.code
start: mov ebx, 5
mov esi, offset src
mov edi, offset tgt
.Repeat
mov eax, [esi]
mov [edi], al
mov [edi+2], ah
bswap eax
mov [edi+6], al
mov [edi+4], ah
add esi, 4
add edi, 8
dec ebx
.Until Zero?
print offset tgt
exit
end start

frktons

Quote from: jj2007 on April 19, 2010, 02:42:33 PM
Quote from: frktons on April 19, 2010, 01:55:24 PM
But I was thinking about something a little bit faster:

Do you think it'll be faster if I read [MOV] four bytes from the original string into a 32 bits register, and after I MOV
the four single 8 bits registers to the addresses pointed by the pointer of the target string?


Try this one:
include \masm32\include\masm32rt.inc
; frktons 19.4.2010

.data
src db "123456789012345678901234567890"
tgt_desired db "1 2 3 4 5 6 7 8 9"
tgt db "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

.code
start: mov ebx, 5
mov esi, offset src
mov edi, offset tgt
.Repeat
mov eax, [esi]
mov [edi], al
mov [edi+2], ah
bswap eax
mov [edi+6], al
mov [edi+4], ah
add esi, 4
add edi, 8
dec ebx
.Until Zero?
print offset tgt
exit
end start


Hi JJ. I compiled and run it, but I didn't see any print to tell me what happened, just a blank line and
the console prompt in the next. Did I forget something?
This is what the assembling says:


---------------------------------------------------------------------------------------------------------------------------------------
Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997.  All rights reserved.

Assembling: C:\masm32\examples\string19042010a.asm
Microsoft (R) Incremental Linker Version 5.12.8078
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.

Il volume nell'unità C è WIN7-64
Numero di serie del volume: 24E7-E258

Directory di C:\masm32\examples

19/04/2010  16:50               438 string19042010a.asm
19/04/2010  16:54             2.560 string19042010a.exe
19/04/2010  16:54             1.011 string19042010a.obj
               3 File          4.009 byte
               0 Directory  39.452.909.568 byte disponibili
Premere un tasto per continuare . . .
-----------------------------------------------------------------------------------------------------------------------------------------


And this is what happens when I run it:


-----------------------------------------------------------------------------------------------------------------------------------------
Microsoft Windows [Versione 6.1.7600]
Copyright (c) 2009 Microsoft Corporation. Tutti i diritti riservati.

Directory di C:\masm32\examples

19/04/2010  16:50             2.560 string19042010a.exe
               1 File          2.560 byte
               0 Directory  39.452.934.144 byte disponibili

C:\masm32\examples>string19042010a

C:\masm32\examples>
--------------------------------------------------------------------------------------------------------------------------------------


So something should be wrong. The MASM version? The Linker? or what?

Frank
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

i don't see a null string terminator in Jochen's code, but that shouldn't stop it from displaying something - lol
let me assemble it......

dedndave

ok - i replaced the tabs with spaces (my personal preference) and added a null terminator

C:\masm32\asm>frktons
1x2x3x4x5x6x7x8x9x0x1x2x3x4x5x6
C:\masm32\asm>               

see attached...

my guess is, we all have the masm32 package installed
it has includes, libs, and macros that you may not have in place

frktons

Quote from: dedndave on April 19, 2010, 03:23:21 PM
ok - i replaced the tabs with spaces (my personal preference) and added a null terminator

C:\masm32\asm>frktons
1x2x3x4x5x6x7x8x9x0x1x2x3x4x5x6
C:\masm32\asm>               

see attached...

my guess is, we all have the masm32 package installed
it has includes, libs, and macros that you may not have in place

So, missing the null terminator in the "tgt" string prevented the displaying of the string?
As for the "tabs" when I copied the source from the FORUM I didn't see any of them anymore:

include \masm32\include\masm32rt.inc
; frktons 19.4.2010

.data
src db "123456789012345678901234567890"
tgt_desired db "1 2 3 4 5 6 7 8 9"
tgt db "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

.code
start: mov ebx, 5
mov esi, offset src
mov edi, offset tgt
.Repeat
mov eax, [esi]
mov [edi], al
mov [edi+2], ah
bswap eax
mov [edi+6], al
mov [edi+4], ah
add esi, 4
add edi, 8
dec ebx
.Until Zero?
print offset tgt
exit
end start


And the package I have installed I guess is fine, as I assembled your version and it works fine.  :8)

So now we have a good starting point thanks to JJ and Dave for the combined effort.
Would you mind if we use it to move one step further?

I'll tell you what I can understand and you could tell me how far I am from the "truth"  :dazzled:

Here we have the first chunk of CODE:

src db "123456789012345678901234567890"
tgt_desired db "1 2 3 4 5 6 7 8 9"
tgt db "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",0

As far as I know we are declaring the source and the target strings [that was easy, wasn't it?]  :bg

And the next chunk says:

.code
start: mov ebx, 5
mov esi, offset src
mov edi, offset tgt

that should be: we are preparing a loop from 5 to zero using the 32 bit register ebx, than
we move the starting points of the source and target strings to 32 bit registers esi and edi.
first question: esi and edi are just retaining the pointers or the first 4 bytes of our strings?
I mean the offset src and offset tgt are moving pointers to registers or dword data of the strings from that point?

Enough for this post. ::)

Frank
Mind is like a parachute. You know what to do in order to use it :-)

dedndave

that's because the editor expands the tabs to spaces
i have always hated tabbed text - lol - i like to see what is actually there

include \masm32\include\masm32rt.inc
; frktons 19.4.2010

.data
src db "123456789012345678901234567890"
tgt_desired db "1 2 3 4 5 6 7 8 9"
tgt db "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",0

.code
start:  mov ebx, 5
        mov esi, offset src
        mov edi, offset tgt
        .Repeat
                mov eax, [esi]
                mov [edi], al
                mov [edi+2], ah
                bswap eax
                mov [edi+6], al
                mov [edi+4], ah
                add esi, 4
                add edi, 8
                dec ebx
        .Until Zero?
        print offset tgt
exit
end start

let me try it without the null....   :bg

btw - i use a program called "Swiss Army Knife" to de-tabify the file