News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

[Q]string processing

Started by Jeff, June 26, 2005, 03:28:52 AM

Previous topic - Next topic

Jeff

regarding processing strings (character by character), i was wondering which of these would be more efficient (speed- and/or size-wise): using a reference for comparison or loading to a register first.MOV esi,pString
.WHILE BYTE PTR [esi] != 0
    .WHILE BYTE PTR [esi] == ' '    ;skip leading spaces
        INC esi
    .ENDW
    .IF BYTE PTR [esi] == '"'
        ;do stuff
    .ELSEIF BYTE PTR [esi] != 0
        ;do more stuff
    .ELSE
        .BREAK
    .ENDIF
    INC esi
.ENDW

VS
MOV esi,pString
CLD
LODSB                   ;load character into al
.WHILE al != 0
    .WHILE al == ' '    ;skip leading spaces
        LODSB
    .ENDW
    .IF al == '"'
        ;do stuff
    .ELSEIF al != 0
        ;do more stuff
    .ELSE
        .BREAK
    .ENDIF
    LODSB
.ENDW

on my first draft, i used the first but in an attempt to reduce the code and use the string primative mnemonics, i came up with the second since i was pretty much copying a string into a buffer.  im planning on doing a third and would like to know which approach i should take or take a bit of both (as i am about to do).

hutch--

Jeff,

Almost exclusively an incremented pointer style algo is faster than the old string instructions. LODSB runs reasonable with REP but is slow on most modern hardware without it. In many instances you can do the task wih less registers as well but on the other end, if its not really speed critical and a simple algo, you get smaller code using the old string instructions.

An algo of the type you are writing is usually faster without the higher level constructs and reasonably simple to code as well. If you have an algo in mind and don't mind having it beaten to death, post a working version in the Laboratory and it will usually get some interesting variations.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jeff

thanks hutch, i am not following any algorithms in particular, but rather "making it up as i go".  im not aiming for "the fastest algorithm" nor the smallest, just trying to force myself into writing my code more efficiently: first a working one that makes the most logical sense, then improve on it a bit, then cut down on unnecessities.  good practice IMO.  the main reason i use the high level constructs is because i dont like making up label names and try to use the @@: wherever i can.  i should probably grow out of that one soon.  :)  (i dont remember who said it but i agree, we need a plain smiley face  :green)

i dont have my first draft around since i had to make drastic changes to get to the second and just tossed it.  when i was coming up with my second draft, i was trying to keep in mind that using registers vs memory locations (and references) usually means a byte or two saved in code.  here's my second and third drafts respectively.

second draftOPTION EXPR32
OPTION CASEMAP:NONE

.386
.MODEL FLAT

GetCommandLineA PROTO STDCALL
ExitProcess PROTO STDCALL, :DWORD

.DATA
PUBLIC argc,argv
PUBLIC argv0,argv1,argv2,argv3,argv4,argv5,argv6,argv7,argv8,argv9
argc DWORD 0
argv DWORD OFFSET argv0
    argv0 DWORD 0
    argv1 DWORD 0
    argv2 DWORD 0
    argv3 DWORD 0
    argv4 DWORD 0
    argv5 DWORD 0
    argv6 DWORD 0
    argv7 DWORD 0
    argv8 DWORD 0
    argv9 DWORD 0
buffer BYTE 208 DUP(0)

.CODE
_Start:
    CALL _GetArgs
    XOR eax,eax
    PUSH eax
    CALL ExitProcess

_GetArgs:
    CALL GetCommandLineA
    MOV esi,eax
    MOV edi,OFFSET buffer
    MOV ebx,argv
    CLD
    XOR eax,eax
    LODSB
    .WHILE al != 0
        .WHILE al == ' '                ;skip all spaces between arguments
            LODSB
        .ENDW
        .IF al == '"'                   ;"long" argument
@@:         LODSB
            .IF al == 0
                RET
            .ENDIF
            MOV DWORD PTR [ebx],edi
            INC argc
            .WHILE al != '"'
                .IF al == 0
                    RET
                .ENDIF
                STOSB
                LODSB
            .ENDW
        .ELSEIF al != 0                 ;"short" argument
            MOV DWORD PTR [ebx],edi
            INC argc
            STOSB
            LODSB
            .WHILE al != ' '
                .IF al == 0
                    RET
                .ENDIF
                .IF al == '"'           ;premature start of a "long" argument
                    INC edi
                    ADD ebx,4
                    JMP @B
                .ENDIF
                STOSB
                LODSB
            .ENDW
        .ELSE
            RET
        .ENDIF
        INC edi
        LODSB
        ADD ebx,4
    .ENDW
    RET

END _Start


third draftOPTION EXPR32
OPTION CASEMAP:NONE

.386
.MODEL FLAT

GetCommandLineA PROTO STDCALL
ExitProcess PROTO STDCALL, :DWORD

.DATA
PUBLIC argc,argv
PUBLIC argv0,argv1,argv2,argv3,argv4,argv5,argv6,argv7,argv8,argv9
argc DWORD 0
argv DWORD OFFSET argv0
    argv0 DWORD 0
    argv1 DWORD 0
    argv2 DWORD 0
    argv3 DWORD 0
    argv4 DWORD 0
    argv5 DWORD 0
    argv6 DWORD 0
    argv7 DWORD 0
    argv8 DWORD 0
    argv9 DWORD 0
buffer BYTE 208 DUP(0)

.CODE
_Start:
    CALL GetCommandLineA
    MOV esi,eax
    MOV edi,OFFSET buffer
    MOV ebx,argv
    CLD
    .WHILE BYTE PTR [esi] != 0
        .WHILE BYTE PTR [esi] == ' '        ;skip all spaces between arguments
            INC esi
        .ENDW
        .IF BYTE PTR [esi] == '"'           ;"long" argument
@@:         INC esi
            CMP BYTE PTR [esi],0
            JE @F
            MOV DWORD PTR [ebx],edi
            INC argc
            .WHILE BYTE PTR [esi] != '"'
                CMP BYTE PTR [esi],0
                JE @F
                MOVSB
            .ENDW
        .ELSEIF BYTE PTR [esi] != 0         ;"short" argument
            MOV DWORD PTR [ebx],edi
            INC argc
            .WHILE BYTE PTR [esi] != ' '
                CMP BYTE PTR [esi],0
                JE @F
                .IF BYTE PTR [esi] == '"'   ;premature start of a "long" argument
                    INC edi
                    ADD ebx,4
                    JMP @B
                .ENDIF
                MOVSB
            .ENDW
        .ELSE
            JMP @F
        .ENDIF
        INC esi
        INC edi
        ADD ebx,4
    .ENDW
@@: XOR eax,eax
    PUSH eax
    CALL ExitProcess

END _Start

hutch--

Jeff,

Here is a suggestion for cleaning up the lead of a string.


    mov esi, lpstr
    sub esi, 1

  @@:
    add esi, 1
    cmp BYTE PTR [esi], 32  ; normal space
    je @B
    cmp BYTE PTR [esi], 9   ; tab
    je @B


By only jumping back on a match you don't have to test for zero and you can test for a TAB as well so you clean up any mess at the beginning of a string.

The gradient range of smilies are from cheeky to stroppy.

1. :P friendly smile with tongue sticking out.
2. :bg big cheesy grin.
3. :toothy cheesy grin with teeth.
4. :green cheesy grin with teeth turning green.
5. :green2 emphasised cheesy grin with teeth and turning green.
6.:cheekygreen: context dependent green cheesy grin with congratulations.
7. :lol smilie with a touch of sarcasm.
8. :bdg smilie with a LARGE touch of sarcasm.



Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Jeff

Quote from: hutch-- on June 26, 2005, 06:14:40 AM
Jeff,

Here is a suggestion for cleaning up the lead of a string.


mov esi, lpstr
sub esi, 1

@@:
add esi, 1
cmp BYTE PTR [esi], 32 ; normal space
je @B
cmp BYTE PTR [esi], 9 ; tab
je @B


By only jumping back on a match you don't have to test for zero and you can test for a TAB as well so you clean up any mess at the beginning of a string.
should i really be concerned about finding tab characters?  the procedure will only be reading characters at the command line and apparently, tabs cannot be entered (unless there was a ALT+NumPad combination for tab but it think its safe to ignore this case).

hutch--

Jeff,

I just tested CMD.EXE in win2k and it takes tabs with no problem. Can be entered before or after a command or to seperate options.

If you are sure that the user will never use tabs, you can remove,


cmp BYTE PTR [esi], 9 ; tab
je @B
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

MichaelW

I also have no problem entering tabs under Windows 2000. Even if you could not actually enter them with the keyboard, you would still need to allow for input redirection.

eschew obfuscation

Jeff

ok it must be XP then.  tab on a blank line cycles through all available file/folder names.  i'll add the tab character for the sake of completedness.

i'll produce a fourth draft without using the high level constructs to see how that goes although, what i come up with will most likely look like what is generated in the listing file.

Jeff

ok, just about finished with this (and no comments :P).  without the stack frame.  man, that made things very complicated but pulled through.  :)

in a flat memory model, the .DATA and .FARDATA still work the same yes?  i only used .FARDATA because i could give it a "name" and so far, the program didnt try to throw it back up.

OPTION EXPR32
OPTION CASEMAP:NONE
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

.386
.MODEL FLAT

INCLUDELIB Kernel32.lib

GetCommandLineA PROTO STDCALL

.FARDATA CommandLineToArgvA_DATA
args DWORD 20h DUP(0)
buff BYTE 1800h DUP(0)

.CODE CommandLineToArgvA_TEXT
CommandLineToArgvA PROC STDCALL, pArgv:DWORD
    PUSH ecx
    PUSH edx
    PUSH esi
    PUSH edi
    CALL GetCommandLineA
    MOV esi,eax
    MOV edi,OFFSET buff
    MOV edx,OFFSET args
    MOV eax,DWORD PTR [esp+20]
    MOV DWORD PTR [eax],edx
    XOR eax,eax
    XOR ecx,ecx
    CLD
    GetArg:
    CMP BYTE PTR [esi],0
    JE Done
        JMP NoWhiteSpace
        WhiteSpace:
            INC esi
        NoWhiteSpace:
        CMP BYTE PTR [esi],' '
        JE WhiteSpace
        CMP BYTE PTR [esi],9
        JE WhiteSpace
        CMP BYTE PTR [esi],0
        JE Done
        CMP BYTE PTR [esi],'"'
        JNE ShortArg
        LongArg:
            MOV DWORD PTR [edx],edi
            INC ecx
            INC esi
            LongArgStart:
            CMP BYTE PTR [esi],'"'
            JE LongArgEnd
                CMP BYTE PTR [esi],0
                JE Done
                MOVSB
            JMP LongArgStart
            LongArgEnd:
        JMP NextArg
        ShortArg:
            MOV DWORD PTR [edx],edi
            INC ecx
            ShortArgStart:
            CMP BYTE PTR [esi],' '
            JE ShortArgEnd
                CMP BYTE PTR [esi],0
                JE Done
                CMP BYTE PTR [esi],'"'
                JNE StillShort
                    STOSB
                    ADD edx,4
                JMP LongArg
                StillShort:
                MOVSB
            JMP ShortArgStart
            ShortArgEnd:
        NextArg:
        INC esi
        STOSB
        ADD edx,4
    JMP GetArg
    Done:
    MOV eax,ecx
    POP edi
    POP esi
    POP edx
    POP ecx
    RET 4
CommandLineToArgvA ENDP
END