News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Retrieving from String

Started by Bamizas, April 22, 2012, 06:08:57 PM

Previous topic - Next topic

Bamizas

Heya all,

I'm fairly new to assembly and would like a few pointers to help me out.

I am trying to make a program that can parse threw a string and retrieve the numbers from it. Here is an example :

"456 + 236"

I would like to scan the string for the operator + (which I can do at the moment), I just don't know how to extract the numbers from the string after.

In the end I want to do a calculator where I can write the equation.

Thx in advance!

Bami

NoCforMe

Here's one possible solution.

This is my preferred way of doing any parsing tasks. It might seem a little complicated for what you want to do, and may be overkill for this simple task, but it can be adapted to any number of situations where you need to interpret text given by a user, making it a very useful tool.

This method is called, technically speaking, using a finite state automaton (FSA). Don't let that highfalutin' term scare you; it just means, basically, "a machine with a finite number of states and a finite set of conditions governing moving between those states".

The first thing you have to do is sit down with pencil and paper and draw out a state diagram, like the one attached below. This shows a series of "states" (the circles with numbers inside them), and a bunch of other stuff. The other stuff is the "tokens" (the little boxes with labels), and some other states shown as letters inside boxes.



There's really no other way to start this, unless you're some kind of freaking genius (which I am not). You need to be able to visualize the flow of parsing.

State 0 is the starting state: this is where you are when you're looking at the beginning of your string ("456 + 236" in your example). At each state, you go to another state (or the same state) based on what you see. So at state 0, if you have leading spaces in your string (like "   456 + 236"), then when you see a space, you just go back to node 0, which basically does nothing. This lets you "scan off" leading spaces the user may have typed in by mistake.

When you finally see a numeric digit (the token marked "#"), then you advance to state A, where you set to store your first number. This puts you at node 1, where succeeding numeric digits are stored. If you see a space instead, you go to node 2. Node 2 scans off spaces and looks for a plus sign. And so on and so forth.

(After looking at this, I would make one simple change, which would be to allow for no space before the plus sign. This is really easy to do: just change row 1 of the parsing table (the 2nd row) to allow a move to node B for a plus sign. Shows how easy  it is to change the behavior of the FSA, instead of rewriting tons of code.)

Here's the code to implement this. The code at the beginning is just the "driver" for the FSA: get the next character, try to match it to the list of tokens (space, plus sign, EOL or end-of-line). If the token matches, then it sets up a jump to the proper node, using the current row index (EBX) and the offset from that index, or the "column". If it doesn't match, then if it's a numeric digit (0-9), go to the last column of the current row, which is the "#" column.


data

ParseTable LABEL DWORD
;   SP     +      EOL    #
DD Node0, NodeE, NodeE, NodeA ; row 0
DD Node2, NodeE, NodeE, Node1 ; row 1
DD Node2, NodeB, NodeE, NodeE ; row 2
DD Node3, NodeE, NodeE, NodeC ; row 2
DD NodeE, NodeE, NodeX, Node4 ; row 2

$EOL EQU 0

ParseChars DB ' ', '+', $EOL
$NumParseChars EQU $ - ParseChars

$Row1 EQU ($NumParseChars + 1) * 4 ;DWORD offset
$Row2 EQU $Row1 * 2
$Row3 EQU $Row1 * 3
$Row4 EQU $Row1 * 4

$numOffset EQU  $NumParseChars * 4

.code

XOR EBX, EBX ;Set row = 0.
LEA ESI, [insert name of buffer here]

gnc: LODSB

; Try to match parsing characters:
LEA EDI, ParseChars
MOV EDX, EDI ;Save pointer to list of chars.
MOV ECX, $NumParseChars
REPNE SCASB
JNE isnum ;No match, so see if it's a #.
SUB EDI, EDX ;Get offset from start of list.
DEC EDI
SHL EDI, 2 ;Convert to DWORD offset.
JMP [EBX + EDI + ParseTable]

; No match, so see if character is numeric:
isnum: CMP AL, '0'
JB is_error
CMP AL, '9'
JA NodeE
; It's a numeric digit, so go to the "number" column in table:
JMP [EBX + ParseTable + $numOffset]

;================
; Parsing nodes
;================

NodeE LABEL NEAR
(handle syntax error here)

Node0 LABEL NEAR
JMP gnc

Node1 LABEL NEAR
MOV EDI, NumberStorePtr
STOSB ;Store # digit
MOV NumberStorePtr, EDI
MOV EBX, $row1 ;Set new row index
JMP gnc

Node2 LABEL NEAR
MOV EBX, $row2
JMP gnc

Node3 LABEL NEAR
MOV EBX, $row3
JMP gnc

Node4 LABEL NEAR
MOV EDI, NumberStorePtr
STOSB ;Store # digit
MOV NumberStorePtr, EDI
MOV EBX, $row4
JMP gnc

NodeA LABEL NEAR
LEA EDX, FirstNumberStorageArea
MOV NumberStorePtr, EDX
JMP Node1 ;Start storing #s

NodeB LABEL NEAR
MOV Operation, $addition
JMP Node3

NodeC LABEL NEAR
LEA EDX, SecondNumberStorageArea
MOV NumberStorePtr, EDX
JMP Node4 ;Start storing #s

NodeX LABEL NEAR
; Do any required processing, exit subroutine, etc.


Obviously some details need to be fleshed out here, but this gives you the basic idea. Since I use this so often, I just copy and paste this code as a template. Notice how the parsing table codes directly from the one from the sketch, which in turn comes from the FSA diagram. It's really fast to get a new parsing scheme going with this method. At least for me, it sure beats a bunch of spaghetti code with a lot of SUBSTRs, string comparisons, etc. (Jochen may disagree with this, but I'm sticking to my guns here.)

Anyhow, something to consider. Sorry if this is a little over your head, since you're a beginner. Other folks will no doubt give you other options.

NoCforMe

Instead of storing the numbers as text and then later having to convert them to binary  (using atoi() or some such), maybe easier just to process the numbers in place:


; Set up numeric storage:
NodeA LABEL NEAR
MOV FirstNumber, 0
JMP Node1 ;Start storing #s

; Add numeric digit to number:
Node1 LABEL NEAR
SUB AL, '0' ;Convert ASCII to binary
MOVZX ECX, AL ;Save digit in a DWORD
MOV EAX, 10
MUL FirstNumber ;Multiply partial # by 10
ADD FirstNumber, ECX ;Add in new digit.
MOV EBX, $row1 ;Set new row index
JMP gnc

uosunsetter

#3
Well I am a newbie, however when I read your post and this is the solution came to my mind quickly and is working! IMO, it is easy to understand since a newbie wrote it  :bg I hope you get the idea :)


include     \masm32\include\masm32rt.inc

.data


.data?
strADD db 100 dup(?) ;the whole string
strFirst db 100 dup(?) ;substring until the " +"
strSecond db 100 dup(?);substring after the "+ "


.code


start:


;Read the string
invoke StdIn,ADDR strADD,100


;Get the length of string
xor ecx, ecx ;Zero ecx
mov esi, OFFSET strADD
getstrlen: ;keep increasing ecx until the end of string
mov al, [esi]
inc esi
inc ecx
cmp al, 0
jne getstrlen


;Get the position of "+"
push ecx ;will need it back
mov esi, OFFSET strADD
xor edx, edx ;Zero edx
findplus: ;keep increasing edx until the "+" is found
dec ecx
jz quit; there is no +
mov al, [esi]
inc esi
inc edx
cmp al, "+"
jne findplus

pop  ecx ;we got it back now
mov esi, OFFSET strADD
mov edi, OFFSET strFirst
push edx ;will need it
sub edx, 2 ;we dont want the space char before the +

;Get first part
splitfirstpart:
mov al, [esi]
mov [edi], al
inc esi
inc edi
dec edx
jnz splitfirstpart
mov ah, 0
mov [edi], ah ;close the string


pop edx
add edx, 1; ;we dont want + and the following space char
mov esi, OFFSET strADD
mov edi, OFFSET strSecond
add esi, edx

;Get second part
splitsecondpart:
mov al, [esi]
mov [edi], al
inc esi
inc edi
inc edx
cmp edx, ecx
jne splitsecondpart
mov ah, 0
mov [edi], ah ;close the string


;Convert each and add
invoke atodw, addr strFirst ;convert first string to integer and save it to ecx
mov ebx, eax

invoke atodw, addr strSecond ;convert second string to integer and add it to eax
add eax, ebx

print str$(eax) ;finally print addition
inkey

quit:
invoke ExitProcess,0
END start


edit: typo fixes

uosunsetter

#4
Btw, I am sure that all can be done in a more sophisticated, fast, efficent way but here you can see what is going on I hope. Also I was too lazy to wrote the string to integer code by myself soo well that's all  :toothy

Edit: Well I see NoCforMe posted pro version of converting  :wink I think I posted the rest (the part that is not related to your question :bg) with the ready-to-use atodw function.