Retrieving from String

Bamizas · April 22, 2012, 06:08:57 PM

Heya all,

I'm fairly new to assembly and would like a few pointers to help me out.

I am trying to make a program that can parse threw a string and retrieve the numbers from it. Here is an example :

"456 + 236"

I would like to scan the string for the operator + (which I can do at the moment), I just don't know how to extract the numbers from the string after.

In the end I want to do a calculator where I can write the equation.

Thx in advance!

Bami

NoCforMe · April 22, 2012, 09:32:35 PM

Here's one possible solution.

This is my preferred way of doing any parsing tasks. It might seem a little complicated for what you want to do, and may be overkill for this simple task, but it can be adapted to any number of situations where you need to interpret text given by a user, making it a very useful tool.

This method is called, technically speaking, using a finite state automaton (FSA). Don't let that highfalutin' term scare you; it just means, basically, "a machine with a finite number of states and a finite set of conditions governing moving between those states".

The first thing you have to do is sit down with pencil and paper and draw out a state diagram, like the one attached below. This shows a series of "states" (the circles with numbers inside them), and a bunch of other stuff. The other stuff is the "tokens" (the little boxes with labels), and some other states shown as letters inside boxes.

There's really no other way to start this, unless you're some kind of freaking genius (which I am not). You need to be able to visualize the flow of parsing.

State 0 is the starting state: this is where you are when you're looking at the beginning of your string ("456 + 236" in your example). At each state, you go to another state (or the same state) based on what you see. So at state 0, if you have leading spaces in your string (like " 456 + 236"), then when you see a space, you just go back to node 0, which basically does nothing. This lets you "scan off" leading spaces the user may have typed in by mistake.

When you finally see a numeric digit (the token marked "#"), then you advance to state A, where you set to store your first number. This puts you at node 1, where succeeding numeric digits are stored. If you see a space instead, you go to node 2. Node 2 scans off spaces and looks for a plus sign. And so on and so forth.

(After looking at this, I would make one simple change, which would be to allow for no space before the plus sign. This is really easy to do: just change row 1 of the parsing table (the 2nd row) to allow a move to node B for a plus sign. Shows how easy it is to change the behavior of the FSA, instead of rewriting tons of code.)

Here's the code to implement this. The code at the beginning is just the "driver" for the FSA: get the next character, try to match it to the list of tokens (space, plus sign, EOL or end-of-line). If the token matches, then it sets up a jump to the proper node, using the current row index (EBX) and the offset from that index, or the "column". If it doesn't match, then if it's a numeric digit (0-9), go to the last column of the current row, which is the "#" column.

Code Select


data

ParseTable	LABEL DWORD
;		  SP     +      EOL    #
	DD	Node0, NodeE, NodeE, NodeA	; row 0
	DD	Node2, NodeE, NodeE, Node1	; row 1
	DD	Node2, NodeB, NodeE, NodeE	; row 2
	DD	Node3, NodeE, NodeE, NodeC	; row 2
	DD	NodeE, NodeE, NodeX, Node4	; row 2

$EOL		EQU 0

ParseChars	DB ' ', '+', $EOL
 $NumParseChars	EQU $ - ParseChars

$Row1		EQU ($NumParseChars + 1) * 4	;DWORD offset
$Row2		EQU $Row1 * 2
$Row3		EQU $Row1 * 3
$Row4		EQU $Row1 * 4

$numOffset	EQU  $NumParseChars * 4

.code

	XOR	EBX, EBX		;Set row = 0.
	LEA	ESI, [insert name of buffer here]

gnc:	LODSB

; Try to match parsing characters:
	LEA	EDI, ParseChars
	MOV	EDX, EDI		;Save pointer to list of chars.
	MOV	ECX, $NumParseChars
	REPNE	SCASB
	JNE	isnum			;No match, so see if it's a #.
	SUB	EDI, EDX		;Get offset from start of list.
	DEC	EDI
	SHL	EDI, 2			;Convert to DWORD offset.
	JMP	[EBX + EDI + ParseTable]

; No match, so see if character is numeric:
isnum:	CMP	AL, '0'
	JB	is_error
	CMP	AL, '9'
	JA	NodeE
; It's a numeric digit, so go to the "number" column in table:
	JMP	[EBX + ParseTable + $numOffset]

;================
; Parsing nodes
;================

NodeE	LABEL NEAR
	(handle syntax error here)

Node0	LABEL NEAR
	JMP	gnc

Node1	LABEL NEAR
	MOV	EDI, NumberStorePtr
	STOSB			;Store # digit
	MOV	NumberStorePtr, EDI
	MOV	EBX, $row1	;Set new row index
	JMP	gnc

Node2	LABEL NEAR
	MOV	EBX, $row2
	JMP	gnc

Node3	LABEL NEAR
	MOV	EBX, $row3
	JMP	gnc

Node4	LABEL NEAR
	MOV	EDI, NumberStorePtr
	STOSB			;Store # digit
	MOV	NumberStorePtr, EDI
	MOV	EBX, $row4
	JMP	gnc

NodeA	LABEL NEAR
	LEA	EDX, FirstNumberStorageArea
	MOV	NumberStorePtr, EDX
	JMP	Node1		;Start storing #s

NodeB	LABEL NEAR
	MOV	Operation, $addition
	JMP	Node3

NodeC	LABEL NEAR
	LEA	EDX, SecondNumberStorageArea
	MOV	NumberStorePtr, EDX
	JMP	Node4		;Start storing #s

NodeX	LABEL NEAR
; Do any required processing, exit subroutine, etc.

Obviously some details need to be fleshed out here, but this gives you the basic idea. Since I use this so often, I just copy and paste this code as a template. Notice how the parsing table codes directly from the one from the sketch, which in turn comes from the FSA diagram. It's really fast to get a new parsing scheme going with this method. At least for me, it sure beats a bunch of spaghetti code with a lot of SUBSTRs, string comparisons, etc. (Jochen may disagree with this, but I'm sticking to my guns here.)

Anyhow, something to consider. Sorry if this is a little over your head, since you're a beginner. Other folks will no doubt give you other options.

NoCforMe · April 22, 2012, 10:05:30 PM

Instead of storing the numbers as text and then later having to convert them to binary (using atoi() or some such), maybe easier just to process the numbers in place:

Code Select


; Set up numeric storage:
NodeA	LABEL NEAR
	MOV	FirstNumber, 0
	JMP	Node1		;Start storing #s

; Add numeric digit to number:
Node1	LABEL NEAR
	SUB	AL, '0'		;Convert ASCII to binary
	MOVZX	ECX, AL		;Save digit in a DWORD
	MOV	EAX, 10
	MUL	FirstNumber	;Multiply partial # by 10
	ADD	FirstNumber, ECX ;Add in new digit.
	MOV	EBX, $row1	;Set new row index
	JMP	gnc

uosunsetter · April 22, 2012, 10:31:31 PM

Well I am a newbie, however when I read your post and this is the solution came to my mind quickly and is working! IMO, it is easy to understand since a newbie wrote it :bg I hope you get the idea :)

Code Select


include     \masm32\include\masm32rt.inc

.data


.data?
strADD db 100 dup(?) ;the whole string
strFirst db 100 dup(?) ;substring until the " +"
strSecond db 100 dup(?);substring after the "+ "


.code


start:


;Read the string
	invoke StdIn,ADDR strADD,100


;Get the length of string
	xor ecx, ecx ;Zero ecx
	mov esi, OFFSET strADD
getstrlen: ;keep increasing ecx until the end of string
	mov al, [esi]
	inc esi
	inc ecx
	cmp al, 0
	jne getstrlen


;Get the position of "+"
	push ecx ;will need it back
	mov esi, OFFSET strADD
	xor edx, edx ;Zero edx
findplus: ;keep increasing edx until the "+" is found
	dec ecx
	jz quit; there is no +
	mov al, [esi]
	inc esi
	inc edx
	cmp al, "+"
	jne findplus

	pop  ecx ;we got it back now
	mov esi, OFFSET strADD
	mov edi, OFFSET strFirst
	push edx ;will need it
	sub edx, 2 ;we dont want the space char before the +

;Get first part
splitfirstpart:
	mov al, [esi]
	mov [edi], al
	inc esi
	inc edi
	dec edx
	jnz splitfirstpart
	mov ah, 0
	mov [edi], ah ;close the string

		
	pop edx
	add edx, 1; ;we dont want + and the following space char
	mov esi, OFFSET strADD
	mov edi, OFFSET strSecond
	add esi, edx

;Get second part
splitsecondpart:
	mov al, [esi]
	mov [edi], al
	inc esi
	inc edi
	inc edx
	cmp edx, ecx
	jne splitsecondpart
	mov ah, 0
	mov [edi], ah ;close the string


;Convert each and add
	invoke atodw, addr strFirst ;convert first string to integer and save it to ecx
	mov ebx, eax

	invoke atodw, addr strSecond ;convert second string to integer and add it to eax
	add eax, ebx
	
	print str$(eax) ;finally print addition 
	inkey

quit:
	invoke ExitProcess,0
END start

edit: typo fixes

uosunsetter · April 22, 2012, 10:35:13 PM

Btw, I am sure that all can be done in a more sophisticated, fast, efficent way but here you can see what is going on I hope. Also I was too lazy to wrote the string to integer code by myself soo well that's all :toothy

Edit: Well I see NoCforMe posted pro version of converting :wink I think I posted the rest (the part that is not related to your question :bg) with the ready-to-use atodw function.

News:

Retrieving from String

Bamizas

NoCforMe

NoCforMe

uosunsetter

uosunsetter