Pages: [1]
|
 |
|
Author
|
Topic: Retrieving from String (Read 3569 times)
|
Bamizas
New Member

Posts: 5
|
Heya all,
I'm fairly new to assembly and would like a few pointers to help me out.
I am trying to make a program that can parse threw a string and retrieve the numbers from it. Here is an example :
"456 + 236"
I would like to scan the string for the operator + (which I can do at the moment), I just don't know how to extract the numbers from the string after.
In the end I want to do a calculator where I can write the equation.
Thx in advance!
Bami
|
|
|
Logged
|
|
|
|
NoCforMe
|
Here's one possible solution. This is my preferred way of doing any parsing tasks. It might seem a little complicated for what you want to do, and may be overkill for this simple task, but it can be adapted to any number of situations where you need to interpret text given by a user, making it a very useful tool. This method is called, technically speaking, using a finite state automaton (FSA). Don't let that highfalutin' term scare you; it just means, basically, "a machine with a finite number of states and a finite set of conditions governing moving between those states". The first thing you have to do is sit down with pencil and paper and draw out a state diagram, like the one attached below. This shows a series of "states" (the circles with numbers inside them), and a bunch of other stuff. The other stuff is the "tokens" (the little boxes with labels), and some other states shown as letters inside boxes.  There's really no other way to start this, unless you're some kind of freaking genius (which I am not). You need to be able to visualize the flow of parsing. State 0 is the starting state: this is where you are when you're looking at the beginning of your string ("456 + 236" in your example). At each state, you go to another state (or the same state) based on what you see. So at state 0, if you have leading spaces in your string (like " 456 + 236"), then when you see a space, you just go back to node 0, which basically does nothing. This lets you "scan off" leading spaces the user may have typed in by mistake. When you finally see a numeric digit (the token marked "#"), then you advance to state A, where you set to store your first number. This puts you at node 1, where succeeding numeric digits are stored. If you see a space instead, you go to node 2. Node 2 scans off spaces and looks for a plus sign. And so on and so forth. (After looking at this, I would make one simple change, which would be to allow for no space before the plus sign. This is really easy to do: just change row 1 of the parsing table (the 2nd row) to allow a move to node B for a plus sign. Shows how easy it is to change the behavior of the FSA, instead of rewriting tons of code.) Here's the code to implement this. The code at the beginning is just the "driver" for the FSA: get the next character, try to match it to the list of tokens (space, plus sign, EOL or end-of-line). If the token matches, then it sets up a jump to the proper node, using the current row index (EBX) and the offset from that index, or the "column". If it doesn't match, then if it's a numeric digit (0-9), go to the last column of the current row, which is the "#" column. data
ParseTable LABEL DWORD ; SP + EOL # DD Node0, NodeE, NodeE, NodeA ; row 0 DD Node2, NodeE, NodeE, Node1 ; row 1 DD Node2, NodeB, NodeE, NodeE ; row 2 DD Node3, NodeE, NodeE, NodeC ; row 2 DD NodeE, NodeE, NodeX, Node4 ; row 2
$EOL EQU 0
ParseChars DB ' ', '+', $EOL $NumParseChars EQU $ - ParseChars
$Row1 EQU ($NumParseChars + 1) * 4 ;DWORD offset $Row2 EQU $Row1 * 2 $Row3 EQU $Row1 * 3 $Row4 EQU $Row1 * 4
$numOffset EQU $NumParseChars * 4
.code
XOR EBX, EBX ;Set row = 0. LEA ESI, [insert name of buffer here]
gnc: LODSB
; Try to match parsing characters: LEA EDI, ParseChars MOV EDX, EDI ;Save pointer to list of chars. MOV ECX, $NumParseChars REPNE SCASB JNE isnum ;No match, so see if it's a #. SUB EDI, EDX ;Get offset from start of list. DEC EDI SHL EDI, 2 ;Convert to DWORD offset. JMP [EBX + EDI + ParseTable]
; No match, so see if character is numeric: isnum: CMP AL, '0' JB is_error CMP AL, '9' JA NodeE ; It's a numeric digit, so go to the "number" column in table: JMP [EBX + ParseTable + $numOffset]
;================ ; Parsing nodes ;================
NodeE LABEL NEAR (handle syntax error here)
Node0 LABEL NEAR JMP gnc
Node1 LABEL NEAR MOV EDI, NumberStorePtr STOSB ;Store # digit MOV NumberStorePtr, EDI MOV EBX, $row1 ;Set new row index JMP gnc
Node2 LABEL NEAR MOV EBX, $row2 JMP gnc
Node3 LABEL NEAR MOV EBX, $row3 JMP gnc
Node4 LABEL NEAR MOV EDI, NumberStorePtr STOSB ;Store # digit MOV NumberStorePtr, EDI MOV EBX, $row4 JMP gnc
NodeA LABEL NEAR LEA EDX, FirstNumberStorageArea MOV NumberStorePtr, EDX JMP Node1 ;Start storing #s
NodeB LABEL NEAR MOV Operation, $addition JMP Node3
NodeC LABEL NEAR LEA EDX, SecondNumberStorageArea MOV NumberStorePtr, EDX JMP Node4 ;Start storing #s
NodeX LABEL NEAR ; Do any required processing, exit subroutine, etc.
Obviously some details need to be fleshed out here, but this gives you the basic idea. Since I use this so often, I just copy and paste this code as a template. Notice how the parsing table codes directly from the one from the sketch, which in turn comes from the FSA diagram. It's really fast to get a new parsing scheme going with this method. At least for me, it sure beats a bunch of spaghetti code with a lot of SUBSTRs, string comparisons, etc. (Jochen may disagree with this, but I'm sticking to my guns here.) Anyhow, something to consider. Sorry if this is a little over your head, since you're a beginner. Other folks will no doubt give you other options.
|
|
|
Logged
|
|
|
|
NoCforMe
|
Instead of storing the numbers as text and then later having to convert them to binary (using atoi() or some such), maybe easier just to process the numbers in place: ; Set up numeric storage: NodeA LABEL NEAR MOV FirstNumber, 0 JMP Node1 ;Start storing #s
; Add numeric digit to number: Node1 LABEL NEAR SUB AL, '0' ;Convert ASCII to binary MOVZX ECX, AL ;Save digit in a DWORD MOV EAX, 10 MUL FirstNumber ;Multiply partial # by 10 ADD FirstNumber, ECX ;Add in new digit. MOV EBX, $row1 ;Set new row index JMP gnc
|
|
|
Logged
|
|
|
|
uosunsetter
Member
 
Posts: 26
|
Well I am a newbie, however when I read your post and this is the solution came to my mind quickly and is working! IMO, it is easy to understand since a newbie wrote it  I hope you get the idea :) include \masm32\include\masm32rt.inc
.data
.data? strADD db 100 dup(?) ;the whole string strFirst db 100 dup(?) ;substring until the " +" strSecond db 100 dup(?);substring after the "+ "
.code
start:
;Read the string invoke StdIn,ADDR strADD,100
;Get the length of string xor ecx, ecx ;Zero ecx mov esi, OFFSET strADD getstrlen: ;keep increasing ecx until the end of string mov al, [esi] inc esi inc ecx cmp al, 0 jne getstrlen
;Get the position of "+" push ecx ;will need it back mov esi, OFFSET strADD xor edx, edx ;Zero edx findplus: ;keep increasing edx until the "+" is found dec ecx jz quit; there is no + mov al, [esi] inc esi inc edx cmp al, "+" jne findplus
pop ecx ;we got it back now mov esi, OFFSET strADD mov edi, OFFSET strFirst push edx ;will need it sub edx, 2 ;we dont want the space char before the +
;Get first part splitfirstpart: mov al, [esi] mov [edi], al inc esi inc edi dec edx jnz splitfirstpart mov ah, 0 mov [edi], ah ;close the string
pop edx add edx, 1; ;we dont want + and the following space char mov esi, OFFSET strADD mov edi, OFFSET strSecond add esi, edx
;Get second part splitsecondpart: mov al, [esi] mov [edi], al inc esi inc edi inc edx cmp edx, ecx jne splitsecondpart mov ah, 0 mov [edi], ah ;close the string
;Convert each and add invoke atodw, addr strFirst ;convert first string to integer and save it to ecx mov ebx, eax
invoke atodw, addr strSecond ;convert second string to integer and add it to eax add eax, ebx print str$(eax) ;finally print addition inkey
quit: invoke ExitProcess,0 END start edit: typo fixes
|
|
« Last Edit: April 23, 2012, 12:32:39 AM by uosunsetter »
|
Logged
|
|
|
|
uosunsetter
Member
 
Posts: 26
|
Btw, I am sure that all can be done in a more sophisticated, fast, efficent way but here you can see what is going on I hope. Also I was too lazy to wrote the string to integer code by myself soo well that's all  Edit: Well I see NoCforMe posted pro version of converting  I think I posted the rest (the part that is not related to your question  ) with the ready-to-use atodw function.
|
|
« Last Edit: April 23, 2012, 12:23:51 AM by uosunsetter »
|
Logged
|
|
|
|
|
Pages: [1]
|
|
|
 |