My Assembler Development Update

jj2007 · May 14, 2012, 05:58:44 PM

Quote from: johnsa on May 13, 2012, 09:12:38 PM
Once the symbol table starts running now and all identifiers/equ's etc start adding.. by the time I get past windows/winextra.inc I'm already sitting on about 60,000 entries and the table grows to a whopping 150Mb.

150MB/60k=2500 bytes per entry? Why so much? Can you give an example of such entries?

By the way: We count on you. Jwasm is on ice.

johnsa · May 14, 2012, 06:30:01 PM

Code Select


; Symbol Table Entry Types.
STYPE_LABEL     equ 0
STYPE_PROC      equ 1
STYPE_MACRO     equ 2
STYPE_VARIABLE  equ 3
STYPE_STRUCT    equ 4
STYPE_CONST     equ 5
STYPE_EQUATE    equ 6
STYPE_ENUM      equ 7
STYPE_EXTRN     equ 8
STYLE_LITERAL   equ 9
STYPE_RECORD    equ 10
STYPE_UNION     equ 11
STYPE_NAMESPACE equ 12
STYPE_CLASS     equ 13
STYPE_METHOD    equ 14
STYPE_SEGMENT   equ 15
STYPE_SECTION   equ 16
STYPE_TEXTEQU   equ 17
STYPE_UNKNOWN   equ 18				; For when we create a symbol before we know what to do with it.

ARGTYPE_BYTE  equ 0
ARGTYPE_WORD  equ 1
ARGTYPE_DWORD equ 2

SYMBOL struct
	hash       dd ?
	symType    dd ?					; Symbol Type.
	filePtr    dd ?					; Source File or external lib/obj file containing definition ptr. 
	line       dd ?					; Line number symbol defined on.
	address    dd ?
	sectionDef dd ?					; Section id symbol defined in.
	segmentDef dd ?					; Segment id symbol defined in.
	sSize      db ?					; size in bits if variable or element bit size if DUP/Array.
	sLen       dd ?					; Number of elements if Array/DUP or BSS count.
	int8alue   db ?					; Actual integral value of symbol.
	int16alue  dw ?					; Actual integral value of symbol.
	int32alue  dd ?					; Actual integral value of symbol.
	int64alue  dq ?					; Actual integral value of symbol.
	fValue32   REAL4 0.0			; Floating point value.
	fValue64   REAL8 0.0			; Floating point value.
	fValue80   REAL10 0.0			; Floating point value.
	symName    db 64 DUP (?)
	isDeclared db ?
	isDefined  db ?
	isExtern   db ?
	isType     db ?					; Is this symbol a type reference? (IE: struct or DWORD etc).
	usage      dd ?					; Number of times this symbol has been referenced via offset,addr,lea,call,invoke etc.
	argCount   db ?					; If Proc or macro, how many args does it have?
	argTypes   db 64 DUP (?)		; Argument types.
	scopePtr   dd ?					; Pointer to a scope entry (namespace, local etc).
	prevPtr    dd ?
	nextPtr    dd ?					; Pointer to next symbol entry in linked list.
SYMBOL ends

Thats tentative, it will change no doubt.
I think i've solved the issue by working backwards. I add symbols as "UNKNOWN" on reference, then on the next pass it'll use the declaration to fill in the symbol information. This means ONLY symbols that are actually used are created and makes it a lot faster.

I've added in the first recursive stage of the parser for blocks/nestings and macros (anything that can be terminated by ENDM.. rept, repeat, ifp ifpc.. etc).
Making a few minor adjustments now for the multi-pass, fixing a bug in expression and having it output some more debug info.
In theory then I can send out another update that should fully generate ORG directives including symbol references and expressions... IE: org 20+(-2)+myVariable AND -4 for example

Has there been an official statement on jwasm? I know that 2.07 was suppose to come out beginning of the year and that hasn't happened yet...

johnsa · May 21, 2012, 03:37:47 PM

Next update...

Lots of pain ...
Many bugs, much re-factoring, unit testing and regression testing. Changes to lexer and parser.

I've decided the smart thing to do would be to group instructions together according to shared parser rules. My thinking is that there are a number of instructions which can be handled by exactly the same rule set.
IE: All instructions that take NO parameters... A group for instructions that take one parameter being a memory address and so on.

I realized my idea with multiple passes was still flawed.. you DO need as many passes as it takes to solve the problem, but my solution to solve the jumps will still work. I've subsequently implemented FULL multi-pass support so that after each pass it knows
the state of forward references and symbol definition completeness.
This was necessary to solve things like this:

Code Select


A equ B
B equ C
C equ D
D equ 2

While doing this I noticed that ML/ML64 handle a lot of things FAR better than JWASM. Like the above which works in ML but not in JWASM without funny errors if these values are all fwd. references and defined after use.
This also breaks jwasm:

Code Select


A equ B
B equ A

Whereas ML and mine handle this as expected by hitting maximum pass warning.

My solution to only adding symbols to the symbol table on reference works nicely. I've fully implemented org, EQU, Expression evaluation and a bunch of built-in pre-defined symbols for things like $, $$, true, false, null. I've tested some ORGs, forward references,
offset operator and more.

I've added a dump of the symbol table to a .sym file when you build in debug mode with binary output.

The debug mode execution will also demonstrate the parser deciding where to evaluate through recurssion or linear-state matching.

I'm starting to look at building up the line number info necessary for debug mode output. As yet I'm not sure what COFF etc requires for this, I'm assuming it needs a line number+address reference for every instruction? As well as line number for symbol definitions (which is already stored in the symbol table). Any thoughts?
The one thing I do want to fix here over ML is that the line number in source of the actual MACRO must be stored (as this annoys me currently) when debugging you can't really step into a macro and there's no reason why not.. it should be much like a proc.

I've also used all the cpu manuals and MASM manual to finish capturing every single instruction and directive into the lookups... that was painful.

Attached is the next update including the usual release/debug version with added info coming from the SYMBOL TABLE sub-system and EXPRESSION system. I've included a test file which has just about every possible expression i could come up with to test it.

Once I can solve the line number debug info, complete a few more opcode group rules I should be able to start on doing the first simple OBJ generation with COFF that will actually link, run and debug properly.
(I will need some assistance or advice around what info needs to be captured to .xdata / pdata etc).

It's going slower than I would've liked.. but at least its going :) 8000 lines of code and counting...

BogdanOntanu · May 21, 2012, 08:16:37 PM

I think you should move the posts to the new forums :D

jj2007 · May 21, 2012, 10:42:43 PM

Quote from: johnsa on May 21, 2012, 03:37:47 PM
I'm starting to look at building up the line number info necessary for debug mode output. As yet I'm not sure what COFF etc requires for this, I'm assuming it needs a line number+address reference for every instruction? As well as line number for symbol definitions (which is already stored in the symbol table). Any thoughts?
The one thing I do want to fix here over ML is that the line number in source of the actual MACRO must be stored (as this annoys me currently) when debugging you can't really step into a macro and there's no reason why not.. it should be much like a proc.

You probably saw the mapinfo:lines thread.

Good luck for your project :thumbu

News:

My Assembler Development Update

jj2007

johnsa

johnsa

BogdanOntanu

jj2007