My Assembler Development Update

Started by johnsa, May 06, 2012, 07:01:31 PM

Previous topic - Next topic

dedndave

not saying it is right or wrong or better
but, masm inserts space for the larger offset form, then reduces if it can
with older versions, you might have seen something like this in the disassembly

        jmp short SomeAddress
        nop(s)


but - that isn't what rubs me - lol
it can be frustrating when you want to do something like this
        ORG     (SomeLabel+3) AND -4
and the assembler spits out an error telling you that the operand must be a constant   :(

not as pertinent today as it was with 16-bit code
but, if you are writing boot sectors or - especially - ROMable code - it's a pain

i remember when i used to write BIOS's...
i would assemble, creating a MAP file with public symbols
getting the addresses of - say - 5 or 6 symbols
then adjusting EQUates in the program with those addresses
and assemble again - lol

jj2007

Quote from: johnsa on May 06, 2012, 07:01:31 PM
   Performance wise it's doing about 200,000 lines per second over two full passes. The assembler runs itself in 0.4 seconds, as opposed to jwasm's 0.2 on my machine.
   Most of this performance loss is in the lexers directive/opcode/reg lookup functions which are just table scanning string comparisons.
   I have added in hashing and sorting, and once that's finished it should be back to about 500,000 lines per second.

I wonder where the physical limits are. I did some tests on my Celeron M (slow...) for 20*(Windows.inc+WinExtra.inc) and got this:
Reading + tokenising one Mio lines from disk:
130 ms in the 1st round, 145 ms in the 2nd

156 ms for finding 60 occurences of WM_PAINT
163 ms for finding 35580 occurences of EQU


It's SSE2, and the Instr algo is not the absolutely fastest but close. Of course, a lexer is still a different animal...

johnsa

Next Update:

1) Added H/W detection for CRC32 support.
2) Added Fallback ADLER32 for software checksum if no CRC32.
3) Some bug-fixes, refactoring and optimizations.
4) Added full multi-pass support
5) Improved source file management for multiple passes to simply reset each file instead of unload/reload.
6) Added symbol table and lookups.
7) Added two more productions to parser.
8) Updated output to only show on first pass.
9) Added full DB's and hashing system for Lexer lookups for opcodes, registers, directives and symbols.

Notes:
The opcode, register, directive tables I opted to make some more work for myself, instead of generating the tables sorted and pre-hashed offline, the actual table is stored in a readable logical format in the code.. IE: grouped by type/alphabetical whatever.. and the
hash tables and lookups are built by the assembler on start. This means I can add directives opcodes etc without having to re-generate any sort of table outside of the main code.

Updated Performance:
Now that the lexer, hashing and file sub-system is optimized the assembler runs itself in 78ms for me. This includes all 3 full passes (I'm assuming for now my 3 pass idea will hold). This equates currently to about 1,800,000 lines per second lexed, parsed with lookups, register values, numerical value calc, symbol table lookup etc.

Update attached.

habran

 :U you are really amazing
If you continue to work like that in just few weeks we will be able to use it for work

keep up excellent work


jj2007


johnsa

Thanks :)

Luckily I think I spent so much time thinking about it and not coding it, that now I'm coding it's going quite quickly. I'm sure I'll be calling on you all to help test and provide some needed insight!

I think the biggest piece of the work is unfortunately still to come in the form of all 400-500 parser productions with conditions for state, current bit mode, pass no. etc

I am going to try my best to have it generating a BIN file in the next 2-3 days of a simple program



.686p
.mmx
.k3d
.xmm

.const

.data?

.data

myVariable db 10
MyVariable2 dd 20
AnotherOne REAL4 2.5

.code

start:
   mov eax,ebx
   mov ecx,0x20
   mov edx,10
   mov eax,32h
   mov al,10101010b
   nop
   ret

start ends


As soon as that works as a bin, I'll finish MOV opcode completely with memory addressing modes. That should be another 2-3 days.. Call it a week. Then labels, some basic jmps, optimization pass... another 5 days, then I'll try get a real OBJ file out of it. Not sure how long that will take, I'll give myself a week.
My personal deadline/objective is to have a working asm by end May (obviously far from complete still in terms of all the opcodes/macros/procs), but working in essence.. able to generate an OBJ from the above simple opcodes, optimized and be linkable by LINK with symbolic debug info in 64bit.

Please let me know if you find any bugs or issues along the way and I'll factor that in too.

Cheers!
John

johnsa

Next update:

1. Added some more functionality to my global state, including tracking warnings and errors + counts.
2. Started re-factoring the error system to allow it to accumulate all errors on pass 1 then display. (Almost finished this). At present it just terminates on error.
3. Update the output a bit to include the above.
4. Added the following parser productions (.386 - .686p, .code, .data, .const, .data?, .pdata, .xdata, mmx, k3d, xmm, .32bits, .16bits, .64bits...).
5. 50% implemented the section and segment manager.
6. Added parser validation of entry point and END directive.
7. Added nop, ret, mov r32,r32 opcode productions.
8. Added support to declare a variable with DB... will continue rolling out all the other data types this week.
9. Started implementing basic BIN file... if you do xasm test.asm -b you'll get a .BIN output now.. format is very simple DWORD(Length of section),DATA.... Suggestions here?
10. Fixed a command line arg handling bug.
11. 80% complete on the symbol table implementation (just waiting on the number converters).

Tonight I need to finish the numerical converters so that I can update the symbol table with those entries and have them write out too into the BIN file (IE: the data section).

jj2007

Quote from: johnsa on May 10, 2012, 09:59:26 AM
2. Started re-factoring the error system to allow it to accumulate all errors on pass 1 then display.

Older versions of Masm display them one by one, newer versions (and JWasm) "en bloc". Personally I prefer the first variant... just a thought.

:U

johnsa

Ok, maybe I can make that an cmd line option then..

-e Terminate on Error. or something like that.

dedndave

i think masm stops after it has shown you 100 errors - lol
generally, if you have one problem that creates several errors, the first error listed is the one that will find it for you
i would say 20 or 30 errors is plenty   :P

jj2007

I just realised I meant the progress messages when building a library. Recent ML and JWasm remain silent until the whole library is built, then it dumps a long list on you.
But regarding ordinary error messages, I agree with Dave that 20 are enough. If you want a really intelligent solution, suppress repeated "undefined symbol" stuff - once is enough.

mineiro

Nice job Sr johnsa
Quote9. Started implementing basic BIN file... if you do xasm test.asm -b you'll get a .BIN output now.. format is very simple DWORD(Length of section),DATA.... Suggestions here?
From my point of view, BIN files are a raw output, a mix betwen data and code, but first comes code, after comes data. Bin files are like .com files, .rom files, I think the only difference betwen these extensions are the place that they are loaded in memory.
.386
.code
.16bits
start:
org 100h   ;com file, one segment to all, data or code,cs=ds=es=ss
nop
ret
Variable1 db 90
end start

The generated .bin file put's data variable first, before code. With this, I cannot rename .bin to .com and run.
Suggestion is assume that data can be inside code, and/or vice-versa. Bin files do not have a format, so you can remove the lenght of section, or turn this into an option.

johnsa

Hey,

Ok done a bit more work.. been going a bit slowly at the moment. I re-factored some things in the lexer to ensure that $,$$ are not operators but identifiers. AND OR NOT XOR POW EXP SHL SHR SIN COS TAN are converted
into operators.

I updated the BIN file output to be just that.. a dump of the sections as they are in order. So in a few basic tests that worked ok and generated the same 3 byte output as jwasm for the example mineiro posted.
I did a bit more work on the symbol table and added a few things as pre-defined symbols on init, like $,$$.

I changed things around in terms of handling $ and $$ to link them to actual segment/section entries and maintain an actual set of tokens to represent these values.

The main reason for that is things like the expression evaluator expect tokens, not just simple numbers but identifiers/symbols so the expression evaluator can now grab these in the right form.
The expression system is almost right, it was a bit fiddly doing the infix->postfix and evaluation, handling symbol looks as well as negative numbers.
I decided to update the lexer to NOT take something like -5 as a number, but rather as an operator (-) and a number (5).
This should allow simple expressions like
myVar db -5
to be handled by the expression evaluator (as it automatically inserts a zero token) when the stack doesn't balance.. so -5 is actually 0-5. If that makes sense.
As well as handling more complex ones like 2*4-5+(-5)/10*SIN(10)
I still need to put in automatic promotion so that the result will convert to the right type.. IE: above SIN(10) would force the expression to require a float or better.

So now I've run into a small issue/question which maybe you guys can help with.
Once the symbol table starts running now and all identifiers/equ's etc start adding.. by the time I get past windows/winextra.inc I'm already sitting on about 60,000 entries and the table grows to a whopping 150Mb.
At which point VirtualAlloc refuses to allocate any more :) I'm storing the symbol entries one by one in a linked list, so I could change that around and have it allocate in blocks which might stop VirtualAlloc from bombing
and would probably speed things up .. IE: initialize storage for blocks of 1024 symbols at a time.. but it still doesn't change the fact that this is going to run up several hundred Mb of memory to assemble a project just
from what's defined in the standard includes.
I'm not sure if I should be handling EQU's a different way, but even if I stored them in a lighter weight setup, they'd still be using a lot of memory due to the sheer number of them.
My symbol entry (structure) is quite dense as I've put everything in there I think might be useful to allow it to handle struct,macro,proc,arguments,types,dup arrays etc.

Any thoughts?
John

dedndave

that's a good problem   :P
not that 150 Mb is really all that much, in the grander scheme of things
the bugger is - a lot of those symbols won't be used
and, it's hard to predict which will be needed and which will not
it's not as though you could load them on an as-needed basis

this is especially true with the masm32 package
nearly all of the equates are in windows.inc/winextra.inc
typically - the equates that belong to, say, advapi32 are in advapi32.inc - not windows.inc   :'(

one solution that comes to mind is to perhaps use a temporary file for the "windows" symbol tokens,
and keep the "project" symbol tokens in memory
if the source code accesses a symbol, the token is moved from the temporary file into memory

a little bit of memory manager strategy code is going to be needed
you might look at JwAsm, to get some ideas

johnsa

That's not a bad option.. But how would you determine what's project specific and what is system as it's all just includes.. without pulling some sort of hack, like a precompiled header of sorts specifically for windows.inc

the other option I guess would be to add things to the symbol table on use, not on declaration.
We could do this for all things, or at least just for equates. So on the first pass, we detect a reference to a symbol, we don't find it we create it as unknown and then only on the second pass would we actually update the symbol entry with the values from the
EQU. The only problem I can see with this approach is things like textequ .. if we create a dummy symbol on use, a text substitution for example could cause some havoc if we had it as an empty string..

Seing as you can do var equ <text here> too.. makes it hard to limit as it's not just textequ.

The only alternative I can see to that, is that we create something like a GUID to fill the text portion of the symbol if its a literal or quoted literal.

This way only equates and variables that are actually used are pulled into the symbol table.

Maybe Bogdan can shed some light on how he handled this.