make assembler (ml.exe and link.exe and rc.exe)

Started by elmo, May 13, 2011, 04:39:38 AM

Previous topic - Next topic

elmo

I have a project in my school to make a compiler. and no one in my school (incllude the lecturer) has experience in making compiler.  :(

I think MASM' compiler is in (ml.exe and link.exe and rc.exe)
ml.exe => will produce file.obj
link.exe => will produce file.exe
rc.exe => will bound our application with removable image file


when I try to disassemble ml.exe and link.exe and rc.exe,
it show that ml.exe only use the following API:

00042000  2BF VirtualFree
00042004  19F HeapFree
00042008   7D ExitProcess
0004200C  29E TerminateProcess
00042010   F7 GetCurrentProcess
00042014  170 GetTimeZoneInformation
00042018  15D GetSystemTime
0004201C  11B GetLocalTime
00042020  11A GetLastError
00042024  26A SetFilePointer
00042028  2DF WriteFile
0004202C  218 ReadFile
00042030   1B CloseHandle
00042034  115 GetFileType
00042038   34 CreateFileA
0004203C   F5 GetCurrentDirectoryA
00042040  116 GetFullPathNameA
00042044  104 GetDriveTypeA
00042048   57 DeleteFileA
0004204C  10B GetExitCodeProcess
00042050  2CE WaitForSingleObject
00042054   44 CreateProcessA
00042058  1E4 MultiByteToWideChar
0004205C  22F RtlUnwind
00042060  1A3 HeapSize
00042064  199 HeapAlloc
00042068  1A2 HeapReAlloc
0004206C  124 GetModuleFileNameA
00042070   CA GetCommandLineA
00042074  174 GetVersion
00042078  19D HeapDestroy
0004207C  19B HeapCreate
00042080  241 SetConsoleCtrlHandler
00042084  2BB VirtualAlloc
00042088  27C SetStdHandle
0004208C  26D SetHandleCount
00042090  152 GetStdHandle
00042094  150 GetStartupInfoA
00042098  261 SetEndOfFile
0004209C  2D2 WideCharToMultiByte
000420A0  2AD UnhandledExceptionFilter
000420A4  263 SetEnvironmentVariableW
000420A8  262 SetEnvironmentVariableA
000420AC  1BF LCMapStringA
000420B0  1C0 LCMapStringW
000420B4   BF GetCPInfo
000420B8   B9 GetACP
000420BC  131 GetOEMCP
000420C0   9D FindNextFileA
000420C4   94 FindFirstFileA
000420C8   90 FindClose
000420CC   B2 FreeEnvironmentStringsA
000420D0   B3 FreeEnvironmentStringsW
000420D4  106 GetEnvironmentStrings
000420D8  108 GetEnvironmentStringsW
000420DC  153 GetStringTypeA
000420E0  156 GetStringTypeW
000420E4   21 CompareStringA
000420E8   22 CompareStringW
000420EC   AA FlushFileBuffers
000420F0  13E GetProcAddress
000420F4  1C2 LoadLibraryA
000420F8  10D GetFileAttributesA







and link.exe, only use the following API:


Imp Addr Hint Import Name from mspdb50.dll - Not Bound
-------- ---- ---------------------------------------------------------------
00001260   5D PDBOpenValidate
00001264   76 TruncStFromSz
00001268   67 SigForPbCb

Imp Addr Hint Import Name from MSVCRT.dll - Not Bound
-------- ---- ---------------------------------------------------------------
0000109C  1B5 _spawnv
000010A0   75 __p__pgmptr
000010A4  2A8 remove
000010A8   87 __unDName
000010AC  291 malloc
000010B0  2C0 strncmp
000010B4  290 longjmp
000010B8   D5 _fcloseall
000010BC  29F putc
000010C0  2DB vprintf
000010C4  25D fread
000010C8  262 fseek
000010CC  264 ftell
000010D0  268 getc
000010D4  2AA rewind
000010D8  266 fwrite
000010DC  1B7 _spawnvp
000010E0  148 _makepath
000010E4   ED _fsopen
000010E8  1A8 _setjmp3
000010EC  1A3 _seh_longjmp_unwind
000010F0  26A getenv
000010F4   F2 _fullpath
000010F8  1C1 _stricmp
000010FC  12E _ismbcspace
00001100  252 fgets
00001104  17D _mbstok
00001108  1D9 _ultoa
0000110C  177 _mbsrchr
00001110  1D6 _tzset
00001114    F ??2@YAPAXI@Z
00001118  2C3 strrchr
0000111C   49 __CxxFrameHandler
00001120  14F _mbclen
00001124  2D4 toupper
00001128  2D0 time
0000112C  24C fclose
00001130  1D1 _tempnam
00001134  15F _mbsicmp
00001138  258 fprintf
0000113C  277 isprint
00001140  259 fputc
00001144  249 exit
00001148  2A1 puts
0000114C  246 ctime
00001150  25A fputs
00001154  2A7 realloc
00001158  2C7 strtok
0000115C  1BF _strdup
00001160  2BB strcspn
00001164  159 _mbscmp
00001168  165 _mbsnbcmp
0000116C  23D atoi
00001170  2A9 rename
00001174  183 _mktemp
00001178  2C5 strstr
0000117C   DC _filelength
00001180  168 _mbsnbcpy
00001184  15D _mbsdec
00001188  1A2 _searchenv
0000118C  169 _mbsnbicmp
00001190  16E _mbsncmp
00001194  176 _mbspbrk
00001198   55 __dllonexit
0000119C  186 _onexit
000011A0   D3 _exit
000011A4   48 _XcptFilter
000011A8   64 __p___initenv
000011AC   58 __getmainargs
000011B0  10F _initterm
000011B4   83 __setusermatherr
000011B8   9D _adjust_fdiv
000011BC   6A __p__commode
000011C0   6F __p__fmode
000011C4   81 __set_app_type
000011C8   B7 _controlfp
000011CC  158 _mbschr
000011D0  2A4 qsort
000011D4  192 _purecall
000011D8   10 ??3@YAXPAX@Z
000011DC   CA _except_handler3
000011E0  113 _iob
000011E4  2AD setlocale
000011E8  273 isdigit
000011EC  298 memmove
000011F0  1B9 _splitpath
000011F4  1CB _strupr
000011F8  1C5 _strnicmp
000011FC  2AE setvbuf
00001200  29E printf
00001204  24F fflush
00001208  134 _itoa
0000120C  2B2 sprintf
00001210  2B5 sscanf
00001214  2B7 strchr
00001218  217 _write
0000121C  144 _lseek
00001220  198 _read
00001224  1CF _tell
00001228   B3 _close
0000122C  1B0 _sopen
00001230   56 __doserrno
00001234  1BA _stat
00001238   8E _access
0000123C  1DD _unlink
00001240   B1 _chsize
00001244  2C1 strncpy
00001248  25E free
0000124C  240 calloc
00001250  270 isalnum
00001254  2BF strncat
00001258  23F bsearch

Imp Addr Hint Import Name from KERNEL32.dll - Not Bound
-------- ---- ---------------------------------------------------------------
00001000   A2 FreeLibrary
00001004  1F0 RaiseException
00001008  1AF LocalAlloc
0000100C   88 FindClose
00001010   8C FindFirstFileA
00001014   EC GetDiskFreeSpaceA
00001018  295 VirtualAlloc
0000101C  146 GetSystemInfo
00001020  174 GlobalMemoryStatus
00001024  243 SetEnvironmentVariableA
00001028   B6 GetCommandLineA
0000102C   E3 GetCurrentProcess
00001030  158 GetTickCount
00001034  15F GetVersion
00001038   26 CopyFileA
0000103C  216 SearchPathA
00001040  150 GetTempPathA
00001044  14E GetTempFileNameA
00001048  1A9 LoadLibraryA
0000104C  299 VirtualFree
00001050  245 SetErrorMode
00001054  222 SetConsoleCtrlHandler
00001058  273 Sleep
0000105C   54 DeleteFileA
00001060   75 ExitProcess
00001064   33 CreateFileMappingA
00001068  1BE MapViewOfFileEx
0000106C  28B UnmapViewOfFile
00001070  24B SetFilePointer
00001074  242 SetEndOfFile
00001078   19 CloseHandle
0000107C   32 CreateFileA
00001080  105 GetLastError
00001084   FE GetFileSize
00001088  111 GetModuleHandleA
0000108C  129 GetProcAddress
00001090   91 FindNextFileA
00001094  195 InterlockedExchange


Imp Addr Hint Import Name from MSDIS109.DLL - (Delayed)
-------- ---- ---------------------------------------------------------------
000694E8    0 ?PfncchfixupSet@DIS@@QAEP6GIPBV1@_KIPADIPA_K@ZP6GI01I2I3@Z@Z
000694EC    0 ?CchFormatInstr@DIS@@QBEIPADI@Z
000694F0    0 ?PdisNew@DIS@@SGPAV1@W4DIST@1@@Z
000694F4    0 ?CchFormatAddr@DIS@@QBEI_KPADI@Z
000694F8    0 ?Dist@DIS@@QBE?AW4DIST@1@XZ
000694FC    0 ?PvClient@DIS@@QBEPAXXZ
00069500    0 ?PfncchaddrSet@DIS@@QAEP6GIPBV1@_KPADIPA_K@ZP6GI012I3@Z@Z
00069504    0 ?PvClientSet@DIS@@QAEPAXPAX@Z

Imp Addr Hint Import Name from IMAGEHLP.dll - (Delayed)
-------- ---- ---------------------------------------------------------------
000694D4    0 MapFileAndCheckSumA
000694D8    0 CheckSumMappedFile
000694DC    0 ReBaseImage
000694E0    0 BindImageEx





and rc.exe, only use the following API:

Imp Addr Hint Import Name from MSVCRT.dll - Bound
-------- ---- ---------------------------------------------------------------
00001008  246 exit
0000100C  2B4 strchr
00001010  28E malloc
00001014   D0 _exit
00001018   63 __p___initenv
0000101C   58 __getmainargs
00001020  10C _initterm
00001024   48 _XcptFilter
00001028   82 __setusermatherr
0000102C   69 __p__commode
00001030  29B printf
00001034   80 __set_app_type
00001038   C7 _except_handler3
0000103C   B4 _controlfp
00001040   9B _adjust_fdiv
00001044   6E __p__fmode

Imp Addr Hint Import Name from KERNEL32.dll - Bound
-------- ---- ---------------------------------------------------------------
00001000  253 SetConsoleCtrlHandler

Imp Addr Hint Import Name from RCDLL.dll - Bound
-------- ---- ---------------------------------------------------------------
0000104C    0 Handler
00001050    1 RC





Look at ml.exe and link.exe
both use API LoadLibrary.
Do you know what function that ml.exe and link.exe want to call from that API LoadLibrary?
sorry for my bad english.
be the king of accounting programmer world!

dedndave

it's quite likely that one of these functions uses LoadLibrary
00001008  246 exit
0000100C  2B4 strchr
00001010  28E malloc
00001014   D0 _exit
00001018   63 __p___initenv
0000101C   58 __getmainargs
00001020  10C _initterm
00001024   48 _XcptFilter
00001028   82 __setusermatherr
0000102C   69 __p__commode
00001030  29B printf
00001034   80 __set_app_type
00001038   C7 _except_handler3
0000103C   B4 _controlfp
00001040   9B _adjust_fdiv
00001044   6E __p__fmode


at any rate, when the OS loads your EXE, it has to load the OS DLL's (like shell32, user32, etc)
i dunno if it uses LoadLibrary, but it would make sense if it did

elmo

hi dedndave, thanks. but I think it also call StdOut to show something like this on the Command Prompt' console:

Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997.  All rights reserved.

Assembling: test.asm
MASM : fatal error A1000: cannot open file : test.asm




maybe I can find how to use that all API on MSDN.

but I don't get the info about how to use the following API  :red
000694E8    0 ?PfncchfixupSet@DIS@@QAEP6GIPBV1@_KIPADIPA_K@ZP6GI01I2I3@Z@Z
000694EC    0 ?CchFormatInstr@DIS@@QBEIPADI@Z
000694F0    0 ?PdisNew@DIS@@SGPAV1@W4DIST@1@@Z
000694F4    0 ?CchFormatAddr@DIS@@QBEI_KPADI@Z
000694F8    0 ?Dist@DIS@@QBE?AW4DIST@1@XZ
000694FC    0 ?PvClient@DIS@@QBEPAXXZ
00069500    0 ?PfncchaddrSet@DIS@@QAEP6GIPBV1@_KPADIPA_K@ZP6GI012I3@Z@Z
00069504    0 ?PvClientSet@DIS@@QAEPAXPAX@Z

maybe someone know?
be the king of accounting programmer world!

dedndave

i am guessing it's something to do with DOS emulation
you don't want to tear into the console too much - lol
a lot of effort for very little benefit

FORTRANS

Quote from: elmo on May 13, 2011, 04:39:38 AM
I have a project in my school to make a compiler. and no one in my school (incllude the lecturer) has experience in making compiler.  :(

Hi,

   A real compiler?  A mockup?  What exactly is expected?  The
Dragon Book (Compilers: Principles, Techniques, and Tools) by
Aho et. al. was the standard back when.

Quote
when I try to disassemble ml.exe and link.exe and rc.exe,
it show that ml.exe only use the following API:

   Um, all that is to read an input file, write an output file, and
associated data gathering.  It has little to do with parsing the
program or converting language elements to executable code.
I think you can be more productive trying to define a work flow
structure than disassembling an existing compiler.

   What language are you targeting?  What kind of executable
is expected to be generated?  How much time and resources
are you given?

Regards,

Steve

elmo

Quote
   A real compiler?  A mockup?  What exactly is expected?  The
Dragon Book (Compilers: Principles, Techniques, and Tools) by
Aho et. al. was the standard back when.

Yes, my lecture instruct me to make a compiler.
But himself doesn't have experience in making compiler.
I think that's insane!

If I can't I will fail and must retake his lecture.  :(
I just want to know how to make an MASM' compiler (ml.exe , link.exe, rc.exe)
now I only know what API that ml.exe, link.exe dan rc.exe use by disassembler it.
but I don't know how to use that all API's.

any tutorial that make this clear?

sory for my bad english
be the king of accounting programmer world!

dedndave

i doubt he expects a full-featured compiler or assembler   :P
probably something along the lines of the debug programs' assembly capability would satisfy the requirement

also, because you can create your own language, make it something easy to parse   :P
to satisfy the coursework, you could probably use 16-bit, if you are more familiar with it

FORTRANS

Quote from: elmo on May 14, 2011, 08:37:26 AM
any tutorial that make this clear?

Hi,

   Well, you can look at existing open source code for a compiler.
Not really a tutorial, but will show you the scope of the problem.
Assembler examples are NASM (The Netwide Assembler), FASM
(flat assembler), or WASM (Wasm - Open Watcom).

http://www.nasm.us/
http://flatassembler.net/index.php
http://www.openwatcom.org/index.php/Wasm

JWASM is a fork of WASM.  These are probably easier
to look at rather than disassembling MASM.

   Of course, you could try C instead of assembly.   There is
Small-C discussed in Usenet alt.lang.asm group.  One such
(of many) discussions:

http://groups.google.com/group/alt.os.development/msg/9624c78cf3314fa6?hl=en
http://groups.google.com/group/alt.os.development/browse_thread/thread/8f8f1bae0dcdec9/05a497d04b49f746?hl=en&q=small+c+compiler&lnk=nl&

They mention PCC, GCC, and other C implementations as well.
I don't really "do" C, but the discussions are sometimes fun to
follow.  And there are C Usenet groups as well.  They tend to
end up writing a C compiler in C.

   Forth, and its ilk, are probably the easiest languages to start
with.  But most of those are interpreters, not real compilers.
But if your instructor does not know the difference, you might
get away with it.  "Threaded Interpretive Languages" by R. G.
Loeliger ISBN 0-07-038360-X has a fairly complete FORTH like
language in Z80 assembly.  And there are/were a bunch with
source code out on the internet.

   I have a simple book, "Writing Interactive Compilers and
Interpreters", by P. J. Brown, ISBN 0 471 27609 X, ISBN 0471
10072 2 pbk.  He is aiming at beginners and walks through the
whole process of writing a BASIC compiler.  A bit more than
260 pages (paperback).  He has most of what you need to
write a compiler except for real executable code.  You should
be able to write a BASIC compiler in BASIC from this book.

   I still think you should get a better idea of what exactly is
required rather than "write a compiler".  That is fairly vague.
Talk to your instructor and find out what is a minimum acceptable
project.

HTH,

Steve N.

P.S.  Dave just posted some good advice.  Look at DEBUG.EXE
and its capabilities.

clive

Quote from: elmo on May 13, 2011, 04:39:38 AM
I have a project in my school to make a compiler. and no one in my school (incllude the lecturer) has experience in making compiler.

Academics are such bloody idiots. Why would he ask you to do some he is incapable of demonstrating?

The "dragon" book by Aho, et al, is one of the classic texts.

Looking at SmallC or a simple assembler would be the way to go. Looking at the imports/exports of Microsoft tools probably gives very little insight into the internal mechanics.

For more complete example look at GNU/GCC and the binary tools (AS, LD, OBJDUMP, etc).
It could be a random act of randomness. Those happen a lot as well.

MichaelW

There is Paul Vojta's DOS Debug clone:

http://math.berkeley.edu/~vojta/

That in recent times Japheth has been developing:

http://www.japheth.de/debxxf.html


eschew obfuscation

elmo

thank you for all information  :U
I will learn it.

Quote from: clive on May 14, 2011, 04:00:53 PM
Academics are such bloody idiots. Why would he ask you to do some he is incapable of demonstrating?

I agree with you. that's insane!
be the king of accounting programmer world!

vanjast

Academics get the students to do their dirty work, then they plonk their name in front of the students if it's any good.
The poor student doesn't have much choice, and is only glad to get his marks and buzz off to the next stage!!  :'(

ixuta

     Your instructor asked you to make a compiler. First let's see what a compiler does. A compiler converts medium or high level languages into low level languages; such as asm. When you are writing in C or Pascal or Modula II, you are usually working in an IDE (integrated development environment). Inside the IDE things happen that are hidden from you. The IDE is a combination of an editor, a pre-processor, a compiler, an assembler, a linker and sometimes an extra step called exe2bin. Let's see what each one does. Exe2bin is an ancient tool used to convert .exe files into .com files. Com files are executables that occupy less than 64K of RAM. COM files existed back in the days of segmented memory models when .coms lived in only one 64K segment. Let's just talk about today's .exe and leave .coms and segments out of it. In the IDE, when you click compile, the IDE calls the preprocessor, then the compiler, then the assembler, then the linker where is produced the final .exe. For example, let's say you are writing in C language, here are the parts of your IDE.
The editor is where you open, write, edit, save and close your source code. An editor is not part of a compiler.
The preprocessor converts all your #define, #include, macros and other stuff into C code. Say you had written #include <stdio.h>, the preprocessor opens stdio.h and pastes it into your source code. If you wrote #define MyConstant 5, the preprocessor replaces all occurrences of MyConstant with a 5. A "Write a compiler" project  does not need to include a preprocessor. A preprocessor is not part of a compiler.
The compiler converts medium or high level language source code into assembly language. It reads your .C file and outputs a .asm file. For example if, in C, if you wrote x=5; then the complier reads that line of C source code and translates it into mov al,5; mov x,al;  If you wrote printf("I am %d years old",x); then the compiler writes mov ax,<address of your  string>; push ax; mov ax,x; push ax; call printf; After the compiler has translated all of the C source code into assembler source code and written it out to a .asm file, the compiler's job is done.
The IDE then calls the assembler, such as ml.exe, to assemble the .asm file into machine code. The machine code will be written to an .obj file. If the .asm file says mov al,5 then the assembler will read that line and create 1001010001001010010010010010 or A45E7 and write that out to an .obj file. An assembler is not part of a compiler.
The IDE then calls the linker. The linker takes all of the .obj files, such as your translated asm and the .lib files. And combines them into a .exe. The linker resolves the addresses such as <address of string> into their actual offsets in the exe. for example if the linker read the .asm first, at offset 0 bytes from the beginning of the .exe and <your string> starts 5 bytes from the beginning of the .asm then the resolved address of <your string> is 0 plus 5 equals 5, so it is 5 that is the parameter of the machine code that corresponds to mov ax,<address of your string>. If the linker then reads string.lib into the exe at offset 120,000 bytes from the beginning of the exe and the printf function is 128 bytes from the beginning of string.lib then the resolved address of printf is 120,000 plus 128 equals 120,128. The linker combines (links) each of the obj and lib that make up your exe. After the linker has read in all the pieces, resolved all the addresses and written the exe, its job is done and the entire job of the IDE is done. A linker is not part of a compiler.
      Your instructor only asked you to write a compiler. As you can see, the compiler is only one piece of the whole process. So, your task is much simpler than it sounds. You can use any editor, such as wordpad or edlin  or emacs to create your source code and save it. As long as the editor is capable of writing a pure ASCII text file. Example wordpad can "save-as" a file in Text Document -MS Dos format without any kind of control codes, font codes or other junk in it; just plain text. Then all you need to write is a compiler that opens and reads that file and translates it into assembly language and writes out a .asm file. You can then use wordpad to open, display or print your asm file. At that point your project is done. Whether or not your instructor knew it, that is all he asked you to do.
So, my friend, how to write a compiler; a compiler is simply a string parser. Every time it sees
x=5;
replace it with
mov al,5;
mov x,al;
or any such thing as that.
Make up your own high level language, something like C but limited maybe.
Make up your own processor and processor instruction set, like an 8088 maybe.
Say we are going to parse (or compile) x=5;
A string parser is a beautiful example of a state machine. A state machine is a device (software function) that changes state based on some input. It starts out in state 0 then depending on its input, say its input is "x" it may move to state3 where it waits for its next input, say the input is "=" The state machine now knows enough to perform part of an action. When it changes state it performs an action such as writing "mov al". It then waits for  its next input, say its input is "5" and moves to state 7 and there performs some action such as concatenating  "comma 5" on to the end of "mov al". The parser then knows it has reached the end of an instruction so it writes " a semi-colon; and a new line, carriage return" Once it has built up a complet output string it writes that string as a new line to your .asm file. Then the state machine moves back to state 0 waiting to read and parse (compile) the next line of source code.  The state machine can be built using a bunch of switch(...){} statements and an integer variable to represent its current state and a char or string variable to represent its input.  So search for, and read up on, "finite state machine"
State machines are easy, they're a blast. A compilers job is a small piece of the whole. Your instructor only asked for a compiler; not an entire IDE.

Here is a quick and dirty example:
switch(state){
//     . . .  other states/cases
case 3:                     //when im in this state and I get some input, I move to the next state
   switch(input):
      case '=':           //if this is my input then I do some certain task
         strcat(output, "mov al,");
         state =7;             //the state I move to next is determined by what state I was in and what input I got
         break;
//     . . .  other inputs/cases
  }
//     . . .  other states/cases
}
and so on.
So search for, and read up on, "finite state machine"
................. have fun 'piler dude!



anunitu

Check here for TMA macro assembler, that includes source code for the assembler itself...Very old but perhaps something to study. The source code is in assembler,and compiles using the assembler(included binery) I ran into this many years ago.

http://www.bbs.motion-bg.com/index.php?dir=23

The file in this download contains the actual assembler in the file TMABCKUP.COM

As a com file It runs in DOS

jj2007

Quote from: ixuta on July 15, 2011, 12:41:34 AM
     Your instructor asked you to make a compiler. First let's see what a compiler does. A compiler converts ...

Hi ixuta,

Welcome to the Forum :thumbu

(you have passed the "am I a bot" test :bg)