I have a project in my school to make a compiler. and no one in my school (incllude the lecturer) has experience in making compiler. :(
I think MASM' compiler is in (ml.exe and link.exe and rc.exe)
ml.exe => will produce file.obj
link.exe => will produce file.exe
rc.exe => will bound our application with removable image file
when I try to disassemble ml.exe and link.exe and rc.exe,
it show that ml.exe only use the following API:
00042000 2BF VirtualFree
00042004 19F HeapFree
00042008 7D ExitProcess
0004200C 29E TerminateProcess
00042010 F7 GetCurrentProcess
00042014 170 GetTimeZoneInformation
00042018 15D GetSystemTime
0004201C 11B GetLocalTime
00042020 11A GetLastError
00042024 26A SetFilePointer
00042028 2DF WriteFile
0004202C 218 ReadFile
00042030 1B CloseHandle
00042034 115 GetFileType
00042038 34 CreateFileA
0004203C F5 GetCurrentDirectoryA
00042040 116 GetFullPathNameA
00042044 104 GetDriveTypeA
00042048 57 DeleteFileA
0004204C 10B GetExitCodeProcess
00042050 2CE WaitForSingleObject
00042054 44 CreateProcessA
00042058 1E4 MultiByteToWideChar
0004205C 22F RtlUnwind
00042060 1A3 HeapSize
00042064 199 HeapAlloc
00042068 1A2 HeapReAlloc
0004206C 124 GetModuleFileNameA
00042070 CA GetCommandLineA
00042074 174 GetVersion
00042078 19D HeapDestroy
0004207C 19B HeapCreate
00042080 241 SetConsoleCtrlHandler
00042084 2BB VirtualAlloc
00042088 27C SetStdHandle
0004208C 26D SetHandleCount
00042090 152 GetStdHandle
00042094 150 GetStartupInfoA
00042098 261 SetEndOfFile
0004209C 2D2 WideCharToMultiByte
000420A0 2AD UnhandledExceptionFilter
000420A4 263 SetEnvironmentVariableW
000420A8 262 SetEnvironmentVariableA
000420AC 1BF LCMapStringA
000420B0 1C0 LCMapStringW
000420B4 BF GetCPInfo
000420B8 B9 GetACP
000420BC 131 GetOEMCP
000420C0 9D FindNextFileA
000420C4 94 FindFirstFileA
000420C8 90 FindClose
000420CC B2 FreeEnvironmentStringsA
000420D0 B3 FreeEnvironmentStringsW
000420D4 106 GetEnvironmentStrings
000420D8 108 GetEnvironmentStringsW
000420DC 153 GetStringTypeA
000420E0 156 GetStringTypeW
000420E4 21 CompareStringA
000420E8 22 CompareStringW
000420EC AA FlushFileBuffers
000420F0 13E GetProcAddress
000420F4 1C2 LoadLibraryA
000420F8 10D GetFileAttributesA
and link.exe, only use the following API:
Imp Addr Hint Import Name from mspdb50.dll - Not Bound
-------- ---- ---------------------------------------------------------------
00001260 5D PDBOpenValidate
00001264 76 TruncStFromSz
00001268 67 SigForPbCb
Imp Addr Hint Import Name from MSVCRT.dll - Not Bound
-------- ---- ---------------------------------------------------------------
0000109C 1B5 _spawnv
000010A0 75 __p__pgmptr
000010A4 2A8 remove
000010A8 87 __unDName
000010AC 291 malloc
000010B0 2C0 strncmp
000010B4 290 longjmp
000010B8 D5 _fcloseall
000010BC 29F putc
000010C0 2DB vprintf
000010C4 25D fread
000010C8 262 fseek
000010CC 264 ftell
000010D0 268 getc
000010D4 2AA rewind
000010D8 266 fwrite
000010DC 1B7 _spawnvp
000010E0 148 _makepath
000010E4 ED _fsopen
000010E8 1A8 _setjmp3
000010EC 1A3 _seh_longjmp_unwind
000010F0 26A getenv
000010F4 F2 _fullpath
000010F8 1C1 _stricmp
000010FC 12E _ismbcspace
00001100 252 fgets
00001104 17D _mbstok
00001108 1D9 _ultoa
0000110C 177 _mbsrchr
00001110 1D6 _tzset
00001114 F ??2@YAPAXI@Z
00001118 2C3 strrchr
0000111C 49 __CxxFrameHandler
00001120 14F _mbclen
00001124 2D4 toupper
00001128 2D0 time
0000112C 24C fclose
00001130 1D1 _tempnam
00001134 15F _mbsicmp
00001138 258 fprintf
0000113C 277 isprint
00001140 259 fputc
00001144 249 exit
00001148 2A1 puts
0000114C 246 ctime
00001150 25A fputs
00001154 2A7 realloc
00001158 2C7 strtok
0000115C 1BF _strdup
00001160 2BB strcspn
00001164 159 _mbscmp
00001168 165 _mbsnbcmp
0000116C 23D atoi
00001170 2A9 rename
00001174 183 _mktemp
00001178 2C5 strstr
0000117C DC _filelength
00001180 168 _mbsnbcpy
00001184 15D _mbsdec
00001188 1A2 _searchenv
0000118C 169 _mbsnbicmp
00001190 16E _mbsncmp
00001194 176 _mbspbrk
00001198 55 __dllonexit
0000119C 186 _onexit
000011A0 D3 _exit
000011A4 48 _XcptFilter
000011A8 64 __p___initenv
000011AC 58 __getmainargs
000011B0 10F _initterm
000011B4 83 __setusermatherr
000011B8 9D _adjust_fdiv
000011BC 6A __p__commode
000011C0 6F __p__fmode
000011C4 81 __set_app_type
000011C8 B7 _controlfp
000011CC 158 _mbschr
000011D0 2A4 qsort
000011D4 192 _purecall
000011D8 10 ??3@YAXPAX@Z
000011DC CA _except_handler3
000011E0 113 _iob
000011E4 2AD setlocale
000011E8 273 isdigit
000011EC 298 memmove
000011F0 1B9 _splitpath
000011F4 1CB _strupr
000011F8 1C5 _strnicmp
000011FC 2AE setvbuf
00001200 29E printf
00001204 24F fflush
00001208 134 _itoa
0000120C 2B2 sprintf
00001210 2B5 sscanf
00001214 2B7 strchr
00001218 217 _write
0000121C 144 _lseek
00001220 198 _read
00001224 1CF _tell
00001228 B3 _close
0000122C 1B0 _sopen
00001230 56 __doserrno
00001234 1BA _stat
00001238 8E _access
0000123C 1DD _unlink
00001240 B1 _chsize
00001244 2C1 strncpy
00001248 25E free
0000124C 240 calloc
00001250 270 isalnum
00001254 2BF strncat
00001258 23F bsearch
Imp Addr Hint Import Name from KERNEL32.dll - Not Bound
-------- ---- ---------------------------------------------------------------
00001000 A2 FreeLibrary
00001004 1F0 RaiseException
00001008 1AF LocalAlloc
0000100C 88 FindClose
00001010 8C FindFirstFileA
00001014 EC GetDiskFreeSpaceA
00001018 295 VirtualAlloc
0000101C 146 GetSystemInfo
00001020 174 GlobalMemoryStatus
00001024 243 SetEnvironmentVariableA
00001028 B6 GetCommandLineA
0000102C E3 GetCurrentProcess
00001030 158 GetTickCount
00001034 15F GetVersion
00001038 26 CopyFileA
0000103C 216 SearchPathA
00001040 150 GetTempPathA
00001044 14E GetTempFileNameA
00001048 1A9 LoadLibraryA
0000104C 299 VirtualFree
00001050 245 SetErrorMode
00001054 222 SetConsoleCtrlHandler
00001058 273 Sleep
0000105C 54 DeleteFileA
00001060 75 ExitProcess
00001064 33 CreateFileMappingA
00001068 1BE MapViewOfFileEx
0000106C 28B UnmapViewOfFile
00001070 24B SetFilePointer
00001074 242 SetEndOfFile
00001078 19 CloseHandle
0000107C 32 CreateFileA
00001080 105 GetLastError
00001084 FE GetFileSize
00001088 111 GetModuleHandleA
0000108C 129 GetProcAddress
00001090 91 FindNextFileA
00001094 195 InterlockedExchange
Imp Addr Hint Import Name from MSDIS109.DLL - (Delayed)
-------- ---- ---------------------------------------------------------------
000694E8 0 ?PfncchfixupSet@DIS@@QAEP6GIPBV1@_KIPADIPA_K@ZP6GI01I2I3@Z@Z
000694EC 0 ?CchFormatInstr@DIS@@QBEIPADI@Z
000694F0 0 ?PdisNew@DIS@@SGPAV1@W4DIST@1@@Z
000694F4 0 ?CchFormatAddr@DIS@@QBEI_KPADI@Z
000694F8 0 ?Dist@DIS@@QBE?AW4DIST@1@XZ
000694FC 0 ?PvClient@DIS@@QBEPAXXZ
00069500 0 ?PfncchaddrSet@DIS@@QAEP6GIPBV1@_KPADIPA_K@ZP6GI012I3@Z@Z
00069504 0 ?PvClientSet@DIS@@QAEPAXPAX@Z
Imp Addr Hint Import Name from IMAGEHLP.dll - (Delayed)
-------- ---- ---------------------------------------------------------------
000694D4 0 MapFileAndCheckSumA
000694D8 0 CheckSumMappedFile
000694DC 0 ReBaseImage
000694E0 0 BindImageEx
and rc.exe, only use the following API:
Imp Addr Hint Import Name from MSVCRT.dll - Bound
-------- ---- ---------------------------------------------------------------
00001008 246 exit
0000100C 2B4 strchr
00001010 28E malloc
00001014 D0 _exit
00001018 63 __p___initenv
0000101C 58 __getmainargs
00001020 10C _initterm
00001024 48 _XcptFilter
00001028 82 __setusermatherr
0000102C 69 __p__commode
00001030 29B printf
00001034 80 __set_app_type
00001038 C7 _except_handler3
0000103C B4 _controlfp
00001040 9B _adjust_fdiv
00001044 6E __p__fmode
Imp Addr Hint Import Name from KERNEL32.dll - Bound
-------- ---- ---------------------------------------------------------------
00001000 253 SetConsoleCtrlHandler
Imp Addr Hint Import Name from RCDLL.dll - Bound
-------- ---- ---------------------------------------------------------------
0000104C 0 Handler
00001050 1 RC
Look at ml.exe and link.exe
both use API LoadLibrary.
Do you know what function that ml.exe and link.exe want to call from that API LoadLibrary?
sorry for my bad english.
it's quite likely that one of these functions uses LoadLibrary
00001008 246 exit
0000100C 2B4 strchr
00001010 28E malloc
00001014 D0 _exit
00001018 63 __p___initenv
0000101C 58 __getmainargs
00001020 10C _initterm
00001024 48 _XcptFilter
00001028 82 __setusermatherr
0000102C 69 __p__commode
00001030 29B printf
00001034 80 __set_app_type
00001038 C7 _except_handler3
0000103C B4 _controlfp
00001040 9B _adjust_fdiv
00001044 6E __p__fmode
at any rate, when the OS loads your EXE, it has to load the OS DLL's (like shell32, user32, etc)
i dunno if it uses LoadLibrary, but it would make sense if it did
hi dedndave, thanks. but I think it also call StdOut to show something like this on the Command Prompt' console:
Microsoft (R) Macro Assembler Version 6.14.8444
Copyright (C) Microsoft Corp 1981-1997. All rights reserved.
Assembling: test.asm
MASM : fatal error A1000: cannot open file : test.asm
maybe I can find how to use that all API on MSDN.
but I don't get the info about how to use the following API :red
000694E8 0 ?PfncchfixupSet@DIS@@QAEP6GIPBV1@_KIPADIPA_K@ZP6GI01I2I3@Z@Z
000694EC 0 ?CchFormatInstr@DIS@@QBEIPADI@Z
000694F0 0 ?PdisNew@DIS@@SGPAV1@W4DIST@1@@Z
000694F4 0 ?CchFormatAddr@DIS@@QBEI_KPADI@Z
000694F8 0 ?Dist@DIS@@QBE?AW4DIST@1@XZ
000694FC 0 ?PvClient@DIS@@QBEPAXXZ
00069500 0 ?PfncchaddrSet@DIS@@QAEP6GIPBV1@_KPADIPA_K@ZP6GI012I3@Z@Z
00069504 0 ?PvClientSet@DIS@@QAEPAXPAX@Z
maybe someone know?
i am guessing it's something to do with DOS emulation
you don't want to tear into the console too much - lol
a lot of effort for very little benefit
Quote from: elmo on May 13, 2011, 04:39:38 AM
I have a project in my school to make a compiler. and no one in my school (incllude the lecturer) has experience in making compiler. :(
Hi,
A real compiler? A mockup? What exactly is expected? The
Dragon Book (Compilers: Principles, Techniques, and Tools) by
Aho et. al. was the standard back when.
Quote
when I try to disassemble ml.exe and link.exe and rc.exe,
it show that ml.exe only use the following API:
Um, all that is to read an input file, write an output file, and
associated data gathering. It has little to do with parsing the
program or converting language elements to executable code.
I think you can be more productive trying to define a work flow
structure than disassembling an existing compiler.
What language are you targeting? What kind of executable
is expected to be generated? How much time and resources
are you given?
Regards,
Steve
Quote
A real compiler? A mockup? What exactly is expected? The
Dragon Book (Compilers: Principles, Techniques, and Tools) by
Aho et. al. was the standard back when.
Yes, my lecture instruct me to make a compiler.
But himself doesn't have experience in making compiler.
I think that's insane!
If I can't I will fail and must retake his lecture. :(
I just want to know how to make an MASM' compiler (ml.exe , link.exe, rc.exe)
now I only know what API that ml.exe, link.exe dan rc.exe use by disassembler it.
but I don't know how to use that all API's.
any tutorial that make this clear?
sory for my bad english
i doubt he expects a full-featured compiler or assembler :P
probably something along the lines of the debug programs' assembly capability would satisfy the requirement
also, because you can create your own language, make it something easy to parse :P
to satisfy the coursework, you could probably use 16-bit, if you are more familiar with it
Quote from: elmo on May 14, 2011, 08:37:26 AM
any tutorial that make this clear?
Hi,
Well, you can look at existing open source code for a compiler.
Not really a tutorial, but will show you the scope of the problem.
Assembler examples are NASM (The Netwide Assembler), FASM
(flat assembler), or WASM (Wasm - Open Watcom).
http://www.nasm.us/
http://flatassembler.net/index.php
http://www.openwatcom.org/index.php/Wasm
JWASM is a fork of WASM. These are probably easier
to look at rather than disassembling MASM.
Of course, you could try C instead of assembly. There is
Small-C discussed in Usenet alt.lang.asm group. One such
(of many) discussions:
http://groups.google.com/group/alt.os.development/msg/9624c78cf3314fa6?hl=en
http://groups.google.com/group/alt.os.development/browse_thread/thread/8f8f1bae0dcdec9/05a497d04b49f746?hl=en&q=small+c+compiler&lnk=nl&
They mention PCC, GCC, and other C implementations as well.
I don't really "do" C, but the discussions are sometimes fun to
follow. And there are C Usenet groups as well. They tend to
end up writing a C compiler in C.
Forth, and its ilk, are probably the easiest languages to start
with. But most of those are interpreters, not real compilers.
But if your instructor does not know the difference, you might
get away with it. "Threaded Interpretive Languages" by R. G.
Loeliger ISBN 0-07-038360-X has a fairly complete FORTH like
language in Z80 assembly. And there are/were a bunch with
source code out on the internet.
I have a simple book, "Writing Interactive Compilers and
Interpreters", by P. J. Brown, ISBN 0 471 27609 X, ISBN 0471
10072 2 pbk. He is aiming at beginners and walks through the
whole process of writing a BASIC compiler. A bit more than
260 pages (paperback). He has most of what you need to
write a compiler except for real executable code. You should
be able to write a BASIC compiler in BASIC from this book.
I still think you should get a better idea of what exactly is
required rather than "write a compiler". That is fairly vague.
Talk to your instructor and find out what is a minimum acceptable
project.
HTH,
Steve N.
P.S. Dave just posted some good advice. Look at DEBUG.EXE
and its capabilities.
Quote from: elmo on May 13, 2011, 04:39:38 AM
I have a project in my school to make a compiler. and no one in my school (incllude the lecturer) has experience in making compiler.
Academics are such bloody idiots. Why would he ask you to do some he is incapable of demonstrating?
The "dragon" book by Aho, et al, is one of the classic texts.
Looking at SmallC or a simple assembler would be the way to go. Looking at the imports/exports of Microsoft tools probably gives very little insight into the internal mechanics.
For more complete example look at GNU/GCC and the binary tools (AS, LD, OBJDUMP, etc).
There is Paul Vojta's DOS Debug clone:
http://math.berkeley.edu/~vojta/
That in recent times Japheth has been developing:
http://www.japheth.de/debxxf.html
thank you for all information :U
I will learn it.
Quote from: clive on May 14, 2011, 04:00:53 PM
Academics are such bloody idiots. Why would he ask you to do some he is incapable of demonstrating?
I agree with you. that's insane!
Academics get the students to do their dirty work, then they plonk their name in front of the students if it's any good.
The poor student doesn't have much choice, and is only glad to get his marks and buzz off to the next stage!! :'(
Your instructor asked you to make a compiler. First let's see what a compiler does. A compiler converts medium or high level languages into low level languages; such as asm. When you are writing in C or Pascal or Modula II, you are usually working in an IDE (integrated development environment). Inside the IDE things happen that are hidden from you. The IDE is a combination of an editor, a pre-processor, a compiler, an assembler, a linker and sometimes an extra step called exe2bin. Let's see what each one does. Exe2bin is an ancient tool used to convert .exe files into .com files. Com files are executables that occupy less than 64K of RAM. COM files existed back in the days of segmented memory models when .coms lived in only one 64K segment. Let's just talk about today's .exe and leave .coms and segments out of it. In the IDE, when you click compile, the IDE calls the preprocessor, then the compiler, then the assembler, then the linker where is produced the final .exe. For example, let's say you are writing in C language, here are the parts of your IDE.
The editor is where you open, write, edit, save and close your source code. An editor is not part of a compiler.
The preprocessor converts all your #define, #include, macros and other stuff into C code. Say you had written #include <stdio.h>, the preprocessor opens stdio.h and pastes it into your source code. If you wrote #define MyConstant 5, the preprocessor replaces all occurrences of MyConstant with a 5. A "Write a compiler" project does not need to include a preprocessor. A preprocessor is not part of a compiler.
The compiler converts medium or high level language source code into assembly language. It reads your .C file and outputs a .asm file. For example if, in C, if you wrote x=5; then the complier reads that line of C source code and translates it into mov al,5; mov x,al; If you wrote printf("I am %d years old",x); then the compiler writes mov ax,<address of your string>; push ax; mov ax,x; push ax; call printf; After the compiler has translated all of the C source code into assembler source code and written it out to a .asm file, the compiler's job is done.
The IDE then calls the assembler, such as ml.exe, to assemble the .asm file into machine code. The machine code will be written to an .obj file. If the .asm file says mov al,5 then the assembler will read that line and create 1001010001001010010010010010 or A45E7 and write that out to an .obj file. An assembler is not part of a compiler.
The IDE then calls the linker. The linker takes all of the .obj files, such as your translated asm and the .lib files. And combines them into a .exe. The linker resolves the addresses such as <address of string> into their actual offsets in the exe. for example if the linker read the .asm first, at offset 0 bytes from the beginning of the .exe and <your string> starts 5 bytes from the beginning of the .asm then the resolved address of <your string> is 0 plus 5 equals 5, so it is 5 that is the parameter of the machine code that corresponds to mov ax,<address of your string>. If the linker then reads string.lib into the exe at offset 120,000 bytes from the beginning of the exe and the printf function is 128 bytes from the beginning of string.lib then the resolved address of printf is 120,000 plus 128 equals 120,128. The linker combines (links) each of the obj and lib that make up your exe. After the linker has read in all the pieces, resolved all the addresses and written the exe, its job is done and the entire job of the IDE is done. A linker is not part of a compiler.
Your instructor only asked you to write a compiler. As you can see, the compiler is only one piece of the whole process. So, your task is much simpler than it sounds. You can use any editor, such as wordpad or edlin or emacs to create your source code and save it. As long as the editor is capable of writing a pure ASCII text file. Example wordpad can "save-as" a file in Text Document -MS Dos format without any kind of control codes, font codes or other junk in it; just plain text. Then all you need to write is a compiler that opens and reads that file and translates it into assembly language and writes out a .asm file. You can then use wordpad to open, display or print your asm file. At that point your project is done. Whether or not your instructor knew it, that is all he asked you to do.
So, my friend, how to write a compiler; a compiler is simply a string parser. Every time it sees
x=5;
replace it with
mov al,5;
mov x,al;
or any such thing as that.
Make up your own high level language, something like C but limited maybe.
Make up your own processor and processor instruction set, like an 8088 maybe.
Say we are going to parse (or compile) x=5;
A string parser is a beautiful example of a state machine. A state machine is a device (software function) that changes state based on some input. It starts out in state 0 then depending on its input, say its input is "x" it may move to state3 where it waits for its next input, say the input is "=" The state machine now knows enough to perform part of an action. When it changes state it performs an action such as writing "mov al". It then waits for its next input, say its input is "5" and moves to state 7 and there performs some action such as concatenating "comma 5" on to the end of "mov al". The parser then knows it has reached the end of an instruction so it writes " a semi-colon; and a new line, carriage return" Once it has built up a complet output string it writes that string as a new line to your .asm file. Then the state machine moves back to state 0 waiting to read and parse (compile) the next line of source code. The state machine can be built using a bunch of switch(...){} statements and an integer variable to represent its current state and a char or string variable to represent its input. So search for, and read up on, "finite state machine"
State machines are easy, they're a blast. A compilers job is a small piece of the whole. Your instructor only asked for a compiler; not an entire IDE.
Here is a quick and dirty example:
switch(state){
// . . . other states/cases
case 3: //when im in this state and I get some input, I move to the next state
switch(input):
case '=': //if this is my input then I do some certain task
strcat(output, "mov al,");
state =7; //the state I move to next is determined by what state I was in and what input I got
break;
// . . . other inputs/cases
}
// . . . other states/cases
}
and so on.
So search for, and read up on, "finite state machine"
................. have fun 'piler dude!
Check here for TMA macro assembler, that includes source code for the assembler itself...Very old but perhaps something to study. The source code is in assembler,and compiles using the assembler(included binery) I ran into this many years ago.
http://www.bbs.motion-bg.com/index.php?dir=23
The file in this download contains the actual assembler in the file TMABCKUP.COM
As a com file It runs in DOS
Quote from: ixuta on July 15, 2011, 12:41:34 AM
Your instructor asked you to make a compiler. First let's see what a compiler does. A compiler converts ...
Hi ixuta,
Welcome to the Forum :thumbu
(you have passed the "am I a bot" test :bg)