MASM, JWASM and the strange beast DPMI

Started by Gunther, September 23, 2010, 02:44:29 AM

Previous topic - Next topic

Gunther

I've attached the file PMSHELL.zip. It's a simple 32 bit PROTECTED MODE shell for DOS applications. Nothing special, nothing new, not very exciting. The background of all is that: I've a lot of old Windows and DOS sources for TASM 4; my plan was, to convert the material step by step for using it with MASM. So far so good.

First, I removed all TASM specific stuff and the sources can now be assembled with MASM or JWASM or TASM. The application starts in the REAL MODE (or an emulation), switches to 16 bit PROTECTED MODE, to make some preparations and switches into a flat 4 GB segment. It prints a message, switches back to the 16 bit segment, cleans up and terminates.

Here comes the first surprise. The MZ-EXE generated with TASM runs flawlessly under several configurations: Windows XP with SP 2 (32 bit), Windows 98 SE, DOS emulation of OS/2 Warp 4, Linux DosEmu version 1.4, and plain DOS with himem.sys and Borland's 32RTM.EXE (DPMI Host) - to name a few. The MZ-EXE generated with JWASM prints the message and crashes down with a General Protection Fault. The MASM EXE isn't much better. There isn't a GPF, but the screen output is garbage; the program comes back to the command line, but the "DOS Box" is seriously damaged and can't be closed with the close button. One has to finish with EXIT or via the task manager. All together: that's not exactly brilliant.

It seems to me, there's something wrong with the generated addresses by both assemblers. I've enough experiences with DOS PROTECTED MODE; writing such a program for a DPMI Host is always an adventure. When things work well under VCPI, the DPMI behavior can be very different. The application runs in that case upon bad virtualized hardware. The DPMI support of Win 3.1, Win 95, Win 98, and Win ME is okay, but since Win 2000 it's a mess. You'll see that. By starting the program the first time under XP, things are okay; the message is displayed and the text cursor is hidden. Starting it again in the same Box, the message is displayed, but the cursor hangs elsewhere on the screen. That won't happen under other environments.

What's the idea behind all that? After checking Jwasm's Dos64.asm example, I had the idea to improve it. There are a few questions. Before switching to 64 bit LONG MODE, we could prepare a few things, to have a full featured 32 bit PROTECTED MODE shell. Of course, that's not possible under DPMI or VCPI, but with XMS support or in clean DOS via INT 15h. Then in 64 bit LONG MODE, we could do for example a mode switch (LEGACY MODE) as a host for 32 bit PM applications.

I made the application as a first test for that purpose. Speed isn't important here; good manners are decisive for such mode switching applications. The program has only 2 segments: 16 and 32 bits. I used the full style with an explicit ASSUME, to have the entire control about. The logic of the source code is straight forward. Of course, with the new features (INVOKE, LOCAL etc.) the source code would be better readable. But what helps me on the other hand that fancy stuff, when the application crashes into the nirvana? Those things can be done later.

But before doing so, it must be clear what's wrong? What's a good and reliable PM Debugger? Will Olly do the job?

Gunther
Forgive your enemies, but never forget their names.

japheth

Quote
Here comes the first surprise. The MZ-EXE generated with TASM runs flawlessly under several configurations: Windows XP with SP 2

Not for me. If I run b_ex1.bat and then launch the generated ex1.exe, it crashes as well.

Quote
It seems to me, there's something wrong with the generated addresses by both assemblers.

For me it seems that there is a bug in your program.

Quote
But before doing so, it must be clear what's wrong? What's a good and reliable PM Debugger? Will Olly do the job?

No. DPMI debuggers which will reliably debug 32-bit DPMI hosts are

1. FreeDOS DEBUG (the DEBUGX variant, to be clear, which can debug DPMI clients)
2. MS CDB ( but you probably should be familiar with it)
3. Open Watcom WD (in DOS only), GRDB (in DOS only), deb32f

I did a brief test with FreeDOS DEBUGX ( it should be mentioned: I'm the maintainer of this tool) and quickly found the error. But I won't tell, because I don't want to spoil you the pleasure of finding out yourself.

http://www.japheth.de/debxxf.html

Gunther

Hallo japheth,

please excuse my late answer. It was a lot to do in the last few days; for example, I had some very interesting debugging sessions.

Quote from: japheth, September 23, 2010, 07:17:39 amNo. DPMI debuggers which will reliably debug 32-bit DPMI hosts are

1. FreeDOS DEBUG (the DEBUGX variant, to be clear, which can debug DPMI clients)

Thank you for that hint. It's a handy, well written, and good maintained tool. It did help much.

But now it's time for an interim report about some questions. For debugging purposes, I changed the source ex1.asm a little bit: it clears first the VGA compatible text screen, prints out a message by writing directly into the VGA text buffer, and terminates. The Debug directory of the attached archive contains the changed sources, pre-compiled binary files, and the running applications. Please read the readme.txt file.

All that lead to 2 different programs: tg.exe (TASM generated EXE) and jg.exe (JWASM generated EXE). I've checked tg.exe first; I traced step by step through the application, checked the register contents at every stage and inspected nearly all variables. It was a very time consuming job. Unfortunately, I couldn't find anything. The program terminated without errors. But: A picture is worth a thousands words. Let's have acloser look at the debug session. I built the exe with the the following commands:


tasm /ml /m2 /q shell32.asm

Turbo Assembler  Version 4.1  Copyright (c) 1988, 1993 Borland International

Assembling file:   shell32.asm
Error messages:    None
Warning messages:  None
Passes:            2
Remaining memory:  397k

tasm /ml /m2 /q ex1.asm

Turbo Assembler  Version 4.1  Copyright (c) 1988, 1993 Borland International

Assembling file:   ex1.asm
Error messages:    None
Warning messages:  None
Passes:            1
Remaining memory:  411k

tlink /3 /x shell32.obj ex1.obj,tg.exe

Turbo Link  Version 7.00 Copyright (c) 1987, 1994 Borland International


Here is the picture of the session with DEBUGX. Simply load the application into the debugger and type g (for GO).



It looks okay for me.

Now the same source, but assembled with JWASM with the following commands:


jwasm -c shell32.asm

JWasm v2.04RC3, Sep 13 2010, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.

shell32.asm: 686 lines, 3 passes, 0 ms, 0 warnings, 0 errors

jwasm -c ex1.asm

JWasm v2.04RC3, Sep 13 2010, Masm-compatible assembler.
Portions Copyright (c) 1992-2002 Sybase, Inc. All Rights Reserved.
Source code is available under the Sybase Open Watcom Public License.

ex1.asm: 143 lines, 2 passes, 0 ms, 0 warnings, 0 errors

tlink /3 /x shell32.obj ex1.obj,jg.exe

Turbo Link  Version 7.00 Copyright (c) 1987, 1994 Borland International


Here is the image of the debugger session:



In other words: a classic crash. Please note: there are 2 missing stars inside the string, the entire console is damaged (must be closed via task manager), and last but not least the register dump indicates always a serious bug.

Quote from: japheth, September 23, 2010, 07:17:39 amIf I run b_ex1.bat and then launch the generated ex1.exe, it crashes as well.

That seems to me not quite right. If you've used TASM 4.1 or above, the application won't crash. That's tested. Check also the following thread inside the forum: http://www.masm32.com/board/index.php?topic=14887.0 You can find a screen shot of the program there, made by Antariy from scratch. It looks healthy. Of course, Antariy made it with the program and sources from the old archive, to say it clear.

Quote from: japheth, September 23, 2010, 07:17:39 amFor me it seems that there is a bug in your program.

Why not, nobody is perfect. In that case, it must be the same but "negative" bug in TASM or TLINK, which compensates my bug one to one (see the debug session above).

Quote from: japheth, September 23, 2010, 07:17:39 amI did a brief test with FreeDOS DEBUGX ( it should be mentioned: I'm the maintainer of this tool) and quickly found the error. But I won't tell, ...

Sure. It's your decision to share your knowledge or not.

Just a few notes for other interested members. Please download only the archive PMShellDebug.zip. The main directory PMShellDebug contains of course the full unchanged source and the program, which will run and won't crash. The other 2 archives contain only the displayed screen shots above; so don't waste your band width by downloading that material. Any help and assistance is very welcome.

Gunther
Forgive your enemies, but never forget their names.

japheth

Quote from: Gunther on September 25, 2010, 12:40:47 AM
Quote from: japheth, September 23, 2010, 07:17:39 amIf I run b_ex1.bat and then launch the generated ex1.exe, it crashes as well.

That seems to me not quite right. If you've used TASM 4.1 or above, the application won't crash. That's tested.
I must apologize. It doesn't crash. On my first test I left a breakpoint in the code32 segment when I assembled with Tasm. When the breakpoint was hit in protected-mode it caused the crash.

Quote
Quote from: japheth, September 23, 2010, 07:17:39 amFor me it seems that there is a bug in your program.

Why not, nobody is perfect. In that case, it must be the same but "negative" bug in TASM or TLINK, which compensates my bug one to one (see the debug session above).

Yes, I checked this: Tasm behaves differently, it generates different fixups, which "cure" the "bug".

I'll give a hint. In this line

mov [ds:code16sel],cs


remove the "ds:" prefix and then try to assemble it with JWasm or Masm. They will report an error. As already mentioned, Tasm is able to "cure" this error, while JWasm and Masm can't. For more details use tool tdump.exe and see the differences in the first FIXU32 record.

Gunther

Hallo japheth,

here is the final report about the JWASM/MASM crash. After another debugging session it's fixed.

My first impression was, to solve it with an extra variable, but Antariy aka Alex came up with a better idea. He did a clean and elegant hack; I've incorporated it into the source of shell32.asm and it assembles now with MASM, JWASM, and TASM. Good to know. Moreover, Antariy's idea underlines the entire logic of the program and makes things more clear.

I won't discuss the behavior of JWASM or MASM by generating the right offset sizes for segment switching; that's a fruitless debate. I would like to speak about further plans. First I'll try to do the same and probably a bit more with a XMS environment (only himem.sys loaded) and in a clean raw DOS environment without any memory managers. Having this, we can kick DOS into 64 bit Long Mode and have a full 32 bit legacy mode in the background. It seems also possible in V86 mode (for example emm386.sys loaded), but it's more tricky. A VCPI client must first switch into Ring 0, before enabling 64 bit Long Mode. That's for later.

At the end of the road, we would have a kind of a 64 bit DOS Extender, which could also be a Host for 32 bit Protected Mode clients. I hope that willl work.

Gunther
Forgive your enemies, but never forget their names.

japheth

Quote from: Gunther on September 26, 2010, 10:51:58 PM
I won't discuss the behavior of JWASM or MASM by generating the right offset sizes for segment switching; that's a fruitless debate.
I find this topic interesting. Intel designed the OMF format intentionally with a significant amount of complexity to allow such linker calculations as they are done in this sample. It's strange that Masm never supported it. Anyway, with Masm v6, the focus switched to protected-mode, where those features became more or less useless.

Quote
I would like to speak about further plans. First I'll try to do the same and probably a bit more with a XMS environment (only himem.sys loaded) and in a clean raw DOS environment without any memory managers. Having this, we can kick DOS into 64 bit Long Mode and have a full 32 bit legacy mode in the background.
Interesting. I'd prefer to implement the routing of hardware interrupts in long mode to real-mode and back.

Quote
It seems also possible in V86 mode (for example emm386.sys loaded), but it's more tricky. A VCPI client must first switch into Ring 0, before enabling 64 bit Long Mode. That's for later.
It surely IS possible.

Quote
At the end of the road, we would have a kind of a 64 bit DOS Extender, which could also be a Host for 32 bit Protected Mode clients. I hope that willl work.

IMO, the minimal work for a 64-bit extender is:
- routing of hardware interrupts to real-mode
- being able to call real-mode interrupts ( DPMI int 31h, ax=0300h )
- memory management of extended memory. The simplest solution probably is to allocate all memory at startup and tell the program what has been found. ( DPMI int 31h, ax=0500h and/or 0501h )
- ability to terminate the program ( DPMI int 21h, ah=4Ch )

sinsi

QuoteInteresting. I'd prefer to implement the routing of hardware interrupts in long mode to real-mode and back.
That would be nice. It should be easy since we have all the documentation but it does seem complicated with paging et al.
Real to long can be done in one sequence (skipping protected mode) but the reverse is harder.

Can an interrupt gate switch from long mode to protected mode? Or even real mode? It has a selector, can that selector point to real mode code?
Light travels faster than sound, that's why some people seem bright until you hear them.

Gunther

Quote from: japheth, September 28, at 08:26:19 AMInteresting. I'd prefer to implement the routing of hardware interrupts in long mode to real-mode and back.

Okay, but we could have both ways.

Quote from: japheth, September 28, at 08:26:19 AMThe simplest solution probably is to allocate all memory at startup and tell the program what has been found.

Yes, that's the way to go.

Quote from: sinsi, September 28, at 08:58:32 AMReal to long can be done in one sequence (skipping protected mode) but the reverse is harder.

Much harder.

There's another large problem: At the moment, debugging Long Mode applications is not so easy. Bochs has a built in debugger, but will it work with Bochs? I'm not sure.

Gunther
Forgive your enemies, but never forget their names.

Antariy

Quote from: Gunther on September 28, 2010, 01:51:43 PM
There's another large problem: At the moment, debugging Long Mode applications is not so easy. Bochs has a built in debugger, but will it work with Bochs? I'm not sure.

What about debugging under Virtual Machine? This is really possible.
I meant not debugging in usual way - debugging under guest OS - I meant debugging of Virtual Machine in total. This is gives opportunities like hardware debugger of entire machine (but virtual machine).
But I cannot say anything about x64 and VMWare - I don't used them at all. I have experience with Virtual PC 2007, and some kinds of debugging of RM apps and 32bit apps - under *host* OS and debugger.
Of course - this is insane of work (because this is not so simple as debugging under working OS), especially for code of any manager like DPMI, but if no any other way...



Alex

Gunther

Quote from: Antariy, September 29, at 12:11:27 AMWhat about debugging under Virtual Machine? This is really possible.

Sure, Alex. I would try Bochs; it's free and comes with a built in hardware debugger. For the development of such applications it's not so bad. But at the end of the day it must run under a native DOS.

Gunther
Forgive your enemies, but never forget their names.

japheth

Quote from: Gunther on September 28, 2010, 01:51:43 PM
Quote from: sinsi, September 28, at 08:58:32 AMReal to long can be done in one sequence (skipping protected mode) but the reverse is harder.

Much harder.

May be I don't understand you two, but why is it harder? In the DOS64.asm sample there is already a switch back from long mode to real-mode implemented and I cannot see much difference in complexity.

The main reason why this sample isn't really suitable as a base for a "DOS extender" is that it does reprogram the PIC. This most likely can't be done if you want to "route" a hardware interrupt to real-mode.

Quote
There's another large problem: At the moment, debugging Long Mode applications is not so easy. Bochs has a built in debugger, but will it work with Bochs? I'm not sure.

I once tried with Qemu. It does work with 64-bit, but it has the ugly GDB-style interface. IMO simple "printfs" should do the job for most cases.

sinsi

Quote from: japheth on September 29, 2010, 07:01:48 AM
The main reason why this sample isn't really suitable as a base for a "DOS extender" is that it does reprogram the PIC. This most likely can't be done if you want to "route" a hardware interrupt to real-mode.
But in that case can't we just do a switch to real mode and then pushf/call far int*4 ?
Light travels faster than sound, that's why some people seem bright until you hear them.

japheth

Quote from: sinsi on September 29, 2010, 07:41:16 AM
But in that case can't we just do a switch to real mode and then pushf/call far int*4 ?
I guess that won't work. A few interrupt handlers enable interrupts with STI as soon as possible to allow higher interrupts to be handled. At least I found such code in some keyboard interrupt handlers.

Gunther

Quote from: japheth, September 29, at 08:01:48 AMMay be I don't understand you two, but why is it harder? In the DOS64.asm sample there is already a switch back from long mode to real-mode implemented and I cannot see much difference in complexity.

Might be. Let me explain a few points; it's only a draft, nothing final.

Our program needs at least 3 different segments (without the stack): 16 bit, 32 bit, and 64 bit. It starts as a normal Real Mode client (simple MZ EXE) in a 16 bit segment and can do some preparations. Let's say, we've only HIMEM.SYS installed, to make things cleaner.

While Protected Mode could run without paging, in the Long Mode, paging or PAE is absolutely necessary. So, what is to do, to enter Long Mode:


  • Turn off paging, if it's enabled.
  • Set PAE.
  • Create new page tables (they should reside inside the first 4 GB).
  • Enable Long Mode (just enabling, not entering).
  • Enable paging (activates and enters Long Mode).
  • We're now in compatibility mode. Enter 64 bit mode by jumping to an 64 bit code segment. The initial 64 bit segment must reside in the lower 4 GB because compatibility mode can't "see" 64 bit addresses.
  • The only thing, we've to do in Long Mode is, to reset the RSP.

That was the way up. It follows the way down:


  • Because 0EAh isn't a valid jump in Long Mode, we use the RET trick to go back to our compatibility mode segment:

    xor       rcx, rcx
    mov       ecx, BackTo32
    push      code32               ; 32 bit code selector
    push      rcx
    retf

[li]Load the segment registers with valid 32 bit selectors.[/li]
[li]Disable paging.[/li]
[li]De-activate Long Mode.[/li]
[/list]

Now, we're back in the "dirty old" 32 bit Protected Mode. We can go back to 16 bit Protected Mode and terminate clean. Game over.

It seems to me that could work. But we know: always the small things cause problems.

Gunther
Forgive your enemies, but never forget their names.

japheth

Quote from: Gunther on September 29, 2010, 02:49:54 PM
Our program needs at least 3 different segments (without the stack): 16 bit, 32 bit, and 64 bit.
Hm, the DOS64 sample just needs 16-bit and 64-bit. You can switch directly from 16-bit protected-mode to 64-bit and vice versa.

Quote
Because 0EAh isn't a valid jump in Long Mode, we use the RET trick to go back to our compatibility mode segment:[/li]

    xor       rcx, rcx
    mov       ecx, BackTo32
    push      code32               ; 32 bit code selector
    push      rcx
    retf


The RET trick works, but is unnecessary, because there IS a far jmp/call in long mode, it's just that the argument must not be an immediate operand. So this works also:


    jmp [dstaddr]
dstaddr df <32bit_proc | 16bit_proc>