Print Page - Application Blowing up with

Title: Application Blowing up with _asm to MASM64 Conversion
Post by: HooKooDooKu on January 24, 2012, 10:22:54 PM

Using C++ and MFC for bulk of my programming (Visual Studio 2010). However I've got some _asm blocks I'm trying to move to MASM64 as we get our code x64 ready.

I've got an _asm block that is using 'float' values to scale an image. The replacement MASM64 PROC seems to be working as the resulting pixels coming from the MASM64 PROC match the pixels from the _asm block. But the image never displays, and when I terminate the application I get tons of memory leaks. If I replace MASM64 PROC with a subroutine that hardcodes the resulting image to a bunch of white pixels, I get a white image an no memory leaks.

I've enclosed the MASM64 PROC below. Comments have been removed for space (and I don't expect anyone to attempt do debug the whole PROC searching for some minor flaw). But I was hoping that someone could review the way the PROC is structured and tell me if there is something at a fundamental level that I might be doing wrong.

In the C++ code, the MASM64 PROC is delared with
extern "C" void SuperScale_asm( COLORREF* pSrc, int uSrcWidth, //uSrcHeight : not required
        COLORREF* pDst, int uResWidth, int uResHeight,
                            LineContribType* YContrib, LineContribType* XContrib,
                              float* RGBArray );

All variables declared outside of the _asm block yet used within the _asm block have simply been passed to the MASM64 PROC. No functions or memory allocations occur within the code (that is all done before the _asm code executes).

;Content of Fast2PassScale.inc
PDWORD TYPEDEF PTR DWORD
PCOLORREF TYPEDEF PTR DWORD
Cint TYPEDEF SDWORD
Cfloat TYPEDEF REAL4
PCfloat TYPEDEF PTR Cfloat

ContributionType   struct
Left Cint ?
Right Cint ?
Weights    PCfloat ?
ContributionType ends
PContributionType TYPEDEF PTR ContributionType

LineContribType    struct
ContribRow        PContributionType ?
WindowSize Cint           ?
LineLength         Cint           ?
LineContribType ends
PLineContribType TYPEDEF PTR LineContribType

;Content of Fast2PassScale.asm
option casemap :none
include Fast2PassScale64.inc
.data
.code
public SuperScale_asm
SuperScale_asm PROC p1:PCOLORREF, p2:Cint, p3:PCOLORREF, p4:Cint, uResHeight:Cint, YContrib: PLineContribType, XContrib: PLineContribType, RGBArray: PCfloat
   LOCAL ContribPtrX      :PDWORD
   LOCAL ContribTempPtr   :PDWORD
   LOCAL ContribPtrY      :PDWORD
   LOCAL YWeightPtr       :PCfloat
   LOCAL RGBArrPtr        :PCfloat
   LOCAL BVal             :DWORD
   LOCAL GVal             :DWORD
   LOCAL RVal             :DWORD
   LOCAL YDelta           :DWORD
   LOCAL YCounter         :DWORD
   LOCAL XCounter         :DWORD
   LOCAL ColumnCounter    :DWORD
   mov r10, rcx
   mov r11d, edx
   shl edx, 2
   mov r12d, edx
   xor eax, eax
   mov BVal, eax
   mov GVal, eax
   mov RVal, eax
   mov rax, YContrib
   mov rax, [rax]
   sub rax, 12
   mov ContribPtrY, rax
   mov YCounter, 0
   ALIGN 16
   VerticalLoop:
      mov rbx, XContrib
      mov rbx, [rbx]
      sub rbx, 12
      mov ContribPtrX, rbx
      add ContribPtrY, 12
      mov rdi, ContribPtrY
      mov ecx, Cint ptr [rdi]
      mov esi, Cint ptr [rdi + 4]
      sub esi, ecx
      inc esi
      mov YDelta, esi
      mov eax, r12d
      imul eax, ecx
      add rax, r10
      sub rax, 4
      mov rsi, rax
      mov rdi, [rdi + 8]
      mov YWeightPtr, rdi
      mov eax, r11d
      mov rcx, RGBArray
      mov ColumnCounter, eax
      mov RGBArrPtr, rcx
      ALIGN 16
      ColumnLoop:
         mov ecx, YDelta
         mov rdi, YWeightPtr
         add rsi, 4
         mov rdx, rsi
         fldz
         fldz
         fldz
         ALIGN 16
         YWeightingLoop:
            fld dword ptr[rdi]
            movzx eax, byte ptr [rdx]
            movzx ebx, byte ptr [rdx + 1]
            mov BVal, eax
            movzx eax, byte ptr [rdx + 2]
            mov GVal, ebx
            mov RVal, eax
            fild BVal
            fmul st(0), st(1)
            fxch
            add edx, r12d
            fild GVal
            fmul st(0), st(1)
            fxch
            add rdi, 4
            fild RVal;
            fmulp st(1), st(0)
            fxch st(2)
            faddp st(3), st(0)
            faddp st(3), st(0)
            faddp st(3), st(0)
            dec rcx
            jnz short YWeightingLoop
         mov rcx, RGBArrPtr
         fstp dword ptr [ecx]
         fstp dword ptr [ecx + 4]
         fstp dword ptr [ecx + 8]
         add RGBArrPtr, 12
         dec ColumnCounter
      jnz short ColumnLoop
      mov eax, r9d
      mov XCounter, eax
      mov rdx, r8
      mov rax, ContribPtrX
      mov ContribTempPtr, rax
      ALIGN 16
      RowLoop:
         add ContribTempPtr, 12
         mov rax, ContribTempPtr
         mov rbx, RGBArray
         mov rdi, rax
         mov rcx, [rax]
         mov rsi, [rdi + 4]
         sub rsi, rcx
         mov rdi, [rdi + 8]
         lea rax, [rcx * 8 + rbx]
         lea rbx, [rax + rcx * 4]
         inc rsi
         mov rax, 4
         fldz
         fldz
         fldz
         ALIGN 16
         XWeightingLoop:
            fld dword ptr[rdi]
            fld dword ptr [rbx]
            fmul st(0), st(1)
            fxch
            add rdi, rax
            fld dword ptr [rbx + rax]
            fmul st(0), st(1)
            add rbx, 12
            fxch
            fld dword ptr [rbx + 2 * rax - 12]
            fmulp st(1), st(0)
            fxch st(2)
            dec rsi
            faddp st(3), st(0)
            faddp st(3), st(0)
            faddp st(3), st(0)
         jnz short XWeightingLoop
         fistp BVal
         fistp GVal
         fistp RVal
         mov ebx, RVal
         rol ebx, 8
         or ebx, GVal
         rol ebx,8
         or ebx, BVal
         mov dword ptr [rdx], ebx
         lea rdx, [rdx + 4]
         dec XCounter
      jnz RowLoop
      mov r8, rdx
      inc YCounter
      mov eax, YCounter
      cmp eax, uResHeight
   jb VerticalLoop
   ret
SuperScale_asm ENDP
end

Title: Re: Application Blowing up with _asm to MASM64 Conversion
Post by: tofu-sensei on January 24, 2012, 11:19:43 PM

last time i checked ml64 didn't handle parameters or local variables at all.

Title: Re: Application Blowing up with _asm to MASM64 Conversion
Post by: HooKooDooKu on January 25, 2012, 09:19:13 PM

Quote from: tofu-sensei on January 24, 2012, 11:19:43 PM
last time i checked ml64 didn't handle parameters or local variables at all.

???

The code will assemble without error. So if it not handling LOCAL variables, then it's not telling me about it... that is unless your point is that ml64 doesn't handle modifying the stack properly to "handle" them. Otherwise, when I step through the code with the disassembly window, is see things like

mov dword ptr [rbp-2Ch],eax

(negative offsets) for LOCAL variables, and

mov rcx,qword ptr [rbp+48h]

(positive offsets) for Parameters

Title: Re: Application Blowing up with _asm to MASM64 Conversion
Post by: HooKooDooKu on January 25, 2012, 10:51:40 PM

A little more information:

After executing the 1st primary loop, as some point (I haven't narrowed it down yet) the code seems to be exiting the ASM code earlier than it is supposed to, and when it does, it skips executing the line of code in the calling function that executes some cleanup (delete) code. At least that points to why I've got memory leaks. It also appears that something modifies the code. When I reenter the ASM code in the disassymbly window, parts of the code have obviously changed (starting to sound like a bug with a pointer not getting processed within all that ASM code correctly).

Title: Re: Application Blowing up with _asm to MASM64 Conversion
Post by: tofu-sensei on January 26, 2012, 06:03:41 PM

but does it actually set up a stack frame? you're also not saving any nonvolatile registers you're using.

Title: Re: Application Blowing up with _asm to MASM64 Conversion
Post by: HooKooDooKu on January 30, 2012, 04:11:35 PM

Quote from: tofu-sensei on January 26, 2012, 06:03:41 PM
but does it actually set up a stack frame? you're also not saving any nonvolatile registers you're using.

Well, this is only my 2nd foray into MASM x64. The 1st time, I didn't seem to need to set up a stack frame because it looked like the compiler was doing that for me.

As I said, the function is declared ".extern "C" void SuperScale_asm". Here's what the disassembly window shows how the C++ compiler is calling the function:
1. Four parameters are loaded onto the stack @ [rsp+20h], [rsp+28h], [rsp+30h], and [rsp+38h].
2. The other four parameters are loaded into r9d, r8, edx, and rcx.
3. call (SuperScale_asm) (which goes to a jmp SuperScale_asm command)
Then I assume this is the setting up of the stack frame
4. push rbp
5. mov rpb,rsp
6. add rsp, 0FFFFFFFFFFFFFF98h
7. mov register parameters to [rbp-8], [rbp-0Ch], [rbp-14h] and [rbp-18h]
Then My asm source code starts executing.
8. Function ends with leave followed by ret.

What else should I be doing.

Additional update. I found that a part of my problem was where I'm using #ifdef WIN64 that determines if the SuperScale_asm is called, or the old _asm block gets called. I had some descructors duplicated in the WIN64 code. Once I got the double destructor call worked out, the Debug version of the logic ran just fine. But when I try it in Release mode, well, things go just wrong. I tried to put in some debug code by inserting message boxes. The subroutine seems to run fine, but when I try to write the results to a file, my message boxes quit appearing the first time I attempt to access the data (and c++ try/catch blocks around the subroutine don't catch anything).

Title: Re: Application Blowing up with _asm to MASM64 Conversion
Post by: tofu-sensei on January 30, 2012, 05:55:48 PM

Quote from: HooKooDooKu on January 30, 2012, 04:11:35 PM
What else should I be doing.

save rbx, rsi, rdi, r12

Title: Re: Application Blowing up with _asm to MASM64 Conversion
Post by: HooKooDooKu on January 30, 2012, 08:30:55 PM

Quote from: tofu-sensei on January 30, 2012, 05:55:48 PM
Quote from: HooKooDooKu on January 30, 2012, 04:11:35 PM
What else should I be doing.
save rbx, rsi, rdi, r12

That seems to be working. Thx
Not sure why I hadn't run into this being an issue before.

What about the floating point registers. Do I need to do anything with them?

I've never worked with the floating point registers before. I did some research on them and found it strange that in the 32bit code, when the 32 bit equivalent function was called, the TAGS register shows FFFF indicating the floating point stack is empty. But in 64 bit, the TAGS shows 0000 when the 64 bit function is called. The floating point tutorial (referenced in the links at the top right of this MASM web page) indicates that means floating point registers are loaded with valid non-zero numbers (but ST) is zero).

The MASM Forum Archive 2004 to 2012

Project Support Forums => 64 Bit Assembler => Topic started by: HooKooDooKu on January 24, 2012, 10:22:54 PM