Using C++ and MFC for bulk of my programming (Visual Studio 2010). However I've got some _asm blocks I'm trying to move to MASM64 as we get our code x64 ready.
I've got an _asm block that is using 'float' values to scale an image. The replacement MASM64 PROC seems to be working as the resulting pixels coming from the MASM64 PROC match the pixels from the _asm block. But the image never displays, and when I terminate the application I get tons of memory leaks. If I replace MASM64 PROC with a subroutine that hardcodes the resulting image to a bunch of white pixels, I get a white image an no memory leaks.
I've enclosed the MASM64 PROC below. Comments have been removed for space (and I don't expect anyone to attempt do debug the whole PROC searching for some minor flaw). But I was hoping that someone could review the way the PROC is structured and tell me if there is something at a fundamental level that I might be doing wrong.
In the C++ code, the MASM64 PROC is delared with
extern "C" void SuperScale_asm( COLORREF* pSrc, int uSrcWidth, //uSrcHeight : not required
COLORREF* pDst, int uResWidth, int uResHeight,
LineContribType* YContrib, LineContribType* XContrib,
float* RGBArray );
All variables declared outside of the _asm block yet used within the _asm block have simply been passed to the MASM64 PROC. No functions or memory allocations occur within the code (that is all done before the _asm code executes).
;Content of Fast2PassScale.inc
PDWORD TYPEDEF PTR DWORD
PCOLORREF TYPEDEF PTR DWORD
Cint TYPEDEF SDWORD
Cfloat TYPEDEF REAL4
PCfloat TYPEDEF PTR Cfloat
ContributionType struct
Left Cint ?
Right Cint ?
Weights PCfloat ?
ContributionType ends
PContributionType TYPEDEF PTR ContributionType
LineContribType struct
ContribRow PContributionType ?
WindowSize Cint ?
LineLength Cint ?
LineContribType ends
PLineContribType TYPEDEF PTR LineContribType
;Content of Fast2PassScale.asm
option casemap :none
include Fast2PassScale64.inc
.data
.code
public SuperScale_asm
SuperScale_asm PROC p1:PCOLORREF, p2:Cint, p3:PCOLORREF, p4:Cint, uResHeight:Cint, YContrib: PLineContribType, XContrib: PLineContribType, RGBArray: PCfloat
LOCAL ContribPtrX :PDWORD
LOCAL ContribTempPtr :PDWORD
LOCAL ContribPtrY :PDWORD
LOCAL YWeightPtr :PCfloat
LOCAL RGBArrPtr :PCfloat
LOCAL BVal :DWORD
LOCAL GVal :DWORD
LOCAL RVal :DWORD
LOCAL YDelta :DWORD
LOCAL YCounter :DWORD
LOCAL XCounter :DWORD
LOCAL ColumnCounter :DWORD
mov r10, rcx
mov r11d, edx
shl edx, 2
mov r12d, edx
xor eax, eax
mov BVal, eax
mov GVal, eax
mov RVal, eax
mov rax, YContrib
mov rax, [rax]
sub rax, 12
mov ContribPtrY, rax
mov YCounter, 0
ALIGN 16
VerticalLoop:
mov rbx, XContrib
mov rbx, [rbx]
sub rbx, 12
mov ContribPtrX, rbx
add ContribPtrY, 12
mov rdi, ContribPtrY
mov ecx, Cint ptr [rdi]
mov esi, Cint ptr [rdi + 4]
sub esi, ecx
inc esi
mov YDelta, esi
mov eax, r12d
imul eax, ecx
add rax, r10
sub rax, 4
mov rsi, rax
mov rdi, [rdi + 8]
mov YWeightPtr, rdi
mov eax, r11d
mov rcx, RGBArray
mov ColumnCounter, eax
mov RGBArrPtr, rcx
ALIGN 16
ColumnLoop:
mov ecx, YDelta
mov rdi, YWeightPtr
add rsi, 4
mov rdx, rsi
fldz
fldz
fldz
ALIGN 16
YWeightingLoop:
fld dword ptr[rdi]
movzx eax, byte ptr [rdx]
movzx ebx, byte ptr [rdx + 1]
mov BVal, eax
movzx eax, byte ptr [rdx + 2]
mov GVal, ebx
mov RVal, eax
fild BVal
fmul st(0), st(1)
fxch
add edx, r12d
fild GVal
fmul st(0), st(1)
fxch
add rdi, 4
fild RVal;
fmulp st(1), st(0)
fxch st(2)
faddp st(3), st(0)
faddp st(3), st(0)
faddp st(3), st(0)
dec rcx
jnz short YWeightingLoop
mov rcx, RGBArrPtr
fstp dword ptr [ecx]
fstp dword ptr [ecx + 4]
fstp dword ptr [ecx + 8]
add RGBArrPtr, 12
dec ColumnCounter
jnz short ColumnLoop
mov eax, r9d
mov XCounter, eax
mov rdx, r8
mov rax, ContribPtrX
mov ContribTempPtr, rax
ALIGN 16
RowLoop:
add ContribTempPtr, 12
mov rax, ContribTempPtr
mov rbx, RGBArray
mov rdi, rax
mov rcx, [rax]
mov rsi, [rdi + 4]
sub rsi, rcx
mov rdi, [rdi + 8]
lea rax, [rcx * 8 + rbx]
lea rbx, [rax + rcx * 4]
inc rsi
mov rax, 4
fldz
fldz
fldz
ALIGN 16
XWeightingLoop:
fld dword ptr[rdi]
fld dword ptr [rbx]
fmul st(0), st(1)
fxch
add rdi, rax
fld dword ptr [rbx + rax]
fmul st(0), st(1)
add rbx, 12
fxch
fld dword ptr [rbx + 2 * rax - 12]
fmulp st(1), st(0)
fxch st(2)
dec rsi
faddp st(3), st(0)
faddp st(3), st(0)
faddp st(3), st(0)
jnz short XWeightingLoop
fistp BVal
fistp GVal
fistp RVal
mov ebx, RVal
rol ebx, 8
or ebx, GVal
rol ebx,8
or ebx, BVal
mov dword ptr [rdx], ebx
lea rdx, [rdx + 4]
dec XCounter
jnz RowLoop
mov r8, rdx
inc YCounter
mov eax, YCounter
cmp eax, uResHeight
jb VerticalLoop
ret
SuperScale_asm ENDP
end
last time i checked ml64 didn't handle parameters or local variables at all.
Quote from: tofu-sensei on January 24, 2012, 11:19:43 PM
last time i checked ml64 didn't handle parameters or local variables at all.
???
The code will assemble without error. So if it not handling LOCAL variables, then it's not telling me about it... that is unless your point is that ml64 doesn't handle modifying the stack properly to "handle" them. Otherwise, when I step through the code with the disassembly window, is see things like
mov dword ptr [rbp-2Ch],eax(negative offsets) for LOCAL variables, and
mov rcx,qword ptr [rbp+48h](positive offsets) for Parameters
A little more information:
After executing the 1st primary loop, as some point (I haven't narrowed it down yet) the code seems to be exiting the ASM code earlier than it is supposed to, and when it does, it skips executing the line of code in the calling function that executes some cleanup (delete) code. At least that points to why I've got memory leaks. It also appears that something modifies the code. When I reenter the ASM code in the disassymbly window, parts of the code have obviously changed (starting to sound like a bug with a pointer not getting processed within all that ASM code correctly).
but does it actually set up a stack frame? you're also not saving any nonvolatile registers you're using.
Quote from: tofu-sensei on January 26, 2012, 06:03:41 PM
but does it actually set up a stack frame? you're also not saving any nonvolatile registers you're using.
Well, this is only my 2nd foray into MASM x64. The 1st time, I didn't seem to need to set up a stack frame because it looked like the compiler was doing that for me.
As I said, the function is declared ".extern "C" void SuperScale_asm". Here's what the disassembly window shows how the C++ compiler is calling the function:
1. Four parameters are loaded onto the stack @ [rsp+20h], [rsp+28h], [rsp+30h], and [rsp+38h].
2. The other four parameters are loaded into r9d, r8, edx, and rcx.
3. call (SuperScale_asm) (which goes to a jmp SuperScale_asm command)
Then I assume this is the setting up of the stack frame
4. push rbp
5. mov rpb,rsp
6. add rsp, 0FFFFFFFFFFFFFF98h
7. mov register parameters to [rbp-8], [rbp-0Ch], [rbp-14h] and [rbp-18h]
Then My asm source code starts executing.
8. Function ends with leave followed by ret.
What else should I be doing.
Additional update. I found that a part of my problem was where I'm using #ifdef WIN64 that determines if the SuperScale_asm is called, or the old _asm block gets called. I had some descructors duplicated in the WIN64 code. Once I got the double destructor call worked out, the Debug version of the logic ran just fine. But when I try it in Release mode, well, things go just wrong. I tried to put in some debug code by inserting message boxes. The subroutine seems to run fine, but when I try to write the results to a file, my message boxes quit appearing the first time I attempt to access the data (and c++ try/catch blocks around the subroutine don't catch anything).
Quote from: tofu-sensei on January 30, 2012, 05:55:48 PM
Quote from: HooKooDooKu on January 30, 2012, 04:11:35 PM
What else should I be doing.
save rbx, rsi, rdi, r12
That seems to be working. Thx
Not sure why I hadn't run into this being an issue before.
What about the floating point registers. Do I need to do anything with them?
I've never worked with the floating point registers before. I did some research on them and found it strange that in the 32bit code, when the 32 bit equivalent function was called, the TAGS register shows FFFF indicating the floating point stack is empty. But in 64 bit, the TAGS shows 0000 when the 64 bit function is called. The floating point tutorial (referenced in the links at the top right of this MASM web page) indicates that means floating point registers are loaded with valid non-zero numbers (but ST) is zero).