News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Save/Restore xmm6/xmm7

Started by HooKooDooKu, January 31, 2012, 05:41:11 PM

Previous topic - Next topic

HooKooDooKu

A few months ago, I wrote a PROC that utilized SSE2 instructions and xmm0 - xmm7 registers.

Apparently I've gotten lucky so far, because I only recently learned about x64 calling conventions and the need to preserve the xmm6 & mmm7 registers.

But I can't find any references of what is the typical push/pop equivalent to store xmm6 & xmm7?

qWord

movdqu OWORD ptr [rsp+x],...or
movaps OWORD ptr [rsp+x],...or
movapd OWORD ptr [rsp+x],...
FPU in a trice: SmplMath
It's that simple!

HooKooDooKu

Quote from: qWord on January 31, 2012, 07:24:34 PM
movdqu OWORD ptr [rsp+x],......

But before I do that, wouldn't I need to make room for them by manually manipulating the stack ?

For example, wouldn't I need to do the following at the start of the PROC...

push rbx
push rdi
add rsp, 0FFFFFFFFFFFFFFE0h  ;Move Stack Pointer down 32 bytes
movdqu OWORD ptr [rsp], xmm6
movdqu OWORD ptr [rsp], xmm7


Then clean up at the end of the PROC with...

movdqu xmm7, OWORD ptr [rsp+16]
movdqu xmm6, OWORD ptr [rsp]
add rsp, 32
pop rdi
pop rbx
ret


Keep in mind my prior experience with ASM has been using _asm blocks inside C/C++ code.  So all the prolog/epilog logic and stack manipulation has been done for me.  I'm basically just learning exactly how the stack is used.

HooKooDooKu

opps, the prolog was supposed to include:
movdqu xmm6, OWORD ptr [rsp]
movdqu xmm7, OWORD ptr [rsp+16]


HooKooDooKu

 ::)
Still getting it wrong, parameters are backward.

Please ignore what I typed and read what I  MEANT to type. :red

qWord

hi,
the required space must be allocated by sub/add. Also, you must keep track of the stack alignment, which is missaligned by 8 at function entry. The most important parts for stack-using functions are the entries  in the .pdata and .xdata section -> MASM has special directives for this: e.g. .ALLOCSTACK, .SAVEXMM128,  ....
Without the p/xdata entries, even a simple push may cause an access violation (AFAIK).
bla proc FRAME
sub rsp,8+16
.allocstack 8+16
movdqa OWORD ptr [rsp],xmm0
.savexmm128 xmm0,0
.endprolog
...
movdqa xmm0,OWORD ptr [rsp]
add rsp,8+16
ret

You may better switch to jwasm, which can handle the stack frame for you.
FPU in a trice: SmplMath
It's that simple!

tofu-sensei

Quote from: qWord on January 31, 2012, 09:28:20 PMWithout the p/xdata entries, even a simple push may cause an access violation (AFAIK).
nope. they're only required for exception handling in non-leaf functions.

HooKooDooKu

Quote from: qWord on January 31, 2012, 09:28:20 PM
You may better switch to jwasm, which can handle the stack frame for you.

From what I've seen of comments on jwasm here in the forum, it sounds nice.  But I'm just a small cog in a bigger machine.  Some of the code I'm converting from 32 to 64 bit is shared not just with other programmers, but other development teams (Big company A buys little company B and C, A wants B & C to share).

qWord

Quote from: tofu-sensei on February 01, 2012, 11:16:09 AM
Quote from: qWord on January 31, 2012, 09:28:20 PMWithout the p/xdata entries, even a simple push may cause an access violation (AFAIK).
nope. they're only required for exception handling in non-leaf functions.
what is with access volition if the next stack page is not mapped? What will window's exception handler do if a 'unknown' function tries to use the stack? ... still not clear for me.
In fact I have not problems whiteout p/xdata on my current machine (win7), but this behaviour may change in future versions (?)
Regardsless above, Microsoft is very clear about all this: do not use the stack without p/xdata.

regards, qWord
FPU in a trice: SmplMath
It's that simple!

tofu-sensei

Quote from: qWord on February 01, 2012, 02:33:46 PM
what is with access volition if the next stack page is not mapped?
then your program will crash ;)
so yes, of course one should always generate xdata for non-leaf functions (i.e. functions that call other functions or allocate stack space), but it is not strictly necessary.

HooKooDooKu

Ok, with a bit more research, I think I'm starting to understand.  Can someone take a moment and see if I'm doing this correctly? 

I've written a sample x64 PROC that "does it all".  It has 6 parameters, 5 local variables, and saves both 64-bit and 128-bit non-volatile registers.
The sample utilizes MASM macros found in ksamd64.inc to handle the .pdata & .xdata stuff.

The basic outline is that a structure is used to define the space needed for non-volatile registers and local variables.  MASM macros alloc the stack and store off non-volatile registers.  A set of EQU statements setup symbols to make easy access to everything on the stack.

option casemap :none
include ksamd64.inc
.data
.code

Test_Frame struct
   ;Storage for non-volatile registers
   RegXMM6   OWORD   ?   
   RegXMM7 OWORD   ?   
   RegRBX   QWORD   ?
   RegRSI  QWORD   ?

   ;Storage for LOCAL variables
   Local1    BYTE   ?   
   Local2    WORD   ?
   Local3   DWORD   ?
   Local4   QWORD   ?
   Local5   OWORD   ?

   Filler  QWORD   ?   ;So that RSP will be ALIGN(16) after alloc_stack
Test_Frame ends

public Test_asm
Test_asm PROC FRAME ;Parm1:BYTE, Parm2:WORD, Parm3:DWORD, Parm4:QWORD, Parm5:DWORD, Parm6:QWORD

   ;Alloc space for LOCAL variables and non-volatile registers
   alloc_stack( sizeof Test_Frame )
   set_frame rbp, 0

   ;Save non-volatile registers
   save_reg rbx, Test_Frame.RegRBX
   save_reg rsi, Test_Frame.RegRSI
   save_xmm128 xmm6, Test_Frame.RegXMM6
   save_xmm128 xmm7, Test_Frame.RegXMM7

   ;Saved Registers
   SavedRegRBX      EQU <[rbp].Test_Frame.RegRBX >
   SavedRegRSI      EQU <[rbp].Test_Frame.RegRSI >
   SavedRegXMM6   EQU <[rbp].Test_Frame.RegXMM6>
   SavedRegXMM7    EQU <[rbp].Test_Frame.RegXMM7>

   ;LOCAL variables
   Var1   EQU <[rbp].Test_Frame.Local1>
   Var2   EQU <[rbp].Test_Frame.Local2>
   Var3   EQU <[rbp].Test_Frame.Local3>
   Var4   EQU <[rbp].Test_Frame.Local4>
   Var5   EQU <[rbp].Test_Frame.Local5>

   ;Parameters
   Parm1   EQU < BYTE PTR [rbp + (sizeof Test_Frame) + 08h]>
   Parm2   EQU < WORD PTR [rbp + (sizeof Test_Frame) + 10h]>
   Parm3   EQU <DWORD PTR [rbp + (sizeof Test_Frame) + 18h]>
   Parm4   EQU <QWORD PTR [rbp + (sizeof Test_Frame) + 20h]>
   Parm5   EQU <DWORD PTR [rbp + (sizeof Test_Frame) + 28h]>
   Parm6   EQU <QWORD PTR [rbp + (sizeof Test_Frame) + 30h]>

   ;(Optional) Save Register Parms
   mov Parm1, cl
   mov Parm2, dx
   mov Parm3, r8d
   mov Parm4, r9

   .ENDPROLOG


   ;Your Code Goes HERE


   ;EPILOG
   movdqa XMM7, SavedRegXMM7
   movdqu XMM6, SavedRegXMM6
   mov rsi, SavedRegRSI
   mov rbx, SavedRegRBX
   add rsp, ( sizeof Test_Frame )

   ret
Test_asm ENDP

end