1. Getting started with SSE: mov instructions and data
I assume you have a P3 or Athlon or higher cpu and a OS that saves .xmm regs, otherwise you will not be able to run SSE without a GPS
before anything, start with setup your data & constants, you need to align it on 16 byte boundary
if you want to use the 32bit general registers as pointers you will note that you can only scale them x8
.data
ALIGN 16
var1 REAL4 0.0,0.0,0.0,0.0
(when using SSE2
var1 REAL8 0.0,0.0 )
var2 REAL4 12.0,12.0,12.0,12.0
we start to explain SSE syntax:
MOVAPS ;first three letters are the same as other x86 :MOV,MUL
it ends with either SS for single-precision FP, usually the lowest 0-31 bits of xmm regs,PS for 4 packed Single precision FP, for SSE2: SD for single doubleprecision, PD for Packed double
A: after MOV is aligned, U=unaligned, NT= non temporal: advice processor to bypass cache and store directly
H= high pair,L= low pair
MOVAPS causes a General protection fault, if you try to use any adress that isnt aligned on a 16 byte boundary
SSE move data instructions
MOVAPS xmm1,xmm2/mem128 |
MOVLPS xmm,m64 |
MOVSS xmm1,xmm2/m32 |
MOVAPS xmm1/mem128,xmm2 | MOVLPS m64,xmm | MOVSS xmm1/m32,xmm2 |
MOVUPS xmm1,xmm2/mem128 | MOVHPS xmm,m64 | |
MOVUPS xmm1/mem128,xmm2 | MOVHPS m64,xmm | |
MOVNTPS mem128,xmm2 |
move low pair FP's to high pair fp's MOVLHPS xmm1,xmm2 |
you will need at least a P4 to be able to run SSE2
SSE2 move data instructions(for P4)
MOVDQA xmm1,xmm2/m128 | MOVDQ2Q mm,xmm | MOVD xmm,r/m32 |
MOVDQA xmm1/m128,xmm2 | MOVSD xmm1,xmm2/m64 | MOVD r/m32,xmm |
MOVDQU xmm1,xmm2/m128 | MOVSD xmm1/m64,xmm2 | |
MOVDQU xmm1/m128,xmm2 | ||
MASKMOVDQU xmm1,xmm2* | ||
MOVAPD xmm1,xmm2/m128 | MOVLPD xmm,m64 | |
MOVAPDxmm1/mem128,xmm2 | MOVLPD m64,xmm | |
MOVNTDQ m128,xmm | MOVNTI m32,reg32 | |
MOVNTPD m128,xmm | ||
MOVUPD xmm1,xmm2/mem128 | ||
MOVUPD xmm1/mem128,xmm2 | ||
*MASKMOVDQU xmm1,xmm2 masked mov double quad, stores xmm1 in memory pointed to by ES:EDI, each byte is stored from xmm1, depending on most significiant bit in each byte in xmm2 1=write,0= nowrite