News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Allignment for SSE - MOVAPS

Started by Draakie, January 11, 2008, 05:27:12 AM

Previous topic - Next topic

Draakie

Hi again,

This one's for API Mnenomonic grinder types :P. SSE provides for a statement MOVAPS - move alligned to 16 byte
border data to XMM register. If the Data is not alligned - an exception is generated. The alternative MOVUPS. This
I accept and understand. However..... The data sets I would like to access are assigned via the API call GlobalAlloc.
Obviously I'am missing something.....how do I ensure Memmory assignment at a 16 byte border.

I would dearly like to make use of the faster MOVAPS instruction - to gain those extra cycles.

Thanks
Draakie


Does this code make me look bloated ? (wink)

Rockoon

I believe GlobalAlloc only gives you a guarantee of 8-byte alignment.

The mission, should you choose to accept it, is to stop using GlobalAlloc.. or to use GlobalAlloc to allocate a lot of memory all at once and then manage that memory pool yourself.

(You shouldnt be using GlobalAlloc to allocate only 16 bytes, regardless of your alignment needs)
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

Draakie

Thanx Rockoon,

BUT no, I'am not just allocating 16-bytes.... more like multi-kilobytes (vertex data of structure x_REAL4,y_REAL4,z_REAL4,w_REAL4).
So what should I be using to get a guarenteed 16byte allignment, HeapAllocate ? When you say "manage the memory pool yourself"
what do you mean exactly ?

Draakie
Does this code make me look bloated ? (wink)

hutch--

Draakie,

use the alignment macro in the masm32 macros for any memory you like. It will handle any power of two you point at it.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Synfire

msvcrt.lib contains _aligned_malloc() and _aligned_free() which can be used for just this purpose. If they are not defined in msvcrt.inc, then the prototypes would be:

_aligned_malloc PROTO C _Size:DWORD, _Alignment:DWORD
_aligned_free PROTO C _Memory:DWORD


Use these just like you would the normal malloc/free, except when you call _aligned_malloc you have an extra argument which allows you to specify the alignment (should be self explanatory really). Hope this helps.

Draakie

OH YES THAT HELPS !

ta Hutch and SynFire.
Does this code make me look bloated ? (wink)

asmfan

You can use VirtualAlloc - you'll get 4K byte alignment (or 65K if calling for the first chunk of data).
If you need many small chunks - use HeapAlloc with size = NeededSize + (Alignment - 1) and then adjust final aligned pointer as follows - pointer = pointer + (Alignment - 1); pointer = pointer AND (-Alignment); Alignment must be a power of 2. In binary every number "-Alignment" represents a bit mask: "-2"=111..110b applying that mask on a pointer leads to alignment down to needed alignment factor, thus we add some (Alignment-1) before applying that mask to ensure the mew aligned start address pointer will belong to already commited data and will be above (if neede) or equal to the returned by HeapAlloc pointer and at the same time will be aligned.
I hope it explains the binary basics of alignment.
P.S. Be sure to store the basic pointer (unaligned) to Free the memory correctly.
Russia is a weird place

NightWare

Quote from: Draakie on January 11, 2008, 05:27:12 AM
Obviously I'am missing something.....how do I ensure Memmory assignment at a 16 byte border.
or with something like that :
ALIGN 16
;
; allouer un bloc mémoire à un pointeur (ce bloc est aligné sur 16 octets)
; note : la structure _Informations_Memoire_ doit, évidement, avoir été au préalable créée et définie
; enfin, comme ici on alloue un bloc mémoire, il faudra le libérer en fin de programme avec LibererBlocMemoire
;
; Syntaxe :
; mov eax,{taille du bloc mémoire que l'on va créer (en octets)}
; mov esi,{OFFSET (adresse) d'une structure _Informations_Memoire_}
; mov edi,{OFFSET (adresse) d'une structure _Bloc_Memoire_}
; call AllouerBlocMemoire
;
; Retourne :
; eax = adresse du bloc mémoire aligné
; et les variables de la structure _Bloc_Memoire_ sont définies
;
AllouerBlocMemoire PROC
push ecx ;; empiler ecx
push edx ;; empiler edx

mov (_Bloc_Memoire_ PTR [edi])._Taille_Du_Bloc_,eax ;; placer eax dans x._Taille_Du_Bloc_
add eax,000000010h ;; ajouter 16 à la taille (pour aligner le bloc mémoire sur 16 octets)
invoke HeapAlloc,(_Informations_Memoire_ PTR [esi])._Instance_,HEAP_NO_SERIALIZE or HEAP_ZERO_MEMORY,eax ;; la taille en octets du bloc mémoire à créer
mov (_Bloc_Memoire_ PTR [edi])._Pointeur_A_Liberer_,eax ;; sauvegarder eax dans x._Pointeur_A_Liberer_
and eax,0FFFFFFF0h ;; ) nécessaire pour aligner le bloc mémoire sur 16 octets
add eax,000000010h ;; )
mov (_Bloc_Memoire_ PTR [edi])._Pointeur_,eax ;; sauvegarder eax dans x._Pointeur_

pop edx ;; désempiler edx
pop ecx ;; désempiler ecx
ret ;; retourner (sortir de la procédure)
AllouerBlocMemoire ENDP

Draakie

My French is non-existent. Is there some-one willing to translate the above into English ?
Draakie :toothy

PS: NightWare - You have posted a couple of SSE (Zero Mem fill etc.) routines in French.
     Besides ToutenASM and a couple others - please remember all your valuable comments
    are lost on us poor English second language folk.
     
Does this code make me look bloated ? (wink)

daydreamer

Quote from: Draakie on January 14, 2008, 05:25:46 AM
My French is non-existent. Is there some-one willing to translate the above into English ?
Draakie :toothy

PS: NightWare - You have posted a couple of SSE (Zero Mem fill etc.) routines in French.
     Besides ToutenASM and a couple others - please remember all your valuable comments
    are lost on us poor English second language folk.
     
I also use that kinda solution
its simple and fast solution with clipping away all bits below 16 with help of AND eax,$FFFFFF0h, will ensure align 16
AND eax,someFFs can also be useful to get a rollaround effect inside reserved memory, instead of much slower check boundaries or get a GPF
I for example use it when I need some code that needs to tile a 1024x1024 texture

NightWare

drakkie,
::) the instructions are not in english ?... ok i'm gonna make an effort this time...

mov eax,Size ; size of the memory to alloc
add eax,000000010h ; add 16 to the size
invoke HeapAlloc,MemInstance,HEAP_NO_SERIALIZE or HEAP_ZERO_MEMORY,eax ; alloc
mov PointerToFree,eax ; the pointeur you need to free the memory block
; here it's the alignment
and eax,0FFFFFFF0h ; remove 0 to 15 bits value
add eax,000000010h ; add 16 ;; )
mov PointerToUse,eax ; the ALIGN 16 pointer


note : it's also possible to use
add eax,000000011h ; add 16+1 to the size
when you need to load a txt file, to ensure there is a final 0 (just to use your string routines on the txt file in memory...)

daydreamer,
the natural choice of asm coder...  :U

Draakie

Thanks Loads NightWare - the effort is appreciated. Yup the code is in English (well - u're labels were not obvious at all :naughty:)
- but sometimes the comments speak the preverbial thousand words. :wink - Please remember this thread is'nt just for me
- but those newbs to SSE who may now benefit from your and Daydreamer's infinite wizdom.

Draakie
Does this code make me look bloated ? (wink)

NightWare

Quote from: Draakie on January 15, 2008, 05:10:22 AM
the effort is appreciated. Yup the code is in English (well - u're labels were not obvious at all :naughty:)
in fact it wasn't a big effort, it was a bit necessary to translate a bit the algo, i've forgotten i use my own structures to speed up things... and it's true it was quite unreadable for someone who don't speak french