SSE/SSE2 Intrinsics - Impossible rotations?

Started by bozo, August 11, 2005, 09:24:26 PM

Previous topic - Next topic

bozo

I am using Visual C++ toolkit 2003 and cannot perform bit-rotations on SSE/SSE2 registers..
There are some functions in xmmintrin.h file for adding/subtracting/xor/or..et cetera, but i need equivilant of ROL instruction on XMM register.

I'm new to this stuff, so any help would be appreciated.

Any ideas?

valy

Hi

Impossible is not French  :P

#define XROL(src,dest,imm) _asm movq xmm0,src _asm movq xmm1,xmm0 _asm psllq xmm0,imm _asm psrlq xmm1,128-imm _asm por xmm0, xmm1 _asm movq dest,xmm0

Put an "\" if you want to split the macro. NEVER use comments inside an inline asm macro.
If you want to put some binary: _asm _emit 0x.. (VC++ does NOT understand "db" directive)
Their doc. is very good.

Best regards
valy

OceanJeff32

I just wanted to tell you that Intel at their web site:

http://www.intel.com

Has all the instructions for FREE in .PDF format, and a full (sometimes too much info  :dazzled:) description of how to use them.  Their latest manuals I ordered for FREE, yes bound volumes of their Intel Pentium 4 Instruction Set, and they included everything from MMX, to XMM, to SSE3, etc.

I did some inline assembly in Visual C++ a while ago with a direct x application I was writing to see if translating the code into assembly from all those multiplies and divides etc, would help, and ....
it didn't, but it was an eye opening experience.

I would love to see what you are working on, source code too, if you're so inclined! Upload / attach away my friend!!

Later,

Jeff C
:U
Any good programmer knows, every large and/or small job, is equally large, to the programmer!

bozo

Excuse me for late reply, but i was waiting on a copy of Visual C++ 6 from a friend,
before i started learning about c++ again (got fed up with command line version)  :red

normally, i wouldn't give in that easily, but i'm only beginner ;)

Anyway, i haven't looked at the idea i was working on until today briefly.

Quote#define XROL(src,dest,imm) _asm movq xmm0,src _asm movq xmm1,xmm0 _asm psllq xmm0,imm _asm psrlq xmm1,128-imm _asm por xmm0, xmm1 _asm movq dest,xmm0

This is what i have done in assembly, but for c++, i would rather let the compiler decide what registers to use..but in that macro, XMM0 is used, when it may be used for something else by
compiler.

for MMX, this is the macro

#define M_ROL(m_tmp, m, count) \
m_tmp = m; \
m = _mm_slli_pi32(m,count); \
m_tmp = _mm_srli_pi32(m_tmp,(32-count)); \
m = _mm_or_si64(m,m_tmp);


using functions from emmintrin.h

i declare these variables.

register __m64 a = _m_from_int(0);
register __m64 tmp = _m_from_int(0);


its not really important about contents..i know the following for example, is pointless.

M_ROL(tmp,a,9);

its just example...but it roughly translates to this, like valy macro.(except compiler chooses registers)

movq        mm1,mm0
psrld       mm1,17h
pslld       mm0, 9
por         mm0,mm1


I just couldn't figure out how to do the same kind of macro above on 128 bit registers.
would PSHUFD work??

QuoteI would love to see what you are working on, source code too, if you're so inclined! Upload / attach away my friend!!

if i get it working, i'll definitely post something.

GregL

I'm not sure if you know this, but:


  • Visual C++ 6.0 does not support SSE/SSE2 without the Visual C++ 6.0 Processor Pack. You need Visual C++ 6.0 Professional to install the Processor Pack.

 
[li]Visual C++ .NET 2003 or the VC++ Toolkit 2003 has the SSE/SSE2 support built in.[/li]
[/list]

bozo

i downloaded and installed the processor pack.

see, i know you can do this with assemblers that support SSE

Quote
psrld xmm0, 8

which shifts 4 individual dword values right by 8 bits.
but i couldn't find c++ intrinsic function for __m128 (SSE) data type that does shifts.
strange?


#include <stdio.h>
#include <stdlib.h>

#include <xmmintrin.h>

void main(void)
{
__m128 m_testa = _mm_setzero_ps();

__asm{
mov [m_testa], 1
mov [m_testa+4], 1
mov [m_testa+8], 1
mov [m_testa+12], 1

movdqa xmm0, m_testa
pslld xmm0, 1 ; multiply by 2
movdqa m_testa, xmm0
}

printf("\n%08x%08x%08x%08x",m_testa);
}


the output of the program is: 00000002000000020000000200000002
which is right.

but i don't want to use assembly for the shifting process, but there doesn't seem
to be intrinsic function that does it for __m128 data type.

vc complains when i try type casting also.

GregL


OceanJeff32

About intrinsics...

I read the book Visual C++ Optimization with Assembly Code by Yury Magda (excellent book)

and there was an example of using intrinsics to perform SIMD math.  THEN the author performed the same SIMD math using inline assembly code, and showed us the disassembly.

The disassembly of the intrinsics showed pages of instructions generated by the compiler.  The disassembly of the inline assembly (two lines) showed just two lines of instructions, and indeed that was all that was required.

Just watch out using intrinsics, they are full of extras apparently that the compilers need to provide some sort of compatibility.

later,

jeff c
:U
Any good programmer knows, every large and/or small job, is equally large, to the programmer!

bozo

QuoteIt's MMX or SSE2: Shift Operations

hey, thanks Greg, i was including the wrong header file, plus i didn't see those
in the link.

its working now, cheers. :U

QuoteThe disassembly of the intrinsics showed pages of instructions generated by the compiler.  The disassembly of the inline assembly (two lines) showed just two lines of instructions, and indeed that was all that was required.

for a debug version of the code, there is ALOT of code generated, but from the project settings, i optimize for speed.
The result is code almost as a good as an assembler version.

and the advantage is, with the compiler options, i can tell it to optimize for different processors, without re-writing the code..something that can't be done easily in assembly.

thanks all :)

OceanJeff32

Well, cool, but in this example, the compiler was doing a 128-bit copy, using the 64-bit registers, but the compiler was using the commands to transfer 32-bits at a time!  So instead of using two or maybe three 64-bit transfers, there were about 20+ instructions that danced around this transfer. Crazy!

anyways, I have never used intrinsics, but I did like the idea that Visual c++ could use the larger registers in code, that was cool, i thought.

later,

jeff c
:U
Any good programmer knows, every large and/or small job, is equally large, to the programmer!