News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Zeroing a register

Started by sinsi, February 03, 2012, 06:18:51 AM

Previous topic - Next topic

sinsi

Is there any performance problem between using a 32-bit or 64-bit register?
    sub r8d,r8d
    sub r8,r8

Both do the same thing, both are encoded as 3 bytes, but is one better?
I used r8 in this example because using eax/rax there is a difference (extra byte for the rex prefix).
Light travels faster than sound, that's why some people seem bright until you hear them.

sinsi

Maybe I should qualify my question, it's not so much the performance (1 clock cycle ain't a killer) but more like a gotcha.
I am thinking of stalls, like when using an 8- or 16-bit register in 32-bit mode.
Light travels faster than sound, that's why some people seem bright until you hear them.

habran

Hi sinsi,

I am using XOR reg,reg instead of SUB reg,reg
even though both instructions use 1 clock cycle on 486 processor
XOR is looking better and more sophisticated because it looks you understand binary numbers

here http://classes.engr.oregonstate.edu/eecs/summer2008/cs271/Instructions.htm you can check for clock cycles

regards

hutch--

sinsi,

I don't know if 64 bit capable hardware suffered the problem that early PIVs did with partial register writes stalling a larger register read or write shortly after it. I personally doubt that a zeroing operation fits into that style of problem as both SUB and XOR tend to live in silicon, not microcode but probably the only safe way is to make a small test piece and time it. I remember on a PIII that you used to get very bad stalls if you performed a BYTE operation followed shortly after with a DWORD operation on a register and it was blatantly obvious that the timing was different.

If you don't get major differences in the timing, then it probably is not a big deal.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

sinsi

Yeah, I can't really see a problem with zeroing the upper bits, that's built in to all sorts of other instructions.
Interesting, I was wondering about 32/64, never thought about e.g. r8b and how that affects r8/r8d/r8w. Same I should think as al/eax in 32-bit cpus.

All we need is for MichaelW to make timers64...although I am having a go at it on and off.
Light travels faster than sound, that's why some people seem bright until you hear them.

MichaelW

If Dave would hurry up and win the lottery he could buy me a new system as he promised, and then I could make the move to 64 bits :bg
eschew obfuscation

qWord

Quote from: sinsi on February 04, 2012, 10:13:25 AM
All we need is for MichaelW to make timers64...although I am having a go at it on and off.
I've translate them a while ago:
; x64-Version of MichaelW's macros
counter_begin MACRO loopcount:REQ, priority
LOCAL label

IFNDEF tmcb__nLoops
.data
align 16
tmcb__nLoops dd 0
tmcb__cntr dd 0
tmcb__qw dq 2 dup (?)
.code
ENDIF

mov tmcb__nLoops,loopcount
IFNB <priority>
call GetCurrentProcess
mov rdx,priority
mov rcx,rax
call SetPriorityClass
ENDIF
xor rax,rax
cpuid
rdtsc

mov DWORD ptr tmcb__qw[0],eax
mov DWORD ptr tmcb__qw[4],edx
mov tmcb__cntr, loopcount
xor rax,rax
cpuid
align 16
@@:
sub tmcb__cntr,1
jnz @B

xor rax,rax
cpuid
rdtsc
shl rdx,32
or rax,rdx
sub rax,tmcb__qw[0]
mov tmcb__qw[0],rax

xor rax, rax
cpuid
rdtsc
mov tmcb__cntr,loopcount
mov DWORD ptr tmcb__qw[8],eax
mov DWORD ptr tmcb__qw[12],edx
xor rax,rax
cpuid
align 16
label:
tmcb__label equ <label>
ENDM

; x64-Version of MichaelW's macros
counter_end MACRO
sub tmcb__cntr,1
jnz tmcb__label

xor rax,rax
cpuid
rdtsc
shl rdx,32
or rax,rdx
sub rax,tmcb__qw[0]
sub rax,tmcb__qw[8]
mov tmcb__qw[0],rax

call GetCurrentProcess
mov rdx,NORMAL_PRIORITY_CLASS
mov rcx,rax
call SetPriorityClass

IFDEF _EMMS
EMMS
ENDIF

finit
fild tmcb__qw[0]
fild tmcb__nLoops
fdiv
fistp tmcb__qw[0]

mov rax,tmcb__qw[0]
ENDM
FPU in a trice: SmplMath
It's that simple!