Zeroing a register

sinsi · February 03, 2012, 06:18:51 AM

Is there any performance problem between using a 32-bit or 64-bit register?

    sub r8d,r8d
    sub r8,r8

Both do the same thing, both are encoded as 3 bytes, but is one better?
I used r8 in this example because using eax/rax there is a difference (extra byte for the rex prefix).

sinsi · February 03, 2012, 09:59:36 AM

Maybe I should qualify my question, it's not so much the performance (1 clock cycle ain't a killer) but more like a gotcha.
I am thinking of stalls, like when using an 8- or 16-bit register in 32-bit mode.

habran · February 03, 2012, 11:09:52 AM

Hi sinsi,

I am using XOR reg,reg instead of SUB reg,reg
even though both instructions use 1 clock cycle on 486 processor
XOR is looking better and more sophisticated because it looks you understand binary numbers

here http://classes.engr.oregonstate.edu/eecs/summer2008/cs271/Instructions.htm you can check for clock cycles

regards

hutch-- · February 03, 2012, 11:21:19 AM

sinsi,

I don't know if 64 bit capable hardware suffered the problem that early PIVs did with partial register writes stalling a larger register read or write shortly after it. I personally doubt that a zeroing operation fits into that style of problem as both SUB and XOR tend to live in silicon, not microcode but probably the only safe way is to make a small test piece and time it. I remember on a PIII that you used to get very bad stalls if you performed a BYTE operation followed shortly after with a DWORD operation on a register and it was blatantly obvious that the timing was different.

If you don't get major differences in the timing, then it probably is not a big deal.

sinsi · February 04, 2012, 10:13:25 AM

Yeah, I can't really see a problem with zeroing the upper bits, that's built in to all sorts of other instructions.
Interesting, I was wondering about 32/64, never thought about e.g. r8b and how that affects r8/r8d/r8w. Same I should think as al/eax in 32-bit cpus.

All we need is for MichaelW to make timers64...although I am having a go at it on and off.

MichaelW · February 04, 2012, 10:30:57 AM

If Dave would hurry up and win the lottery he could buy me a new system as he promised, and then I could make the move to 64 bits :bg

qWord · February 04, 2012, 12:07:56 PM

Quote from: sinsi on February 04, 2012, 10:13:25 AM
All we need is for MichaelW to make timers64...although I am having a go at it on and off.

I've translate them a while ago:

Code Select

; x64-Version of MichaelW's macros
counter_begin MACRO loopcount:REQ, priority
LOCAL label

	IFNDEF tmcb__nLoops
		.data
			align 16
			tmcb__nLoops	dd 0
			tmcb__cntr		dd 0
			tmcb__qw		dq 2 dup (?)
		.code
	ENDIF

	mov tmcb__nLoops,loopcount
	IFNB <priority>
		call GetCurrentProcess
		mov rdx,priority
		mov rcx,rax
		call SetPriorityClass
	ENDIF
	xor rax,rax
	cpuid
	rdtsc

	mov DWORD ptr tmcb__qw[0],eax
	mov DWORD ptr tmcb__qw[4],edx
	mov tmcb__cntr, loopcount
	xor rax,rax
	cpuid
	align 16
@@:
	sub tmcb__cntr,1
	jnz @B

	xor rax,rax
	cpuid
	rdtsc
	shl rdx,32
	or rax,rdx
	sub rax,tmcb__qw[0]
	mov tmcb__qw[0],rax

	xor rax, rax
	cpuid
	rdtsc
	mov tmcb__cntr,loopcount
	mov DWORD ptr tmcb__qw[8],eax
	mov DWORD ptr tmcb__qw[12],edx
	xor rax,rax
	cpuid
	align 16
label:
	tmcb__label equ <label>
ENDM

; x64-Version of MichaelW's macros
counter_end MACRO
	sub tmcb__cntr,1
	jnz tmcb__label

	xor rax,rax
	cpuid
	rdtsc
	shl rdx,32
	or rax,rdx
	sub rax,tmcb__qw[0]
	sub rax,tmcb__qw[8]
	mov tmcb__qw[0],rax

	call GetCurrentProcess
	mov rdx,NORMAL_PRIORITY_CLASS
	mov rcx,rax
	call SetPriorityClass

	IFDEF _EMMS
		EMMS
	ENDIF

	finit
	fild tmcb__qw[0]
	fild tmcb__nLoops
	fdiv
	fistp tmcb__qw[0]

	mov rax,tmcb__qw[0]
ENDM

News:

Zeroing a register

sinsi

sinsi

habran

hutch--

sinsi

MichaelW

qWord