News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

ZeroMemory Speed Test!

Started by ecube, January 23, 2007, 03:32:37 AM

Previous topic - Next topic

frktons

Quote from: zemtex on September 25, 2010, 03:50:48 AM
Quote from: Mark_Larson on February 21, 2008, 11:07:57 PM

align 16
;only call with > 4096 memory to clear, memory size needs to be divisible by 4096, we can add special code later to
; support any size.
Mark_zeromem_SSE_TLB proc
;use edi for ptr
;eax for size
;int 3

pxor xmm0,xmm0
shr eax,12 ;divide by 4096, one page size.

align 16
outer:
prefetchnta [edi+4096]
mov edx,4096/16 ;we handle 4096 bytes per inner loop, each MOVAPS handle 16 of those bytes.

align 16
inner:
movaps [edi],xmm0
movaps [edi+16],xmm0
movaps [edi+32],xmm0
movaps [edi+48],xmm0
add edi,16*4
sub edx,1*4
jnz inner

sub eax,1
jnz outer

ret
Mark_zeromem_SSE_TLB endp


This can be better written as this:

pxor xmm0,xmm0
shr eax,12 ;divide by 4096, one page size.

mov ecx, 4
        push ebx
        mov ebx, 1
movd mm0, esp
mov esp, 16*4

align 16
outer:
prefetchnta [edi+4096]
mov edx,4096/16 ;we handle 4096 bytes per inner loop, each MOVAPS handle 16 of those bytes.

align 16
inner:
movaps [edi],xmm0
movaps [edi+16],xmm0
movaps [edi+32],xmm0
movaps [edi+48],xmm0
add edi, esp
sub edx, ecx
jnz inner

sub eax, ebx
jnz outer

        movd esp, mm0
        pop ebx
     
ret
Mark_zeromem_SSE_TLB endp


Did you get any improvement in the performance?
"Better written" implies what in this case?

Frank

Mind is like a parachute. You know what to do in order to use it :-)

zemtex

Quote from: frktons on September 25, 2010, 08:49:32 AM
Did you get any improvement in the performance?
"Better written" implies what in this case?
Frank

I havent run the test on it. You save 3 bytes per iteration in the inner loop. It shrinks from 21 to 18 bytes.
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.