CopyMemory API

Mark Jones · March 23, 2006, 10:14:03 PM

Hi, anyone know what library CopyMemory is in? It's not found searching the v9.00 release of \masm32\include. :wink

Here's a general-purpose replacement. I was curious how the two fared in execution speed.

Code Select


CopyMem PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
    mov esi,src
    mov edi,dst
    mov ecx,leng
    cld                     ; clear direction to copy forwards
@@:                         ; copy DWORDs
    cmp ecx,4               ; until less than 4 bytes remain
    jl @F
    sub ecx,4
    movsd                   ; copy DWORD & increment pointers
    jmp @B
@@:                         ; then copy any remaining bytes
    cmp ecx,0
    je @F
    sub ecx,1
    movsb
    jmp @B
@@:
    ret
CopyMem endp

PBrennick · March 23, 2006, 11:06:14 PM

It is found in the PSDK

WinBase.h
#define CopyMemory RtlCopyMemory

winNT.h
#define RtlCopyMemory(Destination,Source,Length)

Paul

Mark Jones · March 23, 2006, 11:20:00 PM

Aaah, thanks Paul. Can't seem to find RtlCopyMemory either, lol. :bg Here's another routine, untested, probably faster than the first:

Code Select


CopyMem2 PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
nop
    mov esi,src
    mov edi,dst
    mov eax,leng
    xor edx,edx             ; divide length into dwords
    mov ecx,4
    div ecx
    mov ecx,eax
    cld                     ; clear direction to copy forwards
    rep movsd               ; copy DWORDs & increment pointers until ecx=0
@@:                         ; then copy any remaining bytes
    test cl,cl
    je @F
    dec ecx
    movsb
    jmp @B
@@:
    ret
CopyMem2 endp

EDIT: Oh yeah this won't work for sizes not equal to dwords. Well maybe I'll plug away at it tomorrow. :bg

PBrennick · March 23, 2006, 11:29:17 PM

I saw CopyMem in the PSDK, I think. Seems the includes could include more!
Paul

hutch-- · March 24, 2006, 01:14:34 AM

Mark,

Some of the direct memory functions can be found in ntoskrnl.exe but they cannot be considered safe across Windows versions from that DLL. The function is declared in winbase.h but I don't currently have a reference to which DLL or library they are in. There is a library in the server 2003 sdk set for ntoskrnl.

Mincho Georgiev · March 24, 2006, 08:42:53 AM

Copy Memory /a.k.a RtlCopyMemory/ is a inline c++ function,located in wdm.lib. You can use RtlMoveMemory too, it does the same job,i always use that with no problems at all. The Source operand is never changed in my system, using RtlMoveMemory, anyway, you can use wdm.lib form poasm package.

Mincho Georgiev · March 24, 2006, 05:30:18 PM

I almost forgot something. CopyMemory,RtlCopyMemory and memcpy are one and the SAME function, a c-runtime one :)
So, even if is not located in wdm.lib from poasm package (i was use it once from wdm.lib ,but from VC6 as i remember) This is not an API function:

from winbase.h:
#define CopyMemory RtlCopyMemory
#define RtlCopyMemory(Destination,Source,Length) memcpy((Destination),(Source),(Length))

Litle tricky,is n't ? :bg

NightWare · March 25, 2006, 02:48:25 AM

Quote from: Mark Jones on March 23, 2006, 11:20:00 PM
mov eax,leng
xor edx,edx ; divide length into dwords
mov ecx,4
div ecx
mov ecx,eax

??? is it a joke ? coz shr eax,2 should do the job

here, a clever code to copy memory, it's for people interrested by non intel dependant code (rep, movsd, movsb, etc...)... it's quite fast

Code Select


ALIGN 16
memcopy PROC _dest_:DWORD,_src_:DWORD,_size_:DWORD
		push ecx
		push edx
		push esi
		push edi

		mov eax,_size_
		mov esi,_src_
		mov edi,_dest_
		mov ecx,eax
		and ecx,11111111111111111111111111110000b
		jz Label1
		add esi,ecx
		add edi,ecx
		neg ecx
Label0:	mov edx,DWORD PTR[esi+ecx]
		mov DWORD PTR[edi+ecx],edx
		mov edx,DWORD PTR[esi+ecx+4]
		mov DWORD PTR[edi+ecx+4],edx
		mov edx,DWORD PTR[esi+ecx+8]
		mov DWORD PTR[edi+ecx+8],edx
		mov edx,DWORD PTR[esi+ecx+12]
		mov DWORD PTR[edi+ecx+12],edx
		add ecx,16
		jnz Label0
Label1:	mov ecx,eax
		and ecx,00000000000000000000000000001100b
		jz Label3
		add esi,ecx
		add edi,ecx
		neg ecx
Label2:	mov edx,DWORD PTR [esi+ecx]
		mov DWORD PTR [edi+ecx],edx
		add ecx,4
		jnz Label2
Label3:	mov ecx,eax
		and ecx,00000000000000000000000000000011b
		jz Label5
		add esi,ecx
		add edi,ecx
		neg ecx
Label4:	mov dl,BYTE PTR [esi+ecx]
		mov BYTE PTR [edi+ecx],dl
		inc ecx
		jnz Label4
Label5:
		pop edi
		pop esi
		pop edx
		pop ecx
	ret
memcopy ENDP

anyway, even if your not interested by non intel dependant code, it's always interesting to see other approch

Mark Jones · March 25, 2006, 08:27:47 PM

Quote from: NightWare on March 25, 2006, 02:48:25 AM
Quote from: Mark Jones on March 23, 2006, 11:20:00 PM
mov eax,leng
xor edx,edx ; divide length into dwords
mov ecx,4
div ecx
mov ecx,eax

??? is it a joke ? coz shr eax,2 should do the job

No it was not a joke. It was untested code typed into this forum in a hurry. The interesting part is how the timings do not change:

Code Select


    db 32-(($-a) AND 31) dup (0CCh) ; ALIGN 32
CopyMem3 PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
    mov esi,src
    mov edi,dst
    mov ecx,leng
    shr ecx,2               ; divide length into dwords
    cld                     ; clear direction to copy forwards
    rep movsd               ; copy DWORD & increment pointers until ecx=0
@@:                         ; then copy any remaining bytes
    cmp ecx,0
    je @F
    sub ecx,1
    movsb
    jmp @B
@@:
    ret
CopyMem3 endp

Quote
anyway, even if your not interested by non intel dependant code, it's always interesting to see other approch

Yes, in fact that's why we have a discussion forum here. :bg

Your routine is quite fast. Here's the results from all three, clocked on 32/16/8/4/3/2/1-byte read offsets. All tests pass on 64-byte memory lengths.

Quote from: AMD XP 2500+ / XP SP2
CopyMem1: 113, 113, 113, 113, 113, 113, 113 (esi,edi) movsd
CopyMem2: 89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMem3: 89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMemNW1: 38, 38, 38, 38, 45, 43, 45 (esi,edi) mov dword

Press enter to exit...

Mark Jones · March 25, 2006, 10:20:55 PM

I was able to tweak your routine a little to get even better performance on the AMD. Doesn't preserve the other GPR's though.

Code Select


    ; by NightWare for non-intel-dependent code
    db 32-(($-a) AND 31) dup (0CCh) ; ALIGN 32
CopyMemNW1 PROC dst:DWORD,src:DWORD, siz:DWORD
    mov eax,siz
    mov esi,src
    mov edi,dst
    mov ecx,eax
    and ecx,11111111111111111111111111110000b
    jz checkDWord
    add esi,ecx
    add edi,ecx
    neg ecx
@@: 
    mov edx,DWORD PTR[esi+ecx]
    mov DWORD PTR[edi+ecx],edx
    mov edx,DWORD PTR[esi+ecx+4]
    mov DWORD PTR[edi+ecx+4],edx
    mov edx,DWORD PTR[esi+ecx+8]
    mov DWORD PTR[edi+ecx+8],edx
    mov edx,DWORD PTR[esi+ecx+12]
    mov DWORD PTR[edi+ecx+12],edx
    add ecx,16
    jnz @B
checkDWord:
    mov ecx,eax
    and ecx,00000000000000000000000000001100b
    jz checkByte
    add esi,ecx
    add edi,ecx
    neg ecx
@@: 
    mov edx,DWORD PTR [esi+ecx]
    mov DWORD PTR [edi+ecx],edx
    add ecx,4
    jnz @B
checkByte: 
    mov ecx,eax
    and ecx,00000000000000000000000000000011b
    jz done
    add esi,ecx
    add edi,ecx
    neg ecx
@@: 
    mov dl,BYTE PTR [esi+ecx]
    mov BYTE PTR [edi+ecx],dl
    inc ecx
    jnz @B
done:
    sub esi,eax
    sub edi,eax
    ret
CopyMemNW1 ENDP

Quote from: AMD XP 2500+ / XP SP2
CopyMem1: 113, 113, 113, 113, 113, 113, 113 (esi,edi) movsd
CopyMem2: 89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMem3: 89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMemNW1: 34, 34, 34, 34, 42, 38, 42 (esi,edi) mov dword

Press enter to exit...

EDIT: Corrected bug in branching. Whoops! :bg

Mincho Georgiev · March 25, 2006, 10:29:47 PM

Ok, i was thought alot before posting this , but i dont see a reason not to do it, since i didn't see anything about it in the license.
Mark, this is the original CopyMemory function, str8 from microsoft's vs7.I only had change the name /memcpy/ This is the function that you had looking for at the begining of this.
I didn't have the time for timing, but you can do it if you like, it will be interesting for me to see the results.

[attachment deleted by admin]

NightWare · March 26, 2006, 08:49:15 PM

hi all,

by using the technic i've posted previously it's possible to produce lot of memory algo

ZeroMem (mov), MemFill (mov), MemXchg (mov*2), MemFilter (and/xor), MemFusion (or), MemAdd (add), etc...

it's quite easy to adapt it, so i'm not going to post all those algos (i don't want to do all the job for you... and maybe someone will be able to see other possibilities i haven't saw)

shaka as posted a copymem algo (badly named coz it's a memmove/rtlmemmove like algo, it take care about the possible overwrite)
so if you want to make speed test you need something that do exactly the same job, that's why i post my MemMove variant algo (in fact, i've never tested it, so don't blame me if it doesn't work correctly), it's just a bit more complicate than the code i've posted previously...

Code Select


ALIGN 16
MemMove PROC _dest_:DWORD,_src_:DWORD,_size_:DWORD
		push ecx
		push edx
		push esi
		push edi

		mov eax,_size_
		mov esi,_src_
		mov edi,_dest_
		cmp esi,edi
		jb Label07
Label00:	mov ecx,eax
		and ecx,11111111111111111111111111110000b
		jz Label02
		add esi,ecx
		add edi,ecx
		neg ecx
Label01:	mov edx,DWORD PTR[esi+ecx]
		mov DWORD PTR[edi+ecx],edx
		mov edx,DWORD PTR[esi+ecx+4]
		mov DWORD PTR[edi+ecx+4],edx
		mov edx,DWORD PTR[esi+ecx+8]
		mov DWORD PTR[edi+ecx+8],edx
		mov edx,DWORD PTR[esi+ecx+12]
		mov DWORD PTR[edi+ecx+12],edx
		add ecx,16
		jnz Label01
Label02:	mov ecx,eax
		and ecx,00000000000000000000000000001100b
		jz Label04
		add esi,ecx
		add edi,ecx
		neg ecx
Label03:	mov edx,DWORD PTR [esi+ecx]
		mov DWORD PTR [edi+ecx],edx
		add ecx,4
		jnz Label03
Label04:	mov ecx,eax
		and ecx,00000000000000000000000000000011b
		jz Label06
		add esi,ecx
		add edi,ecx
		neg ecx
Label05:	mov dl,BYTE PTR [esi+ecx]
		mov BYTE PTR [edi+ecx],dl
		inc ecx
		jnz Label05
Label06:
		pop edi
		pop esi
		pop edx
		pop ecx
	ret

Label07:	mov ecx,edi
		sub ecx,esi
		cmp eax,ecx
		jbe Label00
		add esi,eax
		add edi,eax
		mov ecx,eax
		and ecx,00000000000000000000000000000011b
		jz Label09
		sub esi,ecx
		sub edi,ecx
Label08:	dec ecx
		mov dl,BYTE PTR [esi+ecx]
		mov BYTE PTR [edi+ecx],dl
		jnz Label08
Label09:	mov ecx,eax
		and ecx,00000000000000000000000000001100b
		jz Label11
		sub esi,ecx
		sub edi,ecx
Label10:	add ecx,-4
		mov edx,DWORD PTR [esi+ecx]
		mov DWORD PTR [edi+ecx],edx
		jnz Label10
Label11:	mov ecx,eax
		and ecx,11111111111111111111111111110000b
		jz Label13
		sub esi,ecx
		sub edi,ecx
Label12:	add ecx,-16
		mov edx,DWORD PTR[esi+ecx+12]
		mov DWORD PTR[edi+ecx+12],edx
		mov edx,DWORD PTR[esi+ecx+8]
		mov DWORD PTR[edi+ecx+8],edx
		mov edx,DWORD PTR[esi+ecx+4]
		mov DWORD PTR[edi+ecx+4],edx
		mov edx,DWORD PTR[esi+ecx]
		mov DWORD PTR[edi+ecx],edx
		jnz Label12
Label13:
		pop edi
		pop esi
		pop edx
		pop ecx
	ret
MemMove ENDP

mark, if you want to remove useless PUSHs and POPs, in the algo you've changed you can add :
sub esi,eax
sub edi,eax
at the end of the code and remove USES ESI,EDI... it's just a bit faster...

Mincho Georgiev · March 26, 2006, 09:09:37 PM

It is not badly named, i just cut out the preprocessor directives that make the difference, cause they're the only difference between CopyMemory and memmove !

Mark Jones · March 27, 2006, 05:58:39 AM

Thanks everyone.

Execution cycles, 64-byte memory copy, data read-aligned to 32/16/8/4/3/2/1:

Quote from: Athlon XP 2500+ / XP SP2
CopyMem1: 101, 101, 101, 101, 101, 101, 101 (esi,edi) movsd
CopyMem2: 88, 88, 88, 88, 98, 97, 98 (esi,edi) rep movs
CopyMem3: 88, 88, 88, 88, 98, 97, 98 (esi,edi) rep movs
CopyMemory: 53, 53, 53, 53, 63, 62, 63 (From VS7)
MemMoveNW1: 36, 36, 36, 36, 43, 41, 43 (esi,edi) mov dword
CopyMemNW1: 35, 35, 35, 35, 39, 37, 39 (esi,edi) mov dword

Mincho Georgiev · March 27, 2006, 02:55:00 PM

Thanks to you too, Mark for that usefull performance info ! :thumbu

News:

CopyMemory API