News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

CopyMemory API

Started by Mark Jones, March 23, 2006, 10:14:03 PM

Previous topic - Next topic

Mark Jones

Hi, anyone know what library CopyMemory  is in? It's not found searching the v9.00 release of \masm32\include. :wink

Here's a general-purpose replacement. I was curious how the two fared in execution speed.


CopyMem PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
    mov esi,src
    mov edi,dst
    mov ecx,leng
    cld                     ; clear direction to copy forwards
@@:                         ; copy DWORDs
    cmp ecx,4               ; until less than 4 bytes remain
    jl @F
    sub ecx,4
    movsd                   ; copy DWORD & increment pointers
    jmp @B
@@:                         ; then copy any remaining bytes
    cmp ecx,0
    je @F
    sub ecx,1
    movsb
    jmp @B
@@:
    ret
CopyMem endp
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

PBrennick

It is found in the PSDK

WinBase.h
#define CopyMemory RtlCopyMemory

winNT.h
#define RtlCopyMemory(Destination,Source,Length)

Paul

The GeneSys Project is available from:
The Repository or My crappy website

Mark Jones

Aaah, thanks Paul. Can't seem to find RtlCopyMemory either, lol. :bg Here's another routine, untested, probably faster than the first:


CopyMem2 PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
nop
    mov esi,src
    mov edi,dst
    mov eax,leng
    xor edx,edx             ; divide length into dwords
    mov ecx,4
    div ecx
    mov ecx,eax
    cld                     ; clear direction to copy forwards
    rep movsd               ; copy DWORDs & increment pointers until ecx=0
@@:                         ; then copy any remaining bytes
    test cl,cl
    je @F
    dec ecx
    movsb
    jmp @B
@@:
    ret
CopyMem2 endp


EDIT: Oh yeah this won't work for sizes not equal to dwords. Well maybe I'll plug away at it tomorrow.  :bg
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

PBrennick

I saw CopyMem in the PSDK, I think.  Seems the includes could include more!
Paul
The GeneSys Project is available from:
The Repository or My crappy website

hutch--

Mark,

Some of the direct memory functions can be found in ntoskrnl.exe but they cannot be considered safe across Windows versions from that DLL. The function is declared in winbase.h but I don't currently have a reference to which DLL or library they are in. There is a library in the server 2003 sdk set for ntoskrnl.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Mincho Georgiev

Copy Memory /a.k.a RtlCopyMemory/ is a inline c++ function,located in wdm.lib. You can use RtlMoveMemory too, it does the same job,i always use that with no problems at all. The Source operand is never changed in my system, using RtlMoveMemory, anyway, you can use wdm.lib form poasm package.

Mincho Georgiev

I almost forgot something. CopyMemory,RtlCopyMemory and memcpy are one and the SAME function, a c-runtime one :)
So, even if is not located in wdm.lib from poasm package (i was use it once from wdm.lib ,but from VC6 as i remember) This is not an API function:

from winbase.h:
#define CopyMemory RtlCopyMemory
#define RtlCopyMemory(Destination,Source,Length) memcpy((Destination),(Source),(Length))

Litle tricky,is n't ?  :bg

NightWare

Quote from: Mark Jones on March 23, 2006, 11:20:00 PM
    mov eax,leng
    xor edx,edx             ; divide length into dwords
    mov ecx,4
    div ecx
    mov ecx,eax

??? is it a joke ? coz shr eax,2 should do the job

here, a clever code to copy memory, it's for people interrested by non intel dependant code (rep, movsd, movsb, etc...)... it's quite fast


ALIGN 16
memcopy PROC _dest_:DWORD,_src_:DWORD,_size_:DWORD
push ecx
push edx
push esi
push edi

mov eax,_size_
mov esi,_src_
mov edi,_dest_
mov ecx,eax
and ecx,11111111111111111111111111110000b
jz Label1
add esi,ecx
add edi,ecx
neg ecx
Label0: mov edx,DWORD PTR[esi+ecx]
mov DWORD PTR[edi+ecx],edx
mov edx,DWORD PTR[esi+ecx+4]
mov DWORD PTR[edi+ecx+4],edx
mov edx,DWORD PTR[esi+ecx+8]
mov DWORD PTR[edi+ecx+8],edx
mov edx,DWORD PTR[esi+ecx+12]
mov DWORD PTR[edi+ecx+12],edx
add ecx,16
jnz Label0
Label1: mov ecx,eax
and ecx,00000000000000000000000000001100b
jz Label3
add esi,ecx
add edi,ecx
neg ecx
Label2: mov edx,DWORD PTR [esi+ecx]
mov DWORD PTR [edi+ecx],edx
add ecx,4
jnz Label2
Label3: mov ecx,eax
and ecx,00000000000000000000000000000011b
jz Label5
add esi,ecx
add edi,ecx
neg ecx
Label4: mov dl,BYTE PTR [esi+ecx]
mov BYTE PTR [edi+ecx],dl
inc ecx
jnz Label4
Label5:
pop edi
pop esi
pop edx
pop ecx
ret
memcopy ENDP


anyway, even if your not interested by non intel dependant code, it's always interesting to see other approch

Mark Jones

#8
Quote from: NightWare on March 25, 2006, 02:48:25 AM
Quote from: Mark Jones on March 23, 2006, 11:20:00 PM
    mov eax,leng
    xor edx,edx             ; divide length into dwords
    mov ecx,4
    div ecx
    mov ecx,eax

??? is it a joke ? coz shr eax,2 should do the job

No it was not a joke. It was untested code typed into this forum in a hurry. The interesting part is how the timings do not change:


    db 32-(($-a) AND 31) dup (0CCh) ; ALIGN 32
CopyMem3 PROC USES ESI EDI dst:DWORD,src:DWORD,leng:DWORD
    mov esi,src
    mov edi,dst
    mov ecx,leng
    shr ecx,2               ; divide length into dwords
    cld                     ; clear direction to copy forwards
    rep movsd               ; copy DWORD & increment pointers until ecx=0
@@:                         ; then copy any remaining bytes
    cmp ecx,0
    je @F
    sub ecx,1
    movsb
    jmp @B
@@:
    ret
CopyMem3 endp


Quote
anyway, even if your not interested by non intel dependant code, it's always interesting to see other approch

Yes, in fact that's why we have a discussion forum here. :bg

Your routine is quite fast. Here's the results from all three, clocked on 32/16/8/4/3/2/1-byte read offsets. All tests pass on 64-byte memory lengths.

Quote from: AMD XP 2500+ / XP SP2
CopyMem1:      113, 113, 113, 113, 113, 113, 113 (esi,edi) movsd
CopyMem2:      89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMem3:      89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMemNW1: 38, 38, 38, 38, 45, 43, 45 (esi,edi) mov dword

Press enter to exit...
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Mark Jones

#9
 I was able to tweak your routine a little to get even better performance on the AMD. Doesn't preserve the other GPR's though.


    ; by NightWare for non-intel-dependent code
    db 32-(($-a) AND 31) dup (0CCh) ; ALIGN 32
CopyMemNW1 PROC dst:DWORD,src:DWORD, siz:DWORD
    mov eax,siz
    mov esi,src
    mov edi,dst
    mov ecx,eax
    and ecx,11111111111111111111111111110000b
    jz checkDWord
    add esi,ecx
    add edi,ecx
    neg ecx
@@:
    mov edx,DWORD PTR[esi+ecx]
    mov DWORD PTR[edi+ecx],edx
    mov edx,DWORD PTR[esi+ecx+4]
    mov DWORD PTR[edi+ecx+4],edx
    mov edx,DWORD PTR[esi+ecx+8]
    mov DWORD PTR[edi+ecx+8],edx
    mov edx,DWORD PTR[esi+ecx+12]
    mov DWORD PTR[edi+ecx+12],edx
    add ecx,16
    jnz @B
checkDWord:
    mov ecx,eax
    and ecx,00000000000000000000000000001100b
    jz checkByte
    add esi,ecx
    add edi,ecx
    neg ecx
@@:
    mov edx,DWORD PTR [esi+ecx]
    mov DWORD PTR [edi+ecx],edx
    add ecx,4
    jnz @B
checkByte:
    mov ecx,eax
    and ecx,00000000000000000000000000000011b
    jz done
    add esi,ecx
    add edi,ecx
    neg ecx
@@:
    mov dl,BYTE PTR [esi+ecx]
    mov BYTE PTR [edi+ecx],dl
    inc ecx
    jnz @B
done:
    sub esi,eax
    sub edi,eax
    ret
CopyMemNW1 ENDP


Quote from: AMD XP 2500+ / XP SP2
CopyMem1:       113, 113, 113, 113, 113, 113, 113 (esi,edi) movsd
CopyMem2:       89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMem3:       89, 89, 89, 89, 99, 98, 99 (esi,edi) rep movs
CopyMemNW1: 34, 34, 34, 34, 42, 38, 42 (esi,edi) mov dword

Press enter to exit...

EDIT: Corrected bug in branching. Whoops! :bg
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Mincho Georgiev

Ok, i was thought alot before posting this , but i dont see a reason not to do it, since i didn't see anything about it in the license.
Mark, this is the original CopyMemory function, str8 from microsoft's vs7.I only had change the name /memcpy/ This is the function that you had looking for at the begining of this.
I didn't have the time for timing, but you can do it if you like, it will be interesting for me to see the results.

[attachment deleted by admin]

NightWare

hi all,

by using the technic i've posted previously it's possible to produce lot of memory algo

ZeroMem (mov), MemFill (mov), MemXchg (mov*2), MemFilter (and/xor), MemFusion (or), MemAdd (add), etc...

it's quite easy to adapt it, so i'm not going to post all those algos (i don't want to do all the job for you... and maybe someone will be able to see other possibilities i haven't saw)

shaka as posted a copymem algo (badly named coz it's a memmove/rtlmemmove like algo, it take care about the possible overwrite)
so if you want to make speed test you need something that do exactly the same job, that's why i post my MemMove variant algo (in fact, i've never tested it, so don't blame me if it doesn't work correctly), it's just a bit more complicate than the code i've posted previously...


ALIGN 16
MemMove PROC _dest_:DWORD,_src_:DWORD,_size_:DWORD
push ecx
push edx
push esi
push edi

mov eax,_size_
mov esi,_src_
mov edi,_dest_
cmp esi,edi
jb Label07
Label00: mov ecx,eax
and ecx,11111111111111111111111111110000b
jz Label02
add esi,ecx
add edi,ecx
neg ecx
Label01: mov edx,DWORD PTR[esi+ecx]
mov DWORD PTR[edi+ecx],edx
mov edx,DWORD PTR[esi+ecx+4]
mov DWORD PTR[edi+ecx+4],edx
mov edx,DWORD PTR[esi+ecx+8]
mov DWORD PTR[edi+ecx+8],edx
mov edx,DWORD PTR[esi+ecx+12]
mov DWORD PTR[edi+ecx+12],edx
add ecx,16
jnz Label01
Label02: mov ecx,eax
and ecx,00000000000000000000000000001100b
jz Label04
add esi,ecx
add edi,ecx
neg ecx
Label03: mov edx,DWORD PTR [esi+ecx]
mov DWORD PTR [edi+ecx],edx
add ecx,4
jnz Label03
Label04: mov ecx,eax
and ecx,00000000000000000000000000000011b
jz Label06
add esi,ecx
add edi,ecx
neg ecx
Label05: mov dl,BYTE PTR [esi+ecx]
mov BYTE PTR [edi+ecx],dl
inc ecx
jnz Label05
Label06:
pop edi
pop esi
pop edx
pop ecx
ret

Label07: mov ecx,edi
sub ecx,esi
cmp eax,ecx
jbe Label00
add esi,eax
add edi,eax
mov ecx,eax
and ecx,00000000000000000000000000000011b
jz Label09
sub esi,ecx
sub edi,ecx
Label08: dec ecx
mov dl,BYTE PTR [esi+ecx]
mov BYTE PTR [edi+ecx],dl
jnz Label08
Label09: mov ecx,eax
and ecx,00000000000000000000000000001100b
jz Label11
sub esi,ecx
sub edi,ecx
Label10: add ecx,-4
mov edx,DWORD PTR [esi+ecx]
mov DWORD PTR [edi+ecx],edx
jnz Label10
Label11: mov ecx,eax
and ecx,11111111111111111111111111110000b
jz Label13
sub esi,ecx
sub edi,ecx
Label12: add ecx,-16
mov edx,DWORD PTR[esi+ecx+12]
mov DWORD PTR[edi+ecx+12],edx
mov edx,DWORD PTR[esi+ecx+8]
mov DWORD PTR[edi+ecx+8],edx
mov edx,DWORD PTR[esi+ecx+4]
mov DWORD PTR[edi+ecx+4],edx
mov edx,DWORD PTR[esi+ecx]
mov DWORD PTR[edi+ecx],edx
jnz Label12
Label13:
pop edi
pop esi
pop edx
pop ecx
ret
MemMove ENDP


mark, if you want to remove useless PUSHs and POPs, in the algo you've changed you can add :
sub esi,eax
sub edi,eax
at the end of the code and remove USES ESI,EDI... it's just a bit faster...

Mincho Georgiev

It is not badly named, i just cut out the preprocessor directives that make the difference, cause they're the only difference between CopyMemory and memmove !

Mark Jones

Thanks everyone.

Execution cycles, 64-byte memory copy, data read-aligned to 32/16/8/4/3/2/1:
Quote from: Athlon XP 2500+ / XP SP2
CopyMem1:   101, 101, 101, 101, 101, 101, 101 (esi,edi) movsd
CopyMem2:   88, 88, 88, 88, 98, 97, 98 (esi,edi) rep movs
CopyMem3:   88, 88, 88, 88, 98, 97, 98 (esi,edi) rep movs
CopyMemory: 53, 53, 53, 53, 63, 62, 63 (From VS7)
MemMoveNW1: 36, 36, 36, 36, 43, 41, 43 (esi,edi) mov dword
CopyMemNW1: 35, 35, 35, 35, 39, 37, 39 (esi,edi) mov dword
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Mincho Georgiev

Thanks to you too, Mark for that usefull performance info !  :thumbu