News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Universal string copy routine

Started by Ficko, June 04, 2009, 02:37:54 PM

Previous topic - Next topic

Ficko

Thank for ,,JJ2007"excellent job for timing and exploring SIMD instructions and other tricks found on the forum
I managed to come up with an universal string copy routine. :clap:

I am not claiming it is the fastest possible routine but it does what a multipurpose string copy should do best fit for high level language compilers.

It is "under read" protected. –Not read before the start of the source string-
It works with unaligned strings.
Return destination address in EAX. –Mostly needed-
Return a pointer on the zero termination in ecx. –Needed for append operation-
It requires 16 extra reserved bytes for strings created by "New" type allocations to avoid memory access violation exception.

Enjoy it!
:bg

.686p
.model flat
.xmm
.code
;Ustrcpy(Dest, Source),EAX = Destination; ECX => EndingZero
; ===========================================================================
Ustrcpy proc near public
         push esi
         push edi
         mov esi, [esp+16]                       ;ESI = Source
         mov edi, [esp+12]                       ;EDI = Destination
         pxor         xmm0, xmm0 
         movups    xmm2, [esi]
         movaps    xmm1, xmm2
         pcmpeqb    xmm2, xmm0
         pmovmskb    ecx, xmm2
         bsf ecx, ecx
         jne Exit
         mov edx, esi         
         and esi, -16
         sub edx, esi
         movups [edi], xmm1
         sub edi, edx
         jmp @F      
C0:    movlps qword ptr [edi], xmm1      ;movups [edi], xmm1 is shorter but slower
         movhps qword ptr [edi+8], xmm1
@@:  lea esi, [esi+16]
         movaps xmm1, [esi]
         lea edi, [edi+16]
         pcmpeqb xmm0, xmm1         
         pmovmskb ecx, xmm0
         test ecx, ecx
         jz C0
         bsf ecx, ecx
Exit:   inc ecx          ;Zero termination   
         mov eax, ecx
         shr ecx, 2
         rep movsd
         xchg ecx, eax
         and ecx, 3
         rep movsb
         dec edi
         mov ecx, edi
         mov eax, [esp+12]
         pop edi
         pop esi
         retn 8
Ustrcpy endp
end

jj2007

Quote from: Ficko on June 04, 2009, 02:37:54 PM
Thank for ,,JJ2007"excellent job for timing and exploring SIMD instructions and other tricks found on the forum
Thanks, I feel honoured.

> It is "under read" protected. –Not read before the start of the source string-
Don't understand what you mean. Can you give an example?

> Return destination address in EAX. –Mostly needed-
Hmmm... interesting statement. I just checked my fattest source, and found out I used 68 times lstrcpy but only once I needed the destination address:
invoke lstrcpy, addr SmlBuf, ecx ; copy body.ext
mov ebx, eax
add ebx, len(ebx) ; end of body.ext in ebx
mov al, [ebx-1]
.if al==34

> Return a pointer on the zero termination in ecx. –Needed for append operation-
And it turns out a pointer on the zero termination in ecx would have been the better choice:
invoke Ustrcpy, addr SmlBuf, ecx ; copy body.ext
mov al, [ecx-1]
.if al==34
:bg

> It requires 16 extra reserved bytes for strings created by "New" type allocations to avoid memory access violation exception.
You might rethink this condition. It hurts the claim of universality.

We need more members with your productive approach :U

Ficko

#2
Ok ,,JJ2007"
I see you want some explanation. :lol

First probably you didn't get my emphasis on
Quote"best fit for high level language compilers".

By a compiler you connect random user code with fix subs collected in libraries.
In this case usual approach would be is to push destination, source,destination
call the routine pop destination and print it if you for exemplar have a code like:

A$ = B$
PRINT  A$,"is great!"

It could look like this:

Push addr A$
Push addr B$
Push addr A$
Call strcpy
Pop eax
Push addr "is great!"
Push eax
Call Print

With my "Ustrcpy " you don't need to push and pop the destination address. :8)

For:
Quote–Not read before the start of the source string-

I meant that it is important by compilers not to start read the source string before it actually start.
–Since the high randomity of variation can cause unpredicted troubles.-

But if you use such SSE routines in your own assembler code you may achieve alignment by starting reading before the actual string starts with "movaps"
getting better performance but such case you have full control over the code to avoid trouble.

Quote
> It requires 16 extra reserved bytes for strings created by "New" type allocations to avoid memory access violation exception.
You might rethink this condition. It hurts the claim of universality.

It is common by compilers to reserve some extra bytes for safety.

Since we are reading 16-byte at once and suppose you request with HeapAlloc some space for a string suppose to be 7 byte long
you may get 8 byte back from the system on page boundary than you have trouble. :bg
-I don't know the chances it can happen maybe 1:100000 but still can be a hidden nasty bug.-

With "universal" I meant you can use it in a compiler because it is safe or in your assembler program if you keep in mind the 16 byte safety margin.

I hope that clears up couple things for you. :wink