The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: yvansoftware on March 30, 2010, 07:40:20 PM

Title: Compare two strings
Post by: yvansoftware on March 30, 2010, 07:40:20 PM
Hi,

I'm trying to compare two strings, and I get the error "StrCmp not defined".

Here's my code:

.386
.model flat,stdcall
option casemap:none

include \masm32\include\windows.inc
include \masm32\include\masm32.inc
include \masm32\include\gdi32.inc
include \masm32\include\user32.inc
include \masm32\include\kernel32.inc
include \masm32\include\Comctl32.inc
include \masm32\include\comdlg32.inc
include \masm32\include\shell32.inc
include \masm32\include\oleaut32.inc
include \masm32\macros\macros.asm

includelib \masm32\lib\masm32.lib
includelib \masm32\lib\gdi32.lib
includelib \masm32\lib\user32.lib
includelib \masm32\lib\kernel32.lib
includelib \masm32\lib\Comctl32.lib
includelib \masm32\lib\comdlg32.lib
includelib \masm32\lib\shell32.lib
includelib \masm32\lib\oleaut32.lib

.data
YvanSoftware db "(c) YvanSoftware - ALL RIGHTS RESERVED", 13 ,10 ,0
EnterYourName db "Please enter your name: ", 0
CRLF db 13,10,0
TheHolyMan db "Yvan", 0
Seriously db "Seriously? You're the MAN!", 13,10,0
LoserName db "What a loser name.", 13,10

.data?
buffer db 100 dup(?)
.code
start:
invoke StdOut,addr YvanSoftware
invoke StdOut, addr EnterYourName
invoke StdIn, addr buffer, 100
invoke StdOut, addr CRLF

invoke StrCmp,addr buffer, addr TheHolyMan
je HolyMan
IfNotHolyMan:
invoke StdOut, addr LoserName
jmp EndIfHolyMan
HolyMan:
invoke StdOut, addr Seriously
jmp EndIfHolyMan
EndIfHolyMan:

invoke ExitProcess,0
END start


What do I need to do?
(yeah, I know - it's a bullsh*t app, but it's for learning ASM ;-) )
Title: Re: Compare two strings
Post by: Astro on March 30, 2010, 07:54:55 PM
Hi,

; preserve these registers
push esi
push edi

cld ; count downwards or from the end of the string

mov ecx,40 ; length of the string to compare
mov esi,offset FirstStr
mov edi,offset SecondStr
rep cmpsb
je @F

; not equal processing here

jmp _EXIT

@@:

; equal processing here

_EXIT:
pop edi
pop esi
ret


Best regards,
Robin.
Title: Re: Compare two strings
Post by: qWord on March 30, 2010, 08:09:44 PM
'rep cmpsb' isn't possible :wink
.data
sz1 db "hello",0
sz2 db "hello",0
.code

cld
lea edi,sz1
mov ecx,-1
xor eax,eax
mov edx,edi
repnz scasb
sub edi,edx
mov ecx,edi
mov edi,edx

lea esi,sz2
repz cmpsb
.if ZERO?
; equal
.else
; not equal
.endif
Title: Re: Compare two strings
Post by: Astro on March 30, 2010, 08:15:44 PM
Huh.......???

From opcodes.chm:

QuoteCMPS - Compare String (Byte, Word or Doubleword)


Usage:  CMPS    dest,src
        CMPSB
        CMPSW
        CMPSD   (386+)

Modifies flags: AF CF OF PF SF ZF


CMPSB inc/decrements the index registers by 1

The REP prefixes can be used to process entire data items.

Best regards,
Robin.
Title: Re: Compare two strings
Post by: Astro on March 30, 2010, 08:20:41 PM
This works, if I use repz. Maybe that is what you meant?

include \masm32\include\masm32rt.inc

.486
.model flat,stdcall

.data
    string1 BYTE "Test!",0
    string2 BYTE "Test!",0

.code

start:

    ; preserve these registers
    push esi
    push edi

    cld ; count downwards or from the end of the string

    mov ecx,6 ; length of the string to compare
    lea esi,string1
    lea edi,string2
    repz cmpsb
    je @F

    ; not equal processing here
    print "Not equal!"

    jmp _EXIT

@@:
    ; equal processing here
    print "Equal!"

_EXIT:

    pop edi
    pop esi

    ret

end start


Best regards,
Robin.
Title: Re: Compare two strings
Post by: qWord on March 30, 2010, 08:22:39 PM
REP is not supported by cmpsX! -> see Intel's or AMD's developer guides.
Title: Re: Compare two strings
Post by: Astro on March 30, 2010, 08:26:37 PM
OK - but repz definitely works. Try the code!

I see why rep doesn't work - wrong op code.

REP:

QuoteF3 6C REP INS r/m8, DX Input (E)CX bytes from port DX into ES:[(E)DI]
F3 6D REP INS r/m16,DX Input (E)CX words from port DX into ES:[(E)DI]
F3 6D REP INS r/m32,DX Input (E)CX doublewords from port DX into ES:[(E)DI]
F3 A4 REP MOVS m8,m8 Move (E)CX bytes from DS:[(E)SI] to ES:[(E)DI]
F3 A5 REP MOVS m16,m16 Move (E)CX words from DS:[(E)SI] to ES:[(E)DI]
F3 A5 REP MOVS m32,m32 Move (E)CX doublewords from DS:[(E)SI] to ES:[(E)DI]
F3 6E REP OUTS DX, r/m8 Output (E)CX bytes from DS:[(E)SI] to port DX
F3 6F REP OUTS DX, r/m16 Output (E)CX words from DS:[(E)SI] to port DX
F3 6F REP OUTS DX, r/m32 Output (E)CX doublewords from DS:[(E)SI] to port DX
F3 AC REP LODS AL Load (E)CX bytes from DS:[(E)SI] to AL
F3 AD REP LODS AX Load (E)CX words from DS:[(E)SI] to AX
F3 AD REP LODS EAX Load (E)CX doublewords from DS:[(E)SI] to EAX
F3 AA REP STOS m8 Fill (E)CX bytes at ES:[(E)DI] with AL
F3 AB REP STOS m16 Fill (E)CX words at ES:[(E)DI] with AX
F3 AB REP STOS m32 Fill (E)CX doublewords at ES:[(E)DI] with EAX

REPZ:

QuoteF3 A6 REPE CMPS m8,m8 Find nonmatching bytes in ES:[(E)DI] and DS:[(E)SI]
F3 A7 REPE CMPS m16,m16 Find nonmatching words in ES:[(E)DI] and DS:[(E)SI]
F3 A7 REPE CMPS m32,m32 Find nonmatching doublewords in ES:[(E)DI] and DS:[(E)SI]
F3 AE REPE SCAS m8 Find non-AL byte starting at ES:[(E)DI]
F3 AF REPE SCAS m16 Find non-AX word starting at ES:[(E)DI]
F3 AF REPE SCAS m32 Find non-EAX doubleword starting at ES:[(E)DI]

Best regards,
Robin.
Title: Re: Compare two strings
Post by: qWord on March 30, 2010, 08:31:04 PM
Quote from: Astro on March 30, 2010, 08:26:37 PMI see why rep doesn't work - incompatible op code.
cmpsX with a REP-prefix wouldn't make much sense - think about it  :bg

but repz definitely works. Try the code!
I've never disbelieved this - see my code

qWord
Title: Re: Compare two strings
Post by: Astro on March 30, 2010, 08:36:26 PM
I see!  :bg

I know I need to study the opcodes again.  :red

Best regards,
Robin.
Title: Re: Compare two strings
Post by: clive on March 30, 2010, 09:23:15 PM
Quote from: Astro
I know I need to study the opcodes again.

Well REP and REPZ are the same opcode, so it frankly wouldn't matter at the machine code level. The inline assembler in Microsoft C (tested 12.00) is tolerant of either encoding as is MASM 5.1

-Clive
Title: Re: Compare two strings
Post by: hutch-- on March 30, 2010, 09:33:49 PM
I confess to not having used CMPS? since the DOS days, you can do it faster and from memory with less registers by manually incrementing the count yourself.
Title: Re: Compare two strings
Post by: jj2007 on March 30, 2010, 10:35:31 PM
Quote from: qWord on March 30, 2010, 08:31:04 PM
Quote from: Astro on March 30, 2010, 08:26:37 PMI see why rep doesn't work - incompatible op code.
cmpsX with a REP-prefix wouldn't make much sense - think about it  :bg

Where is the problem? You need to know the length of the strings, but for example, comparing two files with identical size works just fine.

include \masm32\include\masm32rt.inc

.code
TextA db "Masm32?", 0
TextB db "Masm32!", 0

start:
mov esi, offset TextA
mov edi, offset TextB
mov ecx, SIZEOF TextA
repe cmpsb
MsgBox 0, str$(ecx), addr TextA, MB_OK
exit
end start


Quoteinclude \masm32\MasmBasic\MasmBasic.inc   ; library is here (http://www.masm32.com/board/index.php?topic=12460)

.code
start:   xor ebx, ebx
   push Timer
   Let esi=FileRead$("\masm32\include\Windows.inc")
;   1 bytes longer, Warning Duplicate instead of WARNING Duplicate
   Let edi=FileRead$("\masm32\include\Copy_of_Windows.inc")
;   invoke lstrcmp, esi, edi            ; returns 1
;   invoke crt_strcmp, esi, edi         ; returns -1
   mov ecx, Lof("\masm32\include\Copy_of_Windows.inc")
   .if ecx>Lof("\masm32\include\Windows.inc")
      xchg ecx, eax
   .endif
   mov ebx, eax
   sub ebx, ecx
   repe cmpsb
   sub Timer, [esp]
   pop edx
   Inkey Str$("Time=%i ms", eax), Str$(", mismatch at byte EOF-%i", ecx), Str$(", diff in size=%i", ebx)
   Exit
end start

Output: Time=0 ms, mismatch at byte EOF-98, diff in size=1
Title: Re: Compare two strings
Post by: qWord on March 30, 2010, 10:45:49 PM
jj,
rep and repz/e have the same opcode (I've not notice this before), but different termination conditions. Also in masm you can't write 'rep cmpsb'.
Title: Re: Compare two strings
Post by: jj2007 on March 30, 2010, 11:06:06 PM
Quote from: qWord on March 30, 2010, 10:45:49 PM
jj,
rep and repz/e have the same opcode (I've not notice this before), but different termination conditions. Also in masm you can't write 'rep cmpsb'.

See attached exe from my (modified) example. It works.
Title: Re: Compare two strings
Post by: qWord on March 30, 2010, 11:17:52 PM
when saying rep makes no sense with cmpsb, I was talking about the mnemonic 'rep', which terminates if ecx=0 - in context to cmpsX the mnemonic is named 'repz', which terminates if ecx=0 OR ZF=0. I've never said that you can't use it for comparing files or what ever...
Title: Re: Compare two strings
Post by: jj2007 on March 30, 2010, 11:21:51 PM
Ok, misunderstanding :thumbu
Title: Re: Compare two strings
Post by: qWord on March 30, 2010, 11:27:37 PM
Quote from: jj2007 on March 30, 2010, 11:21:51 PM
Ok, misunderstanding :thumbu
yep ..  - maybe caused by my bad english  :bg
Title: Re: Compare two strings
Post by: jj2007 on March 30, 2010, 11:29:55 PM
For Yvan TheHolyMan: These are the correct invokes for the standard Windows and CRT functions:

   invoke lstrcmp, esi, edi
   invoke crt_strcmp, esi, edi
Title: Re: Compare two strings
Post by: qWord on March 30, 2010, 11:44:27 PM
just for addition: an version without string instructions (as hutch suggest):
    lea esi,sz1
    lea edi,sz2
    movzx eax,BYTE ptr [esi]
    movzx ebx,BYTE ptr [edi]
    .while eax == ebx
@@:     or eax,ebx
        jz @F
        inc esi
        inc edi
        movzx eax,BYTE ptr [esi]
        movzx ebx,BYTE ptr [edi]
    .endw
    .if 0
@@:     ; equal
    .else
        ; not equal     
    .endif
Title: Re: Compare two strings
Post by: jj2007 on March 31, 2010, 08:56:02 AM
just for fun:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
4842    cycles for qWord, 1000 bytes
3746    cycles for movzx, 1000 bytes
6685    cycles for movzx check null, 1000 bytes
4192    cycles for repe cmpsb, 1000 bytes
31225   cycles for lstrcmp, 1000 bytes
3119    cycles for crt_strcmp, 1000 bytes

43      cycles for qWord, 10 bytes
32      cycles for movzx, 10 bytes
141     cycles for movzx check null, 10 bytes
123     cycles for repe cmpsb, 10 bytes
831     cycles for lstrcmp, 10 bytes
40      cycles for crt_strcmp, 10 bytes


As so often, the CRT function is damn fast, also taking into account that they check for nullbytes why "our" algos don't (with the exception of the movzx check null version).
Title: Re: Compare two strings
Post by: lingo on March 31, 2010, 02:25:06 PM

Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
2542    cycles for qWord, 1000 bytes
2049    cycles for movzx, 1000 bytes
2032    cycles for LingoCMP, 1000 bytes
4036    cycles for movzx check null, 1000 bytes
4075    cycles for repe cmpsb, 1000 bytes
13431   cycles for lstrcmp, 1000 bytes
2807    cycles for crt_strcmp, 1000 bytes

15      cycles for LingoCMP, 10 bytes
25      cycles for qWord, 10 bytes
18      cycles for movzx, 10 bytes
33      cycles for movzx check null, 10 bytes
93      cycles for repe cmpsb, 10 bytes
836     cycles for lstrcmp, 10 bytes
27      cycles for crt_strcmp, 10 bytes

2524    cycles for qWord, 1000 bytes
2047    cycles for movzx, 1000 bytes
2026    cycles for LingoCMP, 1000 bytes
4039    cycles for movzx check null, 1000 bytes
4089    cycles for repe cmpsb, 1000 bytes
13416   cycles for lstrcmp, 1000 bytes
2789    cycles for crt_strcmp, 1000 bytes

15      cycles for LingoCMP, 10 bytes
25      cycles for qWord, 10 bytes
17      cycles for movzx, 10 bytes
34      cycles for movzx check null, 10 bytes
92      cycles for repe cmpsb, 10 bytes
846     cycles for lstrcmp, 10 bytes
27      cycles for crt_strcmp, 10 bytes

2530    cycles for qWord, 1000 bytes
2038    cycles for movzx, 1000 bytes
2032    cycles for LingoCMP, 1000 bytes
4039    cycles for movzx check null, 1000 bytes
4081    cycles for repe cmpsb, 1000 bytes
13491   cycles for lstrcmp, 1000 bytes
2798    cycles for crt_strcmp, 1000 bytes

15      cycles for LingoCMP, 10 bytes
25      cycles for qWord, 10 bytes
17      cycles for movzx, 10 bytes
33      cycles for movzx check null, 10 bytes
94      cycles for repe cmpsb, 10 bytes
843     cycles for lstrcmp, 10 bytes
28      cycles for crt_strcmp, 10 bytes


--- ok ---

Title: Re: Compare two strings
Post by: hutch-- on March 31, 2010, 02:40:35 PM
My only comment so far is why are these tests being done on such small samples, 10 bytes is "who cares" and 1k is very small, tests should also be run only much longer compares, a meg or larger. All you are testing with such short samples is the attack rate, not the sustain rate.
Title: Re: Compare two strings
Post by: hutch-- on March 31, 2010, 03:34:58 PM
Here is a quick dabble with an unroll of movzx.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
2516    cycles for qWord, 1000 bytes
2023    cycles for movzx, 1000 bytes
2022    cycles for LingoCMP, 1000 bytes
2019    cycles for movzx unrolled, 1000 bytes xxxxxxxx
4059    cycles for repe cmpsb, 1000 bytes
13379   cycles for lstrcmp, 1000 bytes
2773    cycles for crt_strcmp, 1000 bytes

15      cycles for LingoCMP, 10 bytes
22      cycles for qWord, 10 bytes
26      cycles for movzx, 10 bytes
15      cycles for movzx unrolled, 10 bytes  zzzzzzzzzzzz
92      cycles for repe cmpsb, 10 bytes
393     cycles for lstrcmp, 10 bytes
27      cycles for crt_strcmp, 10 bytes

2359    cycles for qWord, 1000 bytes
2023    cycles for movzx, 1000 bytes
2019    cycles for LingoCMP, 1000 bytes
2018    cycles for movzx unrolled, 1000 bytes xxxxxxxx
4059    cycles for repe cmpsb, 1000 bytes
13401   cycles for lstrcmp, 1000 bytes
2776    cycles for crt_strcmp, 1000 bytes

15      cycles for LingoCMP, 10 bytes
25      cycles for qWord, 10 bytes
17      cycles for movzx, 10 bytes
15      cycles for movzx unrolled, 10 bytes  zzzzzzzzzzzz
92      cycles for repe cmpsb, 10 bytes
393     cycles for lstrcmp, 10 bytes
27      cycles for crt_strcmp, 10 bytes

2523    cycles for qWord, 1000 bytes
2026    cycles for movzx, 1000 bytes
2024    cycles for LingoCMP, 1000 bytes
2019    cycles for movzx unrolled, 1000 bytes xxxxxxxx
4053    cycles for repe cmpsb, 1000 bytes
13295   cycles for lstrcmp, 1000 bytes
2773    cycles for crt_strcmp, 1000 bytes

15      cycles for LingoCMP, 10 bytes
22      cycles for qWord, 10 bytes
26      cycles for movzx, 10 bytes
15      cycles for movzx unrolled, 10 bytes  zzzzzzzzzzzz
92      cycles for repe cmpsb, 10 bytes
393     cycles for lstrcmp, 10 bytes
27      cycles for crt_strcmp, 10 bytes


--- ok ---
Title: Re: Compare two strings
Post by: lingo on March 31, 2010, 04:57:32 PM
 :wink
Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
2514    cycles for qWord, 1000 bytes
2033    cycles for movzx, 1000 bytes
1265    cycles for LingoCMP, 1000 bytes
2022    cycles for LingoCMP with check null, 1000 bytes
2019    cycles for movzx unrolled, 1000 bytes xxxxxxxx
4059    cycles for repe cmpsb, 1000 bytes
13251   cycles for lstrcmp, 1000 bytes
2776    cycles for crt_strcmp, 1000 bytes

8       cycles for LingoCMP, 10 bytes
15      cycles for LingoCMP with check null, 10 bytes
22      cycles for qWord, 10 bytes
26      cycles for movzx, 10 bytes
15      cycles for movzx unrolled, 10 bytes  zzzzzzzzzzzz
93      cycles for repe cmpsb, 10 bytes
825     cycles for lstrcmp, 10 bytes
27      cycles for crt_strcmp, 10 bytes

2358    cycles for qWord, 1000 bytes
2023    cycles for movzx, 1000 bytes
1266    cycles for LingoCMP, 1000 bytes
2021    cycles for LingoCMP with check null, 1000 bytes
2018    cycles for movzx unrolled, 1000 bytes xxxxxxxx
4058    cycles for repe cmpsb, 1000 bytes
13228   cycles for lstrcmp, 1000 bytes
2775    cycles for crt_strcmp, 1000 bytes

8       cycles for LingoCMP, 10 bytes
15      cycles for LingoCMP with check null, 10 bytes
25      cycles for qWord, 10 bytes
17      cycles for movzx, 10 bytes
15      cycles for movzx unrolled, 10 bytes  zzzzzzzzzzzz
92      cycles for repe cmpsb, 10 bytes
850     cycles for lstrcmp, 10 bytes
27      cycles for crt_strcmp, 10 bytes

2519    cycles for qWord, 1000 bytes
2026    cycles for movzx, 1000 bytes
1267    cycles for LingoCMP, 1000 bytes
2022    cycles for LingoCMP with check null, 1000 bytes
2019    cycles for movzx unrolled, 1000 bytes xxxxxxxx
4059    cycles for repe cmpsb, 1000 bytes
13223   cycles for lstrcmp, 1000 bytes
2785    cycles for crt_strcmp, 1000 bytes

8       cycles for LingoCMP, 10 bytes
15      cycles for LingoCMP with check null, 10 bytes
22      cycles for qWord, 10 bytes
26      cycles for movzx, 10 bytes
15      cycles for movzx unrolled, 10 bytes  zzzzzzzzzzzz
92      cycles for repe cmpsb, 10 bytes
843     cycles for lstrcmp, 10 bytes
27      cycles for crt_strcmp, 10 bytes


--- ok ---
Title: Re: Compare two strings
Post by: jj2007 on March 31, 2010, 06:42:39 PM
Lingo, what about your old algo here (http://www.masm32.com/board/index.php?topic=2508.msg20153#msg20153)?
Title: Re: Compare two strings
Post by: jj2007 on March 31, 2010, 06:43:57 PM
Quote from: hutch-- on March 31, 2010, 02:40:35 PM
why are these tests being done on such small samples, 10 bytes is "who cares"

Pretty relevant for a string sort, though, don't you think?
Title: Re: Compare two strings
Post by: clive on March 31, 2010, 07:55:09 PM
Quote from: jj2007
Pretty relevant for a string sort, though, don't you think?

Well if you're doing sorts then perhaps we should move beyond a simple inequity test, and consider whether the string is actually greater,equal,smaller which is what strcmp() does.

-Clive
Title: Re: Compare two strings
Post by: clive on March 31, 2010, 08:13:44 PM
Quote from: jj2007
As so often, the CRT function is damn fast, also taking into account that they check for nullbytes why "our" algos don't (with the exception of the movzx check null version).

I'll note that the CRT algorithm has code to get string1 to be DWORD aligned during the bulk of the comparison. The key branch targets are also DWORD aligned, with the primary loop fitting within a single cache line.

-Clive
Title: Re: Compare two strings
Post by: jj2007 on March 31, 2010, 11:08:13 PM
Quote from: clive on March 31, 2010, 07:55:09 PM
Well if you're doing sorts then perhaps we should move beyond a simple inequity test, and consider whether the string is actually greater,equal,smaller which is what strcmp() does.
Good idea. Here is a new version, the StrCmpSSE proc returns in edx the position of the first mismatch, and in eax the difference of the mismatching bytes. It still needs some polishing (still no check for end of string as yet), but the timings look promising.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
String comparison: short string 10 bytes, long string 4000
1070    cycles for SSE, long string
13702   cycles for qWord, long string
3266    cycles for Lingo, long string
12030   cycles for movzx, long string
24007   cycles for movzx check null, long string
16089   cycles for repe cmpsb, long string
72240   cycles for lstrcmp, long string
11139   cycles for crt_strcmp, long string

15      cycles for SSE, 10 bytes
43      cycles for qWord, 10 bytes
27      cycles for movzx, 10 bytes
51      cycles for movzx check null, 10 bytes
88      cycles for repe cmpsb, 10 bytes
456     cycles for lstrcmp, 10 bytes
47      cycles for crt_strcmp, 10 bytes

Codesizes:
Lingo:  365
SSE:    164
Title: Re: Compare two strings
Post by: jj2007 on April 01, 2010, 08:55:05 AM
Here is the version with check for end of string. The timings suffer a bit, of course, but it is still a bit faster than Lingo's old algo (http://www.masm32.com/board/index.php?topic=2508.msg20153#msg20153), and a factor 5 faster than the CRT version.

Note that lstrcmp is slow because it performs additional tasks (MSDN, remarks section (http://msdn.microsoft.com/en-us/library/ms647488%28VS.85%29.aspx)):
QuoteThe lstrcmp function uses a word sort, rather than a string sort. A word sort treats hyphens and apostrophes differently than it treats other symbols that are not alphanumeric, in order to ensure that words such as "coop" and "co-op" stay together within a sorted list.
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
3061    cycles for SSE with null check, long string
4977    cycles for Lingo, long string
15422   cycles for crt_strcmp, long string
34880   cycles for movzx check null, long string
20329   cycles for repe cmpsb, long string
174637  cycles for lstrcmp, long string

37      cycles for SSE with null check, 10 bytes
23      cycles for Lingo, 10 bytes
41      cycles for crt_strcmp, 10 bytes
98      cycles for movzx check null, 10 bytes
123     cycles for repe cmpsb, 10 bytes
679     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
SSE:    190
Title: Re: Compare two strings
Post by: dedndave on April 01, 2010, 11:08:43 AM
Quote"coop" and "co-op" stay together within a sorted list

one is a place for chickens - the other is a group of guys that raise them   :bg
Title: Re: Compare two strings
Post by: jj2007 on April 01, 2010, 11:53:34 AM
Hi Dave,
You tested the old version. Here is a new one, including a case-insensitive algo.

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
3533    cycles for SSE with null check, long string
4507    cycles for SSE with null check, long string, case-insensitive
6926    cycles for Lingo, long string
14194   cycles for crt_strcmp, long string
51312   cycles for crt__stricmp, long string, case-insensitive
33241   cycles for movzx check null, long string
20460   cycles for repe cmpsb, long string
154405  cycles for lstrcmp, long string

40      cycles for SSE with null check, 10 bytes
50      cycles for SSE with null check, 10 bytes, case-insensitive
23      cycles for Lingo, 10 bytes
41      cycles for crt_strcmp, 10 bytes
242     cycles for crt__stricmp, 10 bytes, case-insensitive


Edit: Correct version is StrCompSSE_ci3.zip - removed, it did not always return correct results. MasmBasic (http://www.masm32.com/board/index.php?topic=12460.0) StringsDiffer and FilesDiffer are more stable.
Title: Re: Compare two strings
Post by: dedndave on April 01, 2010, 12:35:11 PM
Intel Pentium 4 Prescott CPU 3.00GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
3564    cycles for SSE with null check, long string
4509    cycles for SSE with null check, long string, case-insensitive
72008   cycles for Lingo, long string
14371   cycles for crt_strcmp, long string
50830   cycles for crt__stricmp, long string, case-insensitive
32967   cycles for movzx check null, long string
-44643  cycles for repe cmpsb, long string
808120  cycles for lstrcmp, long string

40      cycles for SSE with null check, 10 bytes
52      cycles for SSE with null check, 10 bytes, case-insensitive
23      cycles for Lingo, 10 bytes
41      cycles for crt_strcmp, 10 bytes
244     cycles for crt__stricmp, 10 bytes, case-insensitive
100     cycles for movzx check null, 10 bytes
123     cycles for repe cmpsb, 10 bytes
740     cycles for lstrcmp, 10 bytes

3542    cycles for SSE with null check, long string
4568    cycles for SSE with null check, long string, case-insensitive
6953    cycles for Lingo, long string
14017   cycles for crt_strcmp, long string
115829  cycles for crt__stricmp, long string, case-insensitive
98018   cycles for movzx check null, long string
85415   cycles for repe cmpsb, long string
809285  cycles for lstrcmp, long string

43      cycles for SSE with null check, 10 bytes
55      cycles for SSE with null check, 10 bytes, case-insensitive
26      cycles for Lingo, 10 bytes
47      cycles for crt_strcmp, 10 bytes
246     cycles for crt__stricmp, 10 bytes, case-insensitive
101     cycles for movzx check null, 10 bytes
125     cycles for repe cmpsb, 10 bytes
745     cycles for lstrcmp, 10 bytes
Title: Re: Compare two strings
Post by: dedndave on April 01, 2010, 12:49:50 PM
i get crazy results, Jochen
ok - closed all other applications and ran it 10 times - something is amiss...
Title: Re: Compare two strings
Post by: dedndave on April 01, 2010, 01:22:29 PM
i am not sure i understand the Dummy1 var
doesn't that misalign the data strings ?
Title: Re: Compare two strings
Post by: jj2007 on April 01, 2010, 01:28:07 PM
Yep, the dummies are for misaligning the strings.

Your results are really crazy. You are not by accident running a P4?  :wink
Try changing the invoke Sleep, 100 to invoke Sleep, 200 - on my P4 it helps. And take the updated version with 285 bytes size - you were the only downloader yet, so I exchanged it silently.

[Off topic: Nice comics here (http://xkcd.com/). Say bye to your mouse :green]
Title: Re: Compare two strings
Post by: dedndave on April 01, 2010, 01:38:17 PM
i have played with that Sleep value a little bit after i saw MichaelW use it
it doesn't seem to make much difference until you get up to at least Sleep,500
Michael was using Sleep,1500 if i recall
let me try the updated virgin...

oh - running a P4 .... on purpose   :P
Title: Re: Compare two strings
Post by: dedndave on April 01, 2010, 01:42:54 PM
here is my result for the new one - lol
30 diff at pos 4999 (zero-based)
looks like you fixed it  :bg  at least i get the same result every time

ohhhh - i get it - you got me   :P
i was gonna take up a collection to buy Jochen a cuppa cappuccino
Title: Re: Compare two strings
Post by: jj2007 on April 01, 2010, 02:41:08 PM
Oops, it seems I posted the one with the test for correctness activated :red
New one attached above, see StrCompSSE_ci3.zip
Title: Re: Compare two strings
Post by: lingo on April 01, 2010, 02:51:14 PM
"i was gonna take up a collection to buy Jochen a cuppa cappuccino"

Will be better to advise him to not adjust his Chlorpromazine dose (from 50 mg three times daily) unless his doctor specifically instructs him to do so... :wink 
 
Title: Re: Compare two strings
Post by: dedndave on April 01, 2010, 02:53:09 PM
is that like Librium ?

i still get crazy data, JJ
Title: Re: Compare two strings
Post by: jj2007 on April 01, 2010, 03:50:05 PM
Quote from: dedndave on April 01, 2010, 02:53:09 PM
i still get crazy data, JJ

Typical behaviour for a P4 - many outliers. The lowest values are supposed to be the correct ones; for me they are 3600, 4500, 6900 cycles for the first three. Later I will test it on my Celeron.
Title: Re: Compare two strings
Post by: dedndave on April 01, 2010, 04:17:15 PM
i dunno - i have never seen numbers like "-69000" before
Title: Re: Compare two strings
Post by: jj2007 on April 01, 2010, 05:14:24 PM
Celeron is more stable, as usual:

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
2902    cycles for SSE with null check, long string
3672    cycles for SSE with null check, long string, case-insensitive
5834    cycles for Lingo, long string
13933   cycles for crt_strcmp, long string
16508   cycles for crt__stricmp, long string, case-insensitive
30042   cycles for movzx check null, long string
20108   cycles for repe cmpsb, long string
89881   cycles for lstrcmp, long string

18      cycles for SSE with null check, 10 bytes
25      cycles for SSE with null check, 10 bytes, case-insensitive
16      cycles for Lingo, 10 bytes
32      cycles for crt_strcmp, 10 bytes
108     cycles for crt__stricmp, 10 bytes, case-insensitive
51      cycles for movzx check null, 10 bytes
88      cycles for repe cmpsb, 10 bytes
453     cycles for lstrcmp, 10 bytes
Title: Re: Compare two strings
Post by: frktons on June 16, 2010, 10:34:11 PM

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
1405    cycles for LingoCMP, 1000 bytes
519     cycles for frktonsCMP , 1000 bytes

--- ok ---



I have the strange idea that this method could be quite fast:


;--------------------------------------------------------------------------------------------------------------
; compare_4bytes
; -------------------------------------------------------------------------------------------------------------
; comparing 2 4 bytes strings the fast way
;--------------------------------------------------------------------------------------------------------------
;
; Method proposed by frktons
;--------------------------------------------------------------------------------------------------------------
; 16 june 2010 - masmforum
;--------------------------------------------------------------------------------------------------------------


include \masm32\include\masm32rt.inc

.686


.data

Src1 db "This", 0
Src2 db "This", 0


.code
start:

mov ecx, offset Src1
mov eax, offset Src2
@@:
        mov edx, [ecx] ; Comparing two strings in edx and esi
        mov  esi,  [eax]
        xor    edx,  esi
        jnz    end_check
print   "The strings are identical",13,10

        jmp end_of_game
   


     
end_check:     

        print "The strings are different",13,10

end_of_game:


       inkey chr$(13, 10, "--- ok ---", 13)
exit


end start


If you put this method inside a loop, it could be
nice to see how it compares with the other methods.
To be optimal, the cycle has to be 125 times, and the code should
be modified like:

...............

mov ecx, offset Src1
mov eax, offset Src2
        xor ebx, ebx         
        mov ebx, 125
@@:
mov edx, [ecx] ; Comparing two strings in edx and esi
        mov          esi, [eax]
        xor            edx, esi
        jnz             @f
add ecx, 4
        add          eax,4
        mov         edx, [ecx]
        mov         esi, [eax]
        xor           edx, esi
        jnz            @f
        add           ecx, 4
        add           eax, 4
        dec            ebx
        jnz            @b 


@@:
.........


If anyone is kind enough to try it and let me know, I'm quite curious.  :P

Being an Assembly beginner, it could be that I've messed up things without
realizing it.  :lol

Frank

Title: Re: Compare two strings
Post by: frktons on June 17, 2010, 05:09:07 AM
In case you wish to see how I got the results in the previous post,
here it is the new_strcmp comparing Lingo's code with mine.

Let me know if I screwed up anything.  :U

Enjoy

Frank
Title: Re: Compare two strings
Post by: jj2007 on June 17, 2010, 08:39:34 AM
Frank,
Your algo is pretty fast but if it was to return the position and/or the value of the first mismatch, it would require some extra code. There is also no check for nullbytes; which is often not a problem, but what if you run into two short identical strings followed by some megabytes allocated with VirtualAlloc? It will crash right away...

P4 results:
4735    cycles for SSE with null check, long string (MasmBasic)
7014    cycles for Lingo, long string
6084    cycles for Frank, long string
14697   cycles for crt_strcmp, long string
158055  cycles for lstrcmp, long string


Here is your slightly modified routine, full testbed attached:
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE

Frank proc s1, s2 ; granularity is 4 bytes; some extra code required to get bytes, not dwords
mov ecx, [esp+4]
mov eax, [esp+8]
; xor ebx, ebx
; mov ebx, 125
@@:
mov edx, [ecx] ; Comparing two strings in edx and esi
mov esi, [eax]
xor edx, esi
jnz @f
add ecx, 4
add eax, 4
mov edx, [ecx] ; unrolled 1
mov esi, [eax]
xor edx, esi
jnz @f
add ecx, 4
add eax, 4
jmp @B
; dec ebx
; jnz @b
@@: sub eax, [esp+8] ; return the relative position in eax
ret 8
Frank endp
Title: Re: Compare two strings
Post by: frktons on June 17, 2010, 08:54:44 AM
Quote from: jj2007 on June 17, 2010, 08:39:34 AM
Frank,
Your algo is pretty fast but if it was to return the position and/or the value of the first mismatch, it would require some extra code. There is also no check for nullbytes; which is often not a problem, but what if you run into two short identical strings followed by some megabytes allocated VirtualAlloc? It will crash right away...

Of course Jochen, some extra code is required if the count of bytes is not a multiple of 4 as well.
It could be managed quite easily I think.

This is just an example of how we can do a faster search-compare using 4 bytes at a time,
instead of 1 byte at a time.

Thanks anyway, I'll figure it out what you changed and see your point.  :U

Title: Re: Compare two strings
Post by: frktons on June 17, 2010, 09:10:07 AM
Well it'll take quite a while to grasp your methods, so I'd like to ask just
a simple question:

Did you get the best result using xmm registers? I mean did you managed 8
bytes at a time thanks to xmm registers?

P.S.
On my pc I get these results from your routine:

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
String comparison: short string 10 bytes, long string 5050
3597    cycles for SSE with null check, long string
4344    cycles for SSE with null check, long string, case-insensitive
6163    cycles for Lingo, long string
5223    cycles for Frank, long string
14665   cycles for crt_strcmp, long string
10229   cycles for crt__stricmp, long string, case-insensitive
20480   cycles for movzx check null, long string
26398   cycles for repe cmpsb, long string
57011   cycles for lstrcmp, long string

12      cycles for SSE with null check, 10 bytes
18      cycles for SSE with null check, 10 bytes, case-insensitive
11      cycles for Lingo, 10 bytes
49      cycles for Frank, 10 bytes
27      cycles for crt_strcmp, 10 bytes
47      cycles for crt__stricmp, 10 bytes, case-insensitive
33      cycles for movzx check null, 10 bytes
100     cycles for repe cmpsb, 10 bytes
800     cycles for lstrcmp, 10 bytes

3866    cycles for SSE with null check, long string
4362    cycles for SSE with null check, long string, case-insensitive
5865    cycles for Lingo, long string
5166    cycles for Frank, long string
14179   cycles for crt_strcmp, long string
10710   cycles for crt__stricmp, long string, case-insensitive
20041   cycles for movzx check null, long string
25723   cycles for repe cmpsb, long string
58023   cycles for lstrcmp, long string

12      cycles for SSE with null check, 10 bytes
20      cycles for SSE with null check, 10 bytes, case-insensitive
11      cycles for Lingo, 10 bytes
6       cycles for Frank, 10 bytes
27      cycles for crt_strcmp, 10 bytes
45      cycles for crt__stricmp, 10 bytes, case-insensitive
33      cycles for movzx check null, 10 bytes
100     cycles for repe cmpsb, 10 bytes
850     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
SSE:    277     (MasmBasic)
--- ok ---


How strange that I have :

6       cycles for Frank, 10 bytes


and


49      cycles for Frank, 10 bytes


in two different tests. What they mean? Why such a difference?

Title: Re: Compare two strings
Post by: jj2007 on June 17, 2010, 10:36:46 AM
Quote from: frktons on June 17, 2010, 09:10:07 AM
Well it'll take quite a while to grasp your methods, so I'd like to ask just
a simple question:

Did you get the best result using xmm registers? I mean did you managed 8
bytes at a time thanks to xmm registers?
Yes, the MasmBasic StringsDiffer(str1, str2) algo is SSE2, i.e. it uses xmm registers

Quote
in two different tests. What they mean? Why such a difference?
Try the test with LOOP_COUNT = 100000 instead of 10000. Such differences are quite frequent, especially with the Pentium 4. To get a better picture, run the test several times. The lowest value should be the most valid value. For very short strings, your algo is definitely very fast.
Title: Re: Compare two strings
Post by: frktons on June 17, 2010, 12:32:23 PM
OK  :U This is the general idea I was talking about.
In new processors it is usually faster and better manipulating
groups of bytes than single ones.
I wandered why in this topic the examples dealed with single
bytes. You can always test for remaining bytes in many ways, but
the overall process should be done with multiple bytes processing.
This is what I consider better design  :lol

By the way I'm glad I contributed a bit (I mean a multibyte  :P) to this
discussion.

Enjoy

Frank
Title: Re: Compare two strings
Post by: lingo on June 17, 2010, 01:27:40 PM
"I have the strange idea that this method could be quite fast:"

Why do you think xor is faster than cmp instruction? :wink

"In new processors it is usually faster and better manipulating
groups of bytes than single ones"


It is true for 'old' processors too... :wink

"You can always test for remaining bytes in many way"
Just try to do it in your testbed... :wink
Title: Re: Compare two strings
Post by: hutch-- on June 17, 2010, 01:41:36 PM
Frank,

Just keep this in mind, to get the full advantage of multibyte compares, the data needs to be aligned by at least 4 otherwise the 4 byte reads take more than one access by the processor. If you bother to use this method you need to align one of the two strings being compared as the other may not be possible due to minor differences in alignment.
Title: Re: Compare two strings
Post by: frktons on June 17, 2010, 03:02:43 PM
Quote from: lingo on June 17, 2010, 01:27:40 PM
"I have the strange idea that this method could be quite fast:"

Why do you think xor is faster than cmp instruction? :wink

"In new processors it is usually faster and better manipulating
groups of bytes than single ones"


It is true for 'old' processors too... :wink

"You can always test for remaining bytes in many way"
Just try to do it in your testbed... :wink


:lol :lol :lol So why didn't you use that in your code?  ::)
Of course I'll try to do it, the Algo is already in my mind
and it shouldn't be difficult to code  :P even for a n00b like me.  :P

I mean if I have:

x = bytes of the string
y = result of x divided by 4 or 8 [it depends what I need]
r = remainder of the previous division

after I do the loop from y to 0
I could subtract the r from 4 or 8[...]
subtract the result from the two registers holding the addresses of the two strings
do another round of 4 or eight bytes
almost done  :lol I should check if everything is fine, but this is the overall idea.



Quote from: hutch-- on June 17, 2010, 01:41:36 PM
Frank,

Just keep this in mind, to get the full advantage of multibyte compares, the data needs to be aligned by at least 4 otherwise the 4 byte reads take more than one access by the processor. If you bother to use this method you need to align one of the two strings being compared as the other may not be possible due to minor differences in alignment.

ALIGN, ALIGN, ALIGN ...  :lol

I was going to open a new discussion on this matter.
so:

1- What is ALIGNment?
2 - Why should I ALIGN stuffs and What stuff should I ALIGN?
3 - When should I ALIGN and at what degree [4 - 8 - 16....]?
4 - Please show me how can I use the ALIGN stuff in the code I posted

Please feel free to ALIGN your answers after this post  :lol
Title: Re: Compare two strings
Post by: jj2007 on June 17, 2010, 03:14:40 PM
The SSE2/MasmBasic version is "multibyte", of course.
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
2921    cycles for SSE with null check, long string
3687    cycles for SSE with null check, long string, case-insensitive
5826    cycles for Lingo, long string
4915    cycles for Frank, long string
13945   cycles for crt_strcmp, long string
16489   cycles for crt__stricmp, long string, case-insensitive
30043   cycles for movzx check null, long string
20105   cycles for repe cmpsb, long string
90040   cycles for lstrcmp, long string

18      cycles for SSE with null check, 10 bytes
25      cycles for SSE with null check, 10 bytes, case-insensitive
16      cycles for Lingo, 10 bytes
9       cycles for Frank, 10 bytes
32      cycles for crt_strcmp, 10 bytes
108     cycles for crt__stricmp, 10 bytes, case-insensitive
51      cycles for movzx check null, 10 bytes
88      cycles for repe cmpsb, 10 bytes
451     cycles for lstrcmp, 10 bytes


Same but aligned:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
2133    cycles for SSE with null check, long string
3103    cycles for SSE with null check, long string, case-insensitive
4058    cycles for Lingo, long string
3170    cycles for Frank, long string
13859   cycles for crt_strcmp, long string
16480   cycles for crt__stricmp, long string, case-insensitive
30060   cycles for movzx check null, long string
20101   cycles for repe cmpsb, long string
89970   cycles for lstrcmp, long string
Title: Re: Compare two strings
Post by: frktons on June 17, 2010, 03:26:59 PM
Hi Jochen, ALIGN your explanation please, what does your algos do?
Could you post the source as well? Thanks
If you feel like, please use a private msg in your native language.

Frank
Title: Re: Compare two strings
Post by: jj2007 on June 17, 2010, 04:02:34 PM
Quote from: frktons on June 17, 2010, 03:26:59 PM
Could you post the source as well? Thanks

The source was always included. Just open the attachment...
Title: Re: Compare two strings
Post by: clive on June 17, 2010, 04:33:25 PM
Quote from: frktons

x = bytes of the string
y = result of x divided by 4 or 8 [it depends what I need]
r = remainder of the previous division


Problem is, with strcmp() is that you usually don't know what the length of the string is, unless you actually check it first (at additional cost) or haul the metadata around with the string. Further the real strcmp() actually provides an equal, greater than, less than answer.

Perhaps memcmp() would be a better model if you know the length, and don't want to check for a terminating NUL.
Title: Re: Compare two strings
Post by: cman on June 17, 2010, 04:48:43 PM
What software are you using to the cycle analysis for each algorithm?

Title: Re: Compare two strings
Post by: clive on June 17, 2010, 04:57:21 PM
Quote from: cman on June 17, 2010, 04:48:43 PM
What software are you using to the cycle analysis for each algorithm?

MichaelW has a timing library using RDTSC, look at some of the attachments.
Title: Re: Compare two strings
Post by: jj2007 on June 17, 2010, 05:24:25 PM
Quote from: clive on June 17, 2010, 04:33:25 PM
Problem is, with strcmp() is that you usually don't know what the length of the string is, unless you actually check it first (at additional cost) or haul the metadata around with the string. Further the real strcmp() actually provides an equal, greater than, less than answer.

The two SSE2 variants, which have since been included in MasmBasic, do the null check; they return in eax the equal, greater than, less than info, and in edx the position where the first mismatch occurred.

Re algos used for timings:
Quoteinclude \masm32\macros\timers.asm         ; get them from the Masm32 Laboratory (http://www.masm32.com/board/index.php?topic=770.0)
Title: Re: Compare two strings
Post by: clive on June 17, 2010, 05:37:00 PM
Quote from: jj2007
The two SSE2 variants, which have since been included in MasmBasic, do the null check; they return in eax the equal, greater than, less than info, and in edx the position where the first mismatch occurred.

Indeed, but the routines that Frank wrote do not, and in the context of the quote they further assume the length of the string, and that the strings are of equal length. In the general solution you can't make such assumptions, or you would be using memcmp().
Title: Re: Compare two strings
Post by: jj2007 on June 17, 2010, 06:02:47 PM
Quote from: clive on June 17, 2010, 05:37:00 PM
Quote from: jj2007
The two SSE2 variants, which have since been included in MasmBasic, do the null check; they return in eax the equal, greater than, less than info, and in edx the position where the first mismatch occurred.

Indeed, but the routines that Frank wrote do not, and in the context of the quote they further assume the length of the string, and that the strings are of equal length. In the general solution you can't make such assumptions, or you would be using memcmp().

For the sake of fairness, I should add that Lingo's algo does the null check, although it's not mentioned in the output.

Re assuming equal length: The shorter string would trigger a mismatch, since the other one is non-zero; the real danger is that in 99% of all cases that will give the false impression that the algo works fine. Until you run into that rare case where you load two equal short files into a VirtualAlloc'ed buffer, and bang, access violation.
Title: Re: Compare two strings
Post by: frktons on June 17, 2010, 06:42:42 PM
Very well indeed.  :U
When I'll be able to understand all the stuff you are talking about
I'll be probably able to solve these problems as well, I hope at least.  :P

So far I'll be happy just to understand why, what and when to ALIGN data
and how to do it, I mean how and where I declare  the ALIGN directive?

Thanks

Frank
Title: Re: Compare two strings
Post by: clive on June 17, 2010, 07:14:54 PM
Quote from: jj2007
Re assuming equal length: The shorter string would trigger a mismatch, since the other one is non-zero; the real danger is that in 99% of all cases that will give the false impression that the algo works fine. Until you run into that rare case where you load two equal short files into a VirtualAlloc'ed buffer, and bang, access violation.

Plus the potential for byte(s) beyond the NUL to have any value, which would could potentially mean a pair of 2 character string which are otherwise identical to be miscompared if they were examined four bytes (or 8, whatever) at a time.


str1  db 'it',0,'x'
str2  db 'it',0,'y'

Title: Re: Compare two strings
Post by: qWord on June 17, 2010, 07:32:39 PM
Quote from: clive on June 17, 2010, 07:14:54 PMPlus the potential for byte(s) beyond the NUL to have any value, which would could potentially mean a pair of 2 character string which are otherwise identical to be miscompared if they were examined four bytes (or 8, whatever) at a time.


str1  db 'it',0,'x'
str2  db 'it',0,'y'


jj is using bsf for finding the zero byte. Bytes following the termination zero are ignored (AFAIKS)
Title: Re: Compare two strings
Post by: clive on June 17, 2010, 07:32:52 PM
Quote from: frktons
So far I'll be happy just to understand why, what and when to ALIGN data
and how to do it, I mean how and where I declare  the ALIGN directive?

You use the ALIGN directive immediately before the data/code you want to fall on a specific alignment boundary. MASM will insert padding to achieve this based on the current address ($).

If you use .486, the alignment for sections will be 16 bytes, so you can't use ALIGN 32 for example. You could explicitly define the segments and give them page alignment. The linker and the order of the objects linked, will impact where the code actually ends up, MASM passes down this alignment information, but the first object to be linked is the one that the linker uses.

The alignment relates to cache line boundaries (the width of data fetched into cache in a single operation, and other operations internal to the CPU), depends on CPU, but often 32 bytes wide.

Aligining branch targets on 16-byte boundaries helps with the branch prediction, and how the CPU tracks recent branches taken/not-taken.

Data should be aligned on at least 4-byte boundaries, or the data bus width of the processor. If a whole string fits in a single cache line, all the better. There is a penalty if they cross a cache line, Intel tends to be worse than AMD.

Other gains can be had by performing the bulk of 4-byte accesses on 4-byte boundaries, so library routines will often check for this and handle unaligned bytes first.
Title: Re: Compare two strings
Post by: clive on June 17, 2010, 07:34:56 PM
Quote from: qWord
he use bsf for finding the zero byte. Bytes following the termination zero are ignored (AFAIKS)

Not suggesting this is a flaw in all implementations, but should be part of a valid test suite.
Title: Re: Compare two strings
Post by: jj2007 on June 17, 2010, 08:42:36 PM
Quote from: qWord on June 17, 2010, 07:32:39 PM
jj is using bsf for finding the zero byte. Bytes following the termination zero are ignored (AFAIKS)
Correct. It works as it should.
Title: Re: Compare two strings
Post by: frktons on June 17, 2010, 09:30:41 PM
I've tried with 2600 bytes strings, and the proportion of performance
on my pc, with my original routine vs Lingo's one is still 2.5:1 more
or less.


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
3268    cycles for LingoCMP, 2600 bytes
1331    cycles for frktonsCMP , 2600 bytes

--- ok ---


I wander why with the modification of Jochen the proportion changes
so much.  ::)
New version included,  with the ALIGN suggested.  :P
Title: Re: Compare two strings
Post by: Ghandi on June 17, 2010, 09:45:54 PM
Frank, can i ask a question mate?

You set EBX to 325 and then decrement by one on each iteration, right? With a multiple of 4, this gives us 1300 bytes scanned, not the 2600 which you've logged, please correct me if i'm wrong?

Also, you are using a fixed length compare, to make it a fair comparison you would need to remove the zero check from Lingo's code and make it so it knew exactly how long the array was to compare against. Keep working on this though, its always good to see people trying new ideas.

HR,
Ghandi
Title: Re: Compare two strings
Post by: clive on June 17, 2010, 09:51:11 PM
Quote from: frktons
I wander why with the modification of Jochen the proportion changes

I think the word you are looking for is wonder, or perhaps not, but I always imagine someone walking around.<G>

http://dictionary.reference.com/browse/wander
"to  think  or  speak  confusedly  or  incoherently."

http://dictionary.reference.com/browse/wonder
"to  think  or  speculate  curiously"
Title: Re: Compare two strings
Post by: clive on June 17, 2010, 09:59:35 PM
Quote from: Ghandi
You set EBX to 325 and then decrement by one on each iteration, right? With a multiple of 4, this gives us 1300 bytes scanned, not the 2600 which you've logged, please correct me if i'm wrong?

8 bytes per iteration
Title: Re: Compare two strings
Post by: lingo on June 17, 2010, 10:53:23 PM
Take a look that cmp is faster than xor instruction... :wink
Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
3274    cycles for LingoCMP, 2600 bytes
1318    cycles for NewfrktonsWithoutXOR  , 2600 bytes
1334    cycles for frktonsCMP , 2600 bytes

--- ok ---
Title: Re: Compare two strings
Post by: frktons on June 17, 2010, 11:38:05 PM
Quote from: clive on June 17, 2010, 09:51:11 PM
Quote from: frktons
I wander why with the modification of Jochen the proportion changes

I think the word you are looking for is wonder, or perhaps not, but I always imaging someone walking around.<G>

http://dictionary.reference.com/browse/wander
"to  think  or  speak  confusedly  or  incoherently."

http://dictionary.reference.com/browse/wonder
"to  think  or  speculate  curiously"

Yes Clive, I probably should use wonder; bear with me, this is
my first semester in Assembly, C language and English language as well  :P

wander is a bit more lunatic than I usually am  :lol

I use wander in a figurative way, like my mind going around
without knowing what to do or how to interpret things.
But I really don't know when or where I got that meaning.  ::)
Maybe this translation:
to  ramble  without  a  definite  purpose  or  objective;  roam,  rove,  or  stray:  to  wander  over  the  earth.  is close to the meaning I intend.

Thanks for your corrections in all the fields and the explanations and
examples you give me.  :U
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 12:04:27 AM
Quote from: lingo on June 17, 2010, 10:53:23 PM
Take a look that cmp is faster than xor instruction... :wink
Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
3274    cycles for LingoCMP, 2600 bytes
1318    cycles for NewfrktonsWithoutXOR  , 2600 bytes
1334    cycles for frktonsCMP , 2600 bytes

--- ok ---


Not really that fast 1318 vs 1334. You have surely spared some
cycles skipping a passage, maybe one MOV or ADD less somewhere.  ::)

I think that an experienced ASM programmer like you can cut down
the cycles by 30-40% wisely using  xmm instructions and registers I'll
be aware of in my 4th or 5th semester.  :eek

Title: Re: Compare two strings
Post by: theunknownguy on June 18, 2010, 12:12:43 AM
Quote from: lingo on June 17, 2010, 10:53:23 PM
Take a look that cmp is faster than xor instruction... :wink
Intel(R) Core(TM)2 Duo CPU     E8500  @ 3.16GHz (SSE4)
3274    cycles for LingoCMP, 2600 bytes
1318    cycles for NewfrktonsWithoutXOR  , 2600 bytes
1334    cycles for frktonsCMP , 2600 bytes

--- ok ---


1330 withouth xor
1331 CMP


Indeed is CMP more faster but it was needed to do such a test for this? (Not to be rude) But i think it was clear that CMP was faster. lol...

Funny second test and:


1330 withouth xor
1330 CMP
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 12:27:55 AM
Well I tried to use CMP instead of XOR and there is
a very slight difference in performance:

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
3738    cycles for LingoCMP, 2600 bytes
1319    cycles for NewfrktonsWithout XOR  , 2600 bytes
1329    cycles for frktons XOR , 2600 bytes
1325    cycles for frktons with CMP , 2600 bytes

--- ok ---


So the extra 5-6 cycles that Lingo spared depends on something else.
The new routine I tried is:

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS ; LONG SOURCE

mov ecx, offset Src1
mov eax, offset Src2
         
        mov ebx, 325
@@:
mov edx, [ecx] ; Comparing two strings in edx and esi
        mov    esi, [eax]
        cmp    edx, esi
        jne    @f
add     ecx, 4
        add    eax,4
        mov    edx, [ecx]
        mov    esi, [eax]
        cmp    edx, esi
        jne    @f
        add    ecx, 4
        add    eax, 4
        dec    ebx
        jnz    @b 
   


@@:
counter_end
print str$(eax), 9, "cycles for frktons with CMP , 2600 bytes", 13, 10


Quote from: theunknownguy on June 18, 2010, 12:12:43 AM

Indeed is CMP more faster but it was needed to do such a test for this? (Not to be rude) But i think it was clear that CMP was faster. lol...


Probably it is clear for you, but the same I can't say about me  :P
Lingo was so kind to post an example to clear this point and I am
grateful he did.  :U
Title: Re: Compare two strings
Post by: lingo on June 18, 2010, 12:29:50 AM
"So why didn't you use that in your code?"
and
"...you can cut down the cycles by 30-40% wisely using  xmm instructions and registers

Yes, I have similar algo named SrchBytes with times here.. (http://www.masm32.com/board/index.php?topic=9370.90) :wink
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 12:47:27 AM
Quote from: lingo on June 18, 2010, 12:29:50 AM
"So why didn't you use that in your code?"
and
"...you can cut down the cycles by 30-40% wisely using  xmm instructions and registers

Yes, I have similar algo named SrchBytes with times here.. (http://www.masm32.com/board/index.php?topic=9370.90) :wink


My, I just need to be 30 years younger and a lot of free time and I can
have a nice time between bytes and mnemonics  :lol

Great, that is really good to know that we have people still using
their brain today.  :P
Title: Re: Compare two strings
Post by: theunknownguy on June 18, 2010, 12:58:27 AM
Quote

Probably it is clear for you, but the same I can't say about me  :P
Lingo was so kind to post an example to clear this point and I am
grateful he did.  :U

Quote

I think you can get it from logic itself:

"CMP subtracts the second operand from the first but, unlike the SUB instruction, does not store the result; only the flags are changed"

XOR computes the exclusive OR of the two operands. Each bit of the result is 1 if the corresponding bits of the operands are different; each bit is 0 if the corresponding bits are the same. The answer replaces the first operand.

But the example is quiet good. Still i am waiting for some PowerPC for run test like this...  :dazzled:
Title: Re: Compare two strings
Post by: Ghandi on June 18, 2010, 01:17:06 AM
Quote
You set EBX to 325 and then decrement by one on each iteration, right? With a multiple of 4, this gives us 1300 bytes scanned, not the 2600 which you've logged, please correct me if i'm wrong?


8 bytes per iteration

So i saw after re-reading the source and noticing that there are 2x INC EBX per iteration, making it 8 bytes per iteration. My bad, o0

HR,
Ghandi
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 01:31:06 AM
Quote from: theunknownguy on June 18, 2010, 12:58:27 AM

I think you can get it from logic itself:


"CMP subtracts the second operand from the first but,
unlike the SUB instruction, does not store the result;
only the flags are changed"



XOR computes the exclusive OR of the two operands. Each bit of the result is 1
if the corresponding bits of the operands are different; each bit is 0 if the
corresponding bits are the same. The answer replaces the first operand.



:U well, I am in the path of learning Assembly, so these definitions
are studying matter. Where did you get them? I'll surely need them in many
future occasions.
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 01:33:45 AM
Quote from: Ghandi on June 18, 2010, 01:17:06 AM
Quote
You set EBX to 325 and then decrement by one on each iteration, right? With a multiple of 4, this gives us 1300 bytes scanned, not the 2600 which you've logged, please correct me if i'm wrong?


8 bytes per iteration

So i saw after re-reading the source and noticing that there are 2x INC EBX per iteration, making it 8 bytes per iteration. My bad, o0

HR,
Ghandi

Yes, Clive already said that, so I didn't feel to add anything else.  :U
Title: Re: Compare two strings
Post by: theunknownguy on June 18, 2010, 01:45:42 AM
Quote from: frktons on June 18, 2010, 01:31:06 AM
Quote from: theunknownguy on June 18, 2010, 12:58:27 AM

I think you can get it from logic itself:


"CMP subtracts the second operand from the first but,
unlike the SUB instruction, does not store the result;
only the flags are changed"



XOR computes the exclusive OR of the two operands. Each bit of the result is 1
if the corresponding bits of the operands are different; each bit is 0 if the
corresponding bits are the same. The answer replaces the first operand.



:U well, I am in the path of learning Assembly, so these definitions
are studying matter. Where did you get them? I'll surely need them in many
future occasions.

http://pdos.csail.mit.edu/6.828/2004/readings/i386/

Or IBM manual
Title: Re: Compare two strings
Post by: jj2007 on June 18, 2010, 06:37:01 AM
Quote from: frktons on June 18, 2010, 12:04:27 AM

Not really that fast 1318 vs 1334. You have surely spared some
cycles skipping a passage, maybe one MOV or ADD less somewhere.  ::)


If you are not sure, why don't you simple open his attachment and look at your code? You would have found out that butcher lingo has done an impressive job - see below, original copy & paste....
By the way: I had tested that question by simply replacing xor with cmp, and found out there is absolutely no difference in timings.
counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS ; LONG SOURCE

mov ecx, offset Src1
        mov ebx, 325
lea eax,  Src2
       
@@:
  mov edx, [ecx] ; Comparing two strings in ecx and eax
  add ecx, 8
                                      cmp          edx, [eax]
                                      mov          edx, [ecx-4]
                                      lea            eax, [eax+8]
                                      jnz            @f
  cmp          edx, [eax-4]
                                      jnz            @f
                                      sub           ebx, 1
                                      jnz            @b 
@@:
counter_end
print str$(eax), 9, "cycles for NewfrktonsWithoutXOR  , 2600 bytes", 13, 10

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS ; LONG SOURCE

mov ecx, offset Src1
mov eax, offset Src2
         
        mov ebx, 325
@@:
  mov edx, [ecx] ; Comparing two strings in edx and esi
                                      mov          esi, [eax]
                                      xor            edx, esi
                                      jnz            @f
add ecx, 4
                                      add            eax,4
                                      mov            edx, [ecx]
                                      mov            esi, [eax]
                                      xor            edx, esi
                                      jnz            @f
                                      add            ecx, 4
                                      add            eax, 4
                                      dec            ebx
                                      jnz            @b 
   


@@:
counter_end
print str$(eax), 9, "cycles for frktonsCMP , 2600 bytes", 13, 10
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 08:18:52 AM
Yes, Jochen, of course I looked inside the code Lingo posted, and
I noticed That he changed many things to get those 5-6 cycles, but
I am not able, so far, to understand which of those instructions got
those cycles.   ::)

By the way it's good to see people getting better results with a little
bit crunching, I hoped this could be done in our complex and sometime
difficult daily life.

Enjoy

Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 08:22:30 AM
Quote from: theunknownguy on June 18, 2010, 01:45:42 AM

http://pdos.csail.mit.edu/6.828/2004/readings/i386/

Or IBM manual

Thanks unkn, it would be useful for reference.  :U
Title: Re: Compare two strings
Post by: theunknownguy on June 18, 2010, 09:10:39 AM
An straight cmp comparation runned on the same test of 2600 bytes:

1338 clocks

mov ecx, offset Src1
mov eax, offset Src2
xor ebx, ebx


mov  edx, [ecx+ebx]
cmp [eax+ebx], edx
jnz @2
add ebx, 4
jnz @1

1365 ~ 1379?:

mov  edx, [ecx+ebx*4]
cmp [eax+ebx*4], edx
jnz @2
add ebx, 1
jnz @1

Against:

1370 clocks

mov  edx, [ecx+ebx]
xor edx, [eax+ebx]
jnz @2
add ebx, 4
jnz @1

1378 clocks

mov  edx, [ecx+ebx]
mov  edi, [eax+ebx]
xor edx, edi
jnz @2
add ebx, 4
jnz @1
1369 clocks
mov  edx, [ecx+ebx*4]
mov  edi, [eax+ebx*4]
xor edx, edi
jnz @2
add ebx, 1
jnz @1
1365 clocks

mov  edx, [ecx+ebx*4]
xor edx, [eax+ebx*4]
jnz @2
add ebx, 1
jnz @1


Surprise me the CMP SCALAR... meaby something wrong with the code i dont know... 5:14 AM here my eyes could be cheating on me.
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 12:25:42 PM
A slighter faster version of string_compare:

I skipped a couple of MOV and I got a slight improvement in the overall
performance of the routine.


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
3369    cycles for LingoCMP, 2600 bytes
1330    cycles for frktons with XOR , 2600 bytes
1325    cycles for frktons with CMP , 2600 bytes
1316    cycles for frktons with CMP II , 2600 bytes

--- ok ---


Source and EXE attached.

Enjoy

Frank
Title: Re: Compare two strings
Post by: jj2007 on June 18, 2010, 12:54:10 PM
Try this...
mov ecx, offset Src1
mov eax, offset Src2
mov ebx, 325
@@:
        mov edx, [ecx] ; Comparing two strings in edx and [eax]
        cmp edx, [eax]
        jne @f
;        add ecx, 4
;        add eax,4
        mov edx, [ecx+4]
        cmp edx, [eax+4]
        jne @f
        add ecx, 4+4
        add eax, 4+4
        dec ebx
        jnz @b 
@@:
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 01:13:49 PM
Quote from: jj2007 on June 18, 2010, 12:54:10 PM
Try this...
mov ecx, offset Src1
mov eax, offset Src2
mov ebx, 325
@@:
        mov edx, [ecx] ; Comparing two strings in edx and [eax]
        cmp edx, [eax]
        jne @f
;        add ecx, 4
;        add eax,4
        mov edx, [ecx+4]
        cmp edx, [eax+4]
        jne @f
        add ecx, 4+4
        add eax, 4+4
        dec ebx
        jnz @b 
@@:


A very tiny improvement 1 or 2 cycles, I expected more to be honest.


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
3269    cycles for LingoCMP, 2600 bytes
1329    cycles for frktons with XOR , 2600 bytes
1326    cycles for frktons with CMP , 2600 bytes
1316    cycles for frktons with CMP II , 2600 bytes
1314    cycles for frktons with CMP III , 2600 bytes

--- ok ---


I only changed:

        add ecx, 4+4
        add eax, 4+4

with:

        add ecx, 8
        add eax, 8


But it is somewhat smaller and smarter.  :U
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 04:20:23 PM
I tried to unroll the loop, 10 times, but no misurable gain.
Probably this method has been exploited enough  :P

Enjoy
Title: Re: Compare two strings
Post by: Rockoon on June 18, 2010, 05:00:42 PM
Overclocked to 3.46ghz:


AMD Phenom(tm) II X6 1055T Processor (SSE3)
3268    cycles for LingoCMP, 2600 bytes
1642    cycles for frktons with XOR , 2600 bytes
1642    cycles for frktons with CMP , 2600 bytes
1966    cycles for frktons with CMP II , 2600 bytes
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 05:40:13 PM
Quote from: Rockoon on June 18, 2010, 05:00:42 PM
Overclocked to 3.46ghz:


AMD Phenom(tm) II X6 1055T Processor (SSE3)
3268    cycles for LingoCMP, 2600 bytes
1642    cycles for frktons with XOR , 2600 bytes
1642    cycles for frktons with CMP , 2600 bytes
1966    cycles for frktons with CMP II , 2600 bytes


:lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol

AMD has a different philsophy about optimization.

:lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol
Title: Re: Compare two strings
Post by: Rockoon on June 18, 2010, 06:35:45 PM
Quote from: frktons on June 18, 2010, 05:40:13 PM
Quote from: Rockoon on June 18, 2010, 05:00:42 PM
Overclocked to 3.46ghz:


AMD Phenom(tm) II X6 1055T Processor (SSE3)
3268    cycles for LingoCMP, 2600 bytes
1642    cycles for frktons with XOR , 2600 bytes
1642    cycles for frktons with CMP , 2600 bytes
1966    cycles for frktons with CMP II , 2600 bytes


:lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol

AMD has a different philsophy about optimization.

:lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol :lol

Me too. It can be faster, even on Intel :)
Title: Re: Compare two strings
Post by: Rockoon on June 18, 2010, 07:03:01 PM
Added CMP III


AMD Phenom(tm) II X6 1055T Processor (SSE3)
3265    cycles for LingoCMP, 2600 bytes
1640    cycles for frktons with XOR , 2600 bytes
1640    cycles for frktons with CMP , 2600 bytes
1965    cycles for frktons with CMP II , 2600 bytes
1310    cycles for frktons with CMP III, 2600 bytes

Title: Re: Compare two strings
Post by: theunknownguy on June 18, 2010, 07:03:26 PM
Quote from: jj2007 on June 18, 2010, 12:54:10 PM
Try this...
mov ecx, offset Src1
mov eax, offset Src2
mov ebx, 325
@@:
        mov edx, [ecx] ; Comparing two strings in edx and [eax]
        cmp edx, [eax]
        jne @f
;        add ecx, 4
;        add eax,4
        mov edx, [ecx+4]
        cmp edx, [eax+4]
        jne @f
        add ecx, 4+4
        add eax, 4+4
        dec ebx
        jnz @b 
@@:


Dec EBX? what happen with dec being slower than sub?.

It shouldnt been sub ebx, 1. At least for what i remember from agner fog...

Also using 1 reg could avoid 2 adds.

mov ecx, offset Src1
mov eax, offset Src2
mov ebx, 325
xor edi, edi
@@:
       mov edx, [ecx+edi] ; Comparing two strings in edx and [eax]
       cmp edx, [eax+edi]
       jne @f
       mov edx, [ecx+edi+4]
       cmp edx, [eax+edi+4]
       jne @f
       add edi, 4+4
       sub ebx, 1
       jnz @b  
@@:


Testing the CMP REG32*4 SCALAR give worst results, but on XOR give betters... odd

Anyway this should run faster some body test it?



Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 07:11:29 PM
Quote from: Rockoon on June 18, 2010, 07:03:01 PM
Added CMP III


AMD Phenom(tm) II X6 1055T Processor (SSE3)
3265    cycles for LingoCMP, 2600 bytes
1640    cycles for frktons with XOR , 2600 bytes
1640    cycles for frktons with CMP , 2600 bytes
1965    cycles for frktons with CMP II , 2600 bytes
1310    cycles for frktons with CMP III, 2600 bytes



Where is the code? Does it work on Intel as well?

Let me know.

Frank
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 07:18:11 PM
Added method proposed by theunknownguy:

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
3372    cycles for LingoCMP, 2600 bytes
1331    cycles for frktons with XOR , 2600 bytes
1326    cycles for frktons with CMP , 2600 bytes
1318    cycles for frktons with CMP II , 2600 bytes
1316    cycles for frktons with CMP III , 2600 bytes
1317    cycles for frktons with CMP unrolled , 2600 bytes
1316    cycles for theunknown with CMP , 2600 bytes

--- ok ---


No gain.  ::)
Title: Re: Compare two strings
Post by: Rockoon on June 18, 2010, 07:23:57 PM
aligning the top of all the inner loops by 16, and introducing my own method:


AMD Phenom(tm) II X6 1055T Processor (SSE3)
2834    cycles for LingoCMP, 2600 bytes
1553    cycles for frktons with XOR , 2600 bytes
1542    cycles for frktons with CMP , 2600 bytes
1319    cycles for frktons with CMP II , 2600 bytes
1146    cycles for frktons with CMP III, 2600 bytes
989     cycles for RockoonCMP, 2600 bytes


and my method (please check for errors?)

counter_begin LOOP_COUNT, HIGH_PRIORITY_CLASS ; LONG SOURCE

mov ecx, offset Src1
mov eax, offset Src2
mov ebx, 325
        sub eax, ecx
align 16
@@:
        mov edx, [ecx]
        cmp edx, [ecx + eax]
        jne @f
        mov edx, [ecx + 4]
        cmp edx, [ecx + eax + 4]
        jne @f
        add ecx, 8
        dec ebx
        jnz @b 
@@:
counter_end

print str$(eax), 9, "cycles for RockoonCMP, 2600 bytes", 13, 10

Title: Re: Compare two strings
Post by: dedndave on June 18, 2010, 07:46:47 PM
i doubt that ALIGN 16 makes much difference   :P
i find that, for smaller loops (where the branch at the end is SHORT), alignment can actually slow you down a bit
the added padding takes time to execute, too

if the branch at the end of the loop is NEAR, 16-alignment definately helps
Title: Re: Compare two strings
Post by: Rockoon on June 18, 2010, 08:07:02 PM
Well, it does make a difference on AMD.

On Intels with a trace cache, then probably not. 1 byte instructions might be stored as 32-bits, for instance. I'm sure Intel made some good choices with it, so that all the uops in it fall on boundaries that make the most sense for the decoder.

AMD hasnt really changed its architecture in quite awhile. No trace caches or anything like that. Alignment still matters there. In a lot of ways, its like programming for the old Pentium III architecture. The same can be said for the Core2's tho, with the P4's netburst behind us I think that we are all happy.
Title: Re: Compare two strings
Post by: jj2007 on June 18, 2010, 08:29:38 PM
Quote from: Rockoon on June 18, 2010, 07:23:57 PM
aligning the top of all the inner loops by 16, and introducing my own method:

...
        sub eax, ecx
align 16
@@:
        mov edx, [ecx]
        cmp edx, [ecx + eax]
        jne @f
        mov edx, [ecx + 4]
        cmp edx, [ecx + eax + 4]
        jne @f
        add ecx, 8
        dec ebx
        jnz @b 
@@:

Cute idea :U

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
2911    cycles for SSE with null check, long string
3684    cycles for SSE with null check, long string, case-insensitive
5854    cycles for Lingo, long string
4878    cycles for Frank, long string
3803    cycles for Rockoon, long string
3811    cycles for RockoonJ, long string
3770    cycles for RockoonJ2, long string
13906   cycles for crt_strcmp, long string
16493   cycles for crt__stricmp, long string, case-insensitive
30072   cycles for movzx check null, long string
20088   cycles for repe cmpsb, long string
89965   cycles for lstrcmp, long string

18      cycles for SSE with null check, 10 bytes
24      cycles for SSE with null check, 10 bytes, case-insensitive
16      cycles for Lingo, 10 bytes
9       cycles for Frank, 10 bytes
48      cycles for Rockoon, 10 bytes
7       cycles for RockoonJ, 10 bytes
7       cycles for RockoonJ2, 10 bytes
32      cycles for crt_strcmp, 10 bytes
108     cycles for crt__stricmp, 10 bytes, case-insensitive
51      cycles for movzx check null, 10 bytes
88      cycles for repe cmpsb, 10 bytes
453     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
Rockoon:        44
RockoonJ:       42
RockoonJ2:      33
SSE:    277     (MasmBasic)


It can compete with the SSE algo, but bear in mind there is no null check, and granularity is 8 bytes.
2911    cycles for SSE with null check, long string
3803    cycles for Rockoon, long string
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 08:32:18 PM
Well on my pc I got these results:

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
3299    cycles for LingoCMP, 2600 bytes
1331    cycles for frktons with XOR , 2600 bytes
1370    cycles for frktons with CMP , 2600 bytes
1317    cycles for frktons with CMP II , 2600 bytes
1317    cycles for frktons with CMP III , 2600 bytes
1314    cycles for frktons with CMP unrolled , 2600 bytes
1334    cycles for theunknown with CMP , 2600 bytes
1316    cycles for RockoonCMP, 2600 bytes

--- ok ---


Doesn't seem different from others method regarding performance.  ::)
Title: Re: Compare two strings
Post by: jj2007 on June 18, 2010, 08:46:36 PM
Just replaced the attachment above - I could shave off around 30 cycles and a few bytes. Here is the 33 bytes version.

align 16
RockoonJ2 proc
pop edx
pop eax
pop ecx
push edx
sub ecx, eax
align 8
@@: mov edx, [eax+ecx]
cmp edx, [eax]
jne @F
lea eax, [eax+4]
mov edx, [eax+ecx]
cmp edx, [eax]
lea eax, [eax+4]
je @B
@@: sub eax, [esp-4] ; return the relative position in ecx
ret
RockoonJ2 endp
RockoonJ2_endp:
Title: Re: Compare two strings
Post by: qWord on June 18, 2010, 08:55:22 PM
1 byte shorter and remove one dependency:

mov edx, [eax+ecx+4]
cmp edx, [eax+4]
lea eax, [eax+8]

instead of:
lea eax, [eax+4]
mov edx, [eax+ecx]
cmp edx, [eax]
lea eax, [eax+4]

:bg
Title: Re: Compare two strings
Post by: jj2007 on June 18, 2010, 09:04:44 PM
Quote from: qWord on June 18, 2010, 08:55:22 PM
1 byte shorter and remove one dependency:

Thanks, qWord. I had tested that one, it's a whopping 80 cycles slower on my Celeron - but of course, it might be faster on other CPU's. Did you test it?
Title: Re: Compare two strings
Post by: qWord on June 18, 2010, 09:07:12 PM
no - can you post your current test bed?
Title: Re: Compare two strings
Post by: jj2007 on June 18, 2010, 09:11:17 PM
Here it is. Search for "qWord" :bg
Title: Re: Compare two strings
Post by: qWord on June 18, 2010, 09:15:10 PM
sorry jj,
I've found your previous test bed in the mean time  ::)
Title: Re: Compare two strings
Post by: Rockoon on June 18, 2010, 09:22:09 PM
Quote from: jj2007 on June 18, 2010, 08:29:38 PM
Cute idea :U

'twas nothing. Just one of those things in my bag of tricks. I've got many :)

Now, thinking about this further...

...operations on eax are usually encoded shorter, and that is true for both MOV EAX, [...] and ADD EAX, constant

so maybe make the comparator register EAX or conversely make the incrementing register EAX. I doubt it will have a performance impact (so doubtful that I wont bother) but it should shave a byte or so (??) off the loop.

as far as detecting the null terminator, a SWAR method would probably be most efficient for the general purpose regs. Might become register constrained tho.
Title: Re: Compare two strings
Post by: qWord on June 18, 2010, 09:24:02 PM
jj,
for my core2duo it seems a bit faster for long strings, but no difference for the short ones.
Title: Re: Compare two strings
Post by: Rockoon on June 18, 2010, 09:33:54 PM

Am I the only guy with AMD's?


AMD Phenom(tm) II X6 1055T Processor (SSE3)
String comparison: short string 10 bytes, long string 5050
1609    cycles for SSE with null check, long string
1921    cycles for SSE with null check, long string, case-insensitive
3776    cycles for Lingo, long string
2732    cycles for Frank, long string
2522    cycles for Rockoon, long string
2018    cycles for RockoonJ, long string
2211    cycles for RockoonJ2, long string
10046   cycles for crt_strcmp, long string
20065   cycles for crt__stricmp, long string, case-insensitive
25050   cycles for movzx check null, long string
11026   cycles for repe cmpsb, long string
78248   cycles for lstrcmp, long string

23      cycles for SSE with null check, 10 bytes
34      cycles for SSE with null check, 10 bytes, case-insensitive
17      cycles for Lingo, 10 bytes
8       cycles for Frank, 10 bytes
9       cycles for Rockoon, 10 bytes
10      cycles for RockoonJ, 10 bytes
10      cycles for RockoonJ2, 10 bytes
23      cycles for crt_strcmp, 10 bytes
60      cycles for crt__stricmp, 10 bytes, case-insensitive
41      cycles for movzx check null, 10 bytes
34      cycles for repe cmpsb, 10 bytes
777     cycles for lstrcmp, 10 bytes

1608    cycles for SSE with null check, long string
1920    cycles for SSE with null check, long string, case-insensitive
3776    cycles for Lingo, long string
2732    cycles for Frank, long string
2522    cycles for Rockoon, long string
2018    cycles for RockoonJ, long string
2210    cycles for RockoonJ2, long string
10037   cycles for crt_strcmp, long string
20072   cycles for crt__stricmp, long string, case-insensitive
25048   cycles for movzx check null, long string
11031   cycles for repe cmpsb, long string
77537   cycles for lstrcmp, long string

23      cycles for SSE with null check, 10 bytes
34      cycles for SSE with null check, 10 bytes, case-insensitive
17      cycles for Lingo, 10 bytes
8       cycles for Frank, 10 bytes
9       cycles for Rockoon, 10 bytes
9       cycles for RockoonJ, 10 bytes
10      cycles for RockoonJ2, 10 bytes
20      cycles for crt_strcmp, 10 bytes
60      cycles for crt__stricmp, 10 bytes, case-insensitive
41      cycles for movzx check null, 10 bytes
34      cycles for repe cmpsb, 10 bytes
755     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
Rockoon:        40
RockoonJ:       42
RockoonJ2:      33
SSE:    277     (MasmBasic)

Title: Re: Compare two strings
Post by: jj2007 on June 18, 2010, 09:42:37 PM
Quote from: Rockoon on June 18, 2010, 09:22:09 PM
...operations on eax are usually encoded shorter, and that is true for both MOV EAX, [...] and ADD EAX, constant
so maybe make the comparator register EAX or conversely make the incrementing register EAX.

I'm afraid there is no difference - sizes are identical. This variant seems a tick faster, though:

align 16
RockoonJ3 proc
pop edx
pop eax
pop ecx
sub ecx, eax
push edx
align 8 ; aka mov edi, edi
@@:
REPEAT 1 ; further unrolling slows it down
mov edx, [eax+ecx]
cmp edx, [eax]
lea eax, [eax+4]
jne @F
ENDM
mov edx, [eax+ecx]
cmp edx, [eax]
lea eax, [eax+4]
je @B
@@: sub eax, [esp-4] ; return the relative position in eax
ret
RockoonJ3 endp


Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
2909    cycles for SSE with null check, long string
3826    cycles for Rockoon, long string
3836    cycles for RockoonJ, long string
3788    cycles for RockoonJ2, long string
3723    cycles for RockoonJ3, long string
Title: Re: Compare two strings
Post by: clive on June 18, 2010, 09:52:40 PM
Quote from: Rockoon
Am I the only guy with AMD's?

I have a couple of Venice and Newcastle boxes, but unfortunately nothing newer. Will also try the code on an Atom later, because that is more like the P3.
Title: Re: Compare two strings
Post by: Rockoon on June 18, 2010, 09:55:33 PM
SWAR method for detecting NULL bytes...

assuming eax contains 4 bytes to be checked

mov esi, eax
sub eax, 1010101h
not esi
and eax, 80808080h
and eax, esi

ZF is now set if EAX contained no null's

can be extended in the obvious manner for 64-bit registers

In C, this is

bool hasZeroByte = (v - 0x01010101UL) & ~v & 0x80808080UL;

from http://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord
Title: Re: Compare two strings
Post by: jj2007 on June 18, 2010, 10:21:06 PM
Quote from: Rockoon on June 18, 2010, 09:55:33 PM
SWAR method for detecting NULL bytes...

It makes a difference :bg

For these timings, I have been mean - I added 50,000 nullbytes to the long test strings.
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
String comparison: short string 10 bytes, long string 5050 (+50000)
2932    cycles for SSE with null check, long string
3665    cycles for SSE with null check, long string, case-insensitive
5545    cycles for Lingo, long string with null check
59617   cycles for Frank, long string
44534   cycles for Rockoon, long string
44263   cycles for RockoonJ, long string
44044   cycles for RockoonJ2, long string
44192   cycles for RockoonJ3, long string
7875    cycles for RockoonCNB, long string, check nullbyte
13922   cycles for crt_strcmp, long string
16129   cycles for crt__stricmp, long string, case-insensitive
90185   cycles for lstrcmp, long string
Title: Re: Compare two strings
Post by: jj2007 on June 18, 2010, 10:26:57 PM
Quote from: Rockoon on June 18, 2010, 09:55:33 PM
SWAR method for detecting NULL bytes...
...
bool hasZeroByte = (v - 0x01010101UL) & ~v & 0x80808080UL;

from http://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord

"was written in a newsgroup post on April 27, 1987 by Alan Mycroft" - I had just abandoned my Sinclair ZX at that time ::)
Title: Re: Compare two strings
Post by: frktons on June 18, 2010, 11:04:34 PM
So what? Are you checking if one or more bytes inside the register
contain a NULL [zero binary?].

And then you stop confronting the two strings?

What about a binary file that contains some zero binary bytes and
those bytes are part of the string we are comparing?

If all the strings are NULL terminated like in the C Language, let's
assume we are comparing just two memory areas with that content.

I hope I understood what you are meaning.  ::)
Title: Re: Compare two strings
Post by: clive on June 18, 2010, 11:26:43 PM
What about a binary file that contains some zero binary bytes and
those bytes are part of the string we are comparing?


They are NOT strings (PASCAL, C, '$' terminated), they are simply collections of bytes.

Yes, you want to do a memcmp(), not a strcmp().
Title: Re: Compare two strings
Post by: jj2007 on June 18, 2010, 11:29:51 PM
Strings are null-terminated, unless the contrary is explicitly stated. But that's not the problem.
Imagine you are comparing files. You have allocated two fat memory buffers with VirtualAlloc. You are loading two files into this buffer, and run the routine without checking the nullbytes. The following scenarios are possible:

- you know the string length, and stop comparing when the end is reached
- you do not know the string length, and stop comparing when the strings differ
- you do not know the string length, and the files are equal

In the latter case, you stop comparing with an access violation at the end of the buffer.
Title: Re: Compare two strings
Post by: clive on June 18, 2010, 11:54:59 PM
http://www.masm32.com/board/index.php?topic=13701.msg113195#msg113195
StrCompCheckNull.zip

Venice

AMD Athlon(tm) 64 Processor 3400+ (SSE3)
String comparison: short string 10 bytes, long string 5050
2883    cycles for SSE with null check, long string
3520    cycles for SSE with null check, long string, case-insensitive
3964    cycles for Lingo, long string with null check
56656   cycles for Frank, long string
54710   cycles for Rockoon, long string
61931   cycles for RockoonJ, long string
54668   cycles for RockoonJ2, long string
54667   cycles for RockoonJ3, long string
5061    cycles for RockoonCNB, long string, check nullbyte
10149   cycles for crt_strcmp, long string
20218   cycles for crt__stricmp, long string, case-insensitive
106631  cycles for lstrcmp, long string

29      cycles for SSE with null check, 10 bytes
46      cycles for SSE with null check, 10 bytes, case-insensitive
13      cycles for Lingo, 10 bytes with null check
7       cycles for Frank, 10 bytes
7       cycles for Rockoon, 10 bytes
8       cycles for RockoonJ, 10 bytes
8       cycles for RockoonJ2, 10 bytes
10      cycles for RockoonJ3, 10 bytes
13      cycles for RockoonCNB, 10 bytes, check nullbyte
524     cycles for lstrcmp, 10 bytes

2888    cycles for SSE with null check, long string
3519    cycles for SSE with null check, long string, case-insensitive
3975    cycles for Lingo, long string with null check
56400   cycles for Frank, long string
55030   cycles for Rockoon, long string
61608   cycles for RockoonJ, long string
54774   cycles for RockoonJ2, long string
55538   cycles for RockoonJ3, long string
5169    cycles for RockoonCNB, long string, check nullbyte
10234   cycles for crt_strcmp, long string
20625   cycles for crt__stricmp, long string, case-insensitive
108163  cycles for lstrcmp, long string

29      cycles for SSE with null check, 10 bytes
43      cycles for SSE with null check, 10 bytes, case-insensitive
8       cycles for Lingo, 10 bytes with null check
-12     cycles for Frank, 10 bytes
7       cycles for Rockoon, 10 bytes
2       cycles for RockoonJ, 10 bytes
14      cycles for RockoonJ2, 10 bytes
10      cycles for RockoonJ3, 10 bytes
13      cycles for RockoonCNB, 10 bytes, check nullbyte
558     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
Rockoon:        40
RockoonJ:       42
RockoonJ2:      33
RockoonJ3:      33
RockoonCNB:     84
SSE:    277     (MasmBasic)
--- ok ---


NewCastle

AMD Athlon(tm) 64 Processor 3200+ (SSE2)
String comparison: short string 10 bytes, long string 5050
2920    cycles for SSE with null check, long string
3541    cycles for SSE with null check, long string, case-insensitive
3985    cycles for Lingo, long string with null check
91701   cycles for Frank, long string
57463   cycles for Rockoon, long string
61822   cycles for RockoonJ, long string
57433   cycles for RockoonJ2, long string
57502   cycles for RockoonJ3, long string
5104    cycles for RockoonCNB, long string, check nullbyte
10184   cycles for crt_strcmp, long string
20326   cycles for crt__stricmp, long string, case-insensitive
107065  cycles for lstrcmp, long string

29      cycles for SSE with null check, 10 bytes
46      cycles for SSE with null check, 10 bytes, case-insensitive
19      cycles for Lingo, 10 bytes with null check
6       cycles for Frank, 10 bytes
7       cycles for Rockoon, 10 bytes
8       cycles for RockoonJ, 10 bytes
8       cycles for RockoonJ2, 10 bytes
9       cycles for RockoonJ3, 10 bytes
15      cycles for RockoonCNB, 10 bytes, check nullbyte
528     cycles for lstrcmp, 10 bytes

2925    cycles for SSE with null check, long string
3542    cycles for SSE with null check, long string, case-insensitive
4111    cycles for Lingo, long string with null check
57906   cycles for Frank, long string
57502   cycles for Rockoon, long string
61841   cycles for RockoonJ, long string
57448   cycles for RockoonJ2, long string
57577   cycles for RockoonJ3, long string
5143    cycles for RockoonCNB, long string, check nullbyte
10188   cycles for crt_strcmp, long string
20411   cycles for crt__stricmp, long string, case-insensitive
107784  cycles for lstrcmp, long string

29      cycles for SSE with null check, 10 bytes
47      cycles for SSE with null check, 10 bytes, case-insensitive
14      cycles for Lingo, 10 bytes with null check
6       cycles for Frank, 10 bytes
7       cycles for Rockoon, 10 bytes
8       cycles for RockoonJ, 10 bytes
8       cycles for RockoonJ2, 10 bytes
13      cycles for RockoonJ3, 10 bytes
15      cycles for RockoonCNB, 10 bytes, check nullbyte
528     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
Rockoon:        40
RockoonJ:       42
RockoonJ2:      33
RockoonJ3:      33
RockoonCNB:     84
SSE:    277     (MasmBasic)
--- ok ---


Atom

Intel(R) Atom(TM) CPU N270   @ 1.60GHz (SSE4)
String comparison: short string 10 bytes, long string 5050
6787    cycles for SSE with null check, long string
7194    cycles for SSE with null check, long string, case-insensitive
13367   cycles for Lingo, long string with null check
131992  cycles for Frank, long string
95902   cycles for Rockoon, long string
104444  cycles for RockoonJ, long string
106118  cycles for RockoonJ2, long string
70027   cycles for RockoonJ3, long string
10839   cycles for RockoonCNB, long string, check nullbyte
18223   cycles for crt_strcmp, long string
23639   cycles for crt__stricmp, long string, case-insensitive
222617  cycles for lstrcmp, long string

73      cycles for SSE with null check, 10 bytes
70      cycles for SSE with null check, 10 bytes, case-insensitive
33      cycles for Lingo, 10 bytes with null check
23      cycles for Frank, 10 bytes
22      cycles for Rockoon, 10 bytes
20      cycles for RockoonJ, 10 bytes
30      cycles for RockoonJ2, 10 bytes
26      cycles for RockoonJ3, 10 bytes
29      cycles for RockoonCNB, 10 bytes, check nullbyte
913     cycles for lstrcmp, 10 bytes

4723    cycles for SSE with null check, long string
5298    cycles for SSE with null check, long string, case-insensitive
12200   cycles for Lingo, long string with null check
130570  cycles for Frank, long string
96277   cycles for Rockoon, long string
97053   cycles for RockoonJ, long string
98298   cycles for RockoonJ2, long string
68774   cycles for RockoonJ3, long string
9803    cycles for RockoonCNB, long string, check nullbyte
17967   cycles for crt_strcmp, long string
24884   cycles for crt__stricmp, long string, case-insensitive
216194  cycles for lstrcmp, long string

57      cycles for SSE with null check, 10 bytes
71      cycles for SSE with null check, 10 bytes, case-insensitive
38      cycles for Lingo, 10 bytes with null check
27      cycles for Frank, 10 bytes
29      cycles for Rockoon, 10 bytes
24      cycles for RockoonJ, 10 bytes
34      cycles for RockoonJ2, 10 bytes
30      cycles for RockoonJ3, 10 bytes
35      cycles for RockoonCNB, 10 bytes, check nullbyte
930     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
Rockoon:        40
RockoonJ:       42
RockoonJ2:      33
RockoonJ3:      33
RockoonCNB:     84
SSE:    277     (MasmBasic)
--- ok ---
Title: Re: Compare two strings
Post by: KeepingRealBusy on June 19, 2010, 12:31:42 AM
Clive,

For this following test, which ZiP did you use?

Dave.

Quote from: clive on June 18, 2010, 11:54:59 PM
AMD Athlon(tm) 64 Processor 3400+ (SSE3)
String comparison: short string 10 bytes, long string 5050
2883    cycles for SSE with null check, long string
3520    cycles for SSE with null check, long string, case-insensitive
3964    cycles for Lingo, long string with null check
56656   cycles for Frank, long string
54710   cycles for Rockoon, long string
61931   cycles for RockoonJ, long string
54668   cycles for RockoonJ2, long string
54667   cycles for RockoonJ3, long string
5061    cycles for RockoonCNB, long string, check nullbyte
10149   cycles for crt_strcmp, long string
20218   cycles for crt__stricmp, long string, case-insensitive
106631  cycles for lstrcmp, long string

29      cycles for SSE with null check, 10 bytes
46      cycles for SSE with null check, 10 bytes, case-insensitive
13      cycles for Lingo, 10 bytes with null check
7       cycles for Frank, 10 bytes
7       cycles for Rockoon, 10 bytes
8       cycles for RockoonJ, 10 bytes
8       cycles for RockoonJ2, 10 bytes
10      cycles for RockoonJ3, 10 bytes
13      cycles for RockoonCNB, 10 bytes, check nullbyte
524     cycles for lstrcmp, 10 bytes

2888    cycles for SSE with null check, long string
3519    cycles for SSE with null check, long string, case-insensitive
3975    cycles for Lingo, long string with null check
56400   cycles for Frank, long string
55030   cycles for Rockoon, long string
61608   cycles for RockoonJ, long string
54774   cycles for RockoonJ2, long string
55538   cycles for RockoonJ3, long string
5169    cycles for RockoonCNB, long string, check nullbyte
10234   cycles for crt_strcmp, long string
20625   cycles for crt__stricmp, long string, case-insensitive
108163  cycles for lstrcmp, long string

29      cycles for SSE with null check, 10 bytes
43      cycles for SSE with null check, 10 bytes, case-insensitive
8       cycles for Lingo, 10 bytes with null check
-12     cycles for Frank, 10 bytes
7       cycles for Rockoon, 10 bytes
2       cycles for RockoonJ, 10 bytes
14      cycles for RockoonJ2, 10 bytes
10      cycles for RockoonJ3, 10 bytes
13      cycles for RockoonCNB, 10 bytes, check nullbyte
558     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
Rockoon:        40
RockoonJ:       42
RockoonJ2:      33
RockoonJ3:      33
RockoonCNB:     84
SSE:    277     (MasmBasic)
--- ok ---

Title: Re: Compare two strings
Post by: clive on June 19, 2010, 12:51:04 AM
Quote from: KeepingRealBusy on June 19, 2010, 12:31:42 AM
For this following test, which ZiP did you use?

The last one JJ posted.

http://www.masm32.com/board/index.php?topic=13701.msg113195#msg113195
Title: Re: Compare two strings
Post by: KeepingRealBusy on June 19, 2010, 01:04:17 AM
Clive,

Thank you.
Title: Re: Compare two strings
Post by: frktons on June 19, 2010, 02:28:51 AM
Quote from: jj2007 on June 18, 2010, 11:29:51 PM
Strings are null-terminated, unless the contrary is explicitly stated. But that's not the problem.
Imagine you are comparing files. You have allocated two fat memory buffers with VirtualAlloc. You are loading two files into this buffer, and run the routine without checking the nullbytes. The following scenarios are possible:

- you know the string length, and stop comparing when the end is reached
- you do not know the string length, and stop comparing when the strings differ
- you do not know the string length, and the files are equal

In the latter case, you stop comparing with an access violation at the end of the buffer.

When I read files, I have some chances to know the number of bytes I'm reading into
memory, so I can compare just that number of bytes or create by myself the sequence
of bytes that tell me I'm on the EOF.

As Clive pointed out, I'm going to do a memcmp() in that case, and it could be one
possible scenario for the use of this kind of comparing.

If they are strings, the NULL check is mandatory, being it the de facto EOF.
By the way, I assume you are getting those results with the methods tested
just because this rule is not explicit, so there are big differences due to this
simple fact, not because the algos are that much smarter.  :P
Title: Re: Compare two strings
Post by: KeepingRealBusy on June 19, 2010, 03:17:36 AM
Here is my time for my dekstop (development) system:


AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (SSE3)
String comparison: short string 10 bytes, long string 5050
2863    cycles for SSE with null check, long string
6233    cycles for SSE with null check, long string, case-insensitive
3948    cycles for Lingo, long string with null check
60190   cycles for Frank, long string
56336   cycles for Rockoon, long string
63878   cycles for RockoonJ, long string
55673   cycles for RockoonJ2, long string
54345   cycles for RockoonJ3, long string
5029    cycles for RockoonCNB, long string, check nullbyte
12533   cycles for crt_strcmp, long string
20673   cycles for crt__stricmp, long string, case-insensitive
101896  cycles for lstrcmp, long string

29      cycles for SSE with null check, 10 bytes
55      cycles for SSE with null check, 10 bytes, case-insensitive
13      cycles for Lingo, 10 bytes with null check
6       cycles for Frank, 10 bytes
7       cycles for Rockoon, 10 bytes
8       cycles for RockoonJ, 10 bytes
-6      cycles for RockoonJ2, 10 bytes
10      cycles for RockoonJ3, 10 bytes
13      cycles for RockoonCNB, 10 bytes, check nullbyte
591     cycles for lstrcmp, 10 bytes

2864    cycles for SSE with null check, long string
3495    cycles for SSE with null check, long string, case-insensitive
6455    cycles for Lingo, long string with null check
56777   cycles for Frank, long string
60430   cycles for Rockoon, long string
65005   cycles for RockoonJ, long string
54752   cycles for RockoonJ2, long string
60138   cycles for RockoonJ3, long string
5272    cycles for RockoonCNB, long string, check nullbyte
10029   cycles for crt_strcmp, long string
20091   cycles for crt__stricmp, long string, case-insensitive
95502   cycles for lstrcmp, long string

29      cycles for SSE with null check, 10 bytes
46      cycles for SSE with null check, 10 bytes, case-insensitive
13      cycles for Lingo, 10 bytes with null check
6       cycles for Frank, 10 bytes
7       cycles for Rockoon, 10 bytes
8       cycles for RockoonJ, 10 bytes
8       cycles for RockoonJ2, 10 bytes
10      cycles for RockoonJ3, 10 bytes
13      cycles for RockoonCNB, 10 bytes, check nullbyte
504     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
Rockoon:        40
RockoonJ:       42
RockoonJ2:      33
RockoonJ3:      33
RockoonCNB:     84
SSE:    277     (MasmBasic)
--- ok ---


And here is my timing for my laptop (Internet):


Intel(R) Pentium(R) 4 CPU 3.20GHz (SSE2)
String comparison: short string 10 bytes, long string 5050
5463    cycles for SSE with null check, long string
8446    cycles for SSE with null check, long string, case-insensitive
9214    cycles for Lingo, long string with null check
77152   cycles for Frank, long string
67687   cycles for Rockoon, long string
66298   cycles for RockoonJ, long string
69855   cycles for RockoonJ2, long string
66412   cycles for RockoonJ3, long string
9748    cycles for RockoonCNB, long string, check nullbyte
14225   cycles for crt_strcmp, long string
33251   cycles for crt__stricmp, long string, case-insensitive
135879  cycles for lstrcmp, long string

16      cycles for SSE with null check, 10 bytes
23      cycles for SSE with null check, 10 bytes, case-insensitive
37      cycles for Lingo, 10 bytes with null check
43      cycles for Frank, 10 bytes
6       cycles for Rockoon, 10 bytes
16      cycles for RockoonJ, 10 bytes
5       cycles for RockoonJ2, 10 bytes
16      cycles for RockoonJ3, 10 bytes
11      cycles for RockoonCNB, 10 bytes, check nullbyte
695     cycles for lstrcmp, 10 bytes

5105    cycles for SSE with null check, long string
9337    cycles for SSE with null check, long string, case-insensitive
8719    cycles for Lingo, long string with null check
80502   cycles for Frank, long string
67870   cycles for Rockoon, long string
65171   cycles for RockoonJ, long string
66735   cycles for RockoonJ2, long string
66174   cycles for RockoonJ3, long string
9802    cycles for RockoonCNB, long string, check nullbyte
14903   cycles for crt_strcmp, long string
34773   cycles for crt__stricmp, long string, case-insensitive
132146  cycles for lstrcmp, long string

22      cycles for SSE with null check, 10 bytes
32      cycles for SSE with null check, 10 bytes, case-insensitive
18      cycles for Lingo, 10 bytes with null check
16      cycles for Frank, 10 bytes
20      cycles for Rockoon, 10 bytes
13      cycles for RockoonJ, 10 bytes
10      cycles for RockoonJ2, 10 bytes
10      cycles for RockoonJ3, 10 bytes
18      cycles for RockoonCNB, 10 bytes, check nullbyte
648     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
Rockoon:        40
RockoonJ:       42
RockoonJ2:      33
RockoonJ3:      33
RockoonCNB:     84
SSE:    277     (MasmBasic)
--- ok ---

Title: Re: Compare two strings
Post by: frktons on June 20, 2010, 02:42:45 PM
Here it is a new approach to string comparing.

First difference: a parallel forward and backward scan of the strings.
This should be faster because if there is a difference toward the end
or near the beginning of the string it'll be detected without scanning
the whole.
Second difference: it checks all the spare bytes the previous version
did not.
I don't know if anyone has already used this approach, but I think it could
be useful in various occasions.

The routine doesn't return the position of the different bytes, nor anything else.
It is just a quick code, without any optimization in mind.

This is just to show the method.


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
1172    cycles for Parallel scan, 2600 bytes

--- ok ---


If anyone thinks it is worth the effort, optimize it, customize it, and enjoy.

Frank
Title: Re: Compare two strings
Post by: dedndave on June 20, 2010, 02:49:54 PM
to do it that way, seems like you have to know the string lengths, first   :P
Title: Re: Compare two strings
Post by: frktons on June 20, 2010, 02:52:41 PM
Quote from: dedndave on June 20, 2010, 02:49:54 PM
to do it that way, seems like you have to know the string lengths, first   :P

Compliments  :clap: :clap:

This is one the three assumptions the program does.  :P

Guess the next, please.  :lol
Title: Re: Compare two strings
Post by: KeepingRealBusy on June 21, 2010, 09:14:09 PM
I have added some timing for my version of an SSE compare as KRBxxxx (4 tests)
(KRB for KeepingRealBusy). I use this procedure in a Quicksort Indirect for a
buffer (2 GB) full of variable length strings. These have been read from a 6+ GB
file and converted (2 GB block by block) to variable length structures
(WORD_WITH_BYTE_COUNT,BYTE_ARRAY) in-place in the input buffer. Then a separate
buffer is created pointing to these structures, and the pointers are
Quicksorted. Next, the pointers are written to an intermediate file followed by
the records in-order (data only, not the string length - the pointers are
adjusted to point to the moved data in the intermediate buffer, thus the string
length is defined by the difference between any pointer and it's following
pointer). I did reserve the last 16 bytes of the buffer to insure that an XMM
access of the last record in the buffer would not encounter an error. My sorting
requirements did not need the mismatch location (as has been supplied in some of
these other tests) but did require the type of the mismatch (below/equal/above,
and not just equal/not_equal). Thus - I made 4 tests with different optional
return choices: returning both values, one value, the other value, and neither.
My choice would be KRBNP (return compare result but no position return) for use
with the sort. I implemented some test code to verify the results of my first
long string test and my first short string test (returning both results), and
then removed the verification code with "if 0" "endif".

Of course, since my data needed a particular format (a leading size WORD), I
created that format with 4 new data structures (2 for long and 2 for short).

I had originally created fixed length copies of the input strings for direct
sorting, but this took too much time with data handling. The average string
length was 48 bytes but the longest strings were 160 bytes. This caused the
fixed length file to balloon to 20+ GB. The new method allows blocks of sorted
records to be created with only 2 extra bytes per string so that the merging of
the blocks is much faster (only 1/3 of the blocks). The two different formats
for the string lengths (a WORD preceeding the string during sort, and the
difference between two pointers during merging) require two different entries to
the compare, but the rest of the compare logic remains the same following the
entry.

I was a bit disappointed with some of the timing code. No attempt was made to
actually return a true string comparison indicator (a negative value, a zero
value, or a positive value in eax) (such as would be returned from lstrcmp)
except for the last three methods (crt_strcmp, crt__stricmp, and lstrcmp) and
Lingo. In fact, the returned flags for all methods other than those last three
returned the flags from subtracting the starting value from the mismatch point.
For my tests which return comparison results, the results are returned in the
flags. For my tests which return mismatch position, the result is returned in
eax without affecting the comparison flags.

Frank made no check to detect the end of a string and ran off the end of the
first string into the null and then the 50000 byte zero pad and then the string
of "xxx" and then the second string, but at the same time ran off the end of the
second string into the null and then the 50000 byte zero pad and then the string
of "xxx" and then the short strings which finally stopped the compare with a
mismatch (the second string data as part of the first string did not match the
short string data as part of the second string). Essentially, he compared 55036
bytes (5000+1+50000+34+1) before the mismatch. If he had made a null check, then
his counts would have been in the 5000 range instead of the 55000 range. Rockoon
(below) had the same problem until the last attempt where he added the null
check and his times show the huge difference. Neither Frank's method nor
Rockoon's method with DWORD compares will substitute for a lstrcmp even if it
returned the comparison result. What needs to be done to return a true lstrcmp
result (but using DWORD compares) is something like the following (but it
requires that the string be padded with zeros to a mod 4 bound and not just a
single null with trailing garbage in the last DWORD and this is not the normal
case unless the strings are edited):


    @@:
        mov eax[esi]
        cmp eax[edi]
        jnz @f
    ;
    ;   null check or length check here
    ;
         .
         .
         .
        lea esi,[esi+4]
        lea edi,[edi+4]
        jmp @b
    @@:
        mov ebx,[edi]
        bswap eax
        bswap ebx
        cmp eax,ebx


Rockoon depended on the two strings being adjacent such that the string length
of the first was the difference between the two pointers (and the second string
length was not checked) or maybe I am missing something. He was comparing the
strings, the null, the 50000 byte zero pad and then the string of "xxx". His
final test did make the null check and the counts show it:


    56620   cycles for RockoonJ3, long string
    5028    cycles for RockoonCNB, long string, check nullbyte



I have not completely analyzed the StrCmpSSE method, but at least it returns both
a mismatch position and the mismatched byte difference (it swaps the regs,
trying to get one string that is aligned, and I did not thoroughly analyze if
this difference was corrected for such swaps - probably the simplest test would
be to create test strings that differ in alignment and where the first string is
always less than the second string in the lstrcmp sense and verify that the
returned comparison result is always correct whether or not the strings might
have been swapped for alignment).

I have not analyzed Lingo's method completely, but what can I say. His code is
good, and fast, and good and fast. He even returns the result of the comparison
(I think it does, I did not spend too much time verifying all of the possible
conditions).

What can I say about crt_strcmp, crt__stricmp, and lstrcmp. The counts tell the
story of what you can expect with a general purpose solution:


    9670    cycles for crt_strcmp, long string
    20589   cycles for crt__stricmp, long string, case-insensitive
    95656   cycles for lstrcmp, long string


I was dismayed to see that the repe cmpsb test was conditionally not assembled.
I wanted to see how bad it could really get since they didn't include it, so I
re-enabled the tests. Actually, I created 4 tests with 4 sets of data. The first
set of two tests used unaligned data, the second set of two tests used aligned
data. The first test of each set was a repe cmpsb, the second test of each set
was a repe cmpsd (by DWORDS until mismatch), then backup the indices and retry
the last DWORD with a cmpsb to get an accurate compare result. Amazingly, the
counts are NOT ALL THAT BAD, especially for aligned data!

In all of tests (except mine) which make null byte checks, the null byte check
is done for each compare. Checking for the end of the string for each compare
seems to be a case of time wasted, at least for usage with a sort. This is
necessary for a generic strcmp, but you do not want to use a generic process for
a specific task that is repeated a billion times or more. What I have done is to
do this once (to begin with when I read the data and convert it to structures)
and then use the calculated lengths for the compares. It is hard to quantify how
much extra time my tests would take (considering that I really do a sort which
compares the strings more than once, more like a billion times, and the length
calculation is only done once and saved with the strings). A 2 GB buffer of
average length strings (48 byte) gives about 44,739,242 records per buffer. This
can be converted in 6 seconds per buffer, so that is about 7,456,540 records per
second, or .134 microseconds per record, or about 134 nanoseconds per record.
Does anyone have a clue of how to convert that to cycles, and then how to
quantify the sort condition where the strings get compared 44,739,242 *
lg(44,739,242) or 44,739,242 * 26 = 1,163,220,292 times. I think you need to
divide 134 nano by 1,163,220,292 * 2 and I do not think in numbers that small. I
know that if I did the strlen for each string during the compare, it would add
134 nano * 1,163,220,292 compares * 2 (both strings) or 311,743,038,256 nano or
312 seconds to the sort time - that is 5 extra minutes, and that is per2 GB buffer,
and there are 3+ buffers to sort and merge.

Case insensitive compares are also time consuming, especially for sorts where
the same string is accessed many times. I find it highly beneficial to do a
lowercase conversion once (table lookup if ASCII - only 3 instructions per
character) and preface the string with the lowercase data, sort the
lowercase/uppercase data, then strip off the lowercase data when writing the
result strings.

Without any further ado, here are my timings.


AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (SSE3)
String comparison: short string 10 bytes, long string 5050
3835    cycles for SSE with null check, long string
2738    cycles for SSE with null check, long string, case-insensitive
4062    cycles for Lingo, long string with null check
59170   cycles for Frank, long string
3189    cycles for KRB, long string
3171    cycles for KRBNF, long string no flags returned
2855    cycles for KRBNR, long string no return position
2542    cycles for KRBNFNR, long string no flags returned no return position
11172   cycles for repe cmpsb, unaligned long string
3849    cycles for repe cmpsd, unaligned long string
12140   cycles for repe cmpsb, aligned long string
1868    cycles for repe cmpsd, aligned long string
60221   cycles for Rockoon, long string
61258   cycles for RockoonJ, long string
55305   cycles for RockoonJ2, long string
57307   cycles for RockoonJ3, long string
5294    cycles for RockoonCNB, long string, check nullbyte
9670    cycles for crt_strcmp, long string
20589   cycles for crt__stricmp, long string, case-insensitive
95656   cycles for lstrcmp, long string

29      cycles for SSE with null check, 10 bytes
46      cycles for SSE with null check, 10 bytes, case-insensitive
13      cycles for Lingo, 10 bytes with null check
6       cycles for Frank, 10 bytes
56      cycles for KRB, 10 bytes
42      cycles for KRBNF, 10 bytes no flags returned
38      cycles for KRBNR, 10 bytes no return position
38      cycles for KRBNFNR, 10 bytes no flags returned no return position
36      cycles for repe cmpsb, unaligned 10 bytes
50      cycles for repe cmpsd, unaligned 10 bytes
36      cycles for repe cmpsb, aligned 10 bytes
46      cycles for repe cmpsd, aligned 10 bytes
7       cycles for Rockoon, aligned 10 bytes
9       cycles for RockoonJ, 10 bytes
8       cycles for RockoonJ2, 10 bytes
10      cycles for RockoonJ3, 10 bytes
13      cycles for RockoonCNB, 10 bytes, check nullbyte
504     cycles for lstrcmp, 10 bytes

2866    cycles for SSE with null check, long string
3497    cycles for SSE with null check, long string, case-insensitive
3938    cycles for Lingo, long string with null check
58561   cycles for Frank, long string
3362    cycles for KRB, long string
3340    cycles for KRBNF, long string no flags returned
2873    cycles for KRBNR, long string no return position
2694    cycles for KRBNFNR, long string no flags returned no return position
12600   cycles for repe cmpsb, unaligned long string
3835    cycles for repe cmpsd, unaligned long string
11129   cycles for repe cmpsb, aligned long string
2825    cycles for repe cmpsd, aligned long string
57165   cycles for Rockoon, long string
62959   cycles for RockoonJ, long string
56479   cycles for RockoonJ2, long string
55263   cycles for RockoonJ3, long string
4118    cycles for RockoonCNB, long string, check nullbyte
11211   cycles for crt_strcmp, long string
21257   cycles for crt__stricmp, long string, case-insensitive
91616   cycles for lstrcmp, long string

29      cycles for SSE with null check, 10 bytes
29      cycles for SSE with null check, 10 bytes, case-insensitive
13      cycles for Lingo, 10 bytes with null check
6       cycles for Frank, 10 bytes
72      cycles for KRB, 10 bytes
42      cycles for KRBNF, 10 bytes no flags returned
38      cycles for KRBNR, 10 bytes no return position
47      cycles for KRBNFNR, 10 bytes no flags returned no return position
36      cycles for repe cmpsb, unaligned 10 bytes
50      cycles for repe cmpsd, unaligned 10 bytes
36      cycles for repe cmpsb, aligned 10 bytes
46      cycles for repe cmpsd, aligned 10 bytes
7       cycles for Rockoon, aligned 10 bytes
9       cycles for RockoonJ, 10 bytes
8       cycles for RockoonJ2, 10 bytes
10      cycles for RockoonJ3, 10 bytes
26      cycles for RockoonCNB, 10 bytes, check nullbyte
541     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
KRB:    115
KRBNF:  99
KRBNR:  105
KRBNFNR:        91
Rockoon:        40
RockoonJ:       42
RockoonJ2:      33
RockoonJ3:      33
RockoonCNB:     84
SSE:    277     (MasmBasic)
--- ok ---

Title: Re: Compare two strings
Post by: jj2007 on June 21, 2010, 10:21:31 PM
Good job :U

QuoteChecking for the end of the string for each compare seems to be a case of time wasted, at least for usage with a sort.
The leading word is a solution for this particular case, where you compare over and over the same strings. However, in a different setting checking the len first might imply that the cache has to be reloaded, i.e. once for Len() and again for the actual comparison.

QuoteStrCmpSSE method, ... the simplest test would be to create test strings that differ in alignment
The two dummy strings are meant for doing this test.

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
String comparison: short string 10 bytes, long string 5050
2930    cycles for SSE with null check, long string
3662    cycles for SSE with null check, long string, case-insensitive
5533    cycles for Lingo, long string with null check
55931   cycles for Frank, long string
14716   cycles for KRB, long string
3192    cycles for KRBNF, long string no flags returned
3210    cycles for KRBNR, long string no return position
3141    cycles for KRBNFNR, long string no flags returned no return position
20961   cycles for repe cmpsb, unaligned long string
6982    cycles for repe cmpsd, unaligned long string
26079   cycles for repe cmpsb, aligned long string
5135    cycles for repe cmpsd, aligned long string
44509   cycles for Rockoon, long string
44180   cycles for RockoonJ, long string
44074   cycles for RockoonJ2, long string
44162   cycles for RockoonJ3, long string
7864    cycles for RockoonCNB, long string, check nullbyte
13922   cycles for crt_strcmp, long string
16095   cycles for crt__stricmp, long string, case-insensitive
89770   cycles for lstrcmp, long string

18      cycles for SSE with null check, 10 bytes
25      cycles for SSE with null check, 10 bytes, case-insensitive
16      cycles for Lingo, 10 bytes with null check
9       cycles for Frank, 10 bytes
53      cycles for KRB, 10 bytes
21      cycles for KRBNF, 10 bytes no flags returned
20      cycles for KRBNR, 10 bytes no return position
20      cycles for KRBNFNR, 10 bytes no flags returned no return position
88      cycles for repe cmpsb, unaligned 10 bytes
125     cycles for repe cmpsd, unaligned 10 bytes
88      cycles for repe cmpsb, aligned 10 bytes
121     cycles for repe cmpsd, aligned 10 bytes
8       cycles for Rockoon, aligned 10 bytes
7       cycles for RockoonJ, 10 bytes
7       cycles for RockoonJ2, 10 bytes
8       cycles for RockoonJ3, 10 bytes
16      cycles for RockoonCNB, 10 bytes, check nullbyte
453     cycles for lstrcmp, 10 bytes
Title: Re: Compare two strings
Post by: ecube on June 21, 2010, 10:29:36 PM
KeepingRealBusy i can't compile your example I get


strcompkrb.asm(757) : error A2085: instruction or register not accepted in curre
nt CPU mode
strcompkrb.asm(758) : error A2085: instruction or register not accepted in curre
nt CPU mode
strcompkrb.asm(759) : error A2085: instruction or register not accepted in curre
nt CPU mode
etc...
strcompkrb.asm(1143) : error A2006: undefined symbol : aE2
strcompkrb.asm(1144) : error A2006: undefined symbol : uE2
strcompkrb.asm(1148) : error A2006: undefined symbol : ciE2
strcompkrb.asm(1172) : error A2006: undefined symbol : aL1
strcompkrb.asm(1203) : error A2006: undefined symbol : aL1


but according http://www.xbitlabs.com/images/cpu/athlon64-3000/cpuz.png my cpu supports mmx :\
Title: Re: Compare two strings
Post by: frktons on June 21, 2010, 11:02:00 PM
I hope I'll have time enough to modify the routine I proposed for a different context,
and let it suite the context you are using, so in a couple of days the new string compare
routine should be out.  :P

If I have understood the rules of the game, the routine has to be built with the assumptions
that:
1 - the strings are NULL terminated
2 - we don't know the length of the strings

And the next point that I would like to understand is:
3 - the strings are of the same length or it doesn't matter?
If the latter is true what do we check?

Let me know  so I can full enjoy the game for the Beginners Section  :P
Title: Re: Compare two strings
Post by: dedndave on June 21, 2010, 11:53:53 PM
you may have to enable the MXX instructions set, Cube
        include \masm32\include\masm32rt.inc
        .686
        .MXX
        .XMM

that should turn everything on   :P
masm32rt.inc sets the processor to .486 i think
if you want to change it, do so after the include
i think a minimum of .586 has to be on before .MXX or .XMM, too
Title: Re: Compare two strings
Post by: Rockoon on June 22, 2010, 12:14:25 AM

AMD Phenom(tm) II X6 1055T Processor (SSE3)
String comparison: short string 10 bytes, long string 5050
2110    cycles for SSE with null check, long string
2400    cycles for SSE with null check, long string, case-insensitive
3852    cycles for Lingo, long string with null check
34198   cycles for Frank, long string
2028    cycles for KRB, long string
1742    cycles for KRBNF, long string no flags returned
1435    cycles for KRBNR, long string no return position
1274    cycles for KRBNFNR, long string no flags returned no return position
11137   cycles for repe cmpsb, unaligned long string
3017    cycles for repe cmpsd, unaligned long string
11136   cycles for repe cmpsb, aligned long string
2824    cycles for repe cmpsd, aligned long string
33713   cycles for Rockoon, long string
32009   cycles for RockoonJ, long string
33575   cycles for RockoonJ2, long string
33840   cycles for RockoonJ3, long string
5196    cycles for RockoonCNB, long string, check nullbyte
10196   cycles for crt_strcmp, long string
20172   cycles for crt__stricmp, long string, case-insensitive
89104   cycles for lstrcmp, long string

24      cycles for SSE with null check, 10 bytes
34      cycles for SSE with null check, 10 bytes, case-insensitive
17      cycles for Lingo, 10 bytes with null check
1       cycles for Frank, 10 bytes
37      cycles for KRB, 10 bytes
27      cycles for KRBNF, 10 bytes no flags returned
26      cycles for KRBNR, 10 bytes no return position
40      cycles for KRBNFNR, 10 bytes no flags returned no return position
34      cycles for repe cmpsb, unaligned 10 bytes
43      cycles for repe cmpsd, unaligned 10 bytes
34      cycles for repe cmpsb, aligned 10 bytes
43      cycles for repe cmpsd, aligned 10 bytes
8       cycles for Rockoon, aligned 10 bytes
8       cycles for RockoonJ, 10 bytes
10      cycles for RockoonJ2, 10 bytes
21      cycles for RockoonJ3, 10 bytes
14      cycles for RockoonCNB, 10 bytes, check nullbyte
786     cycles for lstrcmp, 10 bytes

2092    cycles for SSE with null check, long string
2340    cycles for SSE with null check, long string, case-insensitive
3894    cycles for Lingo, long string with null check
34232   cycles for Frank, long string
2018    cycles for KRB, long string
1864    cycles for KRBNF, long string no flags returned
1463    cycles for KRBNR, long string no return position
1290    cycles for KRBNFNR, long string no flags returned no return position
11240   cycles for repe cmpsb, unaligned long string
3068    cycles for repe cmpsd, unaligned long string
11272   cycles for repe cmpsb, aligned long string
2868    cycles for repe cmpsd, aligned long string
33779   cycles for Rockoon, long string
31979   cycles for RockoonJ, long string
33669   cycles for RockoonJ2, long string
33805   cycles for RockoonJ3, long string
5194    cycles for RockoonCNB, long string, check nullbyte
10047   cycles for crt_strcmp, long string
20273   cycles for crt__stricmp, long string, case-insensitive
79376   cycles for lstrcmp, long string

24      cycles for SSE with null check, 10 bytes
34      cycles for SSE with null check, 10 bytes, case-insensitive
17      cycles for Lingo, 10 bytes with null check
8       cycles for Frank, 10 bytes
37      cycles for KRB, 10 bytes
42      cycles for KRBNF, 10 bytes no flags returned
26      cycles for KRBNR, 10 bytes no return position
26      cycles for KRBNFNR, 10 bytes no flags returned no return position
34      cycles for repe cmpsb, unaligned 10 bytes
43      cycles for repe cmpsd, unaligned 10 bytes
34      cycles for repe cmpsb, aligned 10 bytes
43      cycles for repe cmpsd, aligned 10 bytes
8       cycles for Rockoon, aligned 10 bytes
8       cycles for RockoonJ, 10 bytes
10      cycles for RockoonJ2, 10 bytes
7       cycles for RockoonJ3, 10 bytes
14      cycles for RockoonCNB, 10 bytes, check nullbyte
793     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
KRB:    115
KRBNF:  99
KRBNR:  105
KRBNFNR:        91
Rockoon:        40
RockoonJ:       42
RockoonJ2:      33
RockoonJ3:      33
RockoonCNB:     84
SSE:    277     (MasmBasic)
--- ok ---


It looks like on AMD, the KRB* solutions are best so far. Remember tho that TANSTATFC
Title: Re: Compare two strings
Post by: ecube on June 22, 2010, 12:51:23 AM
thanks I had that, and .MXX isn't necessary, turns out I was using too old version of masm. Also while you guys did some really interesting stuff with this, I wish more case insensitive versions were made, as a regular strcmp is kinda useless imo. Lingo what is that db stuff you put infront of your functions?
Title: Re: Compare two strings
Post by: frktons on June 22, 2010, 03:03:53 AM
;------------------------------------------------------------------------------------------
; Comparing 2 strings - 5000 bytes long, using a parallel approach.
; One scan goes from the beginning of the string toward the end.
; A second scan goes from the end of the string backward to the start. 
;------------------------------------------------------------------------------------------
; Assumptions:
;      1. Strings are of the same length and NULL terminated
;      2. We don't know the lenght of the strings
;      3. We need to know if the strings are equal or different
;------------------------------------------------------------------------------------------

This modification of the routine is a first approach to the context problem.
Not very fast, but it is a beginning for the Beginners Section  :lol


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
7570    cycles for Parallel scan, 5000 bytes

--- ok ---


Please use this routine inside the testbed instead of the previous one.
If I find better solutions, I'll be working on this version.

Frank

Title: Re: Compare two strings
Post by: KeepingRealBusy on June 22, 2010, 03:14:25 AM
Quote from: E^cube on June 21, 2010, 10:29:36 PM
KeepingRealBusy i can't compile your example I get


strcompkrb.asm(757) : error A2085: instruction or register not accepted in curre
nt CPU mode
strcompkrb.asm(758) : error A2085: instruction or register not accepted in curre
nt CPU mode
strcompkrb.asm(759) : error A2085: instruction or register not accepted in curre
nt CPU mode
etc...
strcompkrb.asm(1143) : error A2006: undefined symbol : aE2
strcompkrb.asm(1144) : error A2006: undefined symbol : uE2
strcompkrb.asm(1148) : error A2006: undefined symbol : ciE2
strcompkrb.asm(1172) : error A2006: undefined symbol : aL1
strcompkrb.asm(1203) : error A2006: undefined symbol : aL1


but according http://www.xbitlabs.com/images/cpu/athlon64-3000/cpuz.png my cpu supports mmx :\

E^cube,

Sorry  I haven't responded sooner. I see that you already found the problem from a later post. I was using, I think, the ML from Visual Studio 2008.

Dave.
Title: Re: Compare two strings
Post by: jj2007 on June 22, 2010, 06:35:13 AM
Quote from: KeepingRealBusy on June 22, 2010, 03:14:25 AM
I was using, I think, the ML from Visual Studio 2008.

ML 6.14 will fail, but 6.15 and higher are ok. Same for JWasm.