News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Compare two strings

Started by yvansoftware, March 30, 2010, 07:40:20 PM

Previous topic - Next topic

dedndave

you may have to enable the MXX instructions set, Cube
        include \masm32\include\masm32rt.inc
        .686
        .MXX
        .XMM

that should turn everything on   :P
masm32rt.inc sets the processor to .486 i think
if you want to change it, do so after the include
i think a minimum of .586 has to be on before .MXX or .XMM, too

Rockoon


AMD Phenom(tm) II X6 1055T Processor (SSE3)
String comparison: short string 10 bytes, long string 5050
2110    cycles for SSE with null check, long string
2400    cycles for SSE with null check, long string, case-insensitive
3852    cycles for Lingo, long string with null check
34198   cycles for Frank, long string
2028    cycles for KRB, long string
1742    cycles for KRBNF, long string no flags returned
1435    cycles for KRBNR, long string no return position
1274    cycles for KRBNFNR, long string no flags returned no return position
11137   cycles for repe cmpsb, unaligned long string
3017    cycles for repe cmpsd, unaligned long string
11136   cycles for repe cmpsb, aligned long string
2824    cycles for repe cmpsd, aligned long string
33713   cycles for Rockoon, long string
32009   cycles for RockoonJ, long string
33575   cycles for RockoonJ2, long string
33840   cycles for RockoonJ3, long string
5196    cycles for RockoonCNB, long string, check nullbyte
10196   cycles for crt_strcmp, long string
20172   cycles for crt__stricmp, long string, case-insensitive
89104   cycles for lstrcmp, long string

24      cycles for SSE with null check, 10 bytes
34      cycles for SSE with null check, 10 bytes, case-insensitive
17      cycles for Lingo, 10 bytes with null check
1       cycles for Frank, 10 bytes
37      cycles for KRB, 10 bytes
27      cycles for KRBNF, 10 bytes no flags returned
26      cycles for KRBNR, 10 bytes no return position
40      cycles for KRBNFNR, 10 bytes no flags returned no return position
34      cycles for repe cmpsb, unaligned 10 bytes
43      cycles for repe cmpsd, unaligned 10 bytes
34      cycles for repe cmpsb, aligned 10 bytes
43      cycles for repe cmpsd, aligned 10 bytes
8       cycles for Rockoon, aligned 10 bytes
8       cycles for RockoonJ, 10 bytes
10      cycles for RockoonJ2, 10 bytes
21      cycles for RockoonJ3, 10 bytes
14      cycles for RockoonCNB, 10 bytes, check nullbyte
786     cycles for lstrcmp, 10 bytes

2092    cycles for SSE with null check, long string
2340    cycles for SSE with null check, long string, case-insensitive
3894    cycles for Lingo, long string with null check
34232   cycles for Frank, long string
2018    cycles for KRB, long string
1864    cycles for KRBNF, long string no flags returned
1463    cycles for KRBNR, long string no return position
1290    cycles for KRBNFNR, long string no flags returned no return position
11240   cycles for repe cmpsb, unaligned long string
3068    cycles for repe cmpsd, unaligned long string
11272   cycles for repe cmpsb, aligned long string
2868    cycles for repe cmpsd, aligned long string
33779   cycles for Rockoon, long string
31979   cycles for RockoonJ, long string
33669   cycles for RockoonJ2, long string
33805   cycles for RockoonJ3, long string
5194    cycles for RockoonCNB, long string, check nullbyte
10047   cycles for crt_strcmp, long string
20273   cycles for crt__stricmp, long string, case-insensitive
79376   cycles for lstrcmp, long string

24      cycles for SSE with null check, 10 bytes
34      cycles for SSE with null check, 10 bytes, case-insensitive
17      cycles for Lingo, 10 bytes with null check
8       cycles for Frank, 10 bytes
37      cycles for KRB, 10 bytes
42      cycles for KRBNF, 10 bytes no flags returned
26      cycles for KRBNR, 10 bytes no return position
26      cycles for KRBNFNR, 10 bytes no flags returned no return position
34      cycles for repe cmpsb, unaligned 10 bytes
43      cycles for repe cmpsd, unaligned 10 bytes
34      cycles for repe cmpsb, aligned 10 bytes
43      cycles for repe cmpsd, aligned 10 bytes
8       cycles for Rockoon, aligned 10 bytes
8       cycles for RockoonJ, 10 bytes
10      cycles for RockoonJ2, 10 bytes
7       cycles for RockoonJ3, 10 bytes
14      cycles for RockoonCNB, 10 bytes, check nullbyte
793     cycles for lstrcmp, 10 bytes

Codesizes:
Lingo:  365
Frank:  45
KRB:    115
KRBNF:  99
KRBNR:  105
KRBNFNR:        91
Rockoon:        40
RockoonJ:       42
RockoonJ2:      33
RockoonJ3:      33
RockoonCNB:     84
SSE:    277     (MasmBasic)
--- ok ---


It looks like on AMD, the KRB* solutions are best so far. Remember tho that TANSTATFC
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

ecube

thanks I had that, and .MXX isn't necessary, turns out I was using too old version of masm. Also while you guys did some really interesting stuff with this, I wish more case insensitive versions were made, as a regular strcmp is kinda useless imo. Lingo what is that db stuff you put infront of your functions?

frktons

;------------------------------------------------------------------------------------------
; Comparing 2 strings - 5000 bytes long, using a parallel approach.
; One scan goes from the beginning of the string toward the end.
; A second scan goes from the end of the string backward to the start. 
;------------------------------------------------------------------------------------------
; Assumptions:
;      1. Strings are of the same length and NULL terminated
;      2. We don't know the lenght of the strings
;      3. We need to know if the strings are equal or different
;------------------------------------------------------------------------------------------

This modification of the routine is a first approach to the context problem.
Not very fast, but it is a beginning for the Beginners Section  :lol


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
7570    cycles for Parallel scan, 5000 bytes

--- ok ---


Please use this routine inside the testbed instead of the previous one.
If I find better solutions, I'll be working on this version.

Frank

Mind is like a parachute. You know what to do in order to use it :-)

KeepingRealBusy

Quote from: E^cube on June 21, 2010, 10:29:36 PM
KeepingRealBusy i can't compile your example I get


strcompkrb.asm(757) : error A2085: instruction or register not accepted in curre
nt CPU mode
strcompkrb.asm(758) : error A2085: instruction or register not accepted in curre
nt CPU mode
strcompkrb.asm(759) : error A2085: instruction or register not accepted in curre
nt CPU mode
etc...
strcompkrb.asm(1143) : error A2006: undefined symbol : aE2
strcompkrb.asm(1144) : error A2006: undefined symbol : uE2
strcompkrb.asm(1148) : error A2006: undefined symbol : ciE2
strcompkrb.asm(1172) : error A2006: undefined symbol : aL1
strcompkrb.asm(1203) : error A2006: undefined symbol : aL1


but according http://www.xbitlabs.com/images/cpu/athlon64-3000/cpuz.png my cpu supports mmx :\

E^cube,

Sorry  I haven't responded sooner. I see that you already found the problem from a later post. I was using, I think, the ML from Visual Studio 2008.

Dave.

jj2007

Quote from: KeepingRealBusy on June 22, 2010, 03:14:25 AM
I was using, I think, the ML from Visual Studio 2008.

ML 6.14 will fail, but 6.15 and higher are ok. Same for JWasm.