The MASM Forum Archive 2004 to 2012
Welcome, Guest. Please login or register.
September 25, 2021, 03:22:09 AM

Login with username, password and session length
Search:     Advanced search
128553 Posts in 15254 Topics by 684 Members
Latest Member: mottt
* Home Help Search Login Register
+  The MASM Forum Archive 2004 to 2012
|-+  General Forums
| |-+  The Laboratory (Moderator: Mark_Larson)
| | |-+  StrLen timings needed
« previous next »
Pages: [1] 2 3 ... 10 Print
Author Topic: StrLen timings needed  (Read 55490 times)
jj2007
Member
*****
Gender: Male
Posts: 6011



StrLen timings needed
« on: August 15, 2010, 09:32:10 PM »

Hi folks,
Could I please have some timings on non-Celerons?
Thanks, jj

Code:
Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
58       bytes for MbStrLen1
84       bytes for MbStrLen2
73       bytes for MbStrLen3
80       bytes for MbStrLen4a
71       bytes for MbStrLen4b
78       bytes for MbStrLen5

29      cycles for MbStrLen1
34      cycles for MbStrLen2
34      cycles for MbStrLen3
31      cycles for MbStrLen4a
35      cycles for MbStrLen4b
38      cycles for MbStrLen5

* StrLenSaveXmm.zip (5.26 KB - downloaded 334 times.)
Logged

ecube
Guest


Email
Re: StrLen timings needed
« Reply #1 on: August 15, 2010, 09:37:20 PM »

Code:
AMD Athlon(tm) 64 Processor 3000+ (SSE3)
58       bytes for MbStrLen1
84       bytes for MbStrLen2
73       bytes for MbStrLen3
80       bytes for MbStrLen4a
71       bytes for MbStrLen4b
78       bytes for MbStrLen5

47      cycles for MbStrLen1
48      cycles for MbStrLen2
51      cycles for MbStrLen3
47      cycles for MbStrLen4a
51      cycles for MbStrLen4b
55      cycles for MbStrLen5

47      cycles for MbStrLen1
54      cycles for MbStrLen2
54      cycles for MbStrLen3
52      cycles for MbStrLen4a
58      cycles for MbStrLen4b
53      cycles for MbStrLen5

47      cycles for MbStrLen1
48      cycles for MbStrLen2
52      cycles for MbStrLen3
47      cycles for MbStrLen4a
50      cycles for MbStrLen4b
54      cycles for MbStrLen5

48      cycles for MbStrLen1
54      cycles for MbStrLen2
54      cycles for MbStrLen3
52      cycles for MbStrLen4a
57      cycles for MbStrLen4b
53      cycles for MbStrLen5

48      cycles for MbStrLen1
48      cycles for MbStrLen2
50      cycles for MbStrLen3
47      cycles for MbStrLen4a
52      cycles for MbStrLen4b
55      cycles for MbStrLen5


--- ok ---
Logged
MichaelW
Global Moderator
Member
*****
Gender: Male
Posts: 5161


Re: StrLen timings needed
« Reply #2 on: August 15, 2010, 09:50:42 PM »

P3:
Code:
pre-P4 (SSE1)
58       bytes for MbStrLen1
84       bytes for MbStrLen2
73       bytes for MbStrLen3
80       bytes for MbStrLen4a
71       bytes for MbStrLen4b
78       bytes for MbStrLen5

45      cycles for MbStrLen1
51      cycles for MbStrLen2
46      cycles for MbStrLen3
51      cycles for MbStrLen4a
47      cycles for MbStrLen4b
59      cycles for MbStrLen5

45      cycles for MbStrLen1
62      cycles for MbStrLen2
46      cycles for MbStrLen3
50      cycles for MbStrLen4a
47      cycles for MbStrLen4b
56      cycles for MbStrLen5

46      cycles for MbStrLen1
51      cycles for MbStrLen2
46      cycles for MbStrLen3
50      cycles for MbStrLen4a
47      cycles for MbStrLen4b
56      cycles for MbStrLen5

45      cycles for MbStrLen1
51      cycles for MbStrLen2
46      cycles for MbStrLen3
51      cycles for MbStrLen4a
47      cycles for MbStrLen4b
55      cycles for MbStrLen5

45      cycles for MbStrLen1
51      cycles for MbStrLen2
46      cycles for MbStrLen3
50      cycles for MbStrLen4a
47      cycles for MbStrLen4b
55      cycles for MbStrLen5
Logged

eschew obfuscation
jj2007
Member
*****
Gender: Male
Posts: 6011



Re: StrLen timings needed
« Reply #3 on: August 15, 2010, 09:51:29 PM »

Thanks. For the curious: I am testing the Intel recommendation for movxxx xmm, mem:
Quote
Intel, generic optimization of memcpy(): movdqu is suitable for fetching byte-aligned groups of 16 bytes from memory, but not useful for storing them. The Barcelona architecture prefers movaps for stores.  movaps, movdqa, and movapd are functionally equivalent, with movaps having shorter encoding

Code:
if 1  ; 4a
movlps qword ptr [esp], xmm0
movhps qword ptr [esp+8], xmm0
else  ; 4b
movdqu [esp], xmm0
endif
...
if 1
movlps xmm0, qword ptr [esp]
movhps xmm0, qword ptr [esp+8]
else
movups xmm0, [esp]
endif

At least for the Celeron and E^cube's AMD, this seems not to be true: The partial lps/hps moves are faster.

(obviously the code does other things, too - the purpose is to efficiently preserve the xmm0 register in a bread-and-butter stringlen algo)
Logged

ecube
Guest


Email
Re: StrLen timings needed
« Reply #4 on: August 15, 2010, 09:54:53 PM »

Off topic but MichaelW my CPU is 10+ years old now I believe, so yours must be ancient, i'm just curious is that your main one? Also jj2007  i'm not sure what your plans are but feel free to take notes on optimization technique you discover  ThumbsUp while lot of stuff is floating around this board I know people enjoy a single place to read up on such things.
Logged
MichaelW
Global Moderator
Member
*****
Gender: Male
Posts: 5161


Re: StrLen timings needed
« Reply #5 on: August 15, 2010, 10:04:47 PM »

I build my P3 system in 98 or 99, and it's currently my primary system at home. It's still very reliable, but sooner or later...
Logged

eschew obfuscation
ecube
Guest


Email
Re: StrLen timings needed
« Reply #6 on: August 15, 2010, 10:07:30 PM »

I build my P3 system in 98 or 99, and it's currently my primary system at home. It's still very reliable, but sooner or later...


heh wow, what os? I can't imagine that thing being able to handle vista, is a resource pig. i'd be suprised if you said windows 2k, I myself wanted to stick with it but was forced to upgrade due to so much software being xp+ only.
Logged
KeepingRealBusy
Member
*****
Gender: Male
Posts: 395


Re: StrLen timings needed
« Reply #7 on: August 15, 2010, 10:13:05 PM »

JJ,

Here is my P4:

Code:
Intel(R) Pentium(R) 4 CPU 3.20GHz (SSE2)
58       bytes for MbStrLen1
84       bytes for MbStrLen2
73       bytes for MbStrLen3
80       bytes for MbStrLen4a
71       bytes for MbStrLen4b
78       bytes for MbStrLen5

38      cycles for MbStrLen1
41      cycles for MbStrLen2
46      cycles for MbStrLen3
37      cycles for MbStrLen4a
45      cycles for MbStrLen4b
41      cycles for MbStrLen5

35      cycles for MbStrLen1
40      cycles for MbStrLen2
41      cycles for MbStrLen3
37      cycles for MbStrLen4a
51      cycles for MbStrLen4b
40      cycles for MbStrLen5

34      cycles for MbStrLen1
45      cycles for MbStrLen2
49      cycles for MbStrLen3
36      cycles for MbStrLen4a
51      cycles for MbStrLen4b
40      cycles for MbStrLen5

33      cycles for MbStrLen1
39      cycles for MbStrLen2
40      cycles for MbStrLen3
39      cycles for MbStrLen4a
43      cycles for MbStrLen4b
47      cycles for MbStrLen5

34      cycles for MbStrLen1
40      cycles for MbStrLen2
65      cycles for MbStrLen3
37      cycles for MbStrLen4a
40      cycles for MbStrLen4b
40      cycles for MbStrLen5


--- ok ---
Logged
KeepingRealBusy
Member
*****
Gender: Male
Posts: 395


Re: StrLen timings needed
« Reply #8 on: August 15, 2010, 10:20:00 PM »

JJ,

Here are mu AMD timings:

Code:
AMD Athlon(tm) 64 X2 Dual Core Processor 4200+ (SSE3)
58       bytes for MbStrLen1
84       bytes for MbStrLen2
73       bytes for MbStrLen3
80       bytes for MbStrLen4a
71       bytes for MbStrLen4b
78       bytes for MbStrLen5

68      cycles for MbStrLen1
47      cycles for MbStrLen2
66      cycles for MbStrLen3
47      cycles for MbStrLen4a
58      cycles for MbStrLen4b
56      cycles for MbStrLen5

47      cycles for MbStrLen1
37      cycles for MbStrLen2
56      cycles for MbStrLen3
57      cycles for MbStrLen4a
43      cycles for MbStrLen4b
84      cycles for MbStrLen5

47      cycles for MbStrLen1
53      cycles for MbStrLen2
73      cycles for MbStrLen3
47      cycles for MbStrLen4a
52      cycles for MbStrLen4b
70      cycles for MbStrLen5

36      cycles for MbStrLen1
57      cycles for MbStrLen2
54      cycles for MbStrLen3
51      cycles for MbStrLen4a
58      cycles for MbStrLen4b
58      cycles for MbStrLen5

63      cycles for MbStrLen1
47      cycles for MbStrLen2
51      cycles for MbStrLen3
51      cycles for MbStrLen4a
56      cycles for MbStrLen4b
55      cycles for MbStrLen5


--- ok ---
Logged
Rockoon
Member
*****
Gender: Male
Posts: 612


Re: StrLen timings needed
« Reply #9 on: August 15, 2010, 10:28:06 PM »

AMD Phenom(tm) II X6 1055T Processo
58       bytes for MbStrLen1
84       bytes for MbStrLen2
73       bytes for MbStrLen3
80       bytes for MbStrLen4a
71       bytes for MbStrLen4b
78       bytes for MbStrLen5

31      cycles for MbStrLen1
34      cycles for MbStrLen2
33      cycles for MbStrLen3
35      cycles for MbStrLen4a
35      cycles for MbStrLen4b
40      cycles for MbStrLen5

31      cycles for MbStrLen1
34      cycles for MbStrLen2
36      cycles for MbStrLen3
35      cycles for MbStrLen4a
35      cycles for MbStrLen4b
40      cycles for MbStrLen5

31      cycles for MbStrLen1
34      cycles for MbStrLen2
33      cycles for MbStrLen3
35      cycles for MbStrLen4a
35      cycles for MbStrLen4b
39      cycles for MbStrLen5

31      cycles for MbStrLen1
37      cycles for MbStrLen2
33      cycles for MbStrLen3
35      cycles for MbStrLen4a
35      cycles for MbStrLen4b
39      cycles for MbStrLen5

31      cycles for MbStrLen1
34      cycles for MbStrLen2
33      cycles for MbStrLen3
35      cycles for MbStrLen4a
35      cycles for MbStrLen4b
40      cycles for MbStrLen5
Logged

When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.
hutch--
Administrator
Member
*****
Posts: 12013


Mnemonic Driven API Grinder


Re: StrLen timings needed
« Reply #10 on: August 16, 2010, 12:15:13 AM »

Code:
Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
58       bytes for MbStrLen1
84       bytes for MbStrLen2
73       bytes for MbStrLen3
80       bytes for MbStrLen4a
71       bytes for MbStrLen4b
78       bytes for MbStrLen5

16      cycles for MbStrLen1
21      cycles for MbStrLen2
23      cycles for MbStrLen3
23      cycles for MbStrLen4a
23      cycles for MbStrLen4b
23      cycles for MbStrLen5

17      cycles for MbStrLen1
26      cycles for MbStrLen2
29      cycles for MbStrLen3
23      cycles for MbStrLen4a
32      cycles for MbStrLen4b
24      cycles for MbStrLen5

16      cycles for MbStrLen1
23      cycles for MbStrLen2
23      cycles for MbStrLen3
23      cycles for MbStrLen4a
23      cycles for MbStrLen4b
23      cycles for MbStrLen5

17      cycles for MbStrLen1
26      cycles for MbStrLen2
29      cycles for MbStrLen3
23      cycles for MbStrLen4a
32      cycles for MbStrLen4b
24      cycles for MbStrLen5

16      cycles for MbStrLen1
21      cycles for MbStrLen2
23      cycles for MbStrLen3
23      cycles for MbStrLen4a
23      cycles for MbStrLen4b
23      cycles for MbStrLen5


--- ok ---
Logged

Regards,



Download site for MASM32
http://www.masm32.com
mineiro
Member
*****
Posts: 253



Re: StrLen timings needed
« Reply #11 on: August 16, 2010, 01:30:07 AM »

Intel(R) Pentium(R) Dual  CPU  E2160  @ 1.80GHz (SSE4)
58       bytes for MbStrLen1
84       bytes for MbStrLen2
73       bytes for MbStrLen3
80       bytes for MbStrLen4a
71       bytes for MbStrLen4b
78       bytes for MbStrLen5

16      cycles for MbStrLen1
20      cycles for MbStrLen2
23      cycles for MbStrLen3
23      cycles for MbStrLen4a
23      cycles for MbStrLen4b
23      cycles for MbStrLen5

16      cycles for MbStrLen1
25      cycles for MbStrLen2
26      cycles for MbStrLen3
23      cycles for MbStrLen4a
26      cycles for MbStrLen4b
23      cycles for MbStrLen5

16      cycles for MbStrLen1
20      cycles for MbStrLen2
23      cycles for MbStrLen3
23      cycles for MbStrLen4a
23      cycles for MbStrLen4b
23      cycles for MbStrLen5

16      cycles for MbStrLen1
25      cycles for MbStrLen2
26      cycles for MbStrLen3
23      cycles for MbStrLen4a
26      cycles for MbStrLen4b
23      cycles for MbStrLen5

16      cycles for MbStrLen1
20      cycles for MbStrLen2
23      cycles for MbStrLen3
23      cycles for MbStrLen4a
23      cycles for MbStrLen4b
23      cycles for MbStrLen5


--- ok ---
Logged
dancho
Member
****
Posts: 86


Re: StrLen timings needed
« Reply #12 on: August 16, 2010, 07:51:21 AM »

Code:
Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
58       bytes for MbStrLen1
84       bytes for MbStrLen2
73       bytes for MbStrLen3
80       bytes for MbStrLen4a
71       bytes for MbStrLen4b
78       bytes for MbStrLen5

16      cycles for MbStrLen1
20      cycles for MbStrLen2
23      cycles for MbStrLen3
23      cycles for MbStrLen4a
23      cycles for MbStrLen4b
23      cycles for MbStrLen5

16      cycles for MbStrLen1
25      cycles for MbStrLen2
26      cycles for MbStrLen3
23      cycles for MbStrLen4a
26      cycles for MbStrLen4b
23      cycles for MbStrLen5

16      cycles for MbStrLen1
20      cycles for MbStrLen2
23      cycles for MbStrLen3
23      cycles for MbStrLen4a
23      cycles for MbStrLen4b
23      cycles for MbStrLen5

16      cycles for MbStrLen1
26      cycles for MbStrLen2
26      cycles for MbStrLen3
23      cycles for MbStrLen4a
26      cycles for MbStrLen4b
23      cycles for MbStrLen5

16      cycles for MbStrLen1
20      cycles for MbStrLen2
23      cycles for MbStrLen3
23      cycles for MbStrLen4a
23      cycles for MbStrLen4b
23      cycles for MbStrLen5


--- ok ---
Logged
Vortex
Raider of the lost code
Member
*****
Gender: Male
Posts: 3460



Re: StrLen timings needed
« Reply #13 on: August 16, 2010, 07:58:43 AM »

Code:
Intel(R) Pentium(R) 4 CPU 3.20GHz (SSE3)
58       bytes for MbStrLen1
84       bytes for MbStrLen2
73       bytes for MbStrLen3
80       bytes for MbStrLen4a
71       bytes for MbStrLen4b
78       bytes for MbStrLen5

65      cycles for MbStrLen1
66      cycles for MbStrLen2
83      cycles for MbStrLen3
67      cycles for MbStrLen4a
66      cycles for MbStrLen4b
77      cycles for MbStrLen5

64      cycles for MbStrLen1
68      cycles for MbStrLen2
71      cycles for MbStrLen3
66      cycles for MbStrLen4a
66      cycles for MbStrLen4b
73      cycles for MbStrLen5

66      cycles for MbStrLen1
66      cycles for MbStrLen2
72      cycles for MbStrLen3
67      cycles for MbStrLen4a
66      cycles for MbStrLen4b
74      cycles for MbStrLen5

72      cycles for MbStrLen1
66      cycles for MbStrLen2
81      cycles for MbStrLen3
66      cycles for MbStrLen4a
74      cycles for MbStrLen4b
86      cycles for MbStrLen5

64      cycles for MbStrLen1
66      cycles for MbStrLen2
74      cycles for MbStrLen3
66      cycles for MbStrLen4a
66      cycles for MbStrLen4b
79      cycles for MbStrLen5
Logged

jj2007
Member
*****
Gender: Male
Posts: 6011



Re: StrLen timings needed
« Reply #14 on: August 16, 2010, 09:59:20 AM »

Thanks to all of you, that should be enough info ThumbsUp
Logged

Pages: [1] 2 3 ... 10 Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP The MASM Forum Archive 2004 to 2012 | Powered by SMF 1.0.12.
© 2001-2005, Lewis Media. All Rights Reserved.
Valid XHTML 1.0! Valid CSS!