News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Suggestions and improvements for SSE2 code are welcome

Started by Gunther, August 26, 2010, 05:20:06 PM

Previous topic - Next topic

dioxin

AMD Phenom(tm) II X4 945 Processor (SSE3)
2214    cycles for DotXMM1Acc4E
2153    cycles for DotXMM1Acc4EJ1
2164    cycles for DotXMM1Acc4EJ2
913     cycles for AxDotXMM1
1211    cycles for DotXMM2Acc16ELingo
1194    cycles for DotXMM2Acc32ELingo
783     cycles for DotXMM2Acc16EPaul

2195    cycles for DotXMM1Acc4E
2108    cycles for DotXMM1Acc4EJ1
2177    cycles for DotXMM1Acc4EJ2
914     cycles for AxDotXMM1
1209    cycles for DotXMM2Acc16ELingo
1196    cycles for DotXMM2Acc32ELingo
815     cycles for DotXMM2Acc16EPaul

2200    cycles for DotXMM1Acc4E
2159    cycles for DotXMM1Acc4EJ1
2154    cycles for DotXMM1Acc4EJ2
922     cycles for AxDotXMM1
1197    cycles for DotXMM2Acc16ELingo
1189    cycles for DotXMM2Acc32ELingo
805     cycles for DotXMM2Acc16EPaul


The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
--- done ---

redskull

Intel(R) Core(TM)2 Duo CPU     E4500  @ 2.20GHz (SSE4)
3080    cycles for DotXMM1Acc4E
2867    cycles for DotXMM1Acc4EJ1
2874    cycles for DotXMM1Acc4EJ2
1930    cycles for AxDotXMM1
1925    cycles for DotXMM2Acc16ELingo
1914    cycles for DotXMM2Acc32ELingo
1363    cycles for DotXMM2Acc16EPaul

1575    cycles for DotXMM1Acc4E
1557    cycles for DotXMM1Acc4EJ1
1569    cycles for DotXMM1Acc4EJ2
1055    cycles for AxDotXMM1
1057    cycles for DotXMM2Acc16ELingo
1049    cycles for DotXMM2Acc32ELingo
1063    cycles for DotXMM2Acc16EPaul

1583    cycles for DotXMM1Acc4E
1560    cycles for DotXMM1Acc4EJ1
1556    cycles for DotXMM1Acc4EJ2
1062    cycles for AxDotXMM1
1054    cycles for DotXMM2Acc16ELingo
1038    cycles for DotXMM2Acc32ELingo
1055    cycles for DotXMM2Acc16EPaul


The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
--- done ---


-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

frktons


Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
1603    cycles for DotXMM1Acc4E
1628    cycles for DotXMM1Acc4EJ1
1588    cycles for DotXMM1Acc4EJ2
1077    cycles for AxDotXMM1
1083    cycles for DotXMM2Acc16ELingo
1059    cycles for DotXMM2Acc32ELingo
1076    cycles for DotXMM2Acc16EPaul

1599    cycles for DotXMM1Acc4E
1593    cycles for DotXMM1Acc4EJ1
1592    cycles for DotXMM1Acc4EJ2
1072    cycles for AxDotXMM1
1071    cycles for DotXMM2Acc16ELingo
1063    cycles for DotXMM2Acc32ELingo
1083    cycles for DotXMM2Acc16EPaul

1598    cycles for DotXMM1Acc4E
1589    cycles for DotXMM1Acc4EJ1
1558    cycles for DotXMM1Acc4EJ2
1077    cycles for AxDotXMM1
1071    cycles for DotXMM2Acc16ELingo
1060    cycles for DotXMM2Acc32ELingo
1054    cycles for DotXMM2Acc16EPaul


The result: 1328212656
The result: 1328212656
The result: 1328212656
The result: 1328212656
The result: 1328212656
The result: 1328212656
The result: 1328212656
--- done ---
Mind is like a parachute. You know what to do in order to use it :-)

redskull

Quote from: frktons on September 29, 2010, 12:55:17 AM

Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz (SSE4)
1603    cycles for DotXMM1Acc4E


This one is twice as fast on a CPU that is almost exactly the same as mine; the only difference being twice as much cache and fsb speed (4 vs 2, 1066 vs 800).  Can cache really have that much of an effect on such small, isolated code?  Yikes.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

Antariy

Quote from: redskull on September 29, 2010, 01:59:28 AM
This one is twice as fast on a CPU that is almost exactly the same as mine; the only difference being twice as much cache and fsb speed (4 vs 2, 1066 vs 800).  Can cache really have that much of an effect on such small, isolated code?  Yikes.

Yes, because code tests very small piece of data, the cache parameters have drastically effect - if data very small with comparsion of cache size - it is in cache. If cache is bigger - much bigger piece of data/code can be putted into it.
If run test as Hutch is suggest - then cache size would have less meaning, but system bus speed would be.

Which is mean "Yikes" word? I don't know what this is - this is slang? Can I know its sense? My English is not very good :)



Alex

oex

"Yikes"
"Informal an expression of surprise, fear, or alarm"
http://www.thefreedictionary.com/yikes

Used in popular culture such as Scooby Doo :bg
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

Antariy

Quote from: oex on September 29, 2010, 11:05:29 PM
"Yikes"
"Informal an expression of surprise, fear, or alarm"
http://www.thefreedictionary.com/yikes

Used in popular culture such as Scooby Doo :bg

Thanks - for link and explanation!  :bg



Alex

Gunther

Alex,

here are the timings from my machine.


AMD Athlon(tm) 64 X2 Dual-Core Processor TK-57 (SSE3)
2297 cycles for DotXMM1Acc4E
2277 cycles for DotXMM1Acc4EJ1
2240 cycles for DotXMM1Acc4EJ2
1502 cycles for AxDotXMM1
1425 cycles for DotXMM2Acc16ELingo
1362 cycles for DotXMM2Acc32ELingo
1633 cycles for DotXMM2Acc16EPaul

2289 cycles for DotXMM1Acc4E
2277 cycles for DotXMM1Acc4EJ1
2240 cycles for DotXMM1Acc4EJ2
1499 cycles for AxDotXMM1
1437 cycles for DotXMM2Acc16ELingo
1360 cycles for DotXMM2Acc32ELingo
1637 cycles for DotXMM2Acc16EPaul

2288 cycles for DotXMM1Acc4E
2277 cycles for DotXMM1Acc4EJ1
2242 cycles for DotXMM1Acc4EJ2
1507 cycles for AxDotXMM1
1423 cycles for DotXMM2Acc16ELingo
1359 cycles for DotXMM2Acc32ELingo
1640 cycles for DotXMM2Acc16EPaul


The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
The result: 2867507200
--- done ---


Gunther
Forgive your enemies, but never forget their names.

jj2007

Good ol' P4:
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
2930    cycles for DotXMM1Acc4E
2741    cycles for DotXMM1Acc4EJ1
2902    cycles for DotXMM1Acc4EJ2
1788    cycles for AxDotXMM1
2180    cycles for DotXMM2Acc16ELingo
2035    cycles for DotXMM2Acc32ELingo
1806    cycles for DotXMM2Acc16EPaul

2861    cycles for DotXMM1Acc4E
2715    cycles for DotXMM1Acc4EJ1
2756    cycles for DotXMM1Acc4EJ2
3149    cycles for AxDotXMM1
2150    cycles for DotXMM2Acc16ELingo
2024    cycles for DotXMM2Acc32ELingo
1826    cycles for DotXMM2Acc16EPaul

3286    cycles for DotXMM1Acc4E
2934    cycles for DotXMM1Acc4EJ1
2962    cycles for DotXMM1Acc4EJ2
2020    cycles for AxDotXMM1
2156    cycles for DotXMM2Acc16ELingo
1968    cycles for DotXMM2Acc32ELingo
1762    cycles for DotXMM2Acc16EPaul