The MASM Forum Archive 2004 to 2012
Welcome, Guest. Please login or register.
September 30, 2022, 12:26:39 PM

Login with username, password and session length
Search:     Advanced search
128553 Posts in 15254 Topics by 684 Members
Latest Member: mottt
* Home Help Search Login Register
+  The MASM Forum Archive 2004 to 2012
|-+  General Forums
| |-+  The Laboratory (Moderator: Mark_Larson)
| | |-+  Sorting a REAL8 array
« previous next »
Pages: [1] Print
Author Topic: Sorting a REAL8 array  (Read 8935 times)
jj2007
Member
*****
Gender: Male
Posts: 6011



Sorting a REAL8 array
« on: April 27, 2012, 12:33:12 AM »

Hi,

I am working on a new ArraySort() macro and stumbled over a problem. This is the standard CRT qsort invoke:
Code:
invoke crt_qsort, addr TiR8(0), MaxIndex+1, REAL8, qsc_proReal

... where TiR8(0) is a MasmBasic array, and qsc_proReal is the callback where the comparison happens:
Code:
qsc_proReal proc ; C arg1:DWORD,arg2:DWORD
mov eax, [esp+4] ; arg1
mov edx, [esp+8] ; arg2
if 1
ffree st(7) ; A
push eax
fld REAL8 PTR [eax]
fsub REAL8 PTR [edx]
fmul FP8(1000.0) ; works but not a "durable" solution
fistp stack
pop eax
elseif 1
ffree st(7) ; B - fails miserably
ffree st(7)
fldz
fld REAL8 PTR [eax]
fsub REAL8 PTR [edx]
xor eax, eax
fcomip ST, ST(1) ; should work but doesn't ;-)
sbb eax, -1
fstp st
else
movlps xmm0, REAL8 PTR [eax] ; C - fails miserably
xor eax, eax
comisd xmm0,  REAL8 PTR [edx] ; sets carry
sbb eax, -1
endif
retn
qsc_proReal endp

Both the ArraySort and the first CRT versions work fine, but the two lower options fail to perform, or at least, partly fail...
Code:
130 ms for ArraySort+, Dword
141 ms for nrQsortA, Dword

288 ms for ArraySort+, Real8
0       -2222.221
1       -2222.217
2       -2222.209
999998  2222.212
999999  2222.217
1000000 2222.221

614 ms for CRT qsort,  Real8
0       -2222.220
1       -2222.209
2       -2222.209
999998  2222.212
999999  2222.212
1000000 2222.218

52 ms for CRT qsort,  Real8
0       -1760.732
1       -1847.541
2       -1968.239
999998  2222.212
999999  2222.212
1000000 2222.218

27 ms for CRT qsortB, Real8
0       -2076.107
1       76.35786
2       377.8849
999998  2222.213
999999  2222.215
1000000 2222.222

The DWORD sorts are faster, of course (the new MB sort is even a tick faster than the Masm32 QuickSort), but the REAL8 sort with the C callback works only for the "lower half", apparently. It probably has to do with the way comisd sets the carry flag, but I am too tired to see the error...

The low timings seem to indicate that qsort ends prematurely, but I haven't seen any problems with the stack etc.

Any ideas appreciated BigGrin

Full source is attached, together wirh a night build of the library. Timings are for one Million elements, with
Code:
RealRangeLo REAL8 -2222.2222
RealRangeHi REAL8 2222.2222
DwRange= 200000

* ArraySortTimingsNoLib.zip (16.66 KB - downloaded 443 times.)
« Last Edit: May 01, 2012, 09:35:02 PM by jj2007 » Logged

hutch--
Administrator
Member
*****
Posts: 12013


Mnemonic Driven API Grinder


Re: Sorting a REAL8 array
« Reply #1 on: April 27, 2012, 02:47:35 AM »

JJ,

I would be inclined to set up an FCOM FCOMI test piece and make sure you are getting the comparisons to work.
Logged

Regards,



Download site for MASM32
http://www.masm32.com
jj2007
Member
*****
Gender: Male
Posts: 6011



Re: Sorting a REAL8 array
« Reply #2 on: April 27, 2012, 05:46:19 AM »

I got it!
Code:
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
Sorting 5000000 items:
625 ms for ArraySort+, Dword
803 ms for nrQsortA, Dword

630 ms for ArraySort+, Dword
802 ms for nrQsortA, Dword


1195 ms for ArraySort+, Real8
0       -2222.221
1       -2222.221
4999999 2222.219
5000000 2222.221

2175 ms for CRT qsort Comisd, Real8
0       -2222.219
1       -2222.218
4999999 2222.221
5000000 2222.222

2259 ms for CRT qsort Fcomip, Real8
0       -2222.221
1       -2222.220
4999999 2222.220
5000000 2222.221

Slow but it works now. Search attached *.asm for qsc_proRealFcomip proc to see the solution BigGrin

P.S.: To avoid trouble with conflicting versions, I removed the library night build. See http://www.masm32.com/board/index.php?topic=12460 for the current version.

* ArraySortTimings27AprilNoLib.zip (17.19 KB - downloaded 428 times.)
« Last Edit: May 01, 2012, 09:39:29 PM by jj2007 » Logged

hutch--
Administrator
Member
*****
Posts: 12013


Mnemonic Driven API Grinder


Re: Sorting a REAL8 array
« Reply #3 on: April 29, 2012, 01:36:21 PM »

Code:

Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
Sorting 5000000 items:
473 ms for ArraySort+, Dword
484 ms for nrQsortA, Dword

469 ms for ArraySort+, Dword
486 ms for nrQsortA, Dword


789 ms for ArraySort+, Real8
0       -2222.221
1       -2222.221
4999999 2222.219
5000000 2222.221

1446 ms for CRT qsort Comisd, Real8
0       -2222.219
1       -2222.218
4999999 2222.221
5000000 2222.222

1464 ms for CRT qsort Fcomip, Real8
0       -2222.221
1       -2222.220
4999999 2222.220
5000000 2222.221

ok
Logged

Regards,



Download site for MASM32
http://www.masm32.com
dedndave
Member
*****
Posts: 12523


Re: Sorting a REAL8 array
« Reply #4 on: April 29, 2012, 06:35:21 PM »

prescott w/htt:
Code:
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Sorting 5000000 items:
908 ms for ArraySort+, Dword
1022 ms for nrQsortA, Dword

914 ms for ArraySort+, Dword
1026 ms for nrQsortA, Dword


1541 ms for ArraySort+, Real8
0       -2222.221
1       -2222.221
4999999 2222.219
5000000 2222.221

3366 ms for CRT qsort Comisd, Real8
0       -2222.219
1       -2222.218
4999999 2222.221
5000000 2222.222

3485 ms for CRT qsort Fcomip, Real8
0       -2222.221
1       -2222.220
4999999 2222.220
5000000 2222.221
Logged
jj2007
Member
*****
Gender: Male
Posts: 6011



Re: Sorting a REAL8 array
« Reply #5 on: April 29, 2012, 08:36:37 PM »

Thanks a lot, Hutch and Dave ThumbsUp
It seems my algo is Prescott-friendly BigGrin

Re design, I tried also to use sbb & adc but it's actually 5% slower than the branches. Below are the FPU and SSE2 algos. The MasmBasic algo is integer-based with some SSE2 but too long to be posted here Roll Eyes

Code:
qsc_proRealComisd proc ; C arg1:DWORD,arg2:DWORD
mov eax, [esp+4] ; arg1
mov edx, [esp+8] ; arg2
movlps xmm0, REAL8 PTR [eax]
xor eax, eax
comisd xmm0,  REAL8 PTR [edx] ; sets carry
je @F
.if Carry?
dec eax
.else
inc eax
.endif
@@:
retn
qsc_proRealComisd endp

qsc_proRealFcomip proc ; C arg1:DWORD,arg2:DWORD
mov eax, [esp+4] ; arg1
mov edx, [esp+8] ; arg2
ffree st(7) ; should be done once, elsewhere
ffree st(7) ; although it seems not to slow down the algo
fldz
fld REAL8 PTR [eax]
fsub REAL8 PTR [edx]
xor eax, eax
fcomip ST, ST(1) ; works!
if 0 ; first branch is 5% slower
je bye
sbb eax, 0 ; we tested equality already, so if it is not signed after the sbb, ...
jns @F
bye: fstp st
retn
@@: inc eax ; ... then it must be positive, right?
fstp st
retn
else
je @F
.if Carry?
dec eax
.else
inc eax
.endif
@@:
endif
fstp st
retn
qsc_proRealFcomip endp

Before Lingo shows up, here a version that is 9% faster, 1002 ms instead of 1093:
Code:
qsc_proRealComisd proc ; C arg1:DWORD,arg2:DWORD
pop ecx ; ret address
pop eax ; arg1
pop edx ; arg2
movlps xmm0, REAL8 PTR [eax]
xor eax, eax
comisd xmm0,  REAL8 PTR [edx] ; sets carry and zero flags
je @F
.if Carry?
dec eax
.else
inc eax
.endif
@@:
sub esp, 8 ; pretty stupid but C is waiting behind the door ;-)
jmp ecx
qsc_proRealComisd endp
Logged

Pages: [1] Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP The MASM Forum Archive 2004 to 2012 | Powered by SMF 1.0.12.
© 2001-2005, Lewis Media. All Rights Reserved.
Valid XHTML 1.0! Valid CSS!