News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Sorting a REAL8 array

Started by jj2007, April 27, 2012, 12:33:12 AM

Previous topic - Next topic

jj2007

Hi,

I am working on a new ArraySort() macro and stumbled over a problem. This is the standard CRT qsort invoke:
invoke crt_qsort, addr TiR8(0), MaxIndex+1, REAL8, qsc_proReal

... where TiR8(0) is a MasmBasic array, and qsc_proReal is the callback where the comparison happens:
qsc_proReal proc ; C arg1:DWORD,arg2:DWORD
mov eax, [esp+4] ; arg1
mov edx, [esp+8] ; arg2
if 1
ffree st(7) ; A
push eax
fld REAL8 PTR [eax]
fsub REAL8 PTR [edx]
fmul FP8(1000.0) ; works but not a "durable" solution
fistp stack
pop eax
elseif 1
ffree st(7) ; B - fails miserably
ffree st(7)
fldz
fld REAL8 PTR [eax]
fsub REAL8 PTR [edx]
xor eax, eax
fcomip ST, ST(1) ; should work but doesn't ;-)
sbb eax, -1
fstp st
else
movlps xmm0, REAL8 PTR [eax] ; C - fails miserably
xor eax, eax
comisd xmm0,  REAL8 PTR [edx] ; sets carry
sbb eax, -1
endif
retn
qsc_proReal endp


Both the ArraySort and the first CRT versions work fine, but the two lower options fail to perform, or at least, partly fail...
130 ms for ArraySort+, Dword
141 ms for nrQsortA, Dword

288 ms for ArraySort+, Real8
0       -2222.221
1       -2222.217
2       -2222.209
999998  2222.212
999999  2222.217
1000000 2222.221

614 ms for CRT qsort,  Real8
0       -2222.220
1       -2222.209
2       -2222.209
999998  2222.212
999999  2222.212
1000000 2222.218

52 ms for CRT qsort,  Real8
0       -1760.732
1       -1847.541
2       -1968.239
999998  2222.212
999999  2222.212
1000000 2222.218

27 ms for CRT qsortB, Real8
0       -2076.107
1       76.35786
2       377.8849
999998  2222.213
999999  2222.215
1000000 2222.222


The DWORD sorts are faster, of course (the new MB sort is even a tick faster than the Masm32 QuickSort), but the REAL8 sort with the C callback works only for the "lower half", apparently. It probably has to do with the way comisd sets the carry flag, but I am too tired to see the error...

The low timings seem to indicate that qsort ends prematurely, but I haven't seen any problems with the stack etc.

Any ideas appreciated :bg

Full source is attached, together wirh a night build of the library. Timings are for one Million elements, with
RealRangeLo REAL8 -2222.2222
RealRangeHi REAL8 2222.2222
DwRange= 200000

hutch--

JJ,

I would be inclined to set up an FCOM FCOMI test piece and make sure you are getting the comparisons to work.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

#2
I got it!
AMD Athlon(tm) Dual Core Processor 4450B (SSE3)
Sorting 5000000 items:
625 ms for ArraySort+, Dword
803 ms for nrQsortA, Dword

630 ms for ArraySort+, Dword
802 ms for nrQsortA, Dword


1195 ms for ArraySort+, Real8
0       -2222.221
1       -2222.221
4999999 2222.219
5000000 2222.221

2175 ms for CRT qsort Comisd, Real8
0       -2222.219
1       -2222.218
4999999 2222.221
5000000 2222.222

2259 ms for CRT qsort Fcomip, Real8
0       -2222.221
1       -2222.220
4999999 2222.220
5000000 2222.221


Slow but it works now. Search attached *.asm for qsc_proRealFcomip proc to see the solution :bg

P.S.: To avoid trouble with conflicting versions, I removed the library night build. See http://www.masm32.com/board/index.php?topic=12460 for the current version.

hutch--



Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
Sorting 5000000 items:
473 ms for ArraySort+, Dword
484 ms for nrQsortA, Dword

469 ms for ArraySort+, Dword
486 ms for nrQsortA, Dword


789 ms for ArraySort+, Real8
0       -2222.221
1       -2222.221
4999999 2222.219
5000000 2222.221

1446 ms for CRT qsort Comisd, Real8
0       -2222.219
1       -2222.218
4999999 2222.221
5000000 2222.222

1464 ms for CRT qsort Fcomip, Real8
0       -2222.221
1       -2222.220
4999999 2222.220
5000000 2222.221

ok
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

dedndave

prescott w/htt:
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)
Sorting 5000000 items:
908 ms for ArraySort+, Dword
1022 ms for nrQsortA, Dword

914 ms for ArraySort+, Dword
1026 ms for nrQsortA, Dword


1541 ms for ArraySort+, Real8
0       -2222.221
1       -2222.221
4999999 2222.219
5000000 2222.221

3366 ms for CRT qsort Comisd, Real8
0       -2222.219
1       -2222.218
4999999 2222.221
5000000 2222.222

3485 ms for CRT qsort Fcomip, Real8
0       -2222.221
1       -2222.220
4999999 2222.220
5000000 2222.221

jj2007

Thanks a lot, Hutch and Dave :thumbu
It seems my algo is Prescott-friendly :bg

Re design, I tried also to use sbb & adc but it's actually 5% slower than the branches. Below are the FPU and SSE2 algos. The MasmBasic algo is integer-based with some SSE2 but too long to be posted here ::)

qsc_proRealComisd proc ; C arg1:DWORD,arg2:DWORD
mov eax, [esp+4] ; arg1
mov edx, [esp+8] ; arg2
movlps xmm0, REAL8 PTR [eax]
xor eax, eax
comisd xmm0,  REAL8 PTR [edx] ; sets carry
je @F
.if Carry?
dec eax
.else
inc eax
.endif
@@:
retn
qsc_proRealComisd endp

qsc_proRealFcomip proc ; C arg1:DWORD,arg2:DWORD
mov eax, [esp+4] ; arg1
mov edx, [esp+8] ; arg2
ffree st(7) ; should be done once, elsewhere
ffree st(7) ; although it seems not to slow down the algo
fldz
fld REAL8 PTR [eax]
fsub REAL8 PTR [edx]
xor eax, eax
fcomip ST, ST(1) ; works!
if 0 ; first branch is 5% slower
je bye
sbb eax, 0 ; we tested equality already, so if it is not signed after the sbb, ...
jns @F
bye: fstp st
retn
@@: inc eax ; ... then it must be positive, right?
fstp st
retn
else
je @F
.if Carry?
dec eax
.else
inc eax
.endif
@@:
endif
fstp st
retn
qsc_proRealFcomip endp


Before Lingo shows up, here a version that is 9% faster, 1002 ms instead of 1093:
qsc_proRealComisd proc ; C arg1:DWORD,arg2:DWORD
pop ecx ; ret address
pop eax ; arg1
pop edx ; arg2
movlps xmm0, REAL8 PTR [eax]
xor eax, eax
comisd xmm0,  REAL8 PTR [edx] ; sets carry and zero flags
je @F
.if Carry?
dec eax
.else
inc eax
.endif
@@:
sub esp, 8 ; pretty stupid but C is waiting behind the door ;-)
jmp ecx
qsc_proRealComisd endp