Industrial strength text sorting application

Started by hutch--, February 02, 2008, 02:11:19 AM

Previous topic - Next topic

hutch--

PowerBASIC has a very flexible sort that can be used for many tasks and in non-critical tasks it works fine but for big sorting tasks it is too far off the pace to compete with programming languages like Microsoft C and MASM. Fortunately PB has a reasonable built in assembler and with a little massaging the mixed C/ASM code can be ported to PB and run at very competitive speeds.

PB is cursed in not having a true low level procedure available so that if you try and use either a SUB or FUNCTION you get a pile of useless basic overhead that is not needed in low level code so the ported code has to be put inside a basic FUNCTION and its starting label must be obtained using the CodePtr() operator. To call the sort a macro was used to encapsulate the direct assembler push/call notation which makes the sort more or less like a high level function to use but in benchmark terms on highly random input data it is over 6 times faster.

The application as been benchmarked against an old PB example PBSORT which has been slightly modified to add timing and the sort was charged to ASCEND as the COLLATE UCASE was about 400 ms slower. By my reckoning the highest time for the application is 485 ms versus over 3.1 seconds for the PBSORT which makes it over 6 times faster.

To save space in this zip file the test sort data is created by the batch file runme.bat which in turn calls an executable caled RANWORDS.EXE.

Here are the test results.


This is a slow operation, it creates 1 million random words.
1000000 lines sorted in 485 ms written to file rslt.txt
---------------------------------------------
Load string array = 984 ms
PB Array Sort     = 3141 ms
String Output     = 7828 ms
---------------------------------------------
1000000 lines sorted in 469 ms written to file rslt.txt
---------------------------------------------
Load string array = 984 ms
PB Array Sort     = 3141 ms
String Output     = 7828 ms
---------------------------------------------
1000000 lines sorted in 485 ms written to file rslt.txt
---------------------------------------------
Load string array = 984 ms
PB Array Sort     = 3125 ms
String Output     = 8172 ms
---------------------------------------------
1000000 lines sorted in 469 ms written to file rslt.txt
---------------------------------------------
Load string array = 969 ms
PB Array Sort     = 3140 ms
String Output     = 7875 ms
---------------------------------------------
Press any key to continue . . .

[attachment deleted by admin]
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Mark Jones

Hmm interesting timing differences when run on my box:

Quote from: AMD Athlon x2 dual-core x64 4000+ (Win XP x32)
This is a slow operation, it creates 1 million random words.
1000000 lines sorted in 1047 ms written to file rslt.txt
---------------------------------------------
Load string array = 812 ms
PB Array Sort     = 2766 ms
String Output     = 11031 ms
---------------------------------------------
1000000 lines sorted in 1000 ms written to file rslt.txt
---------------------------------------------
Load string array = 812 ms
PB Array Sort     = 2688 ms
String Output     = 11078 ms
---------------------------------------------
1000000 lines sorted in 1063 ms written to file rslt.txt
---------------------------------------------
Load string array = 812 ms
PB Array Sort     = 2813 ms
String Output     = 11078 ms
---------------------------------------------
1000000 lines sorted in 1000 ms written to file rslt.txt
---------------------------------------------
Load string array = 828 ms
PB Array Sort     = 2828 ms
String Output     = 11109 ms
---------------------------------------------
Press any key to continue . . .
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

jj2007

Just for fun, I sorted the million words file with the misused listbox described here. Wow... the LB_ADDSTRING loop took almost exactly half an hour :bg

Loading the array was done in 47 ms, storing took around 200, but adding the one million strings to the listbox took roughly 1,800,000 ms...

oex

We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv