News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

some ideas to speed up bubblesort

Started by ecube, August 30, 2010, 02:17:47 PM

Previous topic - Next topic

ecube

I want to order strings that consist of a-z and 0-9 faster. The only idea i've had to speed up bubblesort, is once the list has been sorted once, keep an index where of where the last entry of each group is, for instance where the last A string ends, last B string, etc... so essentially can skip entire groups of entries and just bubble sort in cat into the C group.

anymore ideas are welcome.

so if the string is cat I can jump to entry 11 and bubble sort up to 8

ab
ac
ad
ae  <--last a entry is 4
bb
bc
bd
be <-last b entry is 8
ca
cb
cc <-last c entry is 11

Gunther

Quote from: E^cube Today at 03:17:47 PMThe only idea i've had to speed up bubblesort,

If it must be bubble sort, you could try bidirectional bubble sort: http://www.itl.nist.gov/div897/sqg/dads/HTML/bidirectionalBubbleSort.html.

Gunther
Forgive your enemies, but never forget their names.

ecube

I'll check it out, but it doesn't "need" to be a bubble sort just somethin to quickly alphabetically sort entries

hutch--

Cube,

have a look at an insertion sort for near ordered data. I found that it is faster than either a bubble sort or a double ended bubblesort (cocktail shaker sort).

How big is the word list ?
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Gunther

Quote from: E^cube Today at 04:49:27 PMbut it doesn't "need" to be a bubble sort just somethin to quickly alphabetically sort entries

In that case, there are a lot of other sorting algorithms, which outperform bubble sort: shell sort, merge sort, heap sort, and quick sort, to name a few. These are very well described in: Donald Knuth: The Art of Computer Programming, Volume 3: Sorting and Searching. Addison Wesley, Reading (MA) 1997, ISBN 0-201-89685-0. That three volumes are the "algorithm bible". Knuth uses a kind of a pseudo assembly language notation to illustrate the different techniques. It's worth reading. Another starting point could be: Robert Sedgewick: Algorithms in C++, Parts 1-4: Fundamentals, Data Structure, Sorting, Searching: Fundamentals, Data Structures, Sorting, Searching Pts. 1-4, Second Edition. Addison-Wesley Longman, 1998. ISBN 0201350882. Sedgewick is one of Knuth's scholars.

I've forgotten insertion sort, but hutch-- did remember. Thank you.

Gunther
Forgive your enemies, but never forget their names.

ecube

Quote from: hutch-- on August 30, 2010, 05:00:13 PM
Cube,

have a look at an insertion sort for near ordered data. I found that it is faster than either a bubble sort or a double ended bubblesort (cocktail shaker sort).

How big is the word list ?

could be thousands but the thing the data is gonna constantly come in, so isn't just sort 1 time and finished.

FORTRANS

Hi,

   If they come in one at a time, an insertion sort is the
way to go.  You just use the old sorted data from the
previous run and insert the new data.  A Shell sort is a
variation on an insertion sort with better performance
when you need to sort a moderate amount of unordered
data..

   A heapsort is fairly easy to program.  My first heapsort
had a couple of problems and it still sorted correctly.  I
posted some graphics programs that showed some sorts
in action.

Regards,

Steve N.

lingo

Fastest but not easy way...link :U

"In the attached example I read all names of files and folders from local disk c:
and use red-black trees to sort them when the user presses the column's header.
Please, pay attention of the number of the files and folders and the sorting speed."

ecube

according to http://en.wikipedia.org/wiki/Insertion_sort it's horrible on large arrays, and if I understand it correctly rightly so. Lingo that looks interesting but seems like not much is known on it and as you said isn't easy to implement. I think i'm gonna play around with quicksort to see how slow/fast it is.

redskull

If you aren't sure how many there are, and they come in one at a time, I would vote for B-tree.

-r
Strange women, lying in ponds, distributing swords, is no basis for a system of government

ecube

Quote from: redskull on August 30, 2010, 07:48:17 PM
If you aren't sure how many there are, and they come in one at a time, I would vote for B-tree.

-r

yeah the point of keeping them sorted is to be able to quickly search with a binary search tree. Thanks to larry I got great code to do that, not sure how to apply it to sorting really, but anyway I just tested a millio random string entries using donkeys array code+quicksort and it only took about a second or two so is faster than I thought. maybe i'll make a new array for new entries that come in 1 at a time, and just sort each time. But I really would like to not have to sort the entire array everytime to find where the entry lies, that's extremely inefficent.

also the sorting of a million random strings(in terms of upper/lower case and len being 10-70) is it made my cpu climb to 60% which isn't good.

FORTRANS

Quote from: E^cube on August 30, 2010, 07:45:07 PM
according to http://en.wikipedia.org/wiki/Insertion_sort it's horrible on large arrays, and if I understand it correctly rightly so.

Hi,

   Note that it is bad only with a large amount of unsorted data.
If the data is (mostly) sorted, it can be very good.  Hand sorted
data to a naive/simple quicksort and the insertion sort will beat it
hands down.  (Not that there are that many naive quicksorts
still running around outside of a classroom.)  So if your data set
needs to have small additions to be added to an already sorted
set, an insertion sort may be all you really need.

   As a general purpose sort it is horrible, except for small data
sets.

Regards,

Steve N.

mineiro

Hello Sr, I'm novice and don't have much to say, I remember see one sort test in my pc, and after I find it in net.
http://www.codingcrew.de/marty/win32asm.php
regards.

Rockoon

A hybrid insertion-merge sort is fairly efficient for sorting things in one-at-a-time.

The hybrid maintains two lists, the primary list and the temporary list. Pick a size (S) that represents your largest allowable temporary list. Just for argument, lets let S=64

As items come in, the algorithm sorts them into this temporary list via standard insertion. When the size of the temporary list reaches S (64), this list is merged into the main list.

Lookups are accomplished by binary-searching both lists, so is O(log N).
If a single list is needed for whatever reason at arbitrary times, the O(N) merge can be done, and is then no worse than a non-hybridized insertion sort.

Its very cache friendly, and if the items are mostly sorted to begin with, is also branch-prediction friendly.

This of course assumes that you really do need one-at-a-time sorting (there is a large class of problems that do require that: sort, many searches, add an item, repeat)
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

hutch--

Cube,

The data layout is a major consideration with repeat sorts. With highly random data a quick sort eats the rest alive and the fastest I have seen is a hybrid radix/quick sort but when you mention a near ordered list the rules change. A buble sort performs better as the list is near sorted but from Practice with the same near sorted list an insertion sort is simply faster.

Its common to use either a bubble sort or an insertion sort behind a quick sort to limit recursion depth and in this context an insertion sort is also faster.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php