News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Multiprecision square root

Started by bruce1948, August 17, 2009, 12:21:03 AM

Previous topic - Next topic

Rockoon

Quote from: Eddy on August 18, 2009, 09:08:18 AM
Bruce,

(Referring to your first post)
Newtons method is an iterative method that always converges for floating point numbers, but for integer numbers, caution is necessary.
You must start with an 'initial value' that is guaranteed smaller than the root.
The following new calculated value will be LARGER than the root and will decrease with each iteration.
Iteration must stop when the new value starts to INCREASE (becomes larger than the previous value).
For maximum speed, the initial value is very important. The closer to the root, the faster.
Worst case, a non-optimal intial value could cause the algorithm to 'oscillate' and never converge to a solution, atleast for integers.

Kind regards

Yes and that leads to a good first approximation by simply taking half as many bits as the input value. If the input value is a 100 bit number, than the sqrt is a 50 bit number.
When C++ compilers can be coerced to emit rcl and rcr, I *might* consider using one.

Eddy

Quote from: Rockoon on August 18, 2009, 10:06:25 AM
... a good first approximation by simply taking half as many bits as the input value. If the input value is a 100 bit number, than the sqrt is a 50 bit number.
Correct!
And in your first approximation (50 bits in your example), if you simply set the most significant bit to one (or more accurate: the 50th bit !) and zeroes for all other bits, this will make sure that the first approximation is always smaller or equal than the exact root.
Eddy
www.devotechs.com -- HIME : Huge Integer Math and Encryption library--

dedndave

#17
you can take the upper bits of a bignum and use the fpu to get very close
if you have a 512-bit value, use the upper 64 bits to create an extended real (removing the low-order 448 bits)
use 3FFFh for the sign and exponent
take the squareroot and pad 224 low order bits (0's) to the result (replacing the 448 bits you removed)

of course, if bit 511 is a zero, you must left-justify the value into the 64 bit mantissa
if you remove an odd number of bits from the bottom, bump the exponent/sign to 4000h
then pad the result with half of one less than as many zeros
you get the idea
the basis is that the squareroot of 4 is 2, of 16 is 4, of 64 is 8, and so on

in writing such a library, i would think you could use the fpu to find many such "partials"
certainly, it could easily be used for partial multiplication
it should be possible to use it for partial logs, as well
not sure about exponentiation - i'd have to give that one some thought - lol
for division, i think you are stuck with long division if you have a large divisor - i may be wrong on that one
if the divisor is 64 bits or less, you can cascade divisions
Ray may know some tricks to using the fpu for extending precision

FORTRANS

Hi Raymond,

   Thanks for the explanation.  I'll look at your code again and
puzzle it out.  In the end we must be using variants of the
same algorithm.  I used BCD math in the fixed point conversion
program I posted in a different thread.  But that was not a
way I wanted to go with these longer fractions.  I was using
look-up tables, and that gets ridiculous.

Regards,

Steve N.

bruce1948

Quote
Correct!
And in your first approximation (50 bits in your example), if you simply set the most significant bit to one (or more accurate: the 50th bit !) and zeroes for all other bits, this will make sure that the first approximation is always smaller or equal than the exact root.

Thanks Eddy and Rockoon, I've come across a lot on how important the first estimation is, but nothing puts it that clearly :U


Bruce

dsouza123

I freshly recoded an unsigned integer square root routine from
http://www.masm32.com/board/index.php?topic=3334.msg26094
now with more documentation but the number conversion routines removed.
It has a fixed number of iterations, it is half the count of bits of the number.
It currently is in the form of a program with fixed 32 dword allocations for the number, square root, remainder.
At the end it displays a MessageBox with the number and square root in hex.

; Loop for each bit pair of the number from highest to lowest
;    root * 2
;    remainder * 4 + current pair
;    if root < remainder
;      remainder - ( root + 1)
;      root + 2
;    endif
; endLoop
; root \ 2
; if remainder == 0
;   root is from a perfect square  ie root*root == number
; else
;   the number doesn't have an integer square root

It worked with the test value but extensive testing with other values is yet to be done.

dsouza123

Bug squashed.
After testing with a few more numbers, the bug came to light.
It affected the high dword of the result, because it also,
along with all the other dwords in the result, need to be shr 1 bit.


past2rtdiv:
  ; mov [brt + edi*4 - 4], 0

past2rtdiv:
  shr [brt + edi*4 - 4], 1

dsouza123

Added hex string to binary routine (stores binary value in an array of dwords),
along with a 1024 bit unsigned integer number as a hex string to test.

Added routine to convert the 3 binary values to hex strings
that is displayed in the MessageBox.

The output agrees with the values from the old code, referenced earlier.

dsouza123

Found out the code doesn't handle a number only 1 dword in length.

Changing the code that works well for 2+ dwords, for the special case of 1 dword
wasn't a good tradeoff.

So put in code to check if the number is 1 dword,
if so it runs a small 32 bit square root routine instead.

drizz

This algo from wikipedia is FAST
http://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Binary_.28base_2.29

use bsr, and, bts to construct the first approximation.

rearange the loop from:
while (one != 0) {
            if (op >= res + one) {
                op -= res + one;
                res = (res >> 1) + one;
            }
            else {
                res >>= 1;
            }
            one >>= 2;
}


to:while (one != 0) {
            tmp = res + one;           
            res >>= 1;
            if (op >= tmp) {
                op -= tmp
                res += one;
            }
            one >>= 2;
}

The truth cannot be learned ... it can only be recognized.