The MASM Forum Archive 2004 to 2012

General Forums => The Laboratory => Topic started by: johnsa on April 26, 2010, 11:01:18 AM

Title: SQRT idea
Post by: johnsa on April 26, 2010, 11:01:18 AM
So I was extremely bored this morning, and although this is a rather pointless excercise it might be kind of interesting.
So I was thinking, how would I calculate SQRTs without using lookups, built-in SQRT or Newton Raphson. I came up with this little idea which is working perfectly for integer only operation. For floats it would be best to replace the linear interpolation with exponential for precision. It's based on the simple observation that the perfect squares go up according to 4+(2n+1) : n=0 or n>=2 IE: 4,9,16,25 etc.. and as such map back down to sqr roots that increase sequentially as 2,3,4,5,6.....


        public static int fastSQRT(int x)
        {
            uint i = 65536, n = 4, d = 5;
            uint first, second;
            while (n < x)
            {
                n += d;
                d += 2;
                i += 65536;
            }
            second = n;
            first = (n - (d - 2));
            //i = (uint)(65536 * (((x - 5) >> 1) + 2));
            //first = (uint)(4 + ((x - 5) >> 1) );
            //second = first + (uint)(4 + (((x - 5) >> 1) + 1));
            return ( (int)( i + (((x-first)<<16)/(second-first)) >> 16) );
        }


It's not exactly fast... although if we could take out the while loop to identify the perfect squares that are to the left/right of the number in question it would be roughly the same as the built in sqrt in C#.  In any event it was a random brain-exploration :)
Title: Re: SQRT idea
Post by: dedndave on April 26, 2010, 12:38:52 PM
QuoteIt's based on the simple observation that the perfect squares go up according to 4+(2n+1) : n=0 or n>=2 IE: 4,9,16,25 etc.. and as such map back down to sqr roots that increase sequentially as 2,3,4,5,6.....

that is an interesting observation, Johnsa

the squarerooter loops i have seen tend to work on bit pairs:
bit 0 of the squareroot is somewhat derived from input bits 0/1
bit 1 of the squareroot is somewhat derived from input bits 2/3
bit 2 of the squareroot is somewhat derived from input bits 4/5, and so on
Title: Re: SQRT idea
Post by: MichaelW on April 26, 2010, 02:15:21 PM
Interesting, I wonder if the algorithm could be implemented as a macro that calculates a SQRT at assembly time.


Title: Re: SQRT idea
Post by: qWord on April 26, 2010, 03:13:44 PM
Quote from: MichaelW on April 26, 2010, 02:15:21 PM
I wonder if the algorithm could be implemented as a macro that calculates a SQRT at assembly time.
fastSQRT macro x:=<1>
fsqrt_i = 65536
fsqrt_n = 4
fsqrt_d = 5
WHILE fsqrt_n LT x
fsqrt_n = fsqrt_n + fsqrt_d
fsqrt_d = fsqrt_d + 2
fsqrt_i = fsqrt_i + 65536
ENDM
fsqrt_first = fsqrt_n-(fsqrt_d-2)
EXITM %((fsqrt_i + (((x-fsqrt_first) SHL 16)/(fsqrt_n-fsqrt_first))) SHR 16)
endm
Title: Re: SQRT idea
Post by: dedndave on April 26, 2010, 03:39:41 PM
already putting the "%" thing to use, eh qWord   :U
Title: Re: SQRT idea
Post by: johnsa on April 26, 2010, 07:15:17 PM
The interpolation will obviously lead to less precision the larger the number, but sofar I've tested 100,000,000 numbers in the range of 0 - 500,000,000 and it's been spot on using integers only.
The next trick would be to use non-linear interpolation between the sqrt results.

IE: sqrt(6.5) falls between 4 and 9 which yield perfect roots of 2 and 3... so the sqrt would be calculated with linear interpolation to be 2.5 (which is close.. its actually 2.51....).
This should be fairly easy to add, just need an exponential function in there. The second is how to get rid of the while loop... I was trying my luck with the commented out lines to see if i could replace it by using some sort of
inequality like 4+(2n+1) >= x... but I didn't get it right yet.
Title: Re: SQRT idea
Post by: jj2007 on April 26, 2010, 07:40:51 PM
Quote from: qWord on April 26, 2010, 03:13:44 PM
fastSQRT macro x:=<1>

include \masm32\MasmBasic\MasmBasic.inc

fastSQRT macro x:=<1>
  fsqrt_i = 65536
  fsqrt_n = 4
  fsqrt_d = 5
  WHILE fsqrt_n LT x
fsqrt_n = fsqrt_n + fsqrt_d
fsqrt_d = fsqrt_d + 2
fsqrt_i = fsqrt_i + 65536
  ENDM
  fsqrt_first = fsqrt_n-(fsqrt_d-2)
  EXITM %((fsqrt_i + (((x-fsqrt_first) SHL 16)/(fsqrt_n-fsqrt_first))) SHR 16)
endm

Sqrt macro x:REQ
  ffree st(7)
  push x
  fild dword ptr [esp]
  fsqrt
  fstp MbDebugR8
  add esp, 4
  EXITM <MbDebugR8>
endm
Init
Print Str$("The fast square root of 30 is %9f\n", fastSQRT(30))
Inkey Str$("The slow square root of 30 is %9f", Sqrt(30))
Exit
end start


The fast square root of 30 is 5.00000000
The slow square root of 30 is 5.47722558

:green
Title: Re: SQRT idea
Post by: dedndave on April 26, 2010, 07:47:44 PM
let's not start naming our procs starting with "Slow"   :lol
(or "Slower" or "Slowest" or "SnailSpeed")
Title: Re: SQRT idea
Post by: qWord on April 26, 2010, 08:03:10 PM
 :bg
sqrti macro rm32:req
cvtsi2ss xmm0,rm32
sqrtss xmm0,xmm0
movd rm32,xmm0
endm

.data?
r4 REAL4 ?
.code
sqrti(ASM(mov r4,9))
print real4$(r4)
Title: Re: SQRT idea
Post by: johnsa on April 26, 2010, 08:10:30 PM
I think we should drop the fast prefix off my idea (as it's not very fast at all.. .lol especially with the while in there). :)
Maybe call it... oddWayToGetaSqrt
Title: Re: SQRT idea
Post by: jj2007 on April 26, 2010, 08:20:31 PM
Quote from: qWord on April 26, 2010, 08:03:10 PM
:bg
sqrti macro rm32:req
cvtsi2ss xmm0,rm32
sqrtss xmm0,xmm0
movd rm32,xmm0
endm

.data?
r4 REAL4 ?
.code
sqrti(ASM(mov r4,9))
print real4$(r4)


sqrtss is probably a tiny bit faster than fsqrt. Although I like the idea to have full Real10 precision...

What does ASM() do, by the way?
:bg
Title: Re: SQRT idea
Post by: qWord on April 26, 2010, 08:43:24 PM
Quote from: jj2007 on April 26, 2010, 08:20:31 PMWhat does ASM() do, by the way?
from macros.inc:
    ; --------------------------------------------------
    ; Macro for placing an assembler instruction either
    ; within another or within a procedure call
    ; --------------------------------------------------

work only with two operands instructions. The first operand is returned.
ASM(mov eax,9) ==> returns eax
Title: Re: SQRT idea
Post by: clive on April 26, 2010, 08:51:38 PM
Quote from: jj2007
sqrtss is probably a tiny bit faster than fsqrt. Although I like the idea to have full Real10 precision...

RSQRTSS Is super fast (1 cycle on P3), low precision, table driven out of a ROM. Useful if you are dividing something by the square root.
Title: Re: SQRT idea
Post by: clive on April 26, 2010, 08:58:00 PM
Probably in here, but I ran across it in one of his magazine articles.
http://www.amazon.com/dp/1929629095

// Jack Crenshaw's Integer Square Root

unsigned long integer_sqrt(unsigned long a)
{
  unsigned long rem = 0;
  unsigned long root = 0;
  unsigned long divisor = 0;
  int i;

  for(i=0; i<16; i++) // 32-bit
  {
    root <<= 1;

    rem = ((rem << 2) + (a >> 30));

    a <<= 2;

    divisor = (root << 1) + 1;

    if(divisor <= rem)
    {
      rem -= divisor;
      root++;
    }
  }

  return(root);
}
Title: Re: SQRT idea
Post by: dioxin on April 26, 2010, 10:44:17 PM
Another square root, based on the method used to calculate square roots on hand calculating machines of the early 20th century
As written here it'll work for numbers up to about 1,000,000,000
!mov ecx,10000          ;Get SQRT of this number

!mov eax,40000000h      ;stepsize
!mov edx,40000000h      ;decrement
!mov esi,16             ;bits to do
     
lp1:
!sub ecx,edx
!jb x1
!add edx,eax
         
!shr eax,1
!shl ecx,1
!add edx,eax

!dec esi                ;next bit
!jne lp1

!jmp xit

x1:
!add ecx,edx
!sub edx,eax

!shr eax,1
!shl ecx,1
!add edx,eax

!dec esi                ;next bit
!jne lp1


XIT:
!shr edx,16             ;edx contains the answer 
Title: Re: SQRT idea
Post by: raymond on April 27, 2010, 03:07:10 AM
QuoteAnother square root, based on the method used to calculate square roots on hand calculating machines of the early 20th century
As written here it'll work for numbers up to about 1,000,000,000

A similar procedure has been published some 10 years ago with the Mixlib library. It was also expanded to cover 64-bit integers. Using the latter, the square root of a 32-bit integer would yield an accuracy equivalent to 5 decimal places. The Mixlib package can be downloaded from:

http://www.ray.masmcode.com/fixmath.html

P.S. I can still remember using such a calculating machine still in use in the early 1960's!!! :8)
Title: Re: SQRT idea
Post by: dedndave on April 27, 2010, 03:29:11 AM
and i thought i was old   :P
i remember learning to do squareroots long-hand - kind of similar to division
it was more or less a game of successive approximation

never saw a squareroot machine
the earliest machine i ever got to play with (besides an abacus) was a TI calculator
it would add, subtract, multiply, and divide - no squareroots - forget how many digits - not a lot
if i remember, it cost something like $10,000 and was roughly the size of a small typewriter

at least mine was electric, Ray   :bg
Title: Re: SQRT idea
Post by: raymond on April 27, 2010, 03:58:29 AM
Quoteat least mine was electric, Ray :bg   

The one I used was also electric. :clap:

That's where I learned about that procedure to extract square roots (which I coined the abacus procedure since I suspect it was developed eons ago by people using an abacus).

So, when my son challenged me to write a computer program to extract square roots with more than 20 decimal digits (which a friend of his was attempting without much success), I told him it would probably take me some 4 hours to accomplish that on my TRS-80. He did not know I had an ace up my sleeve. :eek

BTW, the initial procedure I wrote was good enough to provide 10000 decimal places from any input, total memory available on that TRS-80 being 64kb. :dazzled:
Title: Re: SQRT idea
Post by: dedndave on April 27, 2010, 04:04:51 AM
then, you're not that old   :bg

i vaguely remember how to do it long-hand
but i wrote a 16-bit integer squareroot routine based on it long ago
it would return 24 bits   :P
could have done more, but i wanted to keep it all in-register

10,000 digits on a trs80 is very impressive   :U
Title: Re: SQRT idea
Post by: jj2007 on April 27, 2010, 10:06:31 AM
It's not 10,000 digits but for everyday needs it should do the job: I added Sqrt() to the MasmBasic (http://www.masm32.com/board/index.php?topic=12460) library.

include \masm32\MasmBasic\MasmBasic.inc

.data
V32R4 REAL4 32.0
V32R8 REAL8 32.0
V32R10 REAL10 32.0
V32DW dd 32
V32QW dq 32

Init
Print "All tests for a value of 32", CrLf$
mov edx, 32
Print Str$("Square root reg32 \t%If\n", Sqrt(edx))
MovVal xmm0, "32" ; xmm0 is integer (even)
MovVal xmm1, "32" ; xmm1 is Real8 (odd)
Print Str$("Square root xmm0 \t%If\n", Sqrt(xmm0))
Print Str$("Square root xmm1 \t%If\n", Sqrt(xmm1))
Print Str$("Square root DWord \t%If\n", Sqrt(V32DW))
Print Str$("Square root QWord \t%If\n", Sqrt(q:V32QW))    ; qword and single need a little
Print Str$("Square root Real4 \t%If\n", Sqrt(s:V32R4))     ; prefix to distinguish them from Real8 and Dword
Print Str$("Square root Real8 \t%If\n", Sqrt(V32R8))
Print Str$("Square root Real10  \t%If\n", Sqrt(V32R10))
Inkey Str$("Reversed, the result is %If\n", 5.65685424949238019*5.65685424949238019)
Exit
end start


All tests for a value of 32
Square root reg32       5.65685424949238019
Square root xmm0        5.65685424949238019
Square root xmm1        5.65685424949238019
Square root DWord       5.65685424949238019
Square root QWord       5.65685424949238019
Square root Real4       5.65685424949238019
Square root Real8       5.65685424949238019
Square root Real10      5.65685424949238019
Reversed, the result is 32.0000000000000044
Title: Re: SQRT idea
Post by: dioxin on April 27, 2010, 11:21:25 AM
Raymond,
QuoteA similar procedure has been published some 10 years ago with the Mixlib library. It was also expanded to cover 64-bit integers. Using the latter, the square root of a 32-bit integer would yield an accuracy equivalent to 5 decimal places
That's probably not the same method. The one I posted gives total accuracy within the bits available so on a 64 bit register, there would be 32 bits after the point, all of them correct. That's 9 decimal digits after the point. Maybe it's the 32 bit version gave about 5 digits.


I originally learned the method at school in the 1970s on an adding machine that I assumed was from the 1950s.
I programmed a PET computer with a 256 digit version in about 1979. It took 2 seconds to calculate all 256 digits while showing all the digits in all of the registers on the screen as it calculated (character based memory mapped screens were good for that). That's with a 1MHz 6502 CPU.


It would appear from the replies here that calculating long square roots and programming in assembler might be related!

Paul.
Title: Re: SQRT idea
Post by: dedndave on April 27, 2010, 11:44:59 AM
QuoteIt would appear from the replies here that calculating long square roots and programming in assembler might be related!

lol Paul
it is a fun exercise, for some reason - even though the FPU or SSE do it faster
Title: Re: SQRT idea
Post by: raymond on April 28, 2010, 04:15:41 AM
Quote from: dioxin on April 27, 2010, 11:21:25 AM
Raymond,
QuoteA similar procedure has been published some 10 years ago with the Mixlib library. It was also expanded to cover 64-bit integers. Using the latter, the square root of a 32-bit integer would yield an accuracy equivalent to 5 decimal places
That's probably not the same method. The one I posted gives total accuracy within the bits available so on a 64 bit register, there would be 32 bits after the point, all of them correct. That's 9 decimal digits after the point. Maybe it's the 32 bit version gave about 5 digits.

Your posted procedure ends with:

QuoteXIT:
!shr edx,16             ;edx contains the answer

You would thus agree that the square root of a 32-bit integer cannot be any larger than a 16-bit integer, i.e. at most 5 significant digits for the integer part.

Similarly, the square root of a 64-bit integer cannot be any larger than a 32-bit integer, i.e. at most 10 significant digits. If you use the upper dword of a "64-bit register" for a 32-bit integer, the lower 16-bits of the resulting square root could then be converted into a decimal fraction yielding an additional 5 significant digits after the decimal point.

If you use the entire "64-bit register" for the integer, you simply won't get any fractional digit, whether binary or decimal. Obviously, with the newer 64-bit computers, accuracy can be doubled by adapting the procedure.
Title: Re: SQRT idea
Post by: dioxin on April 28, 2010, 10:44:09 AM
Raymond,
I stopped the calculation short to save time. Change the "bits to do" in esi to 32 and the code continues to produce the fractional bits of the result. Then at the end, instead if discarding the 16 bits shifted out of edx, use them.
The simple way to show the result is to print out the content of edx/65536.

Paul.


Title: Re: SQRT idea
Post by: KeepingRealBusy on May 02, 2010, 08:28:11 PM
dedndave,

Here is one way to calculate the square root of any BigNum to any precision that you may desire.

First calculate the value of sqrt(2) = 1.414... Calculate this to any huge precision that you may think that you will ever need, maybe 65537 bits, but express it as a BigNum integer 1414....

Divide this BigNum by 2 to get the 65536 bit BigNum constant sqrt(2)/2 = 707... and save this constant for future sqrt calculations. Note: This is just a SHR 1 of sqrt(2).

To calculate sqrt(2) to any huge precision is an iterative process, but you only need to do this once in your lifetime. If you need this calculation algorithm using powers of 2, I can supply it. The arithmetic is simple in ASM, I would hate to try this in C.

Now to solve the problem of BigNum square roots.

Consider an isosceles right triangle with sides of sqrt(2) and hypotenuse of 2. Consider a similar triangle with sides of sqrt(n) and hypotenuse of n. Since they are similar triangles, the following relationship exists:

    sqrt(2):2 :: sqrt(n):n

or algebraicaly (where (n != 0)):

    sqrt(2)/2 = sqrt(n)/n

or

    (sqrt(2)/2)*n = sqrt(n)

Now the following is also true (consider calculating to 256 bits of precision):


   ((((sqrt(2)/2)*2^256))*(n)      /(2^256))           = sqrt(n)
     integer              * integer   set binary point


The value of (((sqrt(2)/2)*2^256)) is the most significant 256 bits of the BigNum sqrt(2)/2. Multiply this by the integer n. Insert a binary point (split the resulting BigNum product into an integer and a fraction with the fraction having the least significant 256 bits). The result is the sqrt(n).

Note: This calculation only takes one BigNum multiply and setting the binary point.

If the number you have for which you are trying to extract the root is fractional (42/17) (which is (2.4705882352941176470588235294118)), then calculate (42*2^precision)/17 and the resulting integer is (42/17)*2^precision = n*2^precision. Now:


    ((((sqrt(2)/2)*2^256))*(n*2^precision)/(2^256*2^precision)) = sqrt(n)
     integer              * integer        set binary point and truncate


Just truncate the result leaving the most significant precision bits of the fraction (256 fractional bits). The calculation (42*2^precision) is just a shift, and the calculation ((42*2^precision)/17) is a shift and a BigNum Divide, so you have overall a shift, a BigNum divide, and a BigNum multiply, and setting the binary point.

If the number you have for which you are trying to extract the root is already expressed as 2.470588 (to 6 decimal_places), then divide 2470588*2^precision by 10^decimal_places (1,000,000) and the resulting integer is 2.470588*2^precision = n*2^precision. Now:


    ((((sqrt(2)/2)*2^256))*(n*2^precision)/(2^256*2^precision)) = sqrt(n)
     integer              * integer        set binary point and truncate


This is the same single BigNum multiply, it just has a larger fractional part. Just truncate the result leaving the most significant precision bits of the fraction (256 fractional bits).

Dave.
Title: Re: SQRT idea
Post by: dioxin on May 02, 2010, 08:50:57 PM
KeepingRealBusy,
are you sure about all that?

If it worked then I could caclulate sqrt(3) like this:
From your equation: (sqrt(2)/2)*n = sqrt(n)

Substitute 3 for n:
(sqrt(2)/2)*3 = sqrt(3)
therefore 2.1213 = sqrt(3)

Hmm, that should be 1.732. Something isn't right.

Could you show me in simple stages how to calculate sqrt(3) to 3 decimal places using your method and given that sqrt(2)=1.14142 ?

Paul.
Title: Re: SQRT idea
Post by: dedndave on May 02, 2010, 09:54:26 PM
you might be able to make something like that work if you continually shifted the (squareroot of 2/2) to the left
i have used ling long kai fang to calculate BigNum exponentials
not sure i can make it work for fractional exponents, though - let me ponder it for a while
Title: Re: SQRT idea
Post by: KeepingRealBusy on May 02, 2010, 10:01:57 PM
Paul and dedndave,

What was I thinking! You are correct. Let me ponder this also.

Dave.
Title: Re: SQRT idea
Post by: KeepingRealBusy on May 03, 2010, 12:05:59 AM
All,

Please scratch my idea for easy square roots. Bad day at Black Rock.

dedndave,

My algo for square roots for BigNums is still valid, just no easy way to get there. The powers of 2 idea works, but it is slow. The old paper and pencil method converted to computers also works on DWORD digits and is much faster.

Dave.
Title: Re: SQRT idea
Post by: dedndave on May 03, 2010, 01:26:24 AM
you could use logarithms, but i seriously doubt there would be any speed advantage
it involves taking the log, dividing it by 2 (or simply interpreting it with 1-bit shift), then taking the anti-log
it seems like it would take at least twice as long to do it that way