News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

SQRT idea

Started by johnsa, April 26, 2010, 11:01:18 AM

Previous topic - Next topic

johnsa

So I was extremely bored this morning, and although this is a rather pointless excercise it might be kind of interesting.
So I was thinking, how would I calculate SQRTs without using lookups, built-in SQRT or Newton Raphson. I came up with this little idea which is working perfectly for integer only operation. For floats it would be best to replace the linear interpolation with exponential for precision. It's based on the simple observation that the perfect squares go up according to 4+(2n+1) : n=0 or n>=2 IE: 4,9,16,25 etc.. and as such map back down to sqr roots that increase sequentially as 2,3,4,5,6.....


        public static int fastSQRT(int x)
        {
            uint i = 65536, n = 4, d = 5;
            uint first, second;
            while (n < x)
            {
                n += d;
                d += 2;
                i += 65536;
            }
            second = n;
            first = (n - (d - 2));
            //i = (uint)(65536 * (((x - 5) >> 1) + 2));
            //first = (uint)(4 + ((x - 5) >> 1) );
            //second = first + (uint)(4 + (((x - 5) >> 1) + 1));
            return ( (int)( i + (((x-first)<<16)/(second-first)) >> 16) );
        }


It's not exactly fast... although if we could take out the while loop to identify the perfect squares that are to the left/right of the number in question it would be roughly the same as the built in sqrt in C#.  In any event it was a random brain-exploration :)

dedndave

QuoteIt's based on the simple observation that the perfect squares go up according to 4+(2n+1) : n=0 or n>=2 IE: 4,9,16,25 etc.. and as such map back down to sqr roots that increase sequentially as 2,3,4,5,6.....

that is an interesting observation, Johnsa

the squarerooter loops i have seen tend to work on bit pairs:
bit 0 of the squareroot is somewhat derived from input bits 0/1
bit 1 of the squareroot is somewhat derived from input bits 2/3
bit 2 of the squareroot is somewhat derived from input bits 4/5, and so on

MichaelW

Interesting, I wonder if the algorithm could be implemented as a macro that calculates a SQRT at assembly time.


eschew obfuscation

qWord

Quote from: MichaelW on April 26, 2010, 02:15:21 PM
I wonder if the algorithm could be implemented as a macro that calculates a SQRT at assembly time.
fastSQRT macro x:=<1>
fsqrt_i = 65536
fsqrt_n = 4
fsqrt_d = 5
WHILE fsqrt_n LT x
fsqrt_n = fsqrt_n + fsqrt_d
fsqrt_d = fsqrt_d + 2
fsqrt_i = fsqrt_i + 65536
ENDM
fsqrt_first = fsqrt_n-(fsqrt_d-2)
EXITM %((fsqrt_i + (((x-fsqrt_first) SHL 16)/(fsqrt_n-fsqrt_first))) SHR 16)
endm
FPU in a trice: SmplMath
It's that simple!

dedndave

already putting the "%" thing to use, eh qWord   :U

johnsa

The interpolation will obviously lead to less precision the larger the number, but sofar I've tested 100,000,000 numbers in the range of 0 - 500,000,000 and it's been spot on using integers only.
The next trick would be to use non-linear interpolation between the sqrt results.

IE: sqrt(6.5) falls between 4 and 9 which yield perfect roots of 2 and 3... so the sqrt would be calculated with linear interpolation to be 2.5 (which is close.. its actually 2.51....).
This should be fairly easy to add, just need an exponential function in there. The second is how to get rid of the while loop... I was trying my luck with the commented out lines to see if i could replace it by using some sort of
inequality like 4+(2n+1) >= x... but I didn't get it right yet.

jj2007

Quote from: qWord on April 26, 2010, 03:13:44 PM
fastSQRT macro x:=<1>

include \masm32\MasmBasic\MasmBasic.inc

fastSQRT macro x:=<1>
  fsqrt_i = 65536
  fsqrt_n = 4
  fsqrt_d = 5
  WHILE fsqrt_n LT x
fsqrt_n = fsqrt_n + fsqrt_d
fsqrt_d = fsqrt_d + 2
fsqrt_i = fsqrt_i + 65536
  ENDM
  fsqrt_first = fsqrt_n-(fsqrt_d-2)
  EXITM %((fsqrt_i + (((x-fsqrt_first) SHL 16)/(fsqrt_n-fsqrt_first))) SHR 16)
endm

Sqrt macro x:REQ
  ffree st(7)
  push x
  fild dword ptr [esp]
  fsqrt
  fstp MbDebugR8
  add esp, 4
  EXITM <MbDebugR8>
endm
Init
Print Str$("The fast square root of 30 is %9f\n", fastSQRT(30))
Inkey Str$("The slow square root of 30 is %9f", Sqrt(30))
Exit
end start


The fast square root of 30 is 5.00000000
The slow square root of 30 is 5.47722558

:green

dedndave

let's not start naming our procs starting with "Slow"   :lol
(or "Slower" or "Slowest" or "SnailSpeed")

qWord

 :bg
sqrti macro rm32:req
cvtsi2ss xmm0,rm32
sqrtss xmm0,xmm0
movd rm32,xmm0
endm

.data?
r4 REAL4 ?
.code
sqrti(ASM(mov r4,9))
print real4$(r4)
FPU in a trice: SmplMath
It's that simple!

johnsa

I think we should drop the fast prefix off my idea (as it's not very fast at all.. .lol especially with the while in there). :)
Maybe call it... oddWayToGetaSqrt

jj2007

Quote from: qWord on April 26, 2010, 08:03:10 PM
:bg
sqrti macro rm32:req
cvtsi2ss xmm0,rm32
sqrtss xmm0,xmm0
movd rm32,xmm0
endm

.data?
r4 REAL4 ?
.code
sqrti(ASM(mov r4,9))
print real4$(r4)


sqrtss is probably a tiny bit faster than fsqrt. Although I like the idea to have full Real10 precision...

What does ASM() do, by the way?
:bg

qWord

Quote from: jj2007 on April 26, 2010, 08:20:31 PMWhat does ASM() do, by the way?
from macros.inc:
    ; --------------------------------------------------
    ; Macro for placing an assembler instruction either
    ; within another or within a procedure call
    ; --------------------------------------------------

work only with two operands instructions. The first operand is returned.
ASM(mov eax,9) ==> returns eax
FPU in a trice: SmplMath
It's that simple!

clive

Quote from: jj2007
sqrtss is probably a tiny bit faster than fsqrt. Although I like the idea to have full Real10 precision...

RSQRTSS Is super fast (1 cycle on P3), low precision, table driven out of a ROM. Useful if you are dividing something by the square root.
It could be a random act of randomness. Those happen a lot as well.

clive

Probably in here, but I ran across it in one of his magazine articles.
http://www.amazon.com/dp/1929629095

// Jack Crenshaw's Integer Square Root

unsigned long integer_sqrt(unsigned long a)
{
  unsigned long rem = 0;
  unsigned long root = 0;
  unsigned long divisor = 0;
  int i;

  for(i=0; i<16; i++) // 32-bit
  {
    root <<= 1;

    rem = ((rem << 2) + (a >> 30));

    a <<= 2;

    divisor = (root << 1) + 1;

    if(divisor <= rem)
    {
      rem -= divisor;
      root++;
    }
  }

  return(root);
}
It could be a random act of randomness. Those happen a lot as well.

dioxin

Another square root, based on the method used to calculate square roots on hand calculating machines of the early 20th century
As written here it'll work for numbers up to about 1,000,000,000
!mov ecx,10000          ;Get SQRT of this number

!mov eax,40000000h      ;stepsize
!mov edx,40000000h      ;decrement
!mov esi,16             ;bits to do
     
lp1:
!sub ecx,edx
!jb x1
!add edx,eax
         
!shr eax,1
!shl ecx,1
!add edx,eax

!dec esi                ;next bit
!jne lp1

!jmp xit

x1:
!add ecx,edx
!sub edx,eax

!shr eax,1
!shl ecx,1
!add edx,eax

!dec esi                ;next bit
!jne lp1


XIT:
!shr edx,16             ;edx contains the answer