Pages: [1] 2



Author

Topic: SQRT idea (Read 8628 times)

johnsa


SQRT idea
« on: April 26, 2010, 11:01:18 AM » 

So I was extremely bored this morning, and although this is a rather pointless excercise it might be kind of interesting. So I was thinking, how would I calculate SQRTs without using lookups, builtin SQRT or Newton Raphson. I came up with this little idea which is working perfectly for integer only operation. For floats it would be best to replace the linear interpolation with exponential for precision. It's based on the simple observation that the perfect squares go up according to 4+(2n+1) : n=0 or n>=2 IE: 4,9,16,25 etc.. and as such map back down to sqr roots that increase sequentially as 2,3,4,5,6..... public static int fastSQRT(int x) { uint i = 65536, n = 4, d = 5; uint first, second; while (n < x) { n += d; d += 2; i += 65536; } second = n; first = (n  (d  2)); //i = (uint)(65536 * (((x  5) >> 1) + 2)); //first = (uint)(4 + ((x  5) >> 1) ); //second = first + (uint)(4 + (((x  5) >> 1) + 1)); return ( (int)( i + (((xfirst)<<16)/(secondfirst)) >> 16) ); }
It's not exactly fast... although if we could take out the while loop to identify the perfect squares that are to the left/right of the number in question it would be roughly the same as the built in sqrt in C#. In any event it was a random brainexploration :)



Logged




dedndave

It's based on the simple observation that the perfect squares go up according to 4+(2n+1) : n=0 or n>=2 IE: 4,9,16,25 etc.. and as such map back down to sqr roots that increase sequentially as 2,3,4,5,6..... that is an interesting observation, Johnsa the squarerooter loops i have seen tend to work on bit pairs: bit 0 of the squareroot is somewhat derived from input bits 0/1 bit 1 of the squareroot is somewhat derived from input bits 2/3 bit 2 of the squareroot is somewhat derived from input bits 4/5, and so on



Logged




MichaelW
Global Moderator
Member
Gender:
Posts: 5161

Interesting, I wonder if the algorithm could be implemented as a macro that calculates a SQRT at assembly time.



Logged

eschew obfuscation



qWord

I wonder if the algorithm could be implemented as a macro that calculates a SQRT at assembly time.
fastSQRT macro x:=<1> fsqrt_i = 65536 fsqrt_n = 4 fsqrt_d = 5 WHILE fsqrt_n LT x fsqrt_n = fsqrt_n + fsqrt_d fsqrt_d = fsqrt_d + 2 fsqrt_i = fsqrt_i + 65536 ENDM fsqrt_first = fsqrt_n(fsqrt_d2) EXITM %((fsqrt_i + (((xfsqrt_first) SHL 16)/(fsqrt_nfsqrt_first))) SHR 16) endm



Logged

FPU in a trice: SmplMathIt's that simple!



dedndave

already putting the "%" thing to use, eh qWord



Logged




johnsa

The interpolation will obviously lead to less precision the larger the number, but sofar I've tested 100,000,000 numbers in the range of 0  500,000,000 and it's been spot on using integers only. The next trick would be to use nonlinear interpolation between the sqrt results.
IE: sqrt(6.5) falls between 4 and 9 which yield perfect roots of 2 and 3... so the sqrt would be calculated with linear interpolation to be 2.5 (which is close.. its actually 2.51....). This should be fairly easy to add, just need an exponential function in there. The second is how to get rid of the while loop... I was trying my luck with the commented out lines to see if i could replace it by using some sort of inequality like 4+(2n+1) >= x... but I didn't get it right yet.



Logged




jj2007

include \masm32\MasmBasic\MasmBasic.inc
fastSQRT macro x:=<1> fsqrt_i = 65536 fsqrt_n = 4 fsqrt_d = 5 WHILE fsqrt_n LT x fsqrt_n = fsqrt_n + fsqrt_d fsqrt_d = fsqrt_d + 2 fsqrt_i = fsqrt_i + 65536 ENDM fsqrt_first = fsqrt_n(fsqrt_d2) EXITM %((fsqrt_i + (((xfsqrt_first) SHL 16)/(fsqrt_nfsqrt_first))) SHR 16) endm
Sqrt macro x:REQ ffree st(7) push x fild dword ptr [esp] fsqrt fstp MbDebugR8 add esp, 4 EXITM <MbDebugR8> endm Init Print Str$("The fast square root of 30 is %9f\n", fastSQRT(30)) Inkey Str$("The slow square root of 30 is %9f", Sqrt(30)) Exit end start
The fast square root of 30 is 5.00000000 The slow square root of 30 is 5.47722558



Logged




dedndave

let's not start naming our procs starting with "Slow" (or "Slower" or "Slowest" or "SnailSpeed")



Logged




qWord

sqrti macro rm32:req cvtsi2ss xmm0,rm32 sqrtss xmm0,xmm0 movd rm32,xmm0 endm .data? r4 REAL4 ? .code sqrti(ASM(mov r4,9)) print real4$(r4)



Logged

FPU in a trice: SmplMathIt's that simple!



johnsa

I think we should drop the fast prefix off my idea (as it's not very fast at all.. .lol especially with the while in there). :) Maybe call it... oddWayToGetaSqrt



Logged




jj2007

sqrti macro rm32:req cvtsi2ss xmm0,rm32 sqrtss xmm0,xmm0 movd rm32,xmm0 endm .data? r4 REAL4 ? .code sqrti(ASM(mov r4,9)) print real4$(r4) sqrtss is probably a tiny bit faster than fsqrt. Although I like the idea to have full Real10 precision... What does ASM() do, by the way?



Logged




qWord

What does ASM() do, by the way? from macros.inc: ;  ; Macro for placing an assembler instruction either ; within another or within a procedure call ;  work only with two operands instructions. The first operand is returned. ASM(mov eax,9) ==> returns eax



Logged

FPU in a trice: SmplMathIt's that simple!



clive
Hardcore x86, ARM and MIPS toolsmith
Member
Posts: 1230
Project Looking Glass. Following the white rabbit.

sqrtss is probably a tiny bit faster than fsqrt. Although I like the idea to have full Real10 precision...
RSQRTSS Is super fast (1 cycle on P3), low precision, table driven out of a ROM. Useful if you are dividing something by the square root.



Logged

It could be a random act of randomness. Those happen a lot as well.



clive
Hardcore x86, ARM and MIPS toolsmith
Member
Posts: 1230
Project Looking Glass. Following the white rabbit.

Probably in here, but I ran across it in one of his magazine articles. http://www.amazon.com/dp/1929629095// Jack Crenshaw's Integer Square Root
unsigned long integer_sqrt(unsigned long a) { unsigned long rem = 0; unsigned long root = 0; unsigned long divisor = 0; int i;
for(i=0; i<16; i++) // 32bit { root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
divisor = (root << 1) + 1;
if(divisor <= rem) { rem = divisor; root++; } }
return(root); }



Logged

It could be a random act of randomness. Those happen a lot as well.



dioxin

Another square root, based on the method used to calculate square roots on hand calculating machines of the early 20th century As written here it'll work for numbers up to about 1,000,000,000 !mov ecx,10000 ;Get SQRT of this number
!mov eax,40000000h ;stepsize !mov edx,40000000h ;decrement !mov esi,16 ;bits to do lp1: !sub ecx,edx !jb x1 !add edx,eax !shr eax,1 !shl ecx,1 !add edx,eax
!dec esi ;next bit !jne lp1
!jmp xit
x1: !add ecx,edx !sub edx,eax
!shr eax,1 !shl ecx,1 !add edx,eax
!dec esi ;next bit !jne lp1
XIT: !shr edx,16 ;edx contains the answer



Logged





Pages: [1] 2



