News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

GNU Scientific Library in Assembler

Started by jj2007, May 21, 2011, 09:19:45 PM

Previous topic - Next topic

jj2007

The GNU Scientific Library contains about 1,000 mathematical and statistical routines. Since I needed one of them, I wrote a little macro for using them in assembler. Basically (pun intended), you need
gsl  <the definition of the function copied exactly from the gsl reference manual>
...
gsl_INIT
gsl_whatever(x, y, z)
gsl_EXIT

Note that C leaves double aka Real8 output on the FPU. Use e.g. ...
gsl_sf_bessel_J0(x)
... and get the output with fstp MyReal8.
If the output is an integer, e.g. ...
gsl int gsl_multifit_test_delta (const gsl_vector * dx, const gsl_vector * x, double epsabs, double epsrel)
... then use
.data
epsabs REAL8 1.23
epsrel REAL8 4.56
.code
mov eax, gsl_multifit_test_delta (eax, ecx, epsabs, epsrel)


Here is a demo how to use the macro, taken 1:1 from the gsl example program:

Quoteinclude \masm32\MasmBasic\MasmBasic.inc   ; download

; libgsl.dll download : open Binaries zip, and extract
; libgsl.dll and libgslcblas.dll to \masm32\MasmBasic\GnuScLib\DLL

; define the gsl function(s) you need using the syntax of the
GNU Scientific Library Reference
gsl   double gsl_sf_bessel_J0 (double x)
[/b]
.data
x   REAL8 5.0   ; in C, this would be double x = 5.0;

   
Init[/b]
[/color]   
gsl_INIT
[/b]
   [/color]
gsl_sf_bessel_J0(x)
[/b]
   PrintLine "GNU expected:", Tb$, "-0.1775967713143382920"
   PrintLine "On FPU:    ", Tb$, "-0.1775967713143383198"
   Inkey Str$("MB Str$()  \t%If", ST(0))
   
gsl_EXIT
[/b]   
Exit[/b]
end start

QuoteGNU expected:   -0.1775967713143382920 (from gsl example program)
On FPU:         -0.1775967713143383198
MB Str$()       -0.177596771314338320
[/tt]

Note the difference in the last digits. The GNU site claims 2920 is right, but their own routine throws 3198 on the FPU.

P.S.: Gsl is powerful stuff, but don't forget to read the GNU license - see e.g. this thread.

hutch--

JJ,

Why not write yourself a REAL10 replacement ? Be FREE of any GPL pollution.  :bg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: hutch-- on May 22, 2011, 02:41:11 AM
JJ,

Why not write yourself a REAL10 replacement ? Be FREE of any GPL pollution.  :bg

Brilliant idea, Hutch. I can do the coding if you volunteer for the mathematical stuff - you surely know what "Complex Hermitian Matrices" are, right?
That'l keep us busy for the next 20 years :green2

# Mathematical Functions
# Complex Numbers
# Polynomials
# Special Functions
# Vectors and Matrices
# Permutations
# Combinations
# Multisets
# Sorting
# BLAS Support
# Linear Algebra
# Eigensystems
    * Real Symmetric Matrices
    * Complex Hermitian Matrices
15.2 Complex Hermitian Matrices

For hermitian matrices, the library uses the complex form of the symmetric bidiagonalization and QR reduction method.

— Function: gsl_eigen_herm_workspace * gsl_eigen_herm_alloc (const size_t n)

    This function allocates a workspace for computing eigenvalues of n-by-n complex hermitian matrices. The size of the workspace is O(3n).

— Function: void gsl_eigen_herm_free (gsl_eigen_herm_workspace * w)

    This function frees the memory associated with the workspace w.

— Function: int gsl_eigen_herm (gsl_matrix_complex * A, gsl_vector * eval, gsl_eigen_herm_workspace * w)

    This function computes the eigenvalues of the complex hermitian matrix A. Additional workspace of the appropriate size must be provided in w. The diagonal and lower triangular part of A are destroyed during the computation, but the strict upper triangular part is not referenced. The imaginary parts of the diagonal are assumed to be zero and are not referenced. The eigenvalues are stored in the vector eval and are unordered.

— Function: gsl_eigen_hermv_workspace * gsl_eigen_hermv_alloc (const size_t n)

    This function allocates a workspace for computing eigenvalues and eigenvectors of n-by-n complex hermitian matrices. The size of the workspace is O(5n).
....
    * Real Nonsymmetric Matrices
    * Real Generalized Symmetric-Definite Eigensystems
    * Complex Generalized Hermitian-Definite Eigensystems
    * Real Generalized Nonsymmetric Eigensystems
    * Sorting Eigenvalues and Eigenvectors
    * Eigenvalue and Eigenvector Examples
    * Eigenvalue and Eigenvector References
# Fast Fourier Transforms
# Numerical Integration
# Random Number Generation
# Quasi-Random Sequences
# Random Number Distributions
# Statistics
# Histograms
# N-tuples
# Monte Carlo Integration
# Simulated Annealing
# Ordinary Differential Equations
# Interpolation
# Numerical Differentiation
# Chebyshev Approximations
# Series Acceleration
# Wavelet Transforms
# Discrete Hankel Transforms
# One dimensional Root-Finding
# One dimensional Minimization
# Multidimensional Root-Finding
# Multidimensional Minimization
# Least-Squares Fitting
# Nonlinear Least-Squares Fitting
# Basis Splines
# Physical Constants
# IEEE floating-point arithmetic

MichaelW

AFAIK all of the GNU Scientific Library functions take and return doubles, and 19 digits for a double is wishful thinking. But ignoring that reality, if I use the Cygwin j0 function, store the result as a REAL10, and display it with the Cygwin printf function, which is not the MSVCRT function and actually supports long doubles, then I get:

-0.1775967713143382920
eschew obfuscation

jj2007

Quote from: MichaelW on May 22, 2011, 07:53:07 AM
AFAIK all of the GNU Scientific Library functions take and return doubles, and 19 digits for a double is wishful thinking. But ignoring that reality, if I use the Cygwin j0 function, store the result as a REAL10, and display it with the Cygwin printf function, which is not the MSVCRT function and actually supports long doubles, then I get:

-0.1775967713143382920


Michael,

The gsl leaves a result on the FPU, so it is by definition REAL10 (consider that the input is REAL8 but exactly 5.0, in REAL10 terms, and that the bessel routine uses REAL10 internally).

MasmBasic Str$() displays the full REAL10 precision correctly - which simply means that the result returned by the gsl bessel routine is different from what they say it should return.

But I agree fully that (quote gsl site)...
QuoteThe output is shown below, and should be correct to double-precision accuracy,2

     J0(5) = -1.775967713143382920e-01
... is incorrect. Even when I use printf:
invoke crt_printf, Chr$("J0(%g) = ", 9, " %.18e", 13, 10), x, MyRes
... the returned result is ...3200:
GNU expected:   -0.1775967713143382920
J0(5) =          -1.775967713143383200e-001
On FPU:         -0.1775967713143383198
MB Str$()       -0.177596771314338320
MB Str$()       -0.177596771314338320

Which implies that the gsl people used something else to generate the 2920 ending quoted on their site - maybe the cygwin routine you used.

In short: It is not a display problem - cygwin and libgsl.dll return different results.

MichaelW

Jochen,

Returning a floating-point result in ST(0) is part of the calling convention and it does not imply that the result has the full REAL10 precision.

http://www.gnu.org/software/gsl/manual/gsl-ref.html#Long-double

"In general, the algorithms in the library are written for double precision only. The long double type is not supported for actual computation."

Considering that the library is written in ANSI C, and intended to be portable, I doubt that the computations were done without storing intermediate results and I doubt that the intermediate results were stored as long doubles.

eschew obfuscation

hutch--

 :bg

JJ,

> Brilliant idea, Hutch. I can do the coding if you volunteer for the mathematical stuff - you surely know what "Complex Hermitian Matrices" are, right?

I may not be much use to you here, apart from cheating and using a computer my arithmetic skills are of the class of,

Eenie, meanie, miney and Larry and Curly's mate MOE.  :P
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

jj2007

Quote from: MichaelW on May 22, 2011, 09:56:08 AM
I doubt that the computations were done without storing intermediate results and I doubt that the intermediate results were stored as long doubles.

Michael,

At least for the Bessel routine, Olly says it's all FPU and no intermediate results stored to REAL8 vars. Now the interesting question remains which of the two results is more "exact"... :wink

Note also that the MasmBasic gsl macro is an interface to the gsl dll; using the gsl source with C might e.g. use SSE2 instead of the FPU. On the other hand, the dll comes from the fairly "official" GSL for Windows site, so I guess this is the best binary they could produce...

Just for fun, here another example:

Quoteinclude \masm32\MasmBasic\MasmBasic.inc   ; download

; libgsl.dll download : open Binaries zip, and extract
; libgsl.dll and libgslcblas.dll to \masm32\MasmBasic\GnuScLib\DLL

; define the gsl function(s) you want to use with the syntax of the
GNU Scientific Library Reference
gsl   double gsl_stats_mean (const double data[], size_t stride, size_t n)
gsl   double gsl_stats_variance (const double data[], size_t stride, size_t n)
gsl   double gsl_stats_sd (const double data[], size_t stride, size_t n)

.data
MyData   REAL8 5.0, 6.0, 3.2, 1.8, 9.0      ; define a double precision array

   Init
   gsl_INIT
   gsl_stats_
mean (offset MyData, 1, 5)      ; stride 1, 5 elements
   PrintLine Str$("Mean    \t%7f", ST(0))      ; print ST directly from the FPU with 7 digits precision
   gsl_stats_
variance (offset MyData, 1, 5)
   PrintLine Str$("Variance\t%7f", ST(0))
   gsl_stats_
sd (offset MyData, 1, 5)
   PrintLine Str$("Standard Dev\t%7f", ST(0))
   gsl_EXIT
   
Inkey "ok"
   Exit
end start

Mean            5.000000
Variance        7.620000
Standard Dev    2.760435

MichaelW

As to which one is more exact I don't know, but they both appear to produce REAL8 results (but not the same results).

;====================================================================
    .486                                ; create 32 bit code
    .model flat, stdcall                ; 32 bit memory model
    option casemap :none                ; case sensitive

    include \masm32\include\windows.inc
    include \masm32\include\masm32.inc
    include \masm32\include\gdi32.inc
    include \masm32\include\user32.inc
    include \masm32\include\kernel32.inc
    include \masm32\include\Comctl32.inc
    include \masm32\include\comdlg32.inc
    include \masm32\include\shell32.inc
    include \masm32\include\oleaut32.inc
    include \masm32\include\msvcrt.inc

    include \masm32\macros\macros.asm

    includelib \masm32\lib\masm32.lib
    includelib \masm32\lib\gdi32.lib
    includelib \masm32\lib\user32.lib
    includelib \masm32\lib\kernel32.lib
    includelib \masm32\lib\Comctl32.lib
    includelib \masm32\lib\comdlg32.lib
    includelib \masm32\lib\shell32.lib
    includelib \masm32\lib\oleaut32.lib

    includelib cygwin1.lib
    includelib libgsl.lib

    includelib \masm32\lib\msvcrt.lib
;====================================================================

cygwin_dll_init PROTO C
dll_dllcrt0 PROTO C
printf PROTO C :VARARG

j0               PROTO C :REAL8
gsl_sf_bessel_J0 PROTO C :REAL8

;====================================================================
    .data
        r85 REAL8 5.0
        r8  REAL8 ?
        r10 REAL10 ?
    .code
;====================================================================
start:
;====================================================================
    invoke cygwin_dll_init
    invoke dll_dllcrt0

    invoke j0, r85
    fstp r10
    invoke printf, cfm$("cygwin j0 to r10       : %.19Lf\n\n"), r10

    invoke j0, r85
    fstp r8
    invoke printf, cfm$("cygwin j0 to r8        : %.19f\n\n"), r8

    invoke gsl_sf_bessel_J0, r85
    fstp r10
    invoke printf, cfm$("gsl_sf_bessel_J0 to r10: %.19Lf\n\n"), r10

    invoke gsl_sf_bessel_J0, r85
    fstp r8
    invoke printf, cfm$("gsl_sf_bessel_J0 to r8 : %.19f\n\n"), r8

    inkey "Press any key to exit..."
    exit
;====================================================================
end start


cygwin j0 to r10       : -0.1775967713143382920

cygwin j0 to r8        : -0.1775967713143382920

gsl_sf_bessel_J0 to r10: -0.1775967713143382642

gsl_sf_bessel_J0 to r8 : -0.1775967713143382642


FWIW, the module definition file that I created for libgsl.dll, that I then used to create an import library, included 3695 function names.
eschew obfuscation

jj2007

Michael,

How did you get libgsl.lib? Can you post the executable, so that we can have a look at differences in Olly?

MichaelW

All of the necessary components except the DLLs are in the attachment. It turned out that for this I didn't need the libgslcblas components, although as you know I did need the DLL.

And I checked the header files to verify that both functions take a REAL8 argument.
eschew obfuscation

jj2007

Thanks a lot, Michael. I had to do a complete CygWin install to get it to work, but it is pretty obvious: There are two different versions around that return slightly different figures... using the same dll and the same inputs!
::)

Here is what Olly sees:
a) my version, result on FPU: -0.1775967713143383198
gsl_sf_bessel_J0_e   55                        push ebp
100B5611             89E5                      mov ebp, esp
100B5613             56                        push esi
100B5614             53                        push ebx
100B5615             83C4 80                   add esp, -80
100B5618             8B75 10                   mov esi, [ebp+10]
100B561B             DD45 08                   fld qword ptr [ebp+8]
100B561E             D905 207C1310             fld dword ptr [10137C20]         ; float 2.980232e-08 (const 1/2**25)
100B5624             D9C9                      fxch st(1)
100B5626             D9E1                      fabs
100B5628             D9C9                      fxch st(1)
100B562A             DDE9                      fucomp st(1)
100B562C             DFE0                      fstsw ax
100B562E             9E                        sahf
100B562F            76 12                     jbe short 100B5643
100B5631             D8C8                      fmul st, st
100B5633             D9E8                      fld1
100B5635             DD1E                      fstp qword ptr [esi]
100B5637             DD5E 08                   fstp qword ptr [esi+8]
100B563A             83EC 80                   sub esp, -80
100B563D             31C0                      xor eax, eax
100B563F             5B                        pop ebx
100B5640             5E                        pop esi
100B5641             5D                        pop ebp
100B5642             C3                        retn

b) the CygWin version, result on FPU: -0.1775967713143382642
j0                   55                        push ebp
610FE651             89E5                      mov ebp, esp
610FE653             83EC 48                   sub esp, 48
610FE656             DD45 08                   fld qword ptr [ebp+8]
610FE659             DD55 C0                   fst qword ptr [ebp-40]
610FE65C             DD1C24                    fstp qword ptr [esp]
610FE65F             E8 DC810000               call 61106840 <<<<<<<<<<<<<<<<<<<<<<<
610FE664             833D E0201661 FF          cmp dword ptr [611620E0], -1
610FE66B             DD5D C8                   fstp qword ptr [ebp-38]
610FE66E            74 0F                     je short 610FE67F
610FE670             DD45 C0                   fld qword ptr [ebp-40]
610FE673             DD1C24                    fstp qword ptr [esp]
610FE676             E8 45F3FFFF               call __fpclassifyd
610FE67B             85C0                      test eax, eax
610FE67D            75 05                     jne short 610FE684
610FE67F             DD45 C8                   fld qword ptr [ebp-38]
610FE682             C9                        leave
610FE683             C3                        retn


61106840             55                        push ebp
61106841             89E5                      mov ebp, esp
61106843             53                        push ebx
61106844             83EC 64                   sub esp, 64
61106847             DD45 08                   fld qword ptr [ebp+8]
6110684A             DD55 C8                   fst qword ptr [ebp-38]
6110684D             8B5D CC                   mov ebx, [ebp-34]
61106850             81E3 FFFFFF7F             and ebx, 7FFFFFFF
61106856             81FB FFFFEF7F             cmp ebx, 7FEFFFFF
6110685C            7E 12                     jle short 61106870
6110685E             D9C0                      fld st
61106860             DEC9                      fmulp st(1), st
61106862             D83D 70E51C61             fdivr dword ptr [611CE570]       ; float 1.000000
61106868             83C4 64                   add esp, 64
6110686B             5B                        pop ebx
6110686C             5D                        pop ebp
6110686D             C3                        retn

vanjast

Don't forget to init the FPU to Real10 and rounding to the nearest...
Maybe there's an FPU intialisation problem with that......eughh!! library.. whatyou callit!!
:bg

jj2007

Quote from: vanjast on May 23, 2011, 09:37:26 AM
Don't forget to init the FPU to Real10 and rounding to the nearest...
Maybe there's an FPU intialisation problem with that......eughh!! library.. whatyou callit!!
:bg

Correct, thanks.
Quote   finit
   gsl_sf_bessel_J0(x)

GNU expected:   -0.1775967713143382920
MB Str$()       -0.1775967713143382920

Mystery solved. Needless to say that the string fpu does not appear in the 422 pages of the gsl reference manual :'(

dedndave

that suggests that there is a library init routine someplace
perhaps there are other items that require initialization, as well