The MASM Forum Archive 2004 to 2012

General Forums => The Campus => Topic started by: SteveAsm on February 02, 2012, 01:18:30 AM

Title: FPU status word
Post by: SteveAsm on February 02, 2012, 01:18:30 AM
Hey guys,
I've been doing some experimenting, trying to determine what jump instructions work (sucessfully) based on storing the fpu status word in the AX register.
As in:

  FLD avalue
  FLD bvalue
  FCOM
  FSTSW AX
  FWAIT
  SAHF
;     <------- some jump instruction here  ----->  jne, jnl, jng, jnle, jz, je, jl, jg, jle, jge, jnz, ...etc.



Testing leads to a lot of hair pulling..., when some things appear to work, under some circumstances, but, then dont under others.
I've searched the forum archives and done some googling, but, I'm not finding too much information.
Any ideas or pointers would be appreciated.
Thanks,
steve
Title: Re: FPU status word
Post by: dedndave on February 02, 2012, 02:03:05 AM
2 things will come in handy

1) Ray's FPU tutorial
http://www.ray.masmcode.com/fpu.html
specifically...
http://www.ray.masmcode.com/tutorial/fpuchap7.htm

2) a table that shows which flags the conditional branch instructions test
http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_6/CH06-5.html#HEADING5-226
Title: Re: FPU status word
Post by: SteveAsm on February 02, 2012, 03:31:39 AM
Quote from: dedndave on February 02, 2012, 02:03:05 AM
2 things will come in handy

Thanks Dave
Title: Re: FPU status word
Post by: qWord on February 02, 2012, 08:23:12 AM
you may better use fcomi[p], which directly set the flags -> JA/JB/...
Title: Re: FPU status word
Post by: SteveAsm on February 02, 2012, 05:24:03 PM
Quote from: qWord on February 02, 2012, 08:23:12 AM
you may better use fcomi[p], which directly set the flags -> JA/JB/...

Hey qWord,
thanks for pointing that (FCOMI) out.
I didn't catch that in my searches the first time around and my reference book does not include it.

It looks like that could solve the problem, IF most processors in use include it.
I'm not so concerned with the newer ones.
It's the older desktops, laptops and notebooks I wonder about.
Title: Re: FPU status word
Post by: qWord on February 02, 2012, 05:27:49 PM
FCOMI was introduced with Intle's P6 (1995)  - don't worry about availability  :bg
Title: Re: FPU status word
Post by: SteveAsm on February 02, 2012, 10:09:13 PM
Okay..., this is really odd.
I get these similar errors from JWasm and Masm:

JWasm  ::  Test2.asm(111) : Error A2049: Invalid instruction operands.

Masm :: Test2.asm(111) : error A2085: instruction or register not accepted in current CPU mode.

from this code segment:

  FILD aa
  FILD cc
  FCOMI
  jng  Label



Line #111 is the FCOMI instruction.
I have tried:
FCOMI ST(1)
FCOMI ST(0), ST(1)

Any ideas what this is all about ?
Title: Re: FPU status word
Post by: jj2007 on February 02, 2012, 10:12:59 PM
Quote from: SteveAsm on February 02, 2012, 10:09:13 PM
Masm :: Test2.asm(111) : error A2085: instruction or register not accepted in current CPU mode.

include \masm32\include\masm32rt.inc
.686
Title: Re: FPU status word
Post by: dedndave on February 02, 2012, 10:21:33 PM
also, FILD will only work with words, dwords, or qwords
you may need to use a size override operator
SomeStuff db 0,0,0,0,0,0,0,80h
;
;
       FILD qword ptr SomeStuff
Title: Re: FPU status word
Post by: jj2007 on February 02, 2012, 10:36:28 PM
Quote from: dedndave on February 02, 2012, 10:21:33 PM
also, FILD will only work with words, dwords, or qwords

FILD will should only work with words, dwords, or qwords

include \masm32\include\masm32rt.inc

.code
SomeStuff dq 1234567

start:
       FLD qword ptr SomeStuff  ; try FILD
       push eax
       fistp dword ptr [esp]
       pop eax
       print str$(eax)
       exit

end start


Of course, the result is gibberish :bg
Title: Re: FPU status word
Post by: SteveAsm on February 02, 2012, 11:41:01 PM
Quote from: jj2007 on February 02, 2012, 10:12:59 PM
include \masm32\include\masm32rt.inc
.686

Aha!
.686
this is the key.
Nowhere in my search results was that illustrated.

I had tried everything but .686 .

Thanks for that JJ,
Thanks once again guys.
Title: Re: FPU status word
Post by: raymond on February 03, 2012, 01:35:18 AM
Quote.686
this is the key.
Nowhere in my search results was that illustrated.

If you had looked at the first link given to you by dedndave, you would have found this:

QuoteNote: This instruction is valid only for the Pentium Pro and subsequent processors. It may not be supported by some assemblers (for MASM, the .686 directive must be used).
Title: Re: FPU status word
Post by: SteveAsm on February 03, 2012, 04:37:19 PM
Quote from: raymond on February 03, 2012, 01:35:18 AM
If you had looked at the first link given to you by dedndave, you would have found this:

QuoteNote: This instruction is valid only for the Pentium Pro and subsequent processors. It may not be supported by some assemblers (for MASM, the .686 directive must be used).

I don't want to appear antagonsitic, Ray, but I did look at Dave's first link.
There is a lot of information there, and I simply missed the Note.
Title: Re: FPU status word
Post by: SteveAsm on February 03, 2012, 04:44:04 PM
After spending the entire afternoon and evening playing with FCOMI, it doesn't appear that it solves any problems.
It eliminates a few steps, but, the result is still the same.
You are still limited to only a few branching options, as most of the x86 branch instruction are unusable.
Title: Re: FPU status word
Post by: qWord on February 03, 2012, 04:56:52 PM
Quote from: SteveAsm on February 03, 2012, 04:44:04 PMYou are still limited to only a few branching options, as most of the x86 branch instruction are unusable.

Which instructions are unusable?
Title: Re: FPU status word
Post by: jj2007 on February 03, 2012, 05:27:24 PM
Quote from: SteveAsm on February 03, 2012, 04:44:04 PMYou are still limited to only a few branching options, as most of the x86 branch instruction are unusable.

There should only be a few, actually. The FPU holds by default REAL10 values, and when you compare two of them, there are three options:
- bigger
- equal
- smaller
All the rest makes sense only in a reg32 context (carry, unsigned etc). Correct me if I am wrong.
By the way, you can always use
num1 REAL8 123.456
num2 REAL8 123.455
fld num1
fld num2
push eax
fistp num1
pop eax
push edx
fistp num1
pop edx
cmp eax, edx

to do your reg32-style conditional jumps...
Title: Re: FPU status word
Post by: dedndave on February 03, 2012, 05:28:43 PM
you must understand that the CPU has several instructions that are context-dependant
for the FPU, comparisons are always signed - e.g., there is only one context
Title: Re: FPU status word
Post by: jj2007 on February 03, 2012, 05:42:58 PM
Complete example, since we are in The Campus :bg

include \masm32\include\masm32rt.inc
.686
.data
num5 REAL8 123.5
num4 REAL8 123.4
num4e REAL8 123.4
num3 REAL8 123.3

.code
start: fld num4
fld num5
print "num5 is "
fcomi st, st(1)
.if Zero?
print "equal to num4", 13, 10
.elseif Carry?
print "lower than num4", 13, 10
.else
print "higher than num4", 13, 10
.endif
fstp st
fstp st

fld num4
fld num3
print "num3 is "
fcomi st, st(1)
.if Zero?
print "equal to num4", 13, 10
.elseif Carry?
print "lower than num4", 13, 10
.else
print "higher than num4", 13, 10
.endif
fstp st
fstp st

fld num4e
fld num4
print "num4 is "
fcomi st, st(1)
.if Zero?
print "equal to num4e", 13, 10
.elseif Carry?
print "lower than num4e", 13, 10
.else
print "higher than num4e", 13, 10
.endif
fstp st
fstp st

inkey " ", 13, 10
exit
end start


Watch out for precision problems - "equal" means that all 80 bits are equal. MasmBasic users may use the low, medium, high and top precision flag.

include \masm32\MasmBasic\MasmBasic.inc   ; download (http://www.masm32.com/board/index.php?topic=12460)
.data
num5   REAL8 123.5
num4   REAL8 123.4

   Init

   Fcmp num4, num5, low
   .if Carry?
      Print Str$("num4 at %f is lower than num5\n", num4)
   .elseif Zero?
      Print Str$("num4 at %f is equal to num5\n", num4)
   .else
      Print Str$("num4 at %f is higher than num5\n", num4)
   .endif

   Fcmp num4, num5, medium
   .if Carry?
      Print Str$("num4 at %f is lower than num5\n", num4)
   .elseif Zero?
      Print Str$("num4 at %f is equal to num5\n", num4)
   .else
      Print Str$("num4 at %f is higher than num5\n", num4)
   .endif

   Inkey
   Exit
end start

num4 at 123.4000 is equal to num5
num4 at 123.4000 is lower than num5
Title: Re: FPU status word
Post by: dedndave on February 03, 2012, 06:36:03 PM
Jochen's example brings something to mind....

comparing floating point values is really not a simple subject   :P
things like precision, rounding, and epsilon can come into play
it depends entirely on the application
if you are comparing values of currency, you will use different code than if you are calculating pixels to fill a circle  :P

on that note, i did a little google'ing and came across this article that you may find helpful

http://www.cprogramming.com/tutorial/floating_point/understanding_floating_point_representation.html
Title: Re: FPU status word
Post by: SteveAsm on February 03, 2012, 06:58:43 PM
QuoteThere should only be a few, actually.

This is what I mean:
- bigger
- equal
- smaller

Not all the normal jump instruction work based on those three conditions.
My application, based on the conditions, uses all of these:
  je, jne,... jl, jnl,... jg, jng,... jle, jnle,... jge, jnge,... jz, jnz


Quotecomparing floating point values is really not a simple subject.
things like precision, rounding, and epsilon can come into play.
it depends entirely on the application.

Yes, as JJ has pointed out with his example, it is complex and in no way simple.
Title: Re: FPU status word
Post by: raymond on February 03, 2012, 07:58:36 PM
QuoteI don't want to appear antagonsitic, Ray, but I did look at Dave's first link.

My apology if I sounded the wrong way. It was certainly not intended. I may just have assumed too much from your "Nowhere in my search results was that illustrated".

As for conditional jumps, the FPU always does signed comparisons but never modifies the SF sign flag; only the ZF zero flag and CF carry flag get modified. (The PF parity flag may also be modified but for a totally different reason than by the CPU.)

The jl and jg mnemonics (and their variants) which rely on the SF should therefore not be used after FPU comparisons. This still leaves a majority of the jxxx mnemonics relying only on the CF and ZF flags (or combinations) which can be used with FPU comparisons.

Title: Re: FPU status word
Post by: SteveAsm on February 03, 2012, 09:47:00 PM
Quote from: raymond on February 03, 2012, 07:58:36 PM
My apology if I sounded the wrong way. It was certainly not intended.

No, please accept my apologies.
Sometimes when I get frustrated, I tend to google search with blinders on.
I should have explained my self better.

Quote
The jl and jg mnemonics (and their variants) which rely on the SF should therefore not be used after FPU comparisons.
This still leaves a majority of the jxxx mnemonics relying only on the CF and ZF flags (or combinations) which can be used with FPU comparisons.

Okay, I was quite focused on using the jl, jg and variants.
Now I see which ones to stay away from.
The reference materials I have don't explain which groups of jxxx instructions can and can't be used with FPU comparisons.
Thanks
Title: Re: FPU status word
Post by: qWord on February 03, 2012, 09:56:48 PM
Quote from: SteveAsm on February 03, 2012, 09:47:00 PMThe reference materials I have don't explain which groups of jxxx instructions can and can't be used with FPU comparisons.
you should use Intel's and AMD's documentation as reference:
Intel® 64 and IA-32 Architectures Software Developer Manuals (http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
AMD: Developer Guides & Manuals  (http://developer.amd.com/documentation/guides/Pages/default.aspx#manuals)
Title: Re: FPU status word
Post by: dedndave on February 04, 2012, 01:59:04 AM
from Ray's tutorial, ch 7...
The following example hard-codes the instruction for comparing ST(0) to ST(2).

db   0dbh, 0f0h+2  ;encoding for fcomi st,st(2)
                    ;when not supported by the assembler
fwait              ;insure the instruction is completed
jpe  error_handler ;the comparison was indeterminate
                    ;this condition should be verified first
                    ;then only two of the next three conditional jumps
                    ;should become necessary, in whatever order is preferred,
                    ;the third one being replaced by code to handle that case
ja  st0_greater    ;when all flags are 0
jb  st0_lower      ;only the CF flag would be set if no error
jz  both_equal     ;only the ZF flag would be set if no error

Title: Re: FPU status word
Post by: jj2007 on February 07, 2012, 08:58:31 PM
Currently playing with a new Fcmp routine with top, high, medium and low precision. It seems to work but more tests needed :bg

Fcmp sets Sign and Zero flags. Approximate precision (Real10/8/4/x):
top=19 digits, high=15, medium=7, low=4; default = medium, 7 digits

The table below stands for a comparison of 996 ... 1004 against 1000. # means equal. Source attached, requires the MasmBasic library (http://www.masm32.com/board/index.php?topic=12460.0).

Ref     1234.56789012345678      tp 19 hi 15 me 7  lo 4  default
28      996.000000000000000      <  <  <  <  <  <  <  <  <
27      999.000000000000000      <  <  <  <  <  <  #  #  <
26      999.900000000000000      <  <  <  <  <  <  #  #  <
25      999.990000000000000      <  <  <  <  <  <  #  #  <
24      999.999000000000000      <  <  <  <  #  #  #  #  #
23      999.999900000000000      <  <  <  <  #  #  #  #  #
22      999.999999000000000      <  <  <  <  #  #  #  #  #
21      999.999999990000000      <  <  <  <  #  #  #  #  #
20      999.999999999900000      <  <  <  <  #  #  #  #  #
19      999.999999999990000      <  <  #  #  #  #  #  #  #
18      999.999999999999000      <  <  #  #  #  #  #  #  #
17      999.999999999999900      <  <  #  #  #  #  #  #  #
16      999.999999999999990      <  <  #  #  #  #  #  #  #
15      999.999999999999999      <  <  #  #  #  #  #  #  #
14      1000.000000000000000     #  #  #  #  #  #  #  #  #
13      1000.00000000000000      >  >  #  #  #  #  #  #  #
12      1000.00000000000001      >  >  #  #  #  #  #  #  #
11      1000.00000000000010      >  >  #  #  #  #  #  #  #
10      1000.00000000000100      >  >  #  #  #  #  #  #  #
9       1000.00000000001000      >  >  #  #  #  #  #  #  #
8       1000.00000000010000      >  >  >  >  #  #  #  #  #
7       1000.00000001000000      >  >  >  >  #  #  #  #  #
6       1000.00000100000000      >  >  >  >  #  #  #  #  #
5       1000.00010000000000      >  >  >  >  #  #  #  #  #
4       1000.00100000000000      >  >  >  >  #  #  #  #  #
3       1000.01000000000000      >  >  >  >  >  >  #  #  >
2       1000.10000000000000      >  >  >  >  >  >  #  #  >
1       1001.00000000000000      >  >  >  >  >  >  #  #  >
0       1004.00000000000000      >  >  >  >  >  >  >  >  >
Ref     1234.56789012345678      tp 19 hi 15 me 7  lo 4  default

Comparing PI, high precision:
MyPI_hi         at 3.14159265358980000 is exact
MyPIexact       at 3.14159265358979324 is exact
MyPI_low        at 3.14159265358978000 is exact

Comparing PI, top precision:
MyPI_hi         at 3.14159265358980000 is higher than the real PI
MyPIexact       at 3.14159265358979324 is exact
MyPI_low        at 3.14159265358978000 is lower than the real PI
Title: Floating point comparison: timings
Post by: jj2007 on February 09, 2012, 08:56:16 AM
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
5       cycles for 10*cmp       5 ms    for 10000000 comparisons
1307    cycles for 10*Fcmp      389 ms  for 10000000 comparisons
5       cycles for 10*cmp       5 ms    for 10000000 comparisons
1317    cycles for 10*Fcmp      392 ms  for 10000000 comparisons


      REPEAT 2
         Fcmp v1, v2   ; Real4 vs Real8
         nop
         Fcmp eax, v3   ; reg32 vs Real10
         nop
         Fcmp v4, ecx   ; QWord vs reg32
         nop
         Fcmp xmm0, v5   ; xmm vs REAL4
         nop
         Fcmp eax, xmm1   ; reg32 vs xmm
         nop
      ENDM
Title: Re: FPU status word
Post by: raymond on February 09, 2012, 08:09:03 PM
QuoteThe FPU holds by default REAL10 values

Just noticed this in this thread. Let's clarify this a bit to prevent newbies from interpreting this wrongly.

FPU data registers are designed to hold REAL10 values, similar to the CPU's general purpose registers are designed to hold 32-bit values. The actual value in any of the FPU's data registers depends on what is loaded into them and/or under what conditions they have been modified.

At least under Windows, the FPU's precision control is set to REAL8 at the opening of any program. If the program needs REAL10, it must change the precision control before performing any operation.

To be more precise, the statement should thus have been:

The FPU by default holds values in the REAL10 format, but those values are not necessarily in the REAL10 precision.
Title: Re: FPU status word
Post by: jj2007 on February 09, 2012, 09:39:32 PM
That is an interesting point, Raymond. So accordingly, if the FPU is set to REAL4 accuracy, and you use fldpi, the FPU holds a REAL4 crippled value of PI instead of 3.1415926535897932380 aka 4000C90FDAA22168C235h?

If that is the case, then Olly seems to have a bug, because it claims that the FPU holds always the same REAL10 value, irrespective of the precision at the time of loading ::)

Of course, if you use the Fcmp macro to compare a REAL4 with a REAL8 (e.g. xmm0) variable, the accuracy of the comparison depends on the weaker partner.
Title: Re: FPU status word
Post by: dedndave on February 10, 2012, 01:25:42 AM
don't forget - FINIT sets it to real10   :P
Title: Re: FPU status word
Post by: raymond on February 10, 2012, 01:58:17 AM
That is not what I stated. If you load one of the hard coded constants (such as pi) from the FPU, it will get loaded with its full REAL10 value. If you immediately save that value without modification as a REAL10, it will be saved with its full REAL10 precision, regardless of the precision control. You are saving an image of the data register without any conversion.

However, if you compute the value of 1/3 with the precision control set to REAL8, the data register will contain a truncated value in REAL10 format. And, even if you save it as a REAL10, the saved value will still be that truncated value in REAL10 format.

Thus, if you compute something (apart from a multiple of 1/2) with the precision control set to REAL8 and save it as a REAL10,
then you compute the identical something with the precision control set to REAL10 and also save it as a REAL10,
then compare those two values with REAL10 precision control, they will NOT be identical. They will not even be identical with the precision control set to REAL8 if you load the saved REAL10 values for comparison.
Title: Re: FPU status word
Post by: jj2007 on February 10, 2012, 06:08:13 AM
So can we agree that if you load two whatever values with fld (or fild), the FPU holds them as REAL10, and if you save them to two locations in memory as REAL10, they can be compared correctly, showing eventually that 12345.6789 is not equal if one is REAl4 and the other is REAL8...?
Title: Re: FPU status word
Post by: raymond on February 10, 2012, 05:11:27 PM
The FPU loads and holds them in REAL10 format, but not necessarily in REAL10 precision. This may seem as playing on words but it is an important difference.

I do agree that if 12345.6789 was stored as a REAL4 value in memory (regardless of the precision at which it was generated) that it would be different from a 12345.6789 value generated as a REAL8 (or REAL10) before being stored in memory as a REAL8 (or REAL10).

If you load both of them, you can see with Ollydbg that they are different. And, if you now save both of them as REAL10 and look at the memory locations where they are stored (or print them with 15 significant digits), you would observe that they are slightly different.

HOWEVER, if you load the REAL4 value of 12345.6789 and store it as a REAL8 or REAL10 (thus not having been generated in such precision), it should not be any different than the REAL4 value when reloaded onto the FPU in REAL10 format.

Although the difference would be considerably smaller, values generated in REAL 10 precision but stored as REAL8 and REAL10 would be different when reloaded to the FPU. The REAL8 would have lost 8 bits of precision (out of 64) and rounded up or down based on the most significant bit lost.
Title: Re: FPU status word
Post by: MichaelW on February 10, 2012, 05:34:54 PM
Quote from: raymond on February 10, 2012, 05:11:27 PM
Although the difference would be considerably smaller, values generated in REAL 10 precision but stored as REAL8 and REAL10 would be different when reloaded to the FPU. The REAL8 would have lost 8 bits of precision (out of 64) and rounded up or down based on the most significant bit lost.

Isn't that 64 - 53 bits of precision lost?



Title: Re: FPU status word
Post by: raymond on February 10, 2012, 05:58:03 PM
My bad. :red :eek :red You loose 10 bits of precision (out of 64).
It would not be 11 (64-53) because one bit is implied in the REAL4 and REAL8 formats.
Title: Re: FPU status word
Post by: jj2007 on February 10, 2012, 07:35:29 PM
Quote from: raymond on February 10, 2012, 05:11:27 PMHOWEVER, if you load the REAL4 value of 12345.6789 and store it as a REAL8 or REAL10 (thus not having been generated in such precision), it should not be any different than the REAL4 value when reloaded onto the FPU in REAL10 format.

Raymond,
Thank you for your efforts to bring clarity in this tricky business. What I am claiming is that the FPU does not change what it gets when loading and storing a value, regardless of the precision set by the FPU control word. Below is a practical example - source attached, and sorry that it needs the MasmBasic version of today (I added the SetFpu macro).

Of course, if you perform operations (fadd, fmul, ...) with the loaded values, the result would depend on the precision. But that is not the case for the comparison algo.

Digits: 1234.567890123456789
MyR4=   1000.000000000000000
MyR8=   1000.00000000000011
MyR10=  999.999999999999999

64 bit precision, Fcmp 'top' :
MyR4 is lower than MyR8
MyR4 is higher than MyR10

53 bit precision:
MyR4 is lower than MyR8
MyR4 is higher than MyR10

24 bit precision:
MyR4 is lower than MyR8
MyR4 is higher than MyR10

64 bit precision, Fcmp 'high' :
MyR4 and MyR8 are equal
MyR4 and MyR10 are equal

53 bit precision:
MyR4 and MyR8 are equal
MyR4 and MyR10 are equal

24 bit precision:
MyR4 and MyR8 are equal
MyR4 and MyR10 are equal