News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

FPU status word

Started by SteveAsm, February 02, 2012, 01:18:30 AM

Previous topic - Next topic

jj2007

Quote from: SteveAsm on February 03, 2012, 04:44:04 PMYou are still limited to only a few branching options, as most of the x86 branch instruction are unusable.

There should only be a few, actually. The FPU holds by default REAL10 values, and when you compare two of them, there are three options:
- bigger
- equal
- smaller
All the rest makes sense only in a reg32 context (carry, unsigned etc). Correct me if I am wrong.
By the way, you can always use
num1 REAL8 123.456
num2 REAL8 123.455
fld num1
fld num2
push eax
fistp num1
pop eax
push edx
fistp num1
pop edx
cmp eax, edx

to do your reg32-style conditional jumps...

dedndave

you must understand that the CPU has several instructions that are context-dependant
for the FPU, comparisons are always signed - e.g., there is only one context

jj2007

Complete example, since we are in The Campus :bg

include \masm32\include\masm32rt.inc
.686
.data
num5 REAL8 123.5
num4 REAL8 123.4
num4e REAL8 123.4
num3 REAL8 123.3

.code
start: fld num4
fld num5
print "num5 is "
fcomi st, st(1)
.if Zero?
print "equal to num4", 13, 10
.elseif Carry?
print "lower than num4", 13, 10
.else
print "higher than num4", 13, 10
.endif
fstp st
fstp st

fld num4
fld num3
print "num3 is "
fcomi st, st(1)
.if Zero?
print "equal to num4", 13, 10
.elseif Carry?
print "lower than num4", 13, 10
.else
print "higher than num4", 13, 10
.endif
fstp st
fstp st

fld num4e
fld num4
print "num4 is "
fcomi st, st(1)
.if Zero?
print "equal to num4e", 13, 10
.elseif Carry?
print "lower than num4e", 13, 10
.else
print "higher than num4e", 13, 10
.endif
fstp st
fstp st

inkey " ", 13, 10
exit
end start


Watch out for precision problems - "equal" means that all 80 bits are equal. MasmBasic users may use the low, medium, high and top precision flag.

include \masm32\MasmBasic\MasmBasic.inc   ; download
.data
num5   REAL8 123.5
num4   REAL8 123.4

   Init

   Fcmp num4, num5, low
   .if Carry?
      Print Str$("num4 at %f is lower than num5\n", num4)
   .elseif Zero?
      Print Str$("num4 at %f is equal to num5\n", num4)
   .else
      Print Str$("num4 at %f is higher than num5\n", num4)
   .endif

   Fcmp num4, num5, medium
   .if Carry?
      Print Str$("num4 at %f is lower than num5\n", num4)
   .elseif Zero?
      Print Str$("num4 at %f is equal to num5\n", num4)
   .else
      Print Str$("num4 at %f is higher than num5\n", num4)
   .endif

   Inkey
   Exit
end start

num4 at 123.4000 is equal to num5
num4 at 123.4000 is lower than num5

dedndave

Jochen's example brings something to mind....

comparing floating point values is really not a simple subject   :P
things like precision, rounding, and epsilon can come into play
it depends entirely on the application
if you are comparing values of currency, you will use different code than if you are calculating pixels to fill a circle  :P

on that note, i did a little google'ing and came across this article that you may find helpful

http://www.cprogramming.com/tutorial/floating_point/understanding_floating_point_representation.html

SteveAsm

QuoteThere should only be a few, actually.

This is what I mean:
- bigger
- equal
- smaller

Not all the normal jump instruction work based on those three conditions.
My application, based on the conditions, uses all of these:
  je, jne,... jl, jnl,... jg, jng,... jle, jnle,... jge, jnge,... jz, jnz


Quotecomparing floating point values is really not a simple subject.
things like precision, rounding, and epsilon can come into play.
it depends entirely on the application.

Yes, as JJ has pointed out with his example, it is complex and in no way simple.

raymond

QuoteI don't want to appear antagonsitic, Ray, but I did look at Dave's first link.

My apology if I sounded the wrong way. It was certainly not intended. I may just have assumed too much from your "Nowhere in my search results was that illustrated".

As for conditional jumps, the FPU always does signed comparisons but never modifies the SF sign flag; only the ZF zero flag and CF carry flag get modified. (The PF parity flag may also be modified but for a totally different reason than by the CPU.)

The jl and jg mnemonics (and their variants) which rely on the SF should therefore not be used after FPU comparisons. This still leaves a majority of the jxxx mnemonics relying only on the CF and ZF flags (or combinations) which can be used with FPU comparisons.

When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

SteveAsm

Quote from: raymond on February 03, 2012, 07:58:36 PM
My apology if I sounded the wrong way. It was certainly not intended.

No, please accept my apologies.
Sometimes when I get frustrated, I tend to google search with blinders on.
I should have explained my self better.

Quote
The jl and jg mnemonics (and their variants) which rely on the SF should therefore not be used after FPU comparisons.
This still leaves a majority of the jxxx mnemonics relying only on the CF and ZF flags (or combinations) which can be used with FPU comparisons.

Okay, I was quite focused on using the jl, jg and variants.
Now I see which ones to stay away from.
The reference materials I have don't explain which groups of jxxx instructions can and can't be used with FPU comparisons.
Thanks

qWord

Quote from: SteveAsm on February 03, 2012, 09:47:00 PMThe reference materials I have don't explain which groups of jxxx instructions can and can't be used with FPU comparisons.
you should use Intel's and AMD's documentation as reference:
Intel® 64 and IA-32 Architectures Software Developer Manuals
AMD: Developer Guides & Manuals
FPU in a trice: SmplMath
It's that simple!

dedndave

from Ray's tutorial, ch 7...
The following example hard-codes the instruction for comparing ST(0) to ST(2).

db   0dbh, 0f0h+2  ;encoding for fcomi st,st(2)
                    ;when not supported by the assembler
fwait              ;insure the instruction is completed
jpe  error_handler ;the comparison was indeterminate
                    ;this condition should be verified first
                    ;then only two of the next three conditional jumps
                    ;should become necessary, in whatever order is preferred,
                    ;the third one being replaced by code to handle that case
ja  st0_greater    ;when all flags are 0
jb  st0_lower      ;only the CF flag would be set if no error
jz  both_equal     ;only the ZF flag would be set if no error


jj2007

#24
Currently playing with a new Fcmp routine with top, high, medium and low precision. It seems to work but more tests needed :bg

Fcmp sets Sign and Zero flags. Approximate precision (Real10/8/4/x):
top=19 digits, high=15, medium=7, low=4; default = medium, 7 digits

The table below stands for a comparison of 996 ... 1004 against 1000. # means equal. Source attached, requires the MasmBasic library.

Ref     1234.56789012345678      tp 19 hi 15 me 7  lo 4  default
28      996.000000000000000      <  <  <  <  <  <  <  <  <
27      999.000000000000000      <  <  <  <  <  <  #  #  <
26      999.900000000000000      <  <  <  <  <  <  #  #  <
25      999.990000000000000      <  <  <  <  <  <  #  #  <
24      999.999000000000000      <  <  <  <  #  #  #  #  #
23      999.999900000000000      <  <  <  <  #  #  #  #  #
22      999.999999000000000      <  <  <  <  #  #  #  #  #
21      999.999999990000000      <  <  <  <  #  #  #  #  #
20      999.999999999900000      <  <  <  <  #  #  #  #  #
19      999.999999999990000      <  <  #  #  #  #  #  #  #
18      999.999999999999000      <  <  #  #  #  #  #  #  #
17      999.999999999999900      <  <  #  #  #  #  #  #  #
16      999.999999999999990      <  <  #  #  #  #  #  #  #
15      999.999999999999999      <  <  #  #  #  #  #  #  #
14      1000.000000000000000     #  #  #  #  #  #  #  #  #
13      1000.00000000000000      >  >  #  #  #  #  #  #  #
12      1000.00000000000001      >  >  #  #  #  #  #  #  #
11      1000.00000000000010      >  >  #  #  #  #  #  #  #
10      1000.00000000000100      >  >  #  #  #  #  #  #  #
9       1000.00000000001000      >  >  #  #  #  #  #  #  #
8       1000.00000000010000      >  >  >  >  #  #  #  #  #
7       1000.00000001000000      >  >  >  >  #  #  #  #  #
6       1000.00000100000000      >  >  >  >  #  #  #  #  #
5       1000.00010000000000      >  >  >  >  #  #  #  #  #
4       1000.00100000000000      >  >  >  >  #  #  #  #  #
3       1000.01000000000000      >  >  >  >  >  >  #  #  >
2       1000.10000000000000      >  >  >  >  >  >  #  #  >
1       1001.00000000000000      >  >  >  >  >  >  #  #  >
0       1004.00000000000000      >  >  >  >  >  >  >  >  >
Ref     1234.56789012345678      tp 19 hi 15 me 7  lo 4  default

Comparing PI, high precision:
MyPI_hi         at 3.14159265358980000 is exact
MyPIexact       at 3.14159265358979324 is exact
MyPI_low        at 3.14159265358978000 is exact

Comparing PI, top precision:
MyPI_hi         at 3.14159265358980000 is higher than the real PI
MyPIexact       at 3.14159265358979324 is exact
MyPI_low        at 3.14159265358978000 is lower than the real PI

jj2007

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
5       cycles for 10*cmp       5 ms    for 10000000 comparisons
1307    cycles for 10*Fcmp      389 ms  for 10000000 comparisons
5       cycles for 10*cmp       5 ms    for 10000000 comparisons
1317    cycles for 10*Fcmp      392 ms  for 10000000 comparisons


      REPEAT 2
         Fcmp v1, v2   ; Real4 vs Real8
         nop
         Fcmp eax, v3   ; reg32 vs Real10
         nop
         Fcmp v4, ecx   ; QWord vs reg32
         nop
         Fcmp xmm0, v5   ; xmm vs REAL4
         nop
         Fcmp eax, xmm1   ; reg32 vs xmm
         nop
      ENDM

raymond

QuoteThe FPU holds by default REAL10 values

Just noticed this in this thread. Let's clarify this a bit to prevent newbies from interpreting this wrongly.

FPU data registers are designed to hold REAL10 values, similar to the CPU's general purpose registers are designed to hold 32-bit values. The actual value in any of the FPU's data registers depends on what is loaded into them and/or under what conditions they have been modified.

At least under Windows, the FPU's precision control is set to REAL8 at the opening of any program. If the program needs REAL10, it must change the precision control before performing any operation.

To be more precise, the statement should thus have been:

The FPU by default holds values in the REAL10 format, but those values are not necessarily in the REAL10 precision.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com

jj2007

That is an interesting point, Raymond. So accordingly, if the FPU is set to REAL4 accuracy, and you use fldpi, the FPU holds a REAL4 crippled value of PI instead of 3.1415926535897932380 aka 4000C90FDAA22168C235h?

If that is the case, then Olly seems to have a bug, because it claims that the FPU holds always the same REAL10 value, irrespective of the precision at the time of loading ::)

Of course, if you use the Fcmp macro to compare a REAL4 with a REAL8 (e.g. xmm0) variable, the accuracy of the comparison depends on the weaker partner.

dedndave

don't forget - FINIT sets it to real10   :P

raymond

That is not what I stated. If you load one of the hard coded constants (such as pi) from the FPU, it will get loaded with its full REAL10 value. If you immediately save that value without modification as a REAL10, it will be saved with its full REAL10 precision, regardless of the precision control. You are saving an image of the data register without any conversion.

However, if you compute the value of 1/3 with the precision control set to REAL8, the data register will contain a truncated value in REAL10 format. And, even if you save it as a REAL10, the saved value will still be that truncated value in REAL10 format.

Thus, if you compute something (apart from a multiple of 1/2) with the precision control set to REAL8 and save it as a REAL10,
then you compute the identical something with the precision control set to REAL10 and also save it as a REAL10,
then compare those two values with REAL10 precision control, they will NOT be identical. They will not even be identical with the precision control set to REAL8 if you load the saved REAL10 values for comparison.
When you assume something, you risk being wrong half the time
http://www.ray.masmcode.com