FPU status word

jj2007 · February 03, 2012, 05:27:24 PM

Quote from: SteveAsm on February 03, 2012, 04:44:04 PMYou are still limited to only a few branching options, as most of the x86 branch instruction are unusable.

There should only be a few, actually. The FPU holds by default REAL10 values, and when you compare two of them, there are three options:
- bigger
- equal
- smaller
All the rest makes sense only in a reg32 context (carry, unsigned etc). Correct me if I am wrong.
By the way, you can always use

Code Select

num1	REAL8 123.456
num2	REAL8 123.455
	fld num1
	fld num2
	push eax
	fistp num1
	pop eax
	push edx
	fistp num1
	pop edx
	cmp eax, edx

to do your reg32-style conditional jumps...

dedndave · February 03, 2012, 05:28:43 PM

you must understand that the CPU has several instructions that are context-dependant
for the FPU, comparisons are always signed - e.g., there is only one context

jj2007 · February 03, 2012, 05:42:58 PM

Complete example, since we are in The Campus :bg

Code Select

include \masm32\include\masm32rt.inc
.686
.data
num5	REAL8 123.5
num4	REAL8 123.4
num4e	REAL8 123.4
num3	REAL8 123.3

.code
start:	fld num4
	fld num5
	print "num5 is "
	fcomi st, st(1)
	.if Zero?
		print "equal to num4", 13, 10
	.elseif Carry?
		print "lower than num4", 13, 10
	.else
		print "higher than num4", 13, 10
	.endif
	fstp st
	fstp st

	fld num4
	fld num3
	print "num3 is "
	fcomi st, st(1)
	.if Zero?
		print "equal to num4", 13, 10
	.elseif Carry?
		print "lower than num4", 13, 10
	.else
		print "higher than num4", 13, 10
	.endif
	fstp st
	fstp st

	fld num4e
	fld num4
	print "num4 is "
	fcomi st, st(1)
	.if Zero?
		print "equal to num4e", 13, 10
	.elseif Carry?
		print "lower than num4e", 13, 10
	.else
		print "higher than num4e", 13, 10
	.endif
	fstp st
	fstp st

	inkey " ", 13, 10
	exit
end start

Watch out for precision problems - "equal" means that all 80 bits are equal. MasmBasic users may use the low, medium, high and top precision flag.

include \masm32\MasmBasic\MasmBasic.inc   ; download
.data
num5   REAL8 123.5
num4   REAL8 123.4

   Init

   Fcmp num4, num5, low
   .if Carry?
      Print Str$("num4 at %f is lower than num5\n", num4)
   .elseif Zero?
      Print Str$("num4 at %f is equal to num5\n", num4)
   .else
      Print Str$("num4 at %f is higher than num5\n", num4)
   .endif

   Fcmp num4, num5, medium
   .if Carry?
      Print Str$("num4 at %f is lower than num5\n", num4)
   .elseif Zero?
      Print Str$("num4 at %f is equal to num5\n", num4)
   .else
      Print Str$("num4 at %f is higher than num5\n", num4)
   .endif

   Inkey
   Exit
end start

Code Select

num4 at 123.4000 is equal to num5
num4 at 123.4000 is lower than num5

dedndave · February 03, 2012, 06:36:03 PM

Jochen's example brings something to mind....

comparing floating point values is really not a simple subject :P
things like precision, rounding, and epsilon can come into play
it depends entirely on the application
if you are comparing values of currency, you will use different code than if you are calculating pixels to fill a circle :P

on that note, i did a little google'ing and came across this article that you may find helpful

http://www.cprogramming.com/tutorial/floating_point/understanding_floating_point_representation.html

SteveAsm · February 03, 2012, 06:58:43 PM

QuoteThere should only be a few, actually.

This is what I mean:
- bigger
- equal
- smaller

Not all the normal jump instruction work based on those three conditions.
My application, based on the conditions, uses all of these:
je, jne,... jl, jnl,... jg, jng,... jle, jnle,... jge, jnge,... jz, jnz

Quotecomparing floating point values is really not a simple subject.
things like precision, rounding, and epsilon can come into play.
it depends entirely on the application.

Yes, as JJ has pointed out with his example, it is complex and in no way simple.

raymond · February 03, 2012, 07:58:36 PM

QuoteI don't want to appear antagonsitic, Ray, but I did look at Dave's first link.

My apology if I sounded the wrong way. It was certainly not intended. I may just have assumed too much from your "Nowhere in my search results was that illustrated".

As for conditional jumps, the FPU always does signed comparisons but never modifies the SF sign flag; only the ZF zero flag and CF carry flag get modified. (The PF parity flag may also be modified but for a totally different reason than by the CPU.)

The jl and jg mnemonics (and their variants) which rely on the SF should therefore not be used after FPU comparisons. This still leaves a majority of the jxxx mnemonics relying only on the CF and ZF flags (or combinations) which can be used with FPU comparisons.

SteveAsm · February 03, 2012, 09:47:00 PM

Quote from: raymond on February 03, 2012, 07:58:36 PM
My apology if I sounded the wrong way. It was certainly not intended.

No, please accept my apologies.
Sometimes when I get frustrated, I tend to google search with blinders on.
I should have explained my self better.

Quote
The jl and jg mnemonics (and their variants) which rely on the SF should therefore not be used after FPU comparisons.
This still leaves a majority of the jxxx mnemonics relying only on the CF and ZF flags (or combinations) which can be used with FPU comparisons.

Okay, I was quite focused on using the jl, jg and variants.
Now I see which ones to stay away from.
The reference materials I have don't explain which groups of jxxx instructions can and can't be used with FPU comparisons.
Thanks

qWord · February 03, 2012, 09:56:48 PM

Quote from: SteveAsm on February 03, 2012, 09:47:00 PMThe reference materials I have don't explain which groups of jxxx instructions can and can't be used with FPU comparisons.

you should use Intel's and AMD's documentation as reference:
Intel® 64 and IA-32 Architectures Software Developer Manuals
AMD: Developer Guides & Manuals

dedndave · February 04, 2012, 01:59:04 AM

from Ray's tutorial, ch 7...

Code Select

The following example hard-codes the instruction for comparing ST(0) to ST(2). 

 db   0dbh, 0f0h+2  ;encoding for fcomi st,st(2)
                    ;when not supported by the assembler
 fwait              ;insure the instruction is completed
 jpe  error_handler ;the comparison was indeterminate
                    ;this condition should be verified first
                    ;then only two of the next three conditional jumps
                    ;should become necessary, in whatever order is preferred,
                    ;the third one being replaced by code to handle that case
 ja  st0_greater    ;when all flags are 0
 jb  st0_lower      ;only the CF flag would be set if no error
 jz  both_equal     ;only the ZF flag would be set if no error

jj2007 · February 07, 2012, 08:58:31 PM

Currently playing with a new Fcmp routine with top, high, medium and low precision. It seems to work ~~but more tests needed~~ :bg

Fcmp sets Sign and Zero flags. Approximate precision (Real10/8/4/x):
top=19 digits, high=15, medium=7, low=4; default = medium, 7 digits

The table below stands for a comparison of 996 ... 1004 against 1000. # means equal. Source attached, requires the MasmBasic library.

Code Select

Ref     1234.56789012345678      tp 19 hi 15 me 7  lo 4  default
28      996.000000000000000      <  <  <  <  <  <  <  <  <
27      999.000000000000000      <  <  <  <  <  <  #  #  <
26      999.900000000000000      <  <  <  <  <  <  #  #  <
25      999.990000000000000      <  <  <  <  <  <  #  #  <
24      999.999000000000000      <  <  <  <  #  #  #  #  #
23      999.999900000000000      <  <  <  <  #  #  #  #  #
22      999.999999000000000      <  <  <  <  #  #  #  #  #
21      999.999999990000000      <  <  <  <  #  #  #  #  #
20      999.999999999900000      <  <  <  <  #  #  #  #  #
19      999.999999999990000      <  <  #  #  #  #  #  #  #
18      999.999999999999000      <  <  #  #  #  #  #  #  #
17      999.999999999999900      <  <  #  #  #  #  #  #  #
16      999.999999999999990      <  <  #  #  #  #  #  #  #
15      999.999999999999999      <  <  #  #  #  #  #  #  #
14      1000.000000000000000     #  #  #  #  #  #  #  #  #
13      1000.00000000000000      >  >  #  #  #  #  #  #  #
12      1000.00000000000001      >  >  #  #  #  #  #  #  #
11      1000.00000000000010      >  >  #  #  #  #  #  #  #
10      1000.00000000000100      >  >  #  #  #  #  #  #  #
9       1000.00000000001000      >  >  #  #  #  #  #  #  #
8       1000.00000000010000      >  >  >  >  #  #  #  #  #
7       1000.00000001000000      >  >  >  >  #  #  #  #  #
6       1000.00000100000000      >  >  >  >  #  #  #  #  #
5       1000.00010000000000      >  >  >  >  #  #  #  #  #
4       1000.00100000000000      >  >  >  >  #  #  #  #  #
3       1000.01000000000000      >  >  >  >  >  >  #  #  >
2       1000.10000000000000      >  >  >  >  >  >  #  #  >
1       1001.00000000000000      >  >  >  >  >  >  #  #  >
0       1004.00000000000000      >  >  >  >  >  >  >  >  >
Ref     1234.56789012345678      tp 19 hi 15 me 7  lo 4  default

Comparing PI, high precision:
MyPI_hi         at 3.14159265358980000 is exact
MyPIexact       at 3.14159265358979324 is exact
MyPI_low        at 3.14159265358978000 is exact

Comparing PI, top precision:
MyPI_hi         at 3.14159265358980000 is higher than the real PI
MyPIexact       at 3.14159265358979324 is exact
MyPI_low        at 3.14159265358978000 is lower than the real PI

jj2007 · February 09, 2012, 08:56:16 AM

Code Select

Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
5       cycles for 10*cmp       5 ms    for 10000000 comparisons
1307    cycles for 10*Fcmp      389 ms  for 10000000 comparisons
5       cycles for 10*cmp       5 ms    for 10000000 comparisons
1317    cycles for 10*Fcmp      392 ms  for 10000000 comparisons

      REPEAT 2
         Fcmp v1, v2   ; Real4 vs Real8
         nop
         Fcmp eax, v3   ; reg32 vs Real10
         nop
         Fcmp v4, ecx   ; QWord vs reg32
         nop
         Fcmp xmm0, v5   ; xmm vs REAL4
         nop
         Fcmp eax, xmm1   ; reg32 vs xmm
         nop
      ENDM

raymond · February 09, 2012, 08:09:03 PM

QuoteThe FPU holds by default REAL10 values

Just noticed this in this thread. Let's clarify this a bit to prevent newbies from interpreting this wrongly.

FPU data registers are designed to hold REAL10 values, similar to the CPU's general purpose registers are designed to hold 32-bit values. The actual value in any of the FPU's data registers depends on what is loaded into them and/or under what conditions they have been modified.

At least under Windows, the FPU's precision control is set to REAL8 at the opening of any program. If the program needs REAL10, it must change the precision control before performing any operation.

To be more precise, the statement should thus have been:

The FPU by default holds values in the REAL10 format, but those values are not necessarily in the REAL10 precision.

jj2007 · February 09, 2012, 09:39:32 PM

That is an interesting point, Raymond. So accordingly, if the FPU is set to REAL4 accuracy, and you use fldpi, the FPU holds a REAL4 crippled value of PI instead of 3.1415926535897932380 aka 4000C90FDAA22168C235h?

If that is the case, then Olly seems to have a bug, because it claims that the FPU holds always the same REAL10 value, irrespective of the precision at the time of loading ::)

Of course, if you use the Fcmp macro to compare a REAL4 with a REAL8 (e.g. xmm0) variable, the accuracy of the comparison depends on the weaker partner.

dedndave · February 10, 2012, 01:25:42 AM

don't forget - FINIT sets it to real10 :P

raymond · February 10, 2012, 01:58:17 AM

That is not what I stated. If you load one of the hard coded constants (such as pi) from the FPU, it will get loaded with its full REAL10 value. If you immediately save that value without modification as a REAL10, it will be saved with its full REAL10 precision, regardless of the precision control. You are saving an image of the data register without any conversion.

However, if you compute the value of 1/3 with the precision control set to REAL8, the data register will contain a truncated value in REAL10 format. And, even if you save it as a REAL10, the saved value will still be that truncated value in REAL10 format.

Thus, if you compute something (apart from a multiple of 1/2) with the precision control set to REAL8 and save it as a REAL10,
then you compute the identical something with the precision control set to REAL10 and also save it as a REAL10,
then compare those two values with REAL10 precision control, they will NOT be identical. They will not even be identical with the precision control set to REAL8 if you load the saved REAL10 values for comparison.

News:

FPU status word