Hey guys,
I've been doing some experimenting, trying to determine what jump instructions work (sucessfully) based on storing the fpu status word in the AX register.
As in:
FLD avalue
FLD bvalue
FCOM
FSTSW AX
FWAIT
SAHF
; <------- some jump instruction here -----> jne, jnl, jng, jnle, jz, je, jl, jg, jle, jge, jnz, ...etc.
Testing leads to a lot of hair pulling..., when some things appear to work, under some circumstances, but, then dont under others.
I've searched the forum archives and done some googling, but, I'm not finding too much information.
Any ideas or pointers would be appreciated.
Thanks,
steve
2 things will come in handy
1) Ray's FPU tutorial
http://www.ray.masmcode.com/fpu.html
specifically...
http://www.ray.masmcode.com/tutorial/fpuchap7.htm
2) a table that shows which flags the conditional branch instructions test
http://www.arl.wustl.edu/~lockwood/class/cs306/books/artofasm/Chapter_6/CH06-5.html#HEADING5-226
you may better use fcomi[p], which directly set the flags -> JA/JB/...
Quote from: qWord on February 02, 2012, 08:23:12 AM
you may better use fcomi[p], which directly set the flags -> JA/JB/...
Hey qWord,
thanks for pointing that (FCOMI) out.
I didn't catch that in my searches the first time around and my reference book does not include it.
It looks like that could solve the problem,
IF most processors in use include it.
I'm not so concerned with the newer ones.
It's the older desktops, laptops and notebooks I wonder about.
FCOMI was introduced with Intle's P6 (1995) - don't worry about availability :bg
Okay..., this is really odd.
I get these similar errors from JWasm and Masm:
JWasm :: Test2.asm(111) : Error A2049: Invalid instruction operands.
Masm :: Test2.asm(111) : error A2085: instruction or register not accepted in current CPU mode.
from this code segment:
FILD aa
FILD cc
FCOMI
jng Label
Line #111 is the FCOMI instruction.
I have tried:
FCOMI ST(1)
FCOMI ST(0), ST(1)
Any ideas what this is all about ?
Quote from: SteveAsm on February 02, 2012, 10:09:13 PM
Masm :: Test2.asm(111) : error A2085: instruction or register not accepted in current CPU mode.
include \masm32\include\masm32rt.inc
.686
also, FILD will only work with words, dwords, or qwords
you may need to use a size override operator
SomeStuff db 0,0,0,0,0,0,0,80h
;
;
FILD qword ptr SomeStuff
Quote from: dedndave on February 02, 2012, 10:21:33 PM
also, FILD will only work with words, dwords, or qwords
FILD
will should only work with words, dwords, or qwords
include \masm32\include\masm32rt.inc
.code
SomeStuff dq 1234567
start:
FLD qword ptr SomeStuff ; try FILD
push eax
fistp dword ptr [esp]
pop eax
print str$(eax)
exit
end start
Of course, the result is gibberish :bg
Quote from: jj2007 on February 02, 2012, 10:12:59 PM
include \masm32\include\masm32rt.inc
.686
Aha!
.686this is the key.
Nowhere in my search results was that illustrated.
I had tried everything but .686 .
Thanks for that JJ,
Thanks once again guys.
Quote.686
this is the key.
Nowhere in my search results was that illustrated.
If you had looked at the first link given to you by dedndave, you would have found this:
QuoteNote: This instruction is valid only for the Pentium Pro and subsequent processors. It may not be supported by some assemblers (for MASM, the .686 directive must be used).
Quote from: raymond on February 03, 2012, 01:35:18 AM
If you had looked at the first link given to you by dedndave, you would have found this:
QuoteNote: This instruction is valid only for the Pentium Pro and subsequent processors. It may not be supported by some assemblers (for MASM, the .686 directive must be used).
I don't want to appear antagonsitic, Ray, but I
did look at Dave's first link.
There is a lot of information there, and I simply missed the
Note.
After spending the entire afternoon and evening playing with FCOMI, it doesn't appear that it solves any problems.
It eliminates a few steps, but, the result is still the same.
You are still limited to only a few branching options, as most of the x86 branch instruction are unusable.
Quote from: SteveAsm on February 03, 2012, 04:44:04 PMYou are still limited to only a few branching options, as most of the x86 branch instruction are unusable.
Which instructions are unusable?
Quote from: SteveAsm on February 03, 2012, 04:44:04 PMYou are still limited to only a few branching options, as most of the x86 branch instruction are unusable.
There should only be a few, actually. The FPU holds by default REAL10 values, and when you compare two of them, there are three options:
- bigger
- equal
- smaller
All the rest makes sense only in a reg32 context (carry, unsigned etc). Correct me if I am wrong.
By the way, you can always use
num1 REAL8 123.456
num2 REAL8 123.455
fld num1
fld num2
push eax
fistp num1
pop eax
push edx
fistp num1
pop edx
cmp eax, edx
to do your reg32-style conditional jumps...
you must understand that the CPU has several instructions that are context-dependant
for the FPU, comparisons are always signed - e.g., there is only one context
Complete example, since we are in The Campus :bg
include \masm32\include\masm32rt.inc
.686
.data
num5 REAL8 123.5
num4 REAL8 123.4
num4e REAL8 123.4
num3 REAL8 123.3
.code
start: fld num4
fld num5
print "num5 is "
fcomi st, st(1)
.if Zero?
print "equal to num4", 13, 10
.elseif Carry?
print "lower than num4", 13, 10
.else
print "higher than num4", 13, 10
.endif
fstp st
fstp st
fld num4
fld num3
print "num3 is "
fcomi st, st(1)
.if Zero?
print "equal to num4", 13, 10
.elseif Carry?
print "lower than num4", 13, 10
.else
print "higher than num4", 13, 10
.endif
fstp st
fstp st
fld num4e
fld num4
print "num4 is "
fcomi st, st(1)
.if Zero?
print "equal to num4e", 13, 10
.elseif Carry?
print "lower than num4e", 13, 10
.else
print "higher than num4e", 13, 10
.endif
fstp st
fstp st
inkey " ", 13, 10
exit
end start
Watch out for precision problems - "equal" means that all 80 bits are equal. MasmBasic users may use the low, medium, high and top precision flag.
include \masm32\MasmBasic\MasmBasic.inc ; download (http://www.masm32.com/board/index.php?topic=12460)
.data
num5 REAL8 123.5
num4 REAL8 123.4
Init
Fcmp num4, num5, low
.if Carry?
Print Str$("num4 at %f is lower than num5\n", num4)
.elseif Zero?
Print Str$("num4 at %f is equal to num5\n", num4)
.else
Print Str$("num4 at %f is higher than num5\n", num4)
.endif
Fcmp num4, num5, medium
.if Carry?
Print Str$("num4 at %f is lower than num5\n", num4)
.elseif Zero?
Print Str$("num4 at %f is equal to num5\n", num4)
.else
Print Str$("num4 at %f is higher than num5\n", num4)
.endif
Inkey
Exit
end start
num4 at 123.4000 is equal to num5
num4 at 123.4000 is lower than num5
Jochen's example brings something to mind....
comparing floating point values is really not a simple subject :P
things like precision, rounding, and epsilon can come into play
it depends entirely on the application
if you are comparing values of currency, you will use different code than if you are calculating pixels to fill a circle :P
on that note, i did a little google'ing and came across this article that you may find helpful
http://www.cprogramming.com/tutorial/floating_point/understanding_floating_point_representation.html
QuoteThere should only be a few, actually.
This is what I mean:
- bigger
- equal
- smaller
Not all the normal jump instruction work based on those three conditions.
My application, based on the conditions, uses all of these:
je, jne,... jl, jnl,... jg, jng,... jle, jnle,... jge, jnge,... jz, jnz
Quotecomparing floating point values is really not a simple subject.
things like precision, rounding, and epsilon can come into play.
it depends entirely on the application.
Yes, as JJ has pointed out with his example, it is complex and in no way simple.
QuoteI don't want to appear antagonsitic, Ray, but I did look at Dave's first link.
My apology if I sounded the wrong way. It was certainly not intended. I may just have assumed too much from your "Nowhere in my search results was that illustrated".
As for conditional jumps, the FPU always does signed comparisons but never modifies the SF sign flag; only the ZF zero flag and CF carry flag get modified. (The PF parity flag may also be modified but for a totally different reason than by the CPU.)
The jl and jg mnemonics (and their variants) which rely on the SF should therefore not be used after FPU comparisons. This still leaves a majority of the jxxx mnemonics relying only on the CF and ZF flags (or combinations) which can be used with FPU comparisons.
Quote from: raymond on February 03, 2012, 07:58:36 PM
My apology if I sounded the wrong way. It was certainly not intended.
No, please accept my apologies.
Sometimes when I get frustrated, I tend to google search with blinders on.
I should have explained my self better.
Quote
The jl and jg mnemonics (and their variants) which rely on the SF should therefore not be used after FPU comparisons.
This still leaves a majority of the jxxx mnemonics relying only on the CF and ZF flags (or combinations) which can be used with FPU comparisons.
Okay, I was quite focused on using the jl, jg and variants.
Now I see which ones to stay away from.
The reference materials I have don't explain which groups of jxxx instructions can and can't be used with FPU comparisons.
Thanks
Quote from: SteveAsm on February 03, 2012, 09:47:00 PMThe reference materials I have don't explain which groups of jxxx instructions can and can't be used with FPU comparisons.
you should use Intel's and AMD's documentation as reference:
Intel® 64 and IA-32 Architectures Software Developer Manuals (http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
AMD: Developer Guides & Manuals (http://developer.amd.com/documentation/guides/Pages/default.aspx#manuals)
from Ray's tutorial, ch 7...
The following example hard-codes the instruction for comparing ST(0) to ST(2).
db 0dbh, 0f0h+2 ;encoding for fcomi st,st(2)
;when not supported by the assembler
fwait ;insure the instruction is completed
jpe error_handler ;the comparison was indeterminate
;this condition should be verified first
;then only two of the next three conditional jumps
;should become necessary, in whatever order is preferred,
;the third one being replaced by code to handle that case
ja st0_greater ;when all flags are 0
jb st0_lower ;only the CF flag would be set if no error
jz both_equal ;only the ZF flag would be set if no error
Currently playing with a new Fcmp routine with top, high, medium and low precision. It seems to work but more tests needed :bg
Fcmp sets Sign and Zero flags. Approximate precision (Real10/8/4/x):
top=19 digits, high=15, medium=7, low=4; default = medium, 7 digits
The table below stands for a comparison of 996 ... 1004 against 1000. # means equal. Source attached, requires the MasmBasic library (http://www.masm32.com/board/index.php?topic=12460.0).
Ref 1234.56789012345678 tp 19 hi 15 me 7 lo 4 default
28 996.000000000000000 < < < < < < < < <
27 999.000000000000000 < < < < < < # # <
26 999.900000000000000 < < < < < < # # <
25 999.990000000000000 < < < < < < # # <
24 999.999000000000000 < < < < # # # # #
23 999.999900000000000 < < < < # # # # #
22 999.999999000000000 < < < < # # # # #
21 999.999999990000000 < < < < # # # # #
20 999.999999999900000 < < < < # # # # #
19 999.999999999990000 < < # # # # # # #
18 999.999999999999000 < < # # # # # # #
17 999.999999999999900 < < # # # # # # #
16 999.999999999999990 < < # # # # # # #
15 999.999999999999999 < < # # # # # # #
14 1000.000000000000000 # # # # # # # # #
13 1000.00000000000000 > > # # # # # # #
12 1000.00000000000001 > > # # # # # # #
11 1000.00000000000010 > > # # # # # # #
10 1000.00000000000100 > > # # # # # # #
9 1000.00000000001000 > > # # # # # # #
8 1000.00000000010000 > > > > # # # # #
7 1000.00000001000000 > > > > # # # # #
6 1000.00000100000000 > > > > # # # # #
5 1000.00010000000000 > > > > # # # # #
4 1000.00100000000000 > > > > # # # # #
3 1000.01000000000000 > > > > > > # # >
2 1000.10000000000000 > > > > > > # # >
1 1001.00000000000000 > > > > > > # # >
0 1004.00000000000000 > > > > > > > > >
Ref 1234.56789012345678 tp 19 hi 15 me 7 lo 4 default
Comparing PI, high precision:
MyPI_hi at 3.14159265358980000 is exact
MyPIexact at 3.14159265358979324 is exact
MyPI_low at 3.14159265358978000 is exact
Comparing PI, top precision:
MyPI_hi at 3.14159265358980000 is higher than the real PI
MyPIexact at 3.14159265358979324 is exact
MyPI_low at 3.14159265358978000 is lower than the real PI
Intel(R) Pentium(R) 4 CPU 3.40GHz (SSE3)
5 cycles for 10*cmp 5 ms for 10000000 comparisons
1307 cycles for 10*Fcmp 389 ms for 10000000 comparisons
5 cycles for 10*cmp 5 ms for 10000000 comparisons
1317 cycles for 10*Fcmp 392 ms for 10000000 comparisons
REPEAT 2
Fcmp v1, v2 ; Real4 vs Real8
nop
Fcmp eax, v3 ; reg32 vs Real10
nop
Fcmp v4, ecx ; QWord vs reg32
nop
Fcmp xmm0, v5 ; xmm vs REAL4
nop
Fcmp eax, xmm1 ; reg32 vs xmm
nop
ENDM
QuoteThe FPU holds by default REAL10 values
Just noticed this in this thread. Let's clarify this a bit to prevent newbies from interpreting this wrongly.
FPU
data registers are designed to hold REAL10 values, similar to the CPU's general purpose registers are designed to hold 32-bit values. The actual value in any of the FPU's data registers depends on what is loaded into them and/or under what conditions they have been modified.
At least under Windows, the FPU's precision control is set to REAL8 at the opening of any program. If the program needs REAL10, it must change the precision control
before performing any operation.
To be more precise, the statement should thus have been:
The FPU by default holds values in the REAL10 format, but those values are not necessarily in the REAL10 precision.
That is an interesting point, Raymond. So accordingly, if the FPU is set to REAL4 accuracy, and you use fldpi, the FPU holds a REAL4 crippled value of PI instead of 3.1415926535897932380 aka 4000C90FDAA22168C235h?
If that is the case, then Olly seems to have a bug, because it claims that the FPU holds always the same REAL10 value, irrespective of the precision at the time of loading ::)
Of course, if you use the Fcmp macro to compare a REAL4 with a REAL8 (e.g. xmm0) variable, the accuracy of the comparison depends on the weaker partner.
don't forget - FINIT sets it to real10 :P
That is not what I stated. If you load one of the hard coded constants (such as pi) from the FPU, it will get loaded with its full REAL10 value. If you immediately save that value without modification as a REAL10, it will be saved with its full REAL10 precision, regardless of the precision control. You are saving an image of the data register without any conversion.
However, if you compute the value of 1/3 with the precision control set to REAL8, the data register will contain a truncated value in REAL10 format. And, even if you save it as a REAL10, the saved value will still be that truncated value in REAL10 format.
Thus, if you compute something (apart from a multiple of 1/2) with the precision control set to REAL8 and save it as a REAL10,
then you compute the identical something with the precision control set to REAL10 and also save it as a REAL10,
then compare those two values with REAL10 precision control, they will NOT be identical. They will not even be identical with the precision control set to REAL8 if you load the saved REAL10 values for comparison.
So can we agree that if you load two whatever values with fld (or fild), the FPU holds them as REAL10, and if you save them to two locations in memory as REAL10, they can be compared correctly, showing eventually that 12345.6789 is not equal if one is REAl4 and the other is REAL8...?
The FPU loads and holds them in REAL10 format, but not necessarily in REAL10 precision. This may seem as playing on words but it is an important difference.
I do agree that if 12345.6789 was stored as a REAL4 value in memory (regardless of the precision at which it was generated) that it would be different from a 12345.6789 value generated as a REAL8 (or REAL10) before being stored in memory as a REAL8 (or REAL10).
If you load both of them, you can see with Ollydbg that they are different. And, if you now save both of them as REAL10 and look at the memory locations where they are stored (or print them with 15 significant digits), you would observe that they are slightly different.
HOWEVER, if you load the REAL4 value of 12345.6789 and store it as a REAL8 or REAL10 (thus not having been generated in such precision), it should not be any different than the REAL4 value when reloaded onto the FPU in REAL10 format.
Although the difference would be considerably smaller, values generated in REAL 10 precision but stored as REAL8 and REAL10 would be different when reloaded to the FPU. The REAL8 would have lost 8 bits of precision (out of 64) and rounded up or down based on the most significant bit lost.
Quote from: raymond on February 10, 2012, 05:11:27 PM
Although the difference would be considerably smaller, values generated in REAL 10 precision but stored as REAL8 and REAL10 would be different when reloaded to the FPU. The REAL8 would have lost 8 bits of precision (out of 64) and rounded up or down based on the most significant bit lost.
Isn't that 64 - 53 bits of precision lost?
My bad. :red :eek :red You loose 10 bits of precision (out of 64).
It would not be 11 (64-53) because one bit is implied in the REAL4 and REAL8 formats.
Quote from: raymond on February 10, 2012, 05:11:27 PMHOWEVER, if you load the REAL4 value of 12345.6789 and store it as a REAL8 or REAL10 (thus not having been generated in such precision), it should not be any different than the REAL4 value when reloaded onto the FPU in REAL10 format.
Raymond,
Thank you for your efforts to bring clarity in this tricky business. What I am claiming is that the FPU does not
change what it gets when loading and storing a value, regardless of the precision set by the FPU control word. Below is a practical example - source attached, and sorry that it needs the MasmBasic version of today (I added the SetFpu macro).
Of course, if you perform operations (fadd, fmul, ...) with the loaded values, the result would depend on the precision. But that is not the case for the comparison algo.
Digits: 1234.567890123456789
MyR4= 1000.000000000000000
MyR8= 1000.00000000000011
MyR10= 999.999999999999999
64 bit precision, Fcmp 'top' :
MyR4 is lower than MyR8
MyR4 is higher than MyR10
53 bit precision:
MyR4 is lower than MyR8
MyR4 is higher than MyR10
24 bit precision:
MyR4 is lower than MyR8
MyR4 is higher than MyR10
64 bit precision, Fcmp 'high' :
MyR4 and MyR8 are equal
MyR4 and MyR10 are equal
53 bit precision:
MyR4 and MyR8 are equal
MyR4 and MyR10 are equal
24 bit precision:
MyR4 and MyR8 are equal
MyR4 and MyR10 are equal