News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fast float to str algo

Started by jj2007, May 11, 2012, 07:18:18 PM

Previous topic - Next topic

jj2007

Just for fun - see line 183 for a really cute hack  :bg
floatx uses a fixed (but configurable) number of digits after the decimal point, so it is somewhat limited...

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
96       bytes for floatx
12.3456         floatx
12.3456         FloatToStr
12.3456         MB Str$()
12.345600       crt sprintf

162     cycles for floatx
604     cycles for FloatToStr
402     cycles for Str$()
4161    cycles for crt_sprintf

162     cycles for floatx
603     cycles for FloatToStr
403     cycles for Str$()
4203    cycles for crt_sprintf

162     cycles for floatx
603     cycles for FloatToStr
402     cycles for Str$()
4231    cycles for crt_sprintf

dedndave

 :bg

no - THIS is down and dirty - lol
;zoom string table

szZoomTable      db '5',0,0,0,0,0,'4.842',0,'4.688',0,'4.54',0,0,'4.396',0
                 db '4.257',0,'4.122',0,'3.991',0,'3.865',0,'3.742',0
                 db '3.624',0,'3.509',0,'3.398',0,'3.29',0,0,'3.186',0
                 db '3.085',0,'2.987',0,'2.893',0,'2.801',0,'2.712',0
                 db '2.627',0,'2.543',0,'2.463',0,'2.385',0,'2.309',0
                 db '2.236',0,'2.165',0,'2.097',0,'2.03',0,0,'1.966',0
                 db '1.904',0,'1.843',0,'1.785',0,'1.728',0,'1.674',0
                 db '1.62',0,0,'1.569',0,'1.52',0,0,'1.471',0,'1.425',0
                 db '1.38',0,0,'1.336',0,'1.294',0,'1.253',0,'1.213',0
                 db '1.175',0,'1.137',0,'1.101',0,'1.066',0,'1.033',0
szZoomTableLast  db '1',0


it's only 302 bytes - and i must say - it is pretty fast
on the bright side, i use all but one of the strings for 2 different zoom ratios   :P

but - your simple little technique has me re-thinking my code
i may adapt something similar

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

660     cycles for floatx
2170    cycles for FloatToStr
799     cycles for Str$()
7474    cycles for crt_sprintf

656     cycles for floatx
2176    cycles for FloatToStr
807     cycles for Str$()
7610    cycles for crt_sprintf

653     cycles for floatx
2208    cycles for FloatToStr
1138    cycles for Str$()
7650    cycles for crt_sprintf

jj2007

Quote from: dedndave on May 12, 2012, 12:23:23 AM
:bg

no - THIS is down and dirty - lol
...
but - your simple little technique has me re-thinking my code
i may adapt something similar

So what does it zoom, Dave?

dedndave

oh - i have been playing with this thing...
BaldurIIMap.zip
i just got the status bar info to toggle between cursor/center with grab
next is to add a few simple menu options and mousewheel
then - i want to make it load JPG/PNG/GIF files - so that will be a little GDI+ experience   :P

jj2007

Looks nice, David R. Sheldon :U

Of course, a 3MB file must be launched via Olly, for security reasons, but what I saw looked very dedndave-like :bg

dedndave

lol
well - if you examine the resources, you'll see why it's 3 Mb   :bg
i don't have to open a file every time i run it
it's about 400 Kb as a JPEG
although, i am not too sure how well the JPEG will stretch

Antariy

Quote from: jj2007 on May 11, 2012, 07:18:18 PM
floatx uses a fixed (but configurable) number of digits after the decimal point, so it is somewhat limited...

Jochen, approach you used in the code is very good in many circumstances :bg For example, I've used such a way for floating point to string conversion in MemInfoMicro - with fixed precision of fractional part as 3 digits, so there was no need in (MSV)CRT libraries and other bloated things :bg

Here is FPU2Str code to test with my piece added.

But the proc Axprint_float is different from float-to-two-dwords conversion method. It uses FBSTP to convert number from floating point into BCD format, after than processing given packed BCD into resulting output string. Though, the precision of the fractional part is still passed to proc via just power-of-ten number (hm, need to change this part as simple multiplication on entry of a table of powers-of-ten).

This proc supports up to 18 digits (contrary to not full range of 10 digits in float-to-two-dword version) in the integer part of the number passed, and up to 9 digits in its fractional part.
Also used-defined decimal point is supported - passed as a parameter.
NaN is checked and processed correctly as well.
Take notice that the size of proc is actually containing all stuff of it - number-to-string conversion procedure is already inside it and does not need in dwtoa (dwtoa has size of 101 bytes).


Numbers for testing could be changed via MACRO selectTest.

Results for the values digitfactor=10000, r8a=123.4567 (selectTest=3)

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
96       bytes for floatx
189      bytes for Axprint_float
123.4567        floatx
123.4567        FloatToStr
123.456700      crt sprintf
123.4567        Axprint_float

880     cycles for floatx
2177    cycles for FloatToStr
7399    cycles for crt_sprintf
1176    cycles for Axprint_float

690     cycles for floatx
2173    cycles for FloatToStr
7397    cycles for crt_sprintf
1191    cycles for Axprint_float

688     cycles for floatx
2173    cycles for FloatToStr
7400    cycles for crt_sprintf
1181    cycles for Axprint_float



Results for the values digitfactor=1000000000, r8a=123456789012345678.901234567 (18 digits of integer part; note that there was no space to present "defined" fractional part adequately in double) (selectTest=1):

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
96       bytes for floatx
189      bytes for Axprint_float
--2147483648.2147483648         floatx
1.234568e+017           FloatToStr
123456789012345680.000000       crt sprintf
123456789012345680.0    Axprint_float

4576    cycles for floatx
2133    cycles for FloatToStr
9329    cycles for crt_sprintf
741     cycles for Axprint_float

4367    cycles for floatx
2135    cycles for FloatToStr
9301    cycles for crt_sprintf
750     cycles for Axprint_float

4366    cycles for floatx
2134    cycles for FloatToStr
9381    cycles for crt_sprintf
752     cycles for Axprint_float


Timing if Axprint_float is smaller (!) than it was with shorter number because of shortcut used with zero. floatx result is overflow due to dwtoa usage.


Results for the values digitfactor=1000000000, r8a=12345678.901234567 (selectTest=2; archive attached contains this EXE):

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
96       bytes for floatx
189      bytes for Axprint_float
12345678.901234567      floatx
1.234568e+007           FloatToStr
12345678.901235 crt sprintf
12345678.901234567      Axprint_float

1064    cycles for floatx
2141    cycles for FloatToStr
7164    cycles for crt_sprintf
1222    cycles for Axprint_float

863     cycles for floatx
2133    cycles for FloatToStr
7153    cycles for crt_sprintf
1239    cycles for Axprint_float

866     cycles for floatx
2142    cycles for FloatToStr
7168    cycles for crt_sprintf
1227    cycles for Axprint_float



It is interesting to see the timings on different machines/CPUs, because the code is relying on the hardware implementation of the most slow part in the timings - FBSTP instruction (this part takes over 800 (2 times of over 400) cycles from total of ~1200 cycles on my machine).

jj2007

Hi Alex,

Good to see you here :thumbu

Here are timings including Str$():

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
96       bytes for floatx
12345678.9012           floatx
1.234568e+007           FloatToStr
12345678.9012           MB Str$()
12345678.9012           Axprint_float
12345678.901235         crt sprintf

281     cycles for floatx
630     cycles for FloatToStr
507     cycles for Str$()
546     cycles for Axprint_float
4280    cycles for crt_sprintf

273     cycles for floatx
630     cycles for FloatToStr
506     cycles for Str$()
530     cycles for Axprint_float
4278    cycles for crt_sprintf

286     cycles for floatx
630     cycles for FloatToStr
506     cycles for Str$()
534     cycles for Axprint_float
4273    cycles for crt_sprintf


Your algo looks competitive :bg

dedndave

prescott w/httt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

788     cycles for floatx
2149    cycles for FloatToStr
1021    cycles for Str$()
1184    cycles for Axprint_float
7201    cycles for crt_sprintf

790     cycles for floatx
2149    cycles for FloatToStr
1020    cycles for Str$()
1193    cycles for Axprint_float
7277    cycles for crt_sprintf

781     cycles for floatx
2156    cycles for FloatToStr
1007    cycles for Str$()
1190    cycles for Axprint_float
7295    cycles for crt_sprintf

Antariy

Quote from: jj2007 on May 13, 2012, 03:08:46 PM
Good to see you here :thumbu

:U

Quote from: jj2007 on May 13, 2012, 03:08:46 PM
Here are timings including Str$():

Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
96       bytes for floatx
12345678.9012           floatx
1.234568e+007           FloatToStr
12345678.9012           MB Str$()
12345678.9012           Axprint_float
12345678.901235         crt sprintf

811     cycles for floatx
2251    cycles for FloatToStr
1077    cycles for Str$()
1253    cycles for Axprint_float
7604    cycles for crt_sprintf

823     cycles for floatx
2243    cycles for FloatToStr
1072    cycles for Str$()
1250    cycles for Axprint_float
7532    cycles for crt_sprintf

820     cycles for floatx
2245    cycles for FloatToStr
1076    cycles for Str$()
1249    cycles for Axprint_float
7578    cycles for crt_sprintf


Quote from: jj2007 on May 13, 2012, 03:08:46 PM
Your algo looks competitive :bg

If squeeze (a lot!) here and there then... maaaybe :bg

jj2007

Ok, I know the practical relevance is nil, but a factor 24 on a CRT algo is always worth a little effort :green2

Intel(R) Celeron(R) M CPU        420  @ 1.60GHz (SSE3)
96       bytes for floatx
136      bytes for floatx4
12345678.9012           floatx4
12345678.9012           floatx
1.234568e+007           FloatToStr
12345678.9012           MB Str$()
12345678.9012           Axprint_float
12345678.901235         crt sprintf

281     cycles for floatx
176     cycles for floatx4
630     cycles for FloatToStr
509     cycles for Str$()
549     cycles for Axprint_float
4271    cycles for crt_sprintf

285     cycles for floatx
176     cycles for floatx4
630     cycles for FloatToStr
520     cycles for Str$()
535     cycles for Axprint_float
4278    cycles for crt_sprintf

287     cycles for floatx
176     cycles for floatx4
630     cycles for FloatToStr
516     cycles for Str$()
542     cycles for Axprint_float
4270    cycles for crt_sprintf


I had to bloat floatx4 to a whopping 136 bytes, but at least the dependency on dwtoa disappeared in the process :wink

Antariy

Quote from: jj2007 on May 13, 2012, 10:42:31 PM
Ok, I know the practical relevance is nil, but a factor 24 on a CRT algo is always worth a little effort :green2

Well, this is usable and useful code samples - it is better to use own small and relatively fast piece, than rely on some third-party-huge-CRT-etc, if this functionality is required only (and also, for instance, I rare saw usage of scientific notation in "general programming") :bg


Intel(R) Celeron(R) CPU 2.13GHz (SSE3)
96       bytes for floatx
136      bytes for floatx4
12345678.9012           floatx4
-0.001234567            floatx4
12345678.9012           floatx
1.234568e+007           FloatToStr
12345678.9012           MB Str$()
12345678.9012           Axprint_float
12345678.901235         crt sprintf

818     cycles for floatx
358     cycles for floatx4
2267    cycles for FloatToStr
1087    cycles for Str$()
1269    cycles for Axprint_float
7531    cycles for crt_sprintf

835     cycles for floatx
371     cycles for floatx4
2267    cycles for FloatToStr
1089    cycles for Str$()
1246    cycles for Axprint_float
7669    cycles for crt_sprintf

835     cycles for floatx
357     cycles for floatx4
2264    cycles for FloatToStr
1091    cycles for Str$()
1261    cycles for Axprint_float
7525    cycles for crt_sprintf


Quote from: jj2007 on May 13, 2012, 10:42:31 PM
I had to bloat floatx4 to a whopping 136 bytes, but at least the dependency on dwtoa disappeared in the process :wink

But integer part of the number is still limited to a DWORD range. Though, because of these 18 digits of precision in Axprint_float it is "almost" nothing to squeeze in it - FBSTP is very slow, and SSE2 BCD2ASCII will not improve the timings too much.

dedndave

prescott w/htt
Intel(R) Pentium(R) 4 CPU 3.00GHz (SSE3)

783     cycles for floatx
345     cycles for floatx4
2154    cycles for FloatToStr
1008    cycles for Str$()
1176    cycles for Axprint_float
7248    cycles for crt_sprintf

791     cycles for floatx
338     cycles for floatx4
2129    cycles for FloatToStr
1000    cycles for Str$()
1180    cycles for Axprint_float
7197    cycles for crt_sprintf

771     cycles for floatx
337     cycles for floatx4
2157    cycles for FloatToStr
1008    cycles for Str$()
1170    cycles for Axprint_float
7185    cycles for crt_sprintf

hutch--

This is the last zip file timing.


Intel(R) Core(TM)2 Quad CPU    Q9650  @ 3.00GHz (SSE4)
96       bytes for floatx
136      bytes for floatx4
12345678.9012           floatx4
-0.001234567            floatx4
12345678.9012           floatx
1.234568e+007           FloatToStr
12345678.9012           MB Str$()
12345678.9012           Axprint_float
12345678.901235         crt sprintf

264     cycles for floatx
146     cycles for floatx4
606     cycles for FloatToStr
398     cycles for Str$()
493     cycles for Axprint_float
3997    cycles for crt_sprintf

260     cycles for floatx
146     cycles for floatx4
606     cycles for FloatToStr
394     cycles for Str$()
494     cycles for Axprint_float
4002    cycles for crt_sprintf

274     cycles for floatx
147     cycles for floatx4
606     cycles for FloatToStr
394     cycles for Str$()
500     cycles for Axprint_float
4020    cycles for crt_sprintf
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php