News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Fast file time

Started by donkey, January 05, 2012, 07:19:07 AM

Previous topic - Next topic

sinsi

Don't they use the APIC timers nowadays?
One problem with the old ISA timer was binary doesn't work well in decimal, my real mode millisecond timer actually runs at 1090 or something.

>which doesn't change anything
Yes, I got the same results either way. Makes a nice hex (binary?) progression though  :bdg
Light travels faster than sound, that's why some people seem bright until you hear them.

qWord

this work perfectly on my machine (Win7-x64)  :U
include \masm32\include\masm32rt.inc

KSYSTEM_TIME struct
LowPart    ULONG ?
High1Time   LONG ?
High2Time   LONG ?
KSYSTEM_TIME ends

gns macro t:req
mov eax,7FFE0000h+8
.while 1
mov edx,[eax].KSYSTEM_TIME.High1Time
mov ecx,[eax].KSYSTEM_TIME.LowPart
.break .if edx == [eax].KSYSTEM_TIME.High2Time
.endw
mov DWORD ptr t,ecx
mov DWORD ptr t+4,edx
endm

.code
main proc
LOCAL t1:QWORD
LOCAL t2:QWORD

gns t1
fild t1
fmul FP4(100.0E-9)
fstp t1
xor ebx,ebx
.while ebx < 100
gns t2
fild t2
fmul FP4(100.0E-9)
fsub t1
fstp t2
print real8$(t2),13,10
invoke Sleep,250
lea ebx,[ebx+1]
.endw

inkey
exit
main endp
end main
FPU in a trice: SmplMath
It's that simple!

MichaelW

QuoteDon't they use the APIC timers nowadays?

Even though I recall reading about the APIC years ago, I had forgotten all about it. I now have a system that was built in December 2003, and on it QueryPerformanceFrequency returns 2992580000.
eschew obfuscation

jj2007

Quote from: qWord on January 05, 2012, 02:17:56 PM
this work perfectly on my machine (Win7-x64)  :U

You got it :U

Actually, my macros worked, too. The tricky part causing the irregular sequences (6.0, 6.5, 8.0) is the delay loop - probably it cuts through time slices :'(

donkey

#19
Quote from: jj2007 on January 05, 2012, 10:29:59 AM
I got it, Edgar. Open this page and search for consistent.
The structure there doesn't match the one MS uses, there are a number of discrepancies. It is a bit of a complicated structure to translate, this is how far I've gotten, checks out for the offsets they identify in Hex in the structure. It is complete to Win8 however I have to check a few feilds to verify it.

Nice work qWord. works here too.

#define MAX_WOW64_SHARED_ENTRIES 16
#define PROCESSOR_FEATURE_MAX 64
#define MM_SHARED_USER_DATA_VA 0x7FFE0000

// enum NT_PRODUCT_TYPE
NtProductWinNt = 1
NtProductLanManNt = 2
NtProductServer = 3

// enum ALTERNATIVE_ARCHITECTURE_TYPE
StandardDesign = 0
NEC98x86 = 1
EndAlternatives = 2

#define MAXIMUM_XSTATE_FEATURES 64
#define XSTATE_LEGACY_FLOATING_POINT        0
#define XSTATE_LEGACY_SSE                   1
#define XSTATE_GSSE                         2

#define XSTATE_MASK_LEGACY_FLOATING_POINT   (1 << (XSTATE_LEGACY_FLOATING_POINT))
#define XSTATE_MASK_LEGACY_SSE              (1 << (XSTATE_LEGACY_SSE))
#define XSTATE_MASK_LEGACY                  (XSTATE_MASK_LEGACY_FLOATING_POINT | XSTATE_MASK_LEGACY_SSE)
#define XSTATE_MASK_GSSE                    (1 << (XSTATE_GSSE))

#define NX_SUPPORT_POLICY_ALWAYSOFF 0
#define NX_SUPPORT_POLICY_ALWAYSON 1
#define NX_SUPPORT_POLICY_OPTIN 2
#define NX_SUPPORT_POLICY_OPTOUT 3

// Processor features
#define PF_FLOATING_POINT_PRECISION_ERRATA  0   
#define PF_FLOATING_POINT_EMULATED          1   
#define PF_COMPARE_EXCHANGE_DOUBLE          2   
#define PF_MMX_INSTRUCTIONS_AVAILABLE       3   
#define PF_PPC_MOVEMEM_64BIT_OK             4   
#define PF_ALPHA_BYTE_INSTRUCTIONS          5   
#define PF_XMMI_INSTRUCTIONS_AVAILABLE      6   
#define PF_3DNOW_INSTRUCTIONS_AVAILABLE     7   
#define PF_RDTSC_INSTRUCTION_AVAILABLE      8   
#define PF_PAE_ENABLED                      9   
#define PF_XMMI64_INSTRUCTIONS_AVAILABLE   10   
#define PF_SSE_DAZ_MODE_AVAILABLE          11   
#define PF_NX_ENABLED                      12   
#define PF_SSE3_INSTRUCTIONS_AVAILABLE     13   
#define PF_COMPARE_EXCHANGE128             14   
#define PF_COMPARE64_EXCHANGE128           15   
#define PF_CHANNELS_ENABLED                16   
#define PF_XSAVE_ENABLED                   17

XSTATE_FEATURE STRUCT
Offset LONG
Size LONG
ENDS

XSTATE_CONFIGURATION  STRUCT
EnabledFeatures LONG64
Size LONG
OptimizedSave LONG
Features XSTATE_FEATURE MAXIMUM_XSTATE_FEATURES DUP
ENDS

KSYSTEM_TIME STRUCT
LowPart LONG
High1Time LONG
High2Time LONG
ENDS

KUSER_SHARED_DATA STRUCT
//
// WARNING: This structure must have exactly the same layout for 32- and
//    64-bit systems. The layout of this structure cannot change and new
//    fields can only be added at the end of the structure (unless a gap
//    can be exploited). Deprecated fields cannot be deleted. Platform
//    specific fields are included on all systems.
//
//    Layout exactness is required for Wow64 support of 32-bit applications
//    on Win64 systems.
//
//    The layout itself cannot change since this structure has been exported
//    in ntddk, ntifs.h, and nthal.h for some time.

TickCountLowDeprecated LONG
TickCountMultiplier LONG
InterruptTime KSYSTEM_TIME
SystemTime KSYSTEM_TIME
TimeZoneBias KSYSTEM_TIME
ImageNumberLow SHORT
ImageNumberHigh SHORT
NtSystemRoot SHORT 260 DUP
MaxStackTraceDepth LONG
CryptoExponent LONG
TimeZoneId LONG
LargePageMinimum LONG
Reserved2  LONG 7 DUP
NtProductType ENUM // NT_PRODUCT_TYPE
ProductTypeIsValid BOOLEAN
Padding0 CHAR 3 DUP
NtMajorVersion LONG
NtMinorVersion LONG
ProcessorFeatures BOOLEAN PROCESSOR_FEATURE_MAX DUP
Reserved1 LONG
Reserved3 LONG
TimeSlip LONG
AlternativeArchitecture ENUM // ALTERNATIVE_ARCHITECTURE_TYPE
AltArchitecturePad LONG
SystemExpirationDate LARGE_INTEGER
SuiteMask LONG
KdDebuggerEnabled BOOLEAN
NXSupportPolicy CHAR
Padding CHAR 2 DUP
ActiveConsoleId LONG
DismountCount LONG
ComPlusPackage LONG
LastSystemRITEventTickCount LONG
NumberOfPhysicalPages LONG
SafeBootMode BOOLEAN

TscQpcData CHAR
TscQpcPad CHAR 2 DUP

// > Vista only
TraceLogging LONG

; SharedDataFlags LONG
DataFlagsPad LONG

TestRetInstruction LONGLONG
SystemCall LONG
SystemCallReturn LONG
SystemCallPad LONGLONG 3 DUP

UNION
TickCount KSYSTEM_TIME
TickCountQuad LONG64
ENDUNION

// The following padding is documented in the above union
// it is added separately to bypass a bug in GoAsm - Do not change !
TickCountPad DD

Cookie LONG
CookiePad LONG
ConsoleSessionForegroundProcessId LONGLONG
Wow64SharedInformation LONG MAX_WOW64_SHARED_ENTRIES DUP
UserModeGlobalLogger SHORT 16 DUP
ImageFileExecutionOptions LONG

// Pre vista 4 bytes padding instead of LangGenerationCount
LangGenerationCount LONG
Reserved5 LONGLONG
InterruptTimeBias LONG64
TscQpcBias LONG64
ActiveProcessorCount LONG
ActiveGroupCount SHORT
Reserved4 SHORT
AitSamplingValue LONG
AppCompatFlag LONG
SystemDllNativeRelocation LONGLONG
SystemDllWowRelocation LONG
XStatePad LONG
XState XSTATE_CONFIGURATION
ENDS


EDIT:

There were a couple of offset issues with the version of the structure I had in this post previously, they have been fixed and all offsets match the ASSERTs in ntddk.h so the structure is definitely correct. I have left in the comment for this structure to ensure anyone that it will not change and is the same for x32 and x64.

Beginning with Windows Vista, SharedDataFlags has changed to TraceLogging, this is just a name change, offsets remain the same so SharedDataFlags has been commented out.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

donkey

Nice for a replacement for OSVERSIONINFO, always hated the way you got the OS version and never liked the hacks I have seen using FS segment overrides:

mov ecx,MM_SHARED_USER_DATA_VA
mov eax,[ecx+KUSER_SHARED_DATA.NtMajorVersion]
mov edx,[ecx+KUSER_SHARED_DATA.NtMinorVersion]
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

jj2007

#21
I got some unexpected insights into Sleep and its granularity... very nice :bg

Loop 1          0.0156250000000000000
Loop 8          0.125000000000000000
Loop 32         0.500000000000000000
Loop 64         1.00000000000000000
Sleep 16 - one more? Esc=exit


Executable attached.

include \masm32\MasmBasic\MasmBasic.inc
Ticks2FPU MACRO
   mov edx, 7FFE0008h   ; aka KUSER_SHARED_DATA.InterruptTime.High1Time
@@:   mov eax, [edx+4]
   cmp eax, [edx+8]   ; See Windows Research Kernel @ HPI and look for consistent
   jne @B
   fild qword ptr [edx]
ENDM

OneEm9   REAL10 100.0E-9

   Init
   mov ecx, 16
   .Repeat
      Ticks2FPU
      For_ n=1 To 65
         push ecx
         dec ecx
         invoke Sleep, ecx
         pop ecx
         fld st
         Ticks2FPU
         fsub
         fchs
         fld OneEm9
         fmul
         Print Str$("Loop %i  \t", n), Str$("%If\n", ST(0))
         fstp st
      Next
      Inkey Str$("Sleep %i - one more? Esc=exit\n", ecx-1)
      add ecx, ecx   ; double Sleep delay
   .Until eax==VK_ESCAPE
   Exit
end start

EDIT: Don't trust the code posted above!

With a Sleep 1000 (i.e. mov ecx, 1001 after Init) I get:
Loop 40         40.0000000000000000
Loop 41         41.0000000000000000  << great precision until here
Loop 42         42.0019531000000000  << starts misbehaving
Loop 43         43.0039062000000000
Loop 44         44.0058594000000000


Sometimes it keeps 19-digits precision until the end (Loop 65), sometimes it starts misbehaving earlier, i.e. invoke Sleep, n produces delays slightly longer than n ms. It's on a single core CPU.

Greetings to Redmond ::)

donkey

This thing just has a lot of information, processor features are determined as follows (or appear to be)

// replacement for IsProcessorFeaturePresent function
mov ecx,MM_SHARED_USER_DATA_VA
mov edx,[ecx+KUSER_SHARED_DATA.ProcessorFeatures+PF_XXXXXXXXXX]
and edx,1


EDX will be TRUE if the feature is present, FALSE otherwise.

KUSER_SHARED_DATA.ActiveProcessorCount will give the number of threads the processor supports (not the number of cores)

There is quite a lot of stuff to play with here and the best part is it is pretty much guaranteed never to be altered or deprecated, only expanded so it can be used without much worry.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

dedndave

correct me if i'm wrong, here - lol

but, i would think the API code looks much the same as your code

donkey

Quote from: dedndave on January 05, 2012, 07:19:36 PM
correct me if i'm wrong, here - lol

but, i would think the API code looks much the same as your code

Probably but this has no PUSH/CALL/RET overhead so you can save a few cycles (not to mention cache hits) and it also does very well at obfusacting code if you don't appreciate people disassembling your program. Any time I can replace an API reliably is a good thing in my book, one more tool in the box.

For example the KUSER_SHARED_DATA.ActiveProcessorCount can pretty much replace this whole DLL and program with a fairly reliable method using only 2 opcodes. Well, the Logical Processors part anyway.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

jj2007

Warning - see update of #22!

qWord

Quote from: jj2007 on January 05, 2012, 06:16:44 PMSometimes it keeps 19-digits precision until the end (Loop 65), sometimes it starts misbehaving earlier, i.e. invoke Sleep, n produces delays slightly longer than n ms. It's on a single core CPU.
This is unsurprisingly - Windows is not a real-time OS.
You may get better result by restarting a waitable timer each iteration (Create/SetWaitableTimer)
FPU in a trice: SmplMath
It's that simple!

donkey

I've kind of given up on it as a timer at least for now, the reason I got into the structure was that I'm translating the DDK for GoAsm and ran across it. There are plenty of cycle saving tidbits in there to keep me interested though. According to the article Jochen linked in an earlier post the interrupt timer might be one to try for timing functions.

Quote from: The article Jochen postedThe interrupt time is the only Windows clock that guarantees to be monotonous that is, its value only increases over timer. Its value represents the time in units of 100 ns since the system was booted. The interrupt time is the base clock for all timers in Windows (see my recent article A Bug in Windows Timer Management). It is updated every clock interrupt.

The following is a dump of the InterruptTime field with Sleep,100 between each...

Line 228: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1366442026 (0x5172402A)
Line 230: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1367534028 (0x5182E9CC)
Line 232: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1368626030 (0x5193936E)
Line 234: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1369718032 (0x51A43D10)
Line 236: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1370810034 (0x51B4E6B2)
Line 238: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1371902036 (0x51C59054)
Line 240: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1372994038 (0x51D639F6)
Line 242: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1374086040 (0x51E6E398)
Line 244: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1375178042 (0x51F78D3A)
Line 246: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1376270044 (0x520836DC)
Line 248: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1377362046 (0x5218E07E)
Line 250: [ebx+KUSER_SHARED_DATA.InterruptTime] = 1378454048 (0x52298A20)


Haven't checked them all but about 1,010,000 ticks for 100 ms, seems pretty consistent.
"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable

jj2007

Quote from: donkey on January 06, 2012, 01:49:20 AM
The following is a dump of the InterruptTime field with Sleep,100 between each...
...
Haven't checked them all but about 1,010,000 ticks for 100 ms, seems pretty consistent.

That's the whole point, Edgar - it looks consistent, with 19-digits precision. Until loop 42, when Sleep decides to relax a little bit, around 0.2% :bg
The timer macro is ok, and it's the interrupt one, so the culprit is obviously good ol' Sleep.

dedndave

Sleep isn't intended to be a timing device, strictly speaking
because it does not return until the beginning of the next time slice

not really sure how long a time slice is under windows - different versions of windows, different CPU's, etc  :P
but it could account for small variations