News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

I am not a spammer!

Started by nixeagle, April 30, 2012, 10:30:16 PM

Previous topic - Next topic

nixeagle

Reading the instructions during signup told me to post here. I signed up as I want to discuss a bit about some CPU timing work I have been doing. I have figured out a way to do measurements down to a standard deviation of less than 1 tick.

jj2007

We are curious to see your approach - could be fun to compare  :wink

Here is a test candidate.

Welcome to the Forum :thumbu

dedndave

welcome on board   :U
i'd be interested in seeing some comparisons, as well
i have a method in the back of my head (or someplace) that might also be viable

most of us use MichaelW's timing macros...
http://www.masm32.com/board/index.php?topic=770.0

here's a simple example of their use...
http://www.masm32.com/board/index.php?topic=18720.msg158456#msg158456

nixeagle

Wow, thanks guys!

Well my work is atm not finished, but I have some really interesting results and graphs, just experimenting with the "empty" loop. That is one that simply does the minimal work between RDTSC.

The really interesting bit of this problem on a core i7 to me is how I'm able to get timings between around 22 clocks up to a very consistent 154 clocks, with a standard deviation of 0.6 and 99.8% of sample points taking exactly 154 ticks.

What I want to work on is establishing a method to tell how "good" a set of timing results are. How much can we rely on them? How much variance is there between samples? Ideally each time through an algorithm will take the same number of clocks resulting in a standard deviation of 0. We can't quite get that on modern processors though!

Anyway, which sub forum should I be posting graphs and info on my work? I have (so far) 4 interesting graph plots (done using Mathematica) along with interesting code samples and whatnot.

P.S (edit), I'll check the macros out tonight too :)

Edit2: In the second paragraph, where I spoke of a range of results between 22 and 156 clocks, those are over different "sets" of timing code. Right now I'm able to get "fast" but statistically meaningless or "slow" but consistent. I'd like to work on getting the best of both worlds ;).

dedndave

the Laboratory sub-forum is the place you'll enjoy
many examples of code timing in there - and run on different CPU's to see how code acts over a variety of platforms

timing code can be tricky business, in a way
you can get basic results easily enough
but - some algorithms work on a range of data sets that make it hard to compare
things like cache hits/misses can make it hard to create a "real-world" test

you have to keep in mind, "how is this code going to be used", in order to time it

nixeagle

Alright, I'll start working up a post for there then!

And yes I'm very aware of cache misses, power saving modes, operating system interrupts/context switches and whatnot. My goal is to figure out a way to establish when timing results are meaningful. Taking just the average without measuring the quality is not enough.

For example I have one set of code that consistently clocks in at 154 ticks on a core i7. Others take between 77, 88, 99 ticks fairly evenly distributed. I much prefer the former to the latter, but hopefully the latter will be able to get improved ;).

nixeagle

I really hope long posts are ok with you guys, I'm still working on this and I'm at 8 paragraphs and 2 images already!  :eek

P.S. Struggling with the BBCode! My usual is LaTeX :/.

hutch--

Hi nixeagle,

Welcome on board. Timing will always be PHUN running Windows on later hardware. OS privilege level tend to make a mess of most timing techniques and it is not all that predictable in what it interferes with. Over time I have seen various techniques used, Michael Webster's timing method samples the test loop overhead then compares it with the timings of the loop code running the test algorithm, Agner Fog designed a technique that ran in ring0 in real mode to avoid the privilege level problem and old timers like myself run a sample long enough then calculate the data transfer rate.

They all have their virtues and vices and apart from Agner Fogs technique, they are subject to variation due to task switching. The problem in using Agner's technique is you don't know how the test algo will perform under task switching.
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

Vortex

Hi nixeagle,

Welcome to the forum.