The MASM Forum Archive 2004 to 2012
Welcome, Guest. Please login or register.
March 23, 2023, 08:09:33 AM

Login with username, password and session length
Search:     Advanced search
128553 Posts in 15254 Topics by 684 Members
Latest Member: mottt
* Home Help Search Login Register
+  The MASM Forum Archive 2004 to 2012
|-+  General Forums
| |-+  The Campus
| | |-+  Loop unrolling proper usage
« previous next »
Pages: [1] Print
Author Topic: Loop unrolling proper usage  (Read 5348 times)
zemtex
Member
*****
Posts: 537



Loop unrolling proper usage
« on: April 22, 2012, 03:06:19 PM »

I do loop unrolls from time to time, but I am looking for some deeper understanding of it. I am also interested in tricks related to register conservation. Examples are welcome, but I prefer theoretical explanations.
Post a few examples, simple ones and explain each line with a comment why you do it like this and like that. Before and After examples are preferred.
Logged

I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.
shlomok
Member
**
Posts: 47



Re: Loop unrolling proper usage
« Reply #1 on: April 22, 2012, 03:17:57 PM »

Hi,
There are some examples here: http://www.mark.masmcode.com/

Too advanced for me but they might so sense for you.
Logged
FORTRANS
Member
*****
Gender: Male
Posts: 1147


Imagine


Re: Loop unrolling proper usage
« Reply #2 on: April 22, 2012, 03:39:01 PM »

Hi,

   Well I unroll loops when the loop code is taking an appreciable
amount of time compared to the time taken by the code inside
the loop and the code overall is limited in performance.  Off the
top of my head, a line drawing routine and a graphics program
took most of my efforts in this area.

   The line drawing routine started as a generic Bresenham and
ended up as a horrible mess of unrolled, tangled, specialized,
spaghetti code.  I eventually just went with a simplified algorithm
to get the performance I wanted.

   The graphics program got every optimization I could think of.
The loop unrolling, per se, was probably silly as filling a screen with
pixels takes quite a bit of time compared to the loop code.  But it
was a part of restructuring the program more than for looping
performance.  Unrolling by two allowed for a copy from/to buffer
one and then a copy from/to buffer two rather than one buffer
with two copies to set things up for the next iteration.  So about
six copies per iteration went to five (or such).

Regards,

Steve N.
Logged
hutch--
Administrator
Member
*****
Posts: 12013


Mnemonic Driven API Grinder


Re: Loop unrolling proper usage
« Reply #3 on: April 22, 2012, 03:45:11 PM »

It matters more on some hardware than others, mainly older stuff. It worked OK in some algos on PIV hardware but almost has no effect with the Core series and i7 series hardware.

The theory is simple enough, unroll an algo to reduce the loop overhead but many factors work outside the theory, if the loop content is heavily memory dependent then unrolling it will not matter as the time taken with the memory operand operations will be the main factor, in some situations where the loop code is mainly data stored directly in registers there is some potential to get a timing reduction.

Finally you set up a timing mechanism as see if unrolling an algo alters its timing, if its faster then use it, if there is no difference then don't bother but be aware that sometimes and unroll makes an algo slower.
Logged

Regards,



Download site for MASM32
http://www.masm32.com
dedndave
Member
*****
Posts: 12523


Re: Loop unrolling proper usage
« Reply #4 on: April 22, 2012, 04:55:26 PM »

what Hutch said   Tongue

...i might add...
when you time code and modify it, then time it again,
you are optimizing it for your platform
to truly know what is "good", you have to stick it in the laboratory sub-forum
see how it performs on a range of processors and operating systems
Logged
FORTRANS
Member
*****
Gender: Male
Posts: 1147


Imagine


Re: Loop unrolling proper usage
« Reply #5 on: April 22, 2012, 07:13:47 PM »

see how it performs on a range of processors and operating systems

Hi,

   Just as a joke mind you, here are some results from that
graphics program I mentioned.  The P-III Win2k and AMD
systems produced erratic results.

   The target was the 200LX, and some things that sped up
the development machines slowed it down.  Many things that
sped up the others had little or no effect on the 200LX as well,
but usually were left in.

Code:
200LX, 80186 16MHz, MS-DOS 5.0, ~2.2x
+1.96816912E+000 Iterations per second.  PSEUDO5T, mains, A.O.T. battery
+4.35200000E+000 Iterations per second.  PSEUDO7

Pentium 90, WD 90C33, DOS, ~5.0x
+2.49228395E+001 Iterations per second.  PSEUDO5T
+1.24106907E+002 Iterations per second.  PSEUDO7

Pentium III 800, Matrox G400, OS/2 VDM, ~17.2
+3.87951807E+001 Iterations per second.  PSEUDO5T
+6.68239356E+002 Iterations per second.  PSEUDO7

Pentium III 800, Matrox G400, W2k
+2.34104046E+001 Iterations per second.  PSEUDO5T
+3.79800853E+001 Iterations per second.  PSEUDO5T
+4.43276284E+002 Iterations per second.  PSEUDO7
+6.39156627E+002 Iterations per second.  PSEUDO7

AMD 64 @ 2 GHz, WinXP, ~31.5x
+4.21271764E+001 Iterations per second.  PSEUDO5T
+1.32561728E+003 Iterations per second.  PSEUDO7

Cheers,

Steve N.
Logged
jj2007
Member
*****
Gender: Male
Posts: 6011



Re: Loop unrolling proper usage
« Reply #6 on: April 22, 2012, 08:21:09 PM »

It is very difficult to achieve reliable timings for the P4. Attached a set of macros based on MichaelW's Timer.asm designed for that purpose.

Code:
.nolist
include \masm32\include\masm32rt.inc

.686
.xmm

; ###### these macros improve drastically the consistency of timings on the P4 #######
include \masm32\MasmBasic\Cyct_Macros.inc

.code
start:
REPEAT 10
cyct_begin
invoke GetTickCount
cyct_end <GetTickCount>
ENDM
exit

end start

* Cyct_Macros.zip (3.83 KB - downloaded 279 times.)
Logged

Pages: [1] Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP The MASM Forum Archive 2004 to 2012 | Powered by SMF 1.0.12.
© 2001-2005, Lewis Media. All Rights Reserved.
Valid XHTML 1.0! Valid CSS!