The MASM Forum Archive 2004 to 2012
Welcome, Guest. Please login or register.
August 03, 2020, 09:15:08 PM

Login with username, password and session length
Search:     Advanced search
128553 Posts in 15254 Topics by 684 Members
Latest Member: mottt
* Home Help Search Login Register
+  The MASM Forum Archive 2004 to 2012
|-+  General Forums
| |-+  The Campus
| | |-+  for loop with constants
« previous next »
Pages: [1] Print
Author Topic: for loop with constants  (Read 3176 times)
loki_dre
Guest


Email
for loop with constants
« on: April 20, 2008, 08:49:40 AM »

is it possible to write code as a small for loop based on a constant that won't consume any registers or variables, but simply copy and paste the given instructions over X times?
eg.
for(i=0;i<3;i++)
mov [ebx],al
inc ebx

Which would translate to:
mov [ebx],al
inc ebx
mov [ebx],al
inc ebx
mov [ebx],al
inc ebx
Logged
MichaelW
Global Moderator
Member
*****
Gender: Male
Posts: 5161


Re: for loop with constants
« Reply #1 on: April 20, 2008, 09:16:58 AM »

You can create an "unrolled" loop with the REPEAT directive.
Code:
    REPEAT 3
      mov [ebx],al
      inc ebx
    ENDM
Code:
00401000 8803                   mov     [ebx],al
00401002 43                     inc     ebx
00401003 8803                   mov     [ebx],al
00401005 43                     inc     ebx
00401006 8803                   mov     [ebx],al
00401008 43                     inc     ebx

And many variations are possible, for example:
Code:
    N=0
    REPEAT 3
      mov [ebx+N*4],eax
      N=N+1
    ENDM
Code:
00401009 8903                   mov     [ebx],eax
0040100B 894304                 mov     [ebx+4],eax
0040100E 894308                 mov     [ebx+8],eax
Logged

eschew obfuscation
loki_dre
Guest


Email
Re: for loop with constants
« Reply #2 on: April 21, 2008, 03:47:08 PM »

are unrolled loops always faster? any recommendations on a limit?
Logged
donkey
Member
*****
Posts: 3789


ASS-embler


Re: for loop with constants
« Reply #3 on: April 21, 2008, 03:55:32 PM »

are unrolled loops always faster? any recommendations on a limit?


Yes, they are always faster though in some cases the gain would be offset by other considerations for example with loops with a large number of iterations the extra coding would be cumbersome and if the branching is predictable within the scope of the branch prediction hardware the actual gains in speed might be offset by code size and instruction caching issues. For myself I will unroll loops but none bigger than around 16 iterations but that's a personal preference and has no basis in any set rules.
Logged

"Ahhh, what an awful dream. Ones and zeroes everywhere...[shudder] and I thought I saw a two." -- Bender
"It was just a dream, Bender. There's no such thing as two". -- Fry
-- Futurama

Donkey's Stable
loki_dre
Guest


Email
Re: for loop with constants
« Reply #4 on: April 21, 2008, 05:08:21 PM »

any examples of a nested for loop??
Thanks in advance
Logged
loki_dre
Guest


Email
Re: for loop with constants
« Reply #5 on: April 21, 2008, 05:09:14 PM »

examples of a nested unrolled loop I mean
Logged
hutch--
Administrator
Member
*****
Posts: 12013


Mnemonic Driven API Grinder


Re: for loop with constants
« Reply #6 on: April 23, 2008, 06:44:12 AM »

loki,

There does not seem to be any set rule, even on the same processor. I would always advocate timing different unroll sizes as I have found optimal ranging from 0 to 16 depending on the code to be unrolled. Almost exclusively the limiting factor in algo speed is memory access speed and once you have hit that wall, nothing makes it go faster.

You can try reading in larger sizes, DWORD over BYTE etc ... but you make yourself more work handling uneven ends.
Logged

Regards,



Download site for MASM32
http://www.masm32.com
MichaelW
Global Moderator
Member
*****
Gender: Male
Posts: 5161


Re: for loop with constants
« Reply #7 on: April 23, 2008, 08:19:44 AM »

This code displays the cycle counts for a memory fill loop at unroll counts of 1 to 256. The optimal unroll count will depend on the processor, I think primarily on the size and/or characteristics of the processor’s L1 cache.
Code:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    include \masm32\include\masm32rt.inc
    .686
    include \masm32\macros\timers.asm
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    .data
    .code
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
start:
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
    mov esi, halloc(1024)

    invoke Sleep, 3000

    FOR UR,<1,2,4,8,16,32,64,128,256>

      counter_begin 1000, HIGH_PRIORITY_CLASS
          xor eax, eax
          xor edx, edx
          mov ecx, 1024/4/UR
        @@:
          N=0
          REPEAT UR
            mov [esi+edx+N], eax
            N=N+1
          ENDM
          add edx, 4*UR
          dec ecx
          jnz @B
      counter_end
      print ustr$(eax)," cycles, unrolled by "
      print ustr$(UR),13,10

    ENDM

    print chr$(13,10)
    counter_begin 1000, HIGH_PRIORITY_CLASS
        xor eax, eax
        xor edx, edx
        mov ecx, 1024/4-1
      @@:
        mov [esi+ecx*4], eax
        dec ecx
        jns @B
    counter_end
    print ustr$(eax)," cycles, optimized for no unroll",13,10,13,10

    hfree ebx

    inkey "Press any key to exit..."
    exit
; «««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««««
end start

Results on my P3:
Code:
774 cycles, unrolled by 1
392 cycles, unrolled by 2
352 cycles, unrolled by 4
444 cycles, unrolled by 8
668 cycles, unrolled by 16
580 cycles, unrolled by 32
571 cycles, unrolled by 64
577 cycles, unrolled by 128
580 cycles, unrolled by 256

518 cycles, optimized for no unroll

Edit: The unroll by one cycle count is somewhat misleading, because the loop includes an instruction that is not necessary for a loop that is not unrolled, so for reference I added a loop that eliminates this instruction.

And corrected two errors related to using ecx as a loop counter and an address component, instead of just as a loop counter.
« Last Edit: April 23, 2008, 09:44:57 AM by MichaelW » Logged

eschew obfuscation
hutch--
Administrator
Member
*****
Posts: 12013


Mnemonic Driven API Grinder


Re: for loop with constants
« Reply #8 on: April 23, 2008, 08:58:32 AM »

Here is the result on my old PIV.

Code:
548 cycles, unrolled by 1
541 cycles, unrolled by 2
519 cycles, unrolled by 4
519 cycles, unrolled by 8
520 cycles, unrolled by 16
2391 cycles, unrolled by 32
1395 cycles, unrolled by 64
1397 cycles, unrolled by 128
1396 cycles, unrolled by 256

527 cycles, optimized for no unroll

Press any key to exit...
Logged

Regards,



Download site for MASM32
http://www.masm32.com
Tedd
Procrastinator Extraordinaire
Member
*****
Posts: 2210


Reality Bytes


Re: for loop with constants
« Reply #9 on: April 23, 2008, 11:51:28 AM »

In general, you can unroll a small loop a small number of times, for immediate benefit. As the loop gets larger, you can unroll it less without causing it to fill a line of cache - if it all fits in cache then it's faster. The same goes for nested loops - unroll the inner if it doesn't make it too big, but unrolling both is most likely to make it too big.
The processor does actually optimise some loops (mainly, it can skip the decoding step), so it's not always better to unroll (but for small loops it generally is.) And it all depends on the size of the processor's cache Roll Eyes
Logged

No snowflake in an avalanche feels responsible.
Pages: [1] Print 
« previous next »
Jump to:  

Powered by MySQL Powered by PHP The MASM Forum Archive 2004 to 2012 | Powered by SMF 1.0.12.
© 2001-2005, Lewis Media. All Rights Reserved.
Valid XHTML 1.0! Valid CSS!