News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

why is "add" faster than "inc"

Started by thomas_remkus, April 28, 2006, 08:17:49 PM

Previous topic - Next topic

thomas_remkus

i'm in a tight loop. doing basically nothing but adding numbers. when I use "inc eax" it's much slower than "add eax, 1". why is this? the difference is significant.

arafel

It's faster only on PIV and above processors, because there is some penalty due to partial register stall when using inc instruction. On PIII and below "add reg, 1" is much slower.

(at least for Intel cpus, don't know if there is difference for AMD)

jdoe

Quote from: arafel on April 28, 2006, 08:33:46 PM
(at least for Intel cpus, don't know if there is difference for AMD)

add/sub are faster than inc/dec even on AMD processor.  :thumbu



QvasiModo

It's because add changes all of the arithmetic registers, while inc changes only some of them - so the processor may have to wait before another arithmetic operation completes just to set the flags correctly, even when the calculations are completely unrelated.

For example, if I have this:

cmp eax,10h
add edx,1

the processor doesn't have to wait for the cmp instruction to complete to be able to execute the add instruction. But if I have this:

cmp eax,10h
inc edx

then the processor has to wait for cmp to know how the flags have to be set after executing inc.

Ratch

 jdoe,

Quote
add/sub are faster than inc/dec even on AMD processor

     Both ADD and INC are DirectPath vs VectorPath instructions according to the AMD Optimization Manual http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf .  Also many optimization examples in the manual use INC under the right circumstances, i.e. no reading or writing the register immediately after modifying it.  I can't find any statement in the manual saying that a ADD is preferable to a INC on the AMD.

QvasiModo,

Quote
It's because add changes all of the arithmetic registers, while inc changes only some of them

     Both ADD and INC change only the register that they are coded to change.

Quote
For example, if I have this:

Code:
cmp eax,10h
add edx,1
the processor doesn't have to wait for the cmp instruction to complete to be able to execute the add instruction. But if I have this:

Code:
cmp eax,10h
inc edx
then the processor has to wait for cmp to know how the flags have to be set after executing inc

     Why?  In both cases the ADD in the first snippet and the INC in the second snippet are going to wipe away any flag settings of the CMP instruction.  Ratch

tenkey

I think QvasiModo is referring to the differences in flag settings.

Because ADD and CMP change the same set of flags, and INC and CMP don't, there may be a stall for creating the correct flag setting in the latter case.

The difference is CF. In multiprecision arithmetic, you would use INC/DEC for counting and updating addresses. You would need to save and restore CF if there were no increment/decrement instructions that left CF alone.
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

jdoe

Quote from: Ratch on April 29, 2006, 12:04:18 AM
jdoe,

Quote
add/sub are faster than inc/dec even on AMD processor

     Both ADD and INC are DirectPath vs VectorPath instructions according to the AMD Optimization Manual http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf .  Also many optimization examples in the manual use INC under the right circumstances, i.e. no reading or writing the register immediately after modifying it.  I can't find any statement in the manual saying that a ADD is preferable to a INC on the AMD.

I don't mind about what was written or not. From the test I did, using add/sub is in worst case as fast or faster that inc/dec. On my AMD athlon 1800+ though.

If you ear from the radio that the sky is green today, would you believe it without going outside to see it by youself ?

Ratch

jdoe,

QuoteIf you ear from the radio that the sky is green today, would you believe it without going outside to see it by youself ?

     From the radio?  No I certainly would not.  But if the one who made the sky said so, then I would believe it until I saw otherwise.  Check your timings again.  They can be tricky with insidious pitfalls.  Ratch

Ratch

tenkey,

QuoteBecause ADD and CMP change the same set of flags, and INC and CMP don't, there may be a stall for creating the correct flag setting in the latter case.

     Again I point out, the CMPs in his example code are effectively NOPs.  The flags the CMPs set or clear are wiped out by the following ADD and INC instructions.  Ratch

hutch--

In the words of Intel from PIV manual 4,

Quote
The inc and dec instructions should always be avoided. Using add and sub instructions instead of inc and dec instructions avoid data dependence and improve performance.

This probably has something to do with why ADD SUB are faster on later Intel hardware.  :bg
Download site for MASM32      New MASM Forum
https://masm32.com          https://masm32.com/board/index.php

tenkey

Quote from: Ratch on April 29, 2006, 01:10:44 AM
tenkey,

QuoteBecause ADD and CMP change the same set of flags, and INC and CMP don't, there may be a stall for creating the correct flag setting in the latter case.

     Again I point out, the CMPs in his example code are effectively NOPs.  The flags the CMPs set or clear are wiped out by the following ADD and INC instructions.  Ratch

Here is a demonstration of the INC instruction (DEC is similar). Predict what the following code will produce, then run it. Replace the INC with the equivalent ADD and see if there's a difference.

.386
.model stdcall, flat
option casemap :none   ; case sensitive

include c:\masm32\include\windows.inc
include \masm32\include\user32.inc
include \masm32\include\kernel32.inc

includelib c:\masm32\lib\kernel32.lib
includelib c:\masm32\lib\user32.lib

.data
caseclr db "CF is cleared by CMP, not set by INC.",0
case0   db "CF is not set by STC.",0
case1   db "CF is not cleared by CMP.",0
case2   db "CF is set by INC.",0
.code
_start:

stc        ; set CF
jnc carryclear_case0   ; error if CF is clear
mov ecx,-1
mov eax,7
cmp eax,3  ; 7 - 3 = 4, no carry (borrow)
jc carryset_case1   ; error if CF is set
; CF status is "clear"
inc ecx    ; FFFFFFFF + 1 = 0 w/carry - is CF set?
jc carryset_case2   ; find out!
carryclear:
invoke MessageBox,NULL,addr caseclr,addr caseclr,MB_OK
jmp quit
carryclear_case0:
invoke MessageBox,NULL,addr case0,addr case0,MB_OK
jmp quit
carryset_case1:
invoke MessageBox,NULL,addr case1,addr case1,MB_OK
jmp quit
carryset_case2:
invoke MessageBox,NULL,addr case2,addr case2,MB_OK
quit:
invoke ExitProcess,0

end _start
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

EduardoS

Quote from: jdoe on April 28, 2006, 09:51:16 PM
Quote from: arafel on April 28, 2006, 08:33:46 PM
(at least for Intel cpus, don't know if there is difference for AMD)

add/sub are faster than inc/dec even on AMD processor.  :thumbu




Maybe under certain conditions, generaly not:

Press any key to start...
add 1 : 1019 clocks
add 2 : 1020 clocks
add 3 : 1020 clocks
add 4 : 1363 clocks
inc 1 : 1020 clocks
inc 2 : 1020 clocks
inc 3 : 1021 clocks
inc 4 : 1361 clocks
add/cmp : 1019 clocks
inc/cmp : 1019 clocks
Press any key to exit...

[attachment deleted by admin]

dsouza123

Athlon 1.2 Ghz @ 1190 Mhz
Windows XP SP2  512MB

Press any key to start...
add 1 : 1026 clocks
add 2 : 1027 clocks
add 3 : 1351 clocks
add 4 : 1802 clocks
inc 1 : 1027 clocks
inc 2 : 1025 clocks
inc 3 : 1028 clocks
inc 4 : 1373 clocks
add/cmp : 1026 clocks
inc/cmp : 1027 clocks
Press any key to exit...

Mark Jones

Quote from: thomas_remkus on April 28, 2006, 08:17:49 PM
when I use "inc eax" it's much slower than "add eax, 1". why is this?

Code optimization is like working on a Sudoku puzzle or decrypting an encrypto-gram. Can be lots of fun, and also maddeningly annoying at the same time. :bg   

See Agner Fog's optimization guide: http://www.agner.org/assem/
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

jdoe

Quote from: Ratch on April 29, 2006, 01:02:43 AM
But if the one who made the sky said so, then I would believe it until I saw otherwise.

You know how to answer.

Ok, I give you one point about a little gain with INC/DEC on AMD in some circumstance but I keep saying that generaly, using ADD/SUB is as fast or faster. In other words, when writing optimize code, trying both is a good idea.