|
Pages: [1] 2 3 4
|
 |
|
Author
|
Topic: why is "add" faster than "inc" (Read 39999 times)
|
thomas_remkus
Guest
|
i'm in a tight loop. doing basically nothing but adding numbers. when I use "inc eax" it's much slower than "add eax, 1". why is this? the difference is significant.
|
|
|
|
|
Logged
|
|
|
|
arafel
Guest
|
It's faster only on PIV and above processors, because there is some penalty due to partial register stall when using inc instruction. On PIII and below "add reg, 1" is much slower.
(at least for Intel cpus, don't know if there is difference for AMD)
|
|
|
|
|
Logged
|
|
|
|
jdoe
Guest
|
(at least for Intel cpus, don't know if there is difference for AMD)
add/sub are faster than inc/dec even on AMD processor. 
|
|
|
|
|
Logged
|
|
|
|
QvasiModo
Guest
|
It's because add changes all of the arithmetic registers, while inc changes only some of them - so the processor may have to wait before another arithmetic operation completes just to set the flags correctly, even when the calculations are completely unrelated. For example, if I have this: cmp eax,10h add edx,1
the processor doesn't have to wait for the cmp instruction to complete to be able to execute the add instruction. But if I have this: cmp eax,10h inc edx
then the processor has to wait for cmp to know how the flags have to be set after executing inc.
|
|
|
|
|
Logged
|
|
|
|
Ratch
Guest
|
jdoe, add/sub are faster than inc/dec even on AMD processor
Both ADD and INC are DirectPath vs VectorPath instructions according to the AMD Optimization Manual http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf . Also many optimization examples in the manual use INC under the right circumstances, i.e. no reading or writing the register immediately after modifying it. I can't find any statement in the manual saying that a ADD is preferable to a INC on the AMD. QvasiModo, It's because add changes all of the arithmetic registers, while inc changes only some of them
Both ADD and INC change only the register that they are coded to change. For example, if I have this:
Code: cmp eax,10h add edx,1 the processor doesn't have to wait for the cmp instruction to complete to be able to execute the add instruction. But if I have this:
Code: cmp eax,10h inc edx then the processor has to wait for cmp to know how the flags have to be set after executing inc
Why? In both cases the ADD in the first snippet and the INC in the second snippet are going to wipe away any flag settings of the CMP instruction. Ratch
|
|
|
|
|
Logged
|
|
|
|
|
tenkey
|
I think QvasiModo is referring to the differences in flag settings.
Because ADD and CMP change the same set of flags, and INC and CMP don't, there may be a stall for creating the correct flag setting in the latter case.
The difference is CF. In multiprecision arithmetic, you would use INC/DEC for counting and updating addresses. You would need to save and restore CF if there were no increment/decrement instructions that left CF alone.
|
|
|
|
|
Logged
|
A programming language is low level when its programs require attention to the irrelevant. Alan Perlis, Epigram #8
|
|
|
jdoe
Guest
|
jdoe, add/sub are faster than inc/dec even on AMD processor
Both ADD and INC are DirectPath vs VectorPath instructions according to the AMD Optimization Manual http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf . Also many optimization examples in the manual use INC under the right circumstances, i.e. no reading or writing the register immediately after modifying it. I can't find any statement in the manual saying that a ADD is preferable to a INC on the AMD. I don't mind about what was written or not. From the test I did, using add/sub is in worst case as fast or faster that inc/dec. On my AMD athlon 1800+ though. If you ear from the radio that the sky is green today, would you believe it without going outside to see it by youself ?
|
|
|
|
|
Logged
|
|
|
|
Ratch
Guest
|
jdoe, If you ear from the radio that the sky is green today, would you believe it without going outside to see it by youself ? From the radio? No I certainly would not. But if the one who made the sky said so, then I would believe it until I saw otherwise. Check your timings again. They can be tricky with insidious pitfalls. Ratch
|
|
|
|
|
Logged
|
|
|
|
Ratch
Guest
|
tenkey, Because ADD and CMP change the same set of flags, and INC and CMP don't, there may be a stall for creating the correct flag setting in the latter case. Again I point out, the CMPs in his example code are effectively NOPs. The flags the CMPs set or clear are wiped out by the following ADD and INC instructions. Ratch
|
|
|
|
|
Logged
|
|
|
|
hutch--
Administrator
Member
    
Posts: 12013
Mnemonic Driven API Grinder
|
In the words of Intel from PIV manual 4, The inc and dec instructions should always be avoided. Using add and sub instructions instead of inc and dec instructions avoid data dependence and improve performance.
This probably has something to do with why ADD SUB are faster on later Intel hardware. 
|
|
|
|
|
Logged
|
|
|
|
|
tenkey
|
tenkey, Because ADD and CMP change the same set of flags, and INC and CMP don't, there may be a stall for creating the correct flag setting in the latter case. Again I point out, the CMPs in his example code are effectively NOPs. The flags the CMPs set or clear are wiped out by the following ADD and INC instructions. Ratch Here is a demonstration of the INC instruction (DEC is similar). Predict what the following code will produce, then run it. Replace the INC with the equivalent ADD and see if there's a difference. .386 .model stdcall, flat option casemap :none ; case sensitive
include c:\masm32\include\windows.inc include \masm32\include\user32.inc include \masm32\include\kernel32.inc
includelib c:\masm32\lib\kernel32.lib includelib c:\masm32\lib\user32.lib
.data caseclr db "CF is cleared by CMP, not set by INC.",0 case0 db "CF is not set by STC.",0 case1 db "CF is not cleared by CMP.",0 case2 db "CF is set by INC.",0 .code _start:
stc ; set CF jnc carryclear_case0 ; error if CF is clear mov ecx,-1 mov eax,7 cmp eax,3 ; 7 - 3 = 4, no carry (borrow) jc carryset_case1 ; error if CF is set ; CF status is "clear" inc ecx ; FFFFFFFF + 1 = 0 w/carry - is CF set? jc carryset_case2 ; find out! carryclear: invoke MessageBox,NULL,addr caseclr,addr caseclr,MB_OK jmp quit carryclear_case0: invoke MessageBox,NULL,addr case0,addr case0,MB_OK jmp quit carryset_case1: invoke MessageBox,NULL,addr case1,addr case1,MB_OK jmp quit carryset_case2: invoke MessageBox,NULL,addr case2,addr case2,MB_OK quit: invoke ExitProcess,0
end _start
|
|
|
|
|
Logged
|
A programming language is low level when its programs require attention to the irrelevant. Alan Perlis, Epigram #8
|
|
|
EduardoS
Guest
|
(at least for Intel cpus, don't know if there is difference for AMD)
add/sub are faster than inc/dec even on AMD processor.  Maybe under certain conditions, generaly not: Press any key to start... add 1 : 1019 clocks add 2 : 1020 clocks add 3 : 1020 clocks add 4 : 1363 clocks inc 1 : 1020 clocks inc 2 : 1020 clocks inc 3 : 1021 clocks inc 4 : 1361 clocks add/cmp : 1019 clocks inc/cmp : 1019 clocks Press any key to exit...
[attachment deleted by admin]
|
|
|
|
|
Logged
|
|
|
|
dsouza123
Guest
|
Athlon 1.2 Ghz @ 1190 Mhz Windows XP SP2 512MB
Press any key to start... add 1 : 1026 clocks add 2 : 1027 clocks add 3 : 1351 clocks add 4 : 1802 clocks inc 1 : 1027 clocks inc 2 : 1025 clocks inc 3 : 1028 clocks inc 4 : 1373 clocks add/cmp : 1026 clocks inc/cmp : 1027 clocks Press any key to exit...
|
|
|
|
|
Logged
|
|
|
|
Mark Jones
Drifting in the Abstract
Member
    
Posts: 2302
=- Stargate Atlantis -=
|
when I use "inc eax" it's much slower than "add eax, 1". why is this? Code optimization is like working on a Sudoku puzzle or decrypting an encrypto-gram. Can be lots of fun, and also maddeningly annoying at the same time. See Agner Fog's optimization guide: http://www.agner.org/assem/
|
|
|
|
|
Logged
|
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08
|
|
|
jdoe
Guest
|
But if the one who made the sky said so, then I would believe it until I saw otherwise.
You know how to answer. Ok, I give you one point about a little gain with INC/DEC on AMD in some circumstance but I keep saying that generaly, using ADD/SUB is as fast or faster. In other words, when writing optimize code, trying both is a good idea.
|
|
|
|
|
Logged
|
|
|
|
|
|
Pages: [1] 2 3 4
|
|
|
 |