I will post a complete file after this, but I don't have time tonight, I'm at work, and it's super busy.
I will also TEST this, because I haven't done that yet either. This is based on Bresenham's Algorithm for line drawing, as described in Michael Abrash's Book: Zen of Graphics Programming. (VGA)
Take a look, let me know what you all think, and if it can be faster. (or if it will even work...i will log on again during this weekend.)
LineDrawing
mov edi, bitmap ; edi points to 0,0 on bitmap
mov eax, maxx ; load length of row
imul maxy ; times by length of column
shl eax, 2 ; multiply by 4 (32-bits per pixel)
push eax ; store bitmap size at ESP+4 (see next push)
mov eax, maxx ; load length of row
shl eax, 2 ; multiply by 4 (32-bits per pixel)
push eax ; store ROW size at ESP
here:
mov eax, 0 ; x0
add edi, eax ; edi points to (x0,0)
mov ebx, 0 ; y0
pop eax ; restore
push eax ; and save row length
imul ebx ; row length * row of beginning point (ebx = y0)
add edi, eax ; edi points to (x0,y0)
mov eax, 0 ; restore x0
mov ecx, maxx ; x1
mov edx, maxy ; y1
; note that x0,y0,x1,y1 are normally parameters passed in
; or set to variables and then whatever...
cmp ebx, edx ; compare y0 & y1
jle noxchg ; if ebx <= edx jump to avoid xchgs
xchg eax, ecx ; swap x0 & x1
xchg ebx, edx ; swap y0 & y1
noxchg:
sub edx, ebx ; dy = y1-y0
sub ecx, eax ; dx = x1-x0
; note: here eax,ebx are not needed...
mov ebx, 1 ; XDIR set to one
cmp ecx, 0 ; if dx is <= 0 jumps to avoid change
jle xdirfound ; jump here
neg ebx ; XDIR to negative one
xdirfound:
; Variables at this point:
; eax = free ; ebx = XDIR (+or-1) ; ecx = DX ; edx = DY
; EDI points to (x0,y0) ; ESP = ROW SIZE in bytes
; ESP+4 = BITMAP SIZE in bytes
cmp ecx, edx
jle xloopx
; yloop (if we needed a label)...
shl edx, 1 ; EDX = DY*2 (DY2)
movd MM3, edx ; MM3 = DY2
sub edx, ecx ; EDX = DY2 - DX
movd MM2, edx ; MM2 is set to DY2 - DX (Error of line)
sub edx, ecx ; EDX = DY2 - (DX*2) DY2mDX2
movd MM0, edx ; MM0 = DY2mDX2
mov eax, 00FF00FFh ; purple hex color
movd MM1, eax ; MM1 = color of pixel
stosd ; since it's already in EAX, draw eax color to es:edi (stosd)
beginwhiley:
sub ecx, 1 ; while (deltaX--)
jz afterlinexy ; if zero jump to end of while loop
movd eax, MM2 ; set eax to ERROR
cmp eax, 0 ; compare ERROR to ZERO
jl thiselsey ; if less than 0 jump to thiselsey
; otherwise continue
add edi, [esp] ;OR ;pop eax ;push eax ;add edi, eax
paddd MM2, MM0 ; ADD DY2mDX2 to ERROR or line
jmp afterelsey ; skip next step
thiselsey:
paddd MM2, MM3 ; ADD DY2 to ERROR of line.
afterelsey:
add edi, ebx ; ADD XDIR to Pixel point
movd eax, MM1 ; set color in EAX for stosd
stosd ; copy eax to es:edi
jmp beginwhiley ; LOOP UP
xloopx:
shl ecx, 1 ; ECX = DX*2 (DX2)
movd MM3, ecx ; MM3 = DX2
sub ecx, edx ; ECX = DX2 - DY
movd MM2, ecx ; MM2 = DX2-DY (Error)
sub ecx, edx ; ECX = DX2 - (DY*2)
movd MM0, ecx ; MM0 = DX2mDY2
mov eax, 00FF00FFh ; purple hex color
movd MM1, eax ; MM1 = color of pixel
stosd ; since it's already in EAX, draw eax color to es:edi (stosd)
beginwhilex:
sub edx, 1 ; while (deltaY--)
jz afterlinexy ; if zero jump to end of line drawing
movd eax, MM2
cmp eax, 0 ; if Error >= 0
jl thiselsex ; otherwise jump to thiselsex
add edi, ebx ; add ebx (xdir to edi)
paddd MM2, MM0 ; add DX2mDY2 to ERROR
jmp afterelsex
thiselsex:
paddd MM2, MM3 ; just add to error
afterelsex:
add edi, [esp] ; ADD ROW to Pixel point
movd eax, MM1 ; set color in eax for stosd
stosd ; copy eax to es:edi
jmp beginwhilex ; LOOP UP
afterlinexy: ; LINE COMPLETE
Later guys,
Jeff C
:8)
Hey, ! Yes, but have you read Andre LaMothe's book? Advanced 3D Graphics Rasterization.
Not all computers have graphics cards.
Mine didn't until I opened the case and put one in...boom $99
Plus, line-drawing is pretty basic, if you can learn how to do that, game graphics are a lot easier! After all, that's just simple bitmap manipulation and arrangement.
ALSO, this isn't 3d either...that's next. Wish me luck!
Later,
Jeff C
:bdg
P.S. Don't get me wrong, I wish they all did! That would be awesome! They should all have sound cards, video capture cards, wireless high-speed internet, 2.0 gigs RAM, telephone and cable on-board is next!!! Just wait!!!
:8)
I think having labels in your code with the word "SEX" in them is going to offend our one female forum reader. Capitalization for emphasis.
add edi, ebx ; add ebx (xdir to edi)
paddd MM2, MM0 ; add DX2mDY2 to ERROR
jmp afterelsex
thiselSEX:
paddd MM2, MM3 ; just add to error
afterelSEX:
add edi, [esp] ; ADD ROW to Pixel point
movd eax, MM1 ; set color in eax for stosd
stosd ; copy eax to es:edi
jmp beginwhilex ; LOOP UP
I think OceanJeff32 means "this else x" and "after else x".
Thanks AERO, yeah, that's what I meant...whoops! whoopie! oh well.
Hey hitchhiker, my main point was the average consumer isn't going to pay for a separate video card, they are going to wait for the integrated stuff...guys like us aren't average, we like to spend extra money on our computers.
This code listing is not the skeleton, I have that file available, but didn't bring it to work tonight, so I will upload it tomorrow, for all to assemble and run. There is a problem with the line drawing, that I'm not sure what's happening...I get a line, and it starts drawing at the correct location, but the error and other term does not point the end to the correct location. The next step is to optimize for run-slicing, that is: calculating the amount of horizontal pixel runs, and optimizing the runs for each segment of the line. I may just move onto that and hope the problems fix themselves. We'll see. I don't care if this works fast, so much as I want to understand what the code is doing. I should look into the debugger soon...oh boy
Any suggestions on debugging this one guys?
Eventually all will be assimilated,
Jeff C
:U
Quote from: Mark_Larson on February 25, 2005, 04:34:33 PM
I think having labels in your code with the word "SEX" in them is going to offend our one female forum reader. Capitalization for emphasis.
add edi, ebx ; add ebx (xdir to edi)
paddd MM2, MM0 ; add DX2mDY2 to ERROR
jmp afterelsex
thiselSEX:
paddd MM2, MM3 ; just add to error
afterelSEX:
add edi, [esp] ; ADD ROW to Pixel point
movd eax, MM1 ; set color in eax for stosd
stosd ; copy eax to es:edi
jmp beginwhilex ; LOOP UP
I dont know am I made mistake or not, but
mov eax,[esi]
mov edx,[esi+4]
is more fast than mmx instrustion
movq mm0,[esi]
Quote from: AeroASM on February 25, 2005, 06:48:16 PM
I think OceanJeff32 means "this else x" and "after else x".
I know. I am teasing him ;)
Isn't there a manual that lists the timings of each instruction? I saw one a long time ago, but the new intel manuals don't have each instruction timed, maybe because of branch prediction, etc.
Later,
Jeff C
:bdg
Quote from: OceanJeff32 on February 26, 2005, 06:04:38 AM
Isn't there a manual that lists the timings of each instruction? I saw one a long time ago, but the new intel manuals don't have each instruction timed, maybe because of branch prediction, etc.
Later,
Jeff C
:bdg
If you were looking in the Instruction Manual ( book 2), it isn't there. You have to download the optimization manual. It's a seperate book. Let me know if you have trouble finding it on Intel's site. They have timings for all the instructions. Also Agner Fog has timings on his webpage www.agner.org. There is a PDF in the assembler section.
Hitchhikr: Thanks, eliminating that cmp that does what SUB already does, is a good idea! I need to start thinking more like an ASM programmer.
Mark: Yeah, I brought the optimization reference manual to work today, it's actually pretty good reading...I must be a nerd...lol
I'll get back to this project soon too. I'm going to explore the Direct X Skeleton...
Later,
Jeff C
:U
Good luck with it :) I always enjoyed doing graphics programming. I prefer OpenGL over DirectX.
Quote from: OceanJeff32 on February 25, 2005, 06:14:40 AM
P.S. Don't get me wrong, I wish they all did! That would be awesome! They should all have sound cards, video capture cards, wireless high-speed internet, 2.0 gigs RAM, telephone and cable on-board is next!!! Just wait!!!
You forgot the dual 20GHz Sexium processors...
and the five 200petabyte hard drives
Hi Jeff, what about the parameter? Can you made it this function without MMX and with parameter? My line algo is about 50xslower compared to the Microsoft line algo.
Yes, I could have done that, I guess if you PUSH all the variables instead of storing them in MMX registers that saves the registers for other things...including other threads...
Then, you would just reference the stack pointer [esp]
I am working on a particle system first though, I will post code here in a few days or so, I'm getting started with Spring Quarter at the local college.
Later guys,
Jeff C
:bdg
Hi!!
First I have to admit, I didn't read all the posts here, so maybe this post is not "uptodate". I was just wondering about this:
As far as I know Bresenham's method for drawing lines is a beloved one, because it can be implemented fully with integer arithmetic. So what I don't really see, why the MMX and 64bit arithmetic?
Greets, Gábor
Quote from: gabor on April 06, 2005, 10:19:31 AM
Hi!!
First I have to admit, I didn't read all the posts here, so maybe this post is not "uptodate". I was just wondering about this:
As far as I know Bresenham's method for drawing lines is a beloved one, because it can be implemented fully with integer arithmetic. So what I don't really see, why the MMX and 64bit arithmetic?
Greets, Gábor
Optimisation for speed. MMX and SSE technology allows code (integer arithmetic) to be run in parallel and this speeds up the code. For instance using mmx technology you can add 2 pairs of dwords together in one instruction as opposed to requiring 2 adds to add up the 2 pairs of dwords.
Thx!
This is a good thing to keep in mind! Well, I need to extend my knowledge about MMX and SSE...
Greets, Gábor
when investigating in fastest generating spheremeshes on the fly I modified bresenhams circle algo to be recursive for each extra dimension, so it can draw spheres, even superspheres etc
LoL 8D graphics engine using Cayley's octonions! :bg :bg :wink
recursive bresenhams circle algo, well it only makes a quarter of a circle
circle PROC
LOCAL @@rq :DWORD
LOCAL @@yq :DWORD
LOCAL @@xq :DWORD
LOCAL @@temp :DWORD
LOCAL @@color :DWORD
pushad
mov edx,[zzz]
mov [@@temp],edx
lea edi,[buffer]
xor cx,cx
mov dx,ax ;eax=radius
mov bx,3
sub bx,dx
.WHILE cx < dx
;call symplot
pushad
.IF [recurf]==0 ;render a slice of a sphere
inc [recurf]
;mov [recurf],eax
mov eax,ecx ;recursive call to create a sphere
;change z offset from cx
mov [zcentref],edx
call circle
;xor eax,eax
;mov [recurf],eax
dec [recurf]
.ELSE ;render a pixel
xor ebx,ebx
xor eax,eax
mov ax,cx
mov [@@xq],eax
mov ax,dx
mov [@@yq],eax
fild @@xq
fimul @@temp
fidiv zcentref
fistp @@xq
fild @@yq
fimul @@temp
fidiv zcentref
fistp @@yq
mov ecx,[@@xq]
mov edx,[@@yq]
.IF edx>255
popad
popad
ret
.ENDIF
.IF ecx>255
popad
popad
ret
.ENDIF
mov bh,cl ;y*256
mov bl,dl ;x
;add ebx,8080h
and ebx,0ffffh
;mov eax,0ffffffffh
mov eax,[zcentref]
mov [edi+ebx*4],al
inc ebx
mov [edi+ebx*4],al
inc ebx
mov [edi+ebx*4],al
inc ebx
mov [edi+ebx*4],al
mov bh,dl
mov bl,cl
;add ebx,8080h
and ebx,0ffffh
mov eax,[zcentref]
mov [edi+ebx*4],al
inc ebx
mov [edi+ebx*4],al
inc ebx
mov [edi+ebx*4],al
inc ebx
mov [edi+ebx*4],al
.ENDIF
popad
.IF bx<zero
mov ax,cx
shl ax,2
add ax,6
.ELSE
mov ax,cx
sub ax,dx
shl ax,2
add ax,10
dec dx
.ENDIF
inc cx
add bx,ax
.ENDW
.IF cx==dx
;call symplot
.ENDIF
popad
ret
circle ENDP
Recursive algo? IS it a 3D circle? Any screen shot?
Quote from: Farabi on June 23, 2005, 12:13:55 AM
Recursive algo? IS it a 3D circle? Any screen shot?
it fast generate sphere, but it looks like crap, because I dont wanna add costly two atans/pixel to texturemap it
I have a slower algo for that
but intention was for generate spheremeshes on the fly
line algorithm without circle algorithm? :wink ...let me paste it in :U
<head>
<script language="javascript">
function circle(r)
{
blackcell = "<td bgcolor=#000000><table><td></table>";
redcell = "<td bgcolor=#FF0000><table><td></table>";
rr = r+r;
v = r*r;
dyy = 1-rr;
dy = 2;
str = "<table cellspacing=0 cellpadding=0>";
for(c2=0; c2<rr; c2++)
{
dxx = 1-rr;
dx = 2;
v += dyy;
dyy += dy;
str += "<tr>";
for(c=0; c<rr; c++)
{
v += dxx;
dxx += dx;
str += v<0? redcell: blackcell;
}
}
str += "</table>";
document.write(str);
};
circle(30);
</script>
</head>
Howdy,
Just for interests sake, a DDA routine is faster than Bresenhams these days... well pretty much since the early pentiums.
It's still a good programming excercise though IMO :)
It also makes doing anti-aliased lines a bit more straight forward.
It is completly imposible for DDA to be in any way faster than Bresenham Algorithm for line drawing.
A variation of Bresenham that exploits the warap arround to drop any comparations is presumably the fastest ever...
but honestly any decent Bresenham will eat alive DDA at breakfast...
The original Bresenhams line algorithm was superseeded by his run length slice version. This was considerably faster. However maintaining needless accumulators with error margin is unnecessary. Using a DDA which is reversed to operate with the minor axis as the independant one and draw slices with good ol fixed point is by far the fastest algorithm. It would give perfect accuracy for lines up to about 16,000 pixels in length... which is far longer than any line you're ever going to draw to screen.
The only thing that needs to be taken into account is to ensure that the run-length is divide between the first and last run to ensure the lines is A) balanced around it's center and B) does lead to incorrect lines with a single pixel end point.
I did an implementation of this a while ago with a fairly fast clipping implementation.. I think it had a few mmx optimisations in it as well.
An additional note, there are a huge number of optimisations that you can apply to this and other line algorithms with special cases for longer runs, treating steep and shallow lines seperately so you do vertical or horizontal slices, removing the divides for gradient calculations by pre-clipping and using a table (which has as many entries as your screen height dx / dy etc), special cases for "runs of runs" etc.. etc..
I havent gone through and understood your code yet, but I would like to explain a syystem I devised many years ago (PDP11).The system I called 'Linear digital Interpolator'. With the PDP11 multiply's had to be avoided (to slow)
Imagine six registers
X . PartialX Y . Partial y
DeltaX DeltaY
Lets suppose now you wish to move forward 7 in X and 5 in Y (we will do it in decimal so its easier to understand. Set DeltaX to 7 and DeltaY to 5. If we add Delta x to PartialX ten times carrying from PartialX to X(and the same with Y) the X will INCREMENT to 7 while Y will increment to 5. Each time we increment X or Y we plot a new pixel. Note there is an assumed decimal point between X & Partial X (also Y). This will plot a straight line, however we need to set initial 'Partials' to 0.5 otherwise 4.99 will plot as pixel 4.
If you were doing this on a 16 bit machine you could argue that you woulld have to do 64k additions. However if you shift both X & Y left till one of the MSD's is one and shift the multiplier (counter) right by the same amount, then you will have an increment every addition
Note that this is 2 dimensions and could be expanded to 3. Also by cross coupling you can make the interpolator circular instead of linear.