News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

CPU Graphic Fill rate

Started by Farabi, April 04, 2012, 10:38:49 AM

Previous topic - Next topic

Farabi


f3DTo2DF proc uses esi edi X:real4,Y:real4,Z:real4,F:real4,intScreenW:dword,intScreenH:dword

shr intScreenW,1
shr intScreenH,1
;X:= ((X/Z) * F) + intScreenW/2
;Y:= ((Y/Z) * F) + intScreenH/2

fld X
fdiv Z
fmul F
fiadd intScreenW

fld Y
fdiv Z
fmul F
fiadd intScreenH


ret
f3DTo2DF endp


Only 1 million pixels I can plot to the screen each second. Do you think GPU it out perform this? Anyone know the average of GPU can fill pixels on the screen?
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

dedndave

1) make it a MACRO instead of a PROC
2) W and H need only be SHR'ed once, FILD to the FPU stack
3) FLD Z and F, then divide and multiply, and add from the FPU stack instead of memory operands

oex

I guess it depends on what GPU you are using but I expect many are massively faster.... Also consider that you are using 1 core, that very often pixels dont need to be updated....

Also consider that if you dont use the GPU you are also going to have to write shader functions etc yourself and you wont be using more than half of the processing at your disposal....

I have not delved deep into 3D only on the CPU.... I think maybe it has applications but in reality a computer wouldnt have a GPU if it didnt need one and you should use all processing at your disposal.... I have considered to look at this as a learning excercise in 3D only (Once I have fully figured out shaders etc on the GPU)

Buy a better graphics card Onan.... It will cost you maybe $20-$30 to be able to start with shaders and GPU processing and save you a lot of wasted time....

Maybe some members more experienced than I can suggest a cheap card....
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv

dedndave

4) it will run a lot faster if there are no FPU exceptions   :P
you overload the FPU stack
5) it would be a good idea to start with a nice FINIT

qWord

Quote from: dedndave on April 04, 2012, 11:54:57 AM
4) it will run a lot faster if there are no FPU exceptions   :P
you overload the FPU stack
he had the same problem some time ago - seems like he still didn't read any FPU tutorial neither the documentation  :'(

fld F
fld Z
fdivp st(1),st
fld st
fmul X
fiadd intScreenW/2
fstp X
fmul Y
fiadd intScreenH/2
fstp Y
FPU in a trice: SmplMath
It's that simple!

dedndave

also...
i think i would multiply, then divide, then add - seems like it would yield better precision
this looks like a nice task for SSE   :P

dedndave

it would be nice to FINIT once, then run the timing tests
but, it looks like Michael's timer code uses the FPU and does not free all the registers   :'(

but - we can FINIT inside the timer code, then perform many iterations of the test in a loop and get decent results

zemtex

Quote from: Farabi on April 04, 2012, 10:38:49 AM

f3DTo2DF proc uses esi edi X:real4,Y:real4,Z:real4,F:real4,intScreenW:dword,intScreenH:dword

shr intScreenW,1
shr intScreenH,1
;X:= ((X/Z) * F) + intScreenW/2
;Y:= ((Y/Z) * F) + intScreenH/2

fld X
fdiv Z
fmul F
fiadd intScreenW

fld Y
fdiv Z
fmul F
fiadd intScreenH


ret
f3DTo2DF endp


Only 1 million pixels I can plot to the screen each second. Do you think GPU it out perform this? Anyone know the average of GPU can fill pixels on the screen?

I can plot 460 million per second without using floating point if I recall correctly. Maybe I remember wrong, but I think it was something like that. I don't remember if I used a random algorithm, if thats the case then I need to remove that to get true speed.

The problem of using the cpu for graphics is

1: You can do basic things well, but advanced things will probably slow things down tremendously
2: Your graphics application will perform as good as the next application allows it to. Basically "Peter" and his application called "Duck" can make your graphics perform just as he wishes simply by consuming cpu resources. You can not rely in performance using cpu-graphics, you can rely in gpu performance, as it affects cpu minimally.
3: Using cpu graphics is good for compatibility, you know it will just work for anyone.

Use cpu graphics for graphics applications that are not too advanced and use gpu for games and advanced graphics applications. A modern graphics card has extreme capabilities that way outperforms a cpu in many fields, way way way way out of reach by any modern cpu. The question is if you need these capabilities in your app.
I have been puzzling with lego bricks all my life. I know how to do this. When Peter, at age 6 is competing with me, I find it extremely neccessary to show him that I can puzzle bricks better than him, because he is so damn talented that all that is called rational has gone haywire.

qWord

using the graphic card makes only sense fore huge data sets - otherwise most time will be eaten by transmitting data from and to the GC.
FPU in a trice: SmplMath
It's that simple!

dedndave

#9
i get about 93 clock cycles per pixel using the FPU - and i am probably doing something wrong - lol
on a 3 GHz machine, that's over 32 million per second
of course, i am only plotting them - that time does not include rendering them
        finit

        counter_begin LOOP_COUNT,HIGH_PRIORITY_CLASS

;Xres = X * F / Z + intScreenW / 2
;Yres = Y * F / Z + intScreenH / 2

        mov     edx,480
        mov     eax,640
        shr     edx,1
        shr     eax,1
        push    edx
        fild dword ptr [esp]
        pop     edx
        push    eax
        fild dword ptr [esp]
        mov     ecx,1000000

loop01: fld     Zval
        fld     Fval

        fld     Xval
        fmul    st(0),st(1)
        fdiv    st(0),st(2)
        fadd    st(0),st(3)
        fstp    Xres

        fld     Yval
        fmul    st(0),st(1)
        fdiv    st(0),st(2)
        fadd    st(0),st(4)
        fstp    Yres

        dec     ecx
        fstp real4 ptr [esp]
        fstp real4 ptr [esp]
        jnz     loop01

        pop     eax

        counter_end

vanjast

The GPU is good for 3D/texture/lighting/and other operations... and it does them massively parallel.
Most of this stuff you load onto the GPU card before you 'play' with it..
What you're doing is manual labour and using the FPU to do it it also ... not a good idea.

If you read the GPU SDKs and other related goodies.. you'll get the general idea
:wink

Farabi

Quote from: dedndave on April 04, 2012, 11:54:57 AM
4) it will run a lot faster if there are no FPU exceptions   :P
you overload the FPU stack
5) it would be a good idea to start with a nice FINIT

O yeah, I forget to poping the FPU from the stack  :cheekygreen:
I guess I'll try to let the user had a return value from the structure they passed on the function.
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

Farabi

Quote from: qWord on April 04, 2012, 01:42:27 PM
using the graphic card makes only sense fore huge data sets - otherwise most time will be eaten by transmitting data from and to the GC.

Yeah, youre right, I forget to balancing the stack, I got 100 million persecond right now.
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

Farabi

Quote
f3DTo2DF proc uses esi edi X:real4,Y:real4,Z:real4,F:real4,intScreenW:dword,intScreenH:dword
   LOCAL _2DX,_2DY:real4
   
   shr intScreenW,1
   shr intScreenH,1
   ;X:= ((X/Z) * F) + intScreenW/2
   ;Y:= ((Y/Z) * F) + intScreenH/2
   
   fld X
   fdiv Z
   fmul F
   fiadd intScreenW
   fstp _2DX
   
   fld Y
   fdiv Z
   fmul F
   fiadd intScreenH
   fstp _2DY
   
mov eax,_2DX
mov edx,_2DY
   
   ret
f3DTo2DF endp


I got about 100 million persecond now. What about GPU? Can it achieve the same result?
Dont worry about me wasting my time for this, Im just curious about the CPU and GPU performance.


Anyway, I got a 2 core @1.6Ghz notice on the sticker on my laptop, what I want to know is, is it each core had 1.6Ghz or it is 800 Mhz each core? I guess if each core is 1.6Ghz, my laptop should be decent for doing a realtime job.
Those who had universe knowledges can control the world by a micro processor.
http://www.wix.com/farabio/firstpage

"Etos siperi elegi"

oex

both 1.6 but is that logical cores?

Also read the above posts.... Other applications will be able to massively influence your performance.... Also what about other calculations you may want to do? lighting/shading/other 3D application effects or gameplay

You *Need* to buy a better graphics card to answer these questions yourself.... I think the lowest advisable spec is NVidia 8400 but check this with other members

Look at all the statistics on the wiki also and try to understand what a GPU does

Note: (G/T/P)FLOPS

"the Nvidia GTX 480 reaches 672 GFLOPS[18] with one GPU on board."

Once I found a website that listed GFLOPS for many many CPUs but I have forgotten where

Also note the difference between MIPS and FLOPS
We are all of us insane, just to varying degrees and intelligently balanced through networking

http://www.hereford.tv