News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

first raytracing tutorial

Started by Mark_Larson, September 25, 2008, 12:49:37 PM

Previous topic - Next topic

Mark_Larson

  I am going to do this in parts.  First we are going to start with learning how it works. 

If you read just the FIRST page from this webpage, it goes into a lot of details of how raytracing works.  It also has pictures.  We won't be using his code as our first tutorial, it's pretty complicated for beginners.  So don't download the code.

http://www.flipcode.com/archives/Raytracing_Topics_Techniques-Part_1_Introduction.shtml

  So let's take a look at the code we will be trying.  It's really small and thus makes a good starting point.  It simply draws a sphere and has ambient light.


  They are using OpenGL for the tutorial.  I am using SDL.  So I am converting it to use SDL without hardware acceleration.  If we need higher frame rates I will change it from software rendering to hardware rendering using OpenGL, which will run under Linux and Windows.

http://www.dawnofthegeeks.com/software_rendering/index.php?section=c&page=lesson24

links to the code.  One file is for doing the graphics, the other for doing the raytracing.

http://www.dawnofthegeeks.com/software_rendering/resources/lesson24.cpp
http://www.dawnofthegeeks.com/software_rendering/resources/core.cpp
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Mark_Larson

  I decided to only do ONE sphere, with no lights.  Meaning the whole sphere will be the same color.  It is a good starting point, and then we will expand from there.  So to determine if a ray hits a sphere, you have to run a formula.  "hits" = intersect.  So we would write a routine to check for the intersection of a ray and a sphere.  The ray starts at the camera, and moves toward the object.  In general the camera won't move.

    I already converted lesson 24's the ray-sphere intersect routine to scalar sse.  It is in it's own thread on this subforum, that is why I wanted people to re-look at it.

   So let's look at some basic math.  In this case the "Origin" is defined as where the camera is, which is where the ray starts.  The "Direction" has to be normalized, and it's the direction that the ray is shooting in.  A 'Ray' is defined as
Origin + t * Direction

Both the "Origin" and the "Direction" are vectors in 3D.  We need to normalize the "Direction" vector in the "Ray" forumula.  So how do we do that?  pretty easy.  Here is the formula

x of Direction vector / LENGTH of Direction vector
y of Direction vector / LENGTH of Direction vector
z of Direction vector / LENGTH of Direction vector


so how do we calculate the Length of a vector?  pretty easy as well.  The 'x', 'y', and 'z' are the 3D coordinates for the vector you are calculating.

  vector length = sqrt( x*x + y*y + z*z)


  we'll be using reverse sqrt which is a lot faster than sqrt for Normalization.  On my core 2 duo, the sqrtss ( scalar) runs between 6 - 29 cycles.  The reverse one ( rsqrtss)runs in 3 cycles ( no range).  So on average 6 + 29 / 2 = 17.5 cycles for the square root, which means that the reverse square root is 5.83 times faster.

  Fun fact, the scalar and non-scalar versions of sqrt and reverse square root take the exact same cycle account.  So at some point we will be converting to vector SSE instructions.

  Usually Normalizing is really slow due to the sqrt, so one of the tricks is to pre-calculate all the Normalized Direction vectors for the whole screen.  However we might not have to do that, since we will be using reverse square root, which is really fast.

  I will post the math to do the intersection with the sphere, next.

BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

dr_eck

I know this is a MASM forum, but it would be helpful for those of us more interested in ray tracing than ASM to see code in either inlined assembler or intrinsics.  To be very specific, here are three functions:

inline real Len2() const {return x*x+y*y+z*z;}
inline real Len() const {return sqrt(x*x+y*y+z*z);}
inline Vec3& operator!() {real L = this->Len(); if(L>0) *this/=L; return *this;} // normalize this

What would the normalization operator!()  look like using a reciprocal square root intrinsic?  Is the L=0 case properly handled?

I appreciate your efforts!

Mark_Larson

Quote from: dr_eck on September 26, 2008, 05:44:31 PM
I know this is a MASM forum, but it would be helpful for those of us more interested in ray tracing than ASM to see code in either inlined assembler or intrinsics.  To be very specific, here are three functions:

inline real Len2() const {return x*x+y*y+z*z;}
inline real Len() const {return sqrt(x*x+y*y+z*z);}
inline Vec3& operator!() {real L = this->Len(); if(L>0) *this/=L; return *this;} // normalize this

What would the normalization operator!()  look like using a reciprocal square root intrinsic?  Is the L=0 case properly handled?

I appreciate your efforts!

The tutorial I am writing is in C not C++.  There is extra overhead in C++ from having to handle classes.  You have to pass the "this" pointer to every function that uses a class, which adds overhead.  There are also some problems when doing Constructors and Destructors if you don't do them correcly, you can get big slow downs. 

The intrinsic in ICC to do the reverse square root is as follows:

scalar version
RSQRTSS __m128 _mm_rsqrt_ss(__m128 a)

vector version
RSQRTPS __m128 _mm_rsqrt_ps(__m128 a)


Does that help?  As for the L, you would still have to make sure it was non-zero, or check the return value from the rsqrtss.  When you pass 0.0f to it, you it will return Plus or Minus INFINITE depending on the sign.  So you can check for that, or check if L > 0. 

how is your C?  I will eventually convert the C code to ASM.  But not right away.  Here is my C code for your stuff.  I am also including the definitions of the different data types.

typedef struct {
float x,y,z;
// float w; //we will add 'w' later when we start doing vector SIMD
}Vector;

typedef struct {
Vector Center;
float Radius;
// float one_over_radius;
}Sphere;

#define MAX_SPHERES 1

Sphere Spheres[MAX_SPHERES];


typedef struct {
Vector Origin;
Vector Direction;
}Ray;


//forcing it inline makes it a macro.
FINLINE Set_Vector(Vector &P, float x,y,z) {
*(P.x) = x;
*(P.y) = y;
*(P.z) = z;
}

//forcing it inline makes it a macro.
FINLINE Sub_Vector(Vector &P1, Vector P2) {
*(P1.x) -= P2.x;
*(P1.y) -= P2.y;
*(P1.z) -= P2.z;
}

//forcing it inline makes it a macro.
FINLINE Add_Vector(Vector &P1, Vector P2) {
*(P1.x) += P2.x;
*(P1.y) += P2.y;
*(P1.z) += P2.z;
}

// float Length() { return (float)sqrt( x * x + y * y + z * z ); }
//forcing it inline makes it a macro.
FINLINE float Length_Vector(Vector P) {
return sqrtf( P.x * P.x + P.y * P.y + P.z * P.z );
}

REPLACE ME WITH A REVERSE SQUARE ROOT INTRINSIC, AND GET RID OF THE DIVIDE

// void Normalize() { float l = 1.0f / Length(); x *= l; y *= l; z *= l; }


//precalc this code.
//  Normalize( Direction - Origin )
//forcing it inline makes it a macro.
FINLINE Normalize_Vector(Vector &P) {
const float l = 1.0f / Length_Vector(); *(P.x) *= l; *(P.y) *= l; *(P.z) *= l;
}

// float SqrLength() { return x * x + y * y + z * z; }
//forcing it inline makes it a macro.
FINLINE float SqrLength_Vector(Vector P) {
return P.x * P.x + P.y * P.y + P.z * P.z;
}

// float Dot( vector3 a_V ) { return x * a_V.x + y * a_V.y + z * a_V.z; }
//forcing it inline makes it a macro.
FINLINE float Dot_Vector(Vector P1, P2) {
return P1.x * P2.x + P1.y * P2.y + P1.z * P2.z;
}
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Mark_Larson


  I am going to do more posting later today.  I had a really busy week last week.

Mark
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Mark Jones

You and me both; classes are crazy difficult right now, may not have any time to tinker with raytracers :-(
"To deny our impulses... foolish; to revel in them, chaos." MCJ 2003.08

Mark_Larson

ok, now let's look at how we actually write code to intersect a ray from the camera with a sphere.  Go re-read the second message in this thread, where it starts with the math.  Origin in this code is the origin of the camera, and the point that the ray shoots from.  Direction is where it shoots to.  You loop for all the coordinates on the screen, which correspond to the pixels on the screen.  Keep in mind this is hardly the fastest way to do this.  I am just using it to make it easier to understand.  This code simply sets up the environment to do the raytracing.  It doesn't do any raytracing.  That will be in the next post.

We also do normalize, I will explain that in the next post.  I am trying to do a little bit of knowledge per post, to make it easier to understand.


FINLINE float Length_Vector(Vector P) {
return sqrtf( P.x * P.x + P.y * P.y + P.z * P.z );
}

void Normalize() { float l = 1.0f / Length(); x *= l; y *= l; z *= l; }
#define SCREEN_HEIGHT    480
#define SCREEN_WIDTH   640

   for (pixel.y = 0; pixel.y < SCREEN_HEIGHT; pixel.y++) {
      for (pixel.x = 0; pixel.x < SCREEN_WIDTH; pixel.x++) {
//set the camera x,y location
           origin.x = -(SCREEN_WIDTH / 2);
           origin.y = -(SCREEN_HEIGHT / 2);
//set the camera z location to -256
           origin.z = -256;
           direction.x = pixel.x;
           direction = pixel.y;
           direction= 256;
           Normalize( direction);
      }
   }
       
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm

Mark_Larson

#7
  I am installing 32-bit Ubuntu, so the assembly language will be the same.  If I use 64-bit, then I have to use 64-bit pointers ( rdi), and the topic is already complex enough.

  So two choices, I can use Jwasm ( masm compatible)  or yasm, preferences?  I can use either equally well.  Everyone please provide input :)

  I will still be using ICC for the compiler, since it runs under Windows and Linux.

EDIT: Got it installed last night, and installed SDL today, and I am up and running again.  I copied my code from my 64-bit linux machine.

EDIT2: I tried the OpenGL with the SDL in 32-bit, and I am getting an additional 60 FPS, it's not that much harder to code.  So I will probably post the framework soon.

EDIT3:  this guy goes has a tutorial on how to use SDL and OpenGL together.

http://lazyfoo.net/SDL_tutorials/lesson36/index.php
BIOS programmers do it fastest, hehe.  ;)

My Optimization webpage
htttp://www.website.masmforum.com/mark/index.htm