Friday, March 27, 2009

Raytracing: what's coming up!

Ok, today is the Intel Larrabee's GDC 2009 ISA presentation. Here's a good resume:


1. Larrabee is a discrete GPU aimed to compete vs the NVIDIA GeForces 2XX/3XX and also the ATI Radeon 4XXX/5XXX. Although it's oriented to rasterization ( it gonna be DirectX10/11 and OpenGL3 compatible), it has a very interesting GPGPU architecture which makes it unique.... and also a very good thing: if DirectX 12 appears, it will support it because Larrabee uses internally a software renderer!... so it could be improved just updating its drivers or firmware


2. Larrabee is based on the Intel's x86 SSE and many-core architecture. Well, it's sightly different... it has 512bits registers ( 16 single precision floats, 8 double precision ). In comparison, a Core 2/i7 uses just 128 bits. The Larrabee's SSE has also more advanced instructions like a masking register, improved conditionals/branching, recursive function call stack and /16/24/32/48 cores. It also executes much more SSE instructions per cycle than a Core i7 ( I think 160 vs 8 ). It's rumored to have 2 Teraflops of raw power ( like 2 or 3 Radeon 4870 ).

You can see more info about the LRBni SSE instructions here:
http://software.intel.com/en-us/articles/prototype-primitives-guide/

typedef struct { float v[16]; } _M512
typedef struct { double v[8]; } _M512D
typedef struct { int v[16]; } _M512I
typedef unsigned short __mmask;

MADDN132_{PS,PD} – Multiply, Add and Negate Vectors
Performs an element-by-element multiplication between vector v1 and vector v3, adds the result to vector v2, and negates the sum.
_M512 _mm512_madd132_ps(_M512 v1, _M512 v2, _M512 v3)
_M512 _mm512_mask_madd132_ps(_M512 v1, __mmask k1, _M512 v2, _M512 v3)
_M512D _mm512_madd132_pd(_M512D v1, _M512D v2, _M512D v3)
_M512D _mm512_mask_madd132_pd(_M512D v1, __mmask k1, _M512D v2, _M512D v3)

3. Larrabee has a good cache architecture: each core has 256Kb of cache communicated by a 1024 bits(512 x 2 ) bi-directional ring bus. The cache is good to hide the video memory latency and also to reduce the programming complexity. It must be big to allow to traverse the ray tracing structures fast!


4. Larrabee has a complete virtual memory system ( like a CPU ). This is good to manage a scene of zillions of polygons without getting a nice "out of memory" error.

5. Larrabee is made in 32nm using a base frequency of 2,5Ghz... In comparison, some Radeons have severe problems passing the 1Ghz at 40nm. The TDP estimated is 300W ... so Larrabee could use a very efficient and innovative refrigeration system.. like this ionic wind one :p



Or Gallium metal ones, like they did in this ATIX850 experiment:


Or the ones explained here:
http://www.electronics-cooling.com/articles/2005/2005_nov_article2.php

6. Larrabee could use low-voltage 7Ghz GDDR5 memory.


7. Larrabee gonna be very easy to program via Intel Compiler, Thread Blocks, OpenMP, VTune/GPA. Here is a screenshot of their graphics performance analyzer profiler being adapted for Larrabee:


8. Intel bought Project Offset's company to make the 1st game using Larrabee. Here is a preview :p




Sooooooooo.... I have only good words for Larrabee (on the paper)... I see lots of possibilities for ray tracing using it! We can't wait to see a photo of the PCB!

On the other hand, NVIDIA is occupied these days preparing a new thing called NViRT ( NVIDIA Ray tracing API ). It's the API used to run this Siggraph 2008 scene using CUDA ray tracing:

It's almost ready. You can find a preview here:
http://realtimerendering.com/downloads/NVIRT-Overview.pdf

Seems very good, speciallized and fast. I'm impatient to use it in xNormal.

Meanwhile, a new ray tracing "actor" appeared into scene: Caustic Graphics ( www.caustic.com ) . This is a hardware-accelerated graphics card called CausticOne:

It's just a prototype for developers... which could explain the strange SO-DIMMs and JTAG connectors.

Here's a video showing the real PCB:



Its API it's very versatile and it can be used for both realtime applications and offline renderers.
I'm pretty sure I would be able to improve the xNormal speed with that a lot!
Well... we need to wait to see more.

I must also mention an incredible discovery made this week... A graphene oscillator.
Graphene is just a form of carbon discovered recently ( I think in 2004 ) which has very interesting electrical and thermal properties:

It's just a layer of atoms of carbon as you can see :p


The famous "carbon nanotubes" are made of graphene :


But it's the most resistant material discovered ever ( a small film of that can parry a bullet ), it's harder than a diamond, it's cheap, it can be used to protect against the heat and cold ( much more than ceramic materials ), it can be a super-conductor or a semi-conductor... and, now, it can oscillate like Quartz but up to 1Thz !

Here's a picture of the prototype:

The authors claim that this technology could be implemented in two years for desktop computers! Imagine a CPU of 1 terahertz (1000Ghz) consuming 2 Watts!

You can read more about this amazing discovery here:
http://web.mit.edu/newsoffice/2009/graphene-palacios-0319.html

Of course, with a 1Thz CPU I could be able to accelerate the ray tracing for xNormal a lot :p

Soooooooooo... all these news are good news for the future of ray tracing!

4 comments:

Anonymous said...

Do you have any plans for XNormal to use it as offline renderrer for animation production?

santyhammer said...

[quote]
Do you have any plans for XNormal to use it as offline renderer for animation production?
[/quote]
Well, xNormal has a very concrete and specialized purpose: normal map and AO(+derivates) texture baking for games.

It's not really a generic renderer like VRay, Mental Ray, etc...

You can use normal maps with animated characters, but the pre-baked AO has several limitations as you can imagine.

I could convert xNormal into a complete renderer... but I think it should stay as is.

smoluck said...

hey .. Interesting article about graphene technology. great to discover that.

keep going on Xnormal !

I will test the nex release

muzz said...

Very cool writeup on future tech. Its great to hear the opinions of people actually in the know!

Yeah keep xnormal specialized. The reason i use it is because it is good at what it is specialized for. (also did you notice that on the project offset site they recommend xnormal? HA.)