Monday, August 18, 2014

xNormal 3.18.9 and ratGPU 0.7.3

For xNormal, fixed a bug position-offseting the cages.
Also, recompiled with latest JRE and Optix 3.6.2 which should support Maxwell cards.

For ratGPU, fixed a bit the bias parameter so the self-occlusion problems should be fixed on Mac now. Also recompiled with the latest Qt, VS and tbb libs.

Now I'm 100% concentrated in xN4 !

Tuesday, June 03, 2014

xNormal 3.18.8 and ratGPU 0.7.2

Added 3dsmax/Maya 2015 support and fixed some bugs for xN 3.18.8.

ratGPU 0.7.2 is now compiled with the latest libraries and compilers ( gcc 4.8.2 / VS2013-u2 ) for maximum performance. Also, now it renders two tiles per OpenCL device in order to maximize the speed ( that's the "HT" signature on the device list ... aka GPU 'Hyperthreading' :D )

Thursday, January 02, 2014

xNormal 3.18.6


xNormal 3.18.6 released:

- Now the result from the base texture bake is linearly filtered, resulting in better quality.

- Modified the dilation filter's algorithm to consume much less memory. Also, modified the internal image structs to be more SIMD-friendly.

- Increased the maximum render size to 32k x 32k.

- The 3dsmax SBM exporter now saves data from current frame instead of using the first frame.

- Now you can render a new map : the translucency map, which can be used to simulate semitransparent objects and SSS.

- Recompiled using the latest libraries ( FBXSDK 2014.2, JRE 1.7u45, libpng-1.6.7, lua 5.2.3, OpenEXR 2.1.0 ).

Thursday, October 10, 2013

ratGPU 0.6.0 and xN 3.18.4 released


For ratGPU 0.6.0, I've optimized it a bit more for Radeon cards and recompiled it using the latest libraries. Radeon 7990 is the fastest card that my hands ever managed :D



I've also released xNormal 3.18.4 which corrects some bugs.

Sunday, September 01, 2013

Some things I will never understand



1. Apple, the OpenCL's founder, does not support OpenCL for the iPad ...
This will only not help to compute faster but also will help to be more power-efficient...
And, for the love of God, Google, kill that obsolete Frankenstein called Renderscript and embrace OpenCL ...



2. Dear Apple, have you noticed that the current OpenGL version is 4.4? Then... why the heck your new computers barely supports 3.2? And why you set the max vertex count to 150k and index count to only 1M? It's almost impossible to render dense meshes efficiently with those ridiculous limits!

Also, I will never understand why OpenGL's group decided to use source-coded shaders. It's much better to use precompiled ones like Direct3D or OpenCL's SPIR does, separating the compiling from linking and allowing to use virtual functions/interfaces so the user can inject closed-code functions there.



3. Why C++'s ABI is not standarized yet? If you write a plug-in or component system it's a fucking nightmare to make it compatible across different compilers. Seriously, is it so difficult to meet the major ISVs there ( Oracle, IBM, Microsoft, GNU, Apple LLVM, Intel, etc... ) and to decide a  standard C++ ABI based on Itanium64 or LLVM ?



4. Why Android is still not massively introduced in the desktop computers?
There is a small project called  http://www.android-x86.org , but lacks lots of hardware support. I simply cannot understand why Google does not bump that !


5. Windows Vista / 8 are total fiascos. Why Microsoft does not listen the people?
We like the start button and XP's style! Is that so hard to understand? A desktop PC is **NOT** a touch screen, it's not a mobile phone and neither a tablet. We use mouse and keyboard omg ! In fact, the first two things the people do after installing Windows 8 are:

1. Install any of the available start button hacks ( Pokki, IObit, etc... )

2. Remove all the damm metro apps with these Powershell's commands:

Get-AppxPackage -AllUsers | Remove-AppxPackage
Get-AppXProvisionedPackage -online | Remove-AppxProvisionedPackage –online

Friday, August 02, 2013

Tegra 5 is impressive

NVIDIA's Tegra 5 "Logan " SoC is impressive.

The Kepler-based GPU has 192 CUDA cores and supports the latests technologies available today for desktop PCs : OpenGL 4.4, DX11, CUDA 5.5 and OpenCL(1.2?).




It can render Battlefield 3's scenes at almost max settings without problems...


It would deliver more performance than a GeForce 8800GTX, but with all the new features and consuming only ... 2 Watts ( !!! )



The CPU is (supposed to be) a 32bits ARM 20nm CortexMP A15 quad-core running at 1.8Ghz.


A monster omg ! :D

Monday, July 22, 2013

OpenCL 2.0 spec released !



Khronos just released the OpenCL 2.0 / SPIR 1.2 (provisional) spec !

https://www.khronos.org/news/press/khronos-releases-opencl-2.0

  • Shared Virtual Memory
    Host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices.
  • Dynamic Parallelism
    Device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks.
  • Generic Address Space
    Functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application.
  • Images
    Improved image support including sRGB images and 3D image writes, the ability for kernels to read from and write to the same image, and the creation of OpenCL images from a mip-mapped or a multi-sampled OpenGL texture for improved OpenGL interop.
  • C11 Atomics
    A subset of C11 atomics and synchronization operations to enable assignments in one work-item to be visible to other work-items in a work-group, across work-groups executing on a device or for sharing data between the OpenCL device and host.
  • Pipes
    Pipes are memory objects that store data organized as a FIFO and OpenCL 2.0 provides built-in functions for kernels to read from or write to a pipe, providing straightforward programming of pipe data structures that can be highly optimized by OpenCL implementers.
  • Android Installable Client Driver Extension
    Enables OpenCL implementations to be discovered and loaded as a shared object on Android systems. 

I like it.. more of less :D

  • The shared memory would be very useful to deal with super-large resources. It also will save lots of memory for APUs/SoC because the data won't need to be replicated.
  • Dynamic parallelism is good also to avoid CPU host intervention to fire kernels.
  • Images can now be written and read at the same time and also created from a 2D buffer.
  • Atomics ( specially floating point ones ) are always welcome !
  • Pipes can be interesting. I like the stream appoach, it may be useful.
  • The Android ICD is also very welcome, but I highly doubt Google would permit that anyways because they're Renderscript-ninjas and haters :p


But I think some important things are missingfor my taste:
  • A flag to indicate the task could take a lof of time to complete, so the implementation could disable the f$%@ing Windows's watchdog.
  • Multi-sized image arrays, so an image array could contain several images of different sizes. 
  • C++ template and simple virtual/abstract methods support.
  • Compressed textures support.

I think SPIR is also very critical. The IHVs should adopt it as soon as possible because:

  1.  Most of the enterprises aren't using OpenCL because they don't want to distribute their kernel's source code with the app.
  2.  On-the-fly kernel's source compilation can take a lot of memory and time. It's much better to pre-compile the kernels offline as DirectX or CUDA does.

And, yes, of course, xN4 and ratGPUv2 gonna take advantage of this ... very soon :D

Saturday, July 13, 2013

xN4 delayed due to HDD


Really sorry but our code repository hard disk suffered an accident and we had to recover the data using a 4-months old backup ... so we're delaying a bit the xN4 beta to 2014 H2.
Accidents happen 8(

Tuesday, June 18, 2013

Hello, ARM servers !


ARM servers are becoming a reality!

HP presented several several ARM servers at Computex 2013. I like this one with 4x Calxeda Energycore quad cores running at 1,4Ghz:




And here is the world first ARMv8 64-bits server ! The AppliedMicro X-C1, based on the X-gene CPU running at 3Ghz:



Also, AMD anounced the roadmap for 64bits ARM A57 28nm SoC today, codenamed Seattle ( 8-16 cores at 2Ghz ) and available on 2H 2014:




And NVIDIA also announced CUDA 5.5 with complete ARM support:


and presented the NVIDIA Kayla platform some time ago:


Wednesday, May 29, 2013

HSA spec out


AMD, ARM, Imagination, Samsung, etc... which are in the HSA Foundation just released the HSA spec v0.95.

That document strandarizes an ISA for GPGPU computations which can be used by several APIs ( like OpenCL ) to pre-compile their kernels into an intermediate language called HSAIL ( and an output binary format called BRIG ).


It includes also external linkage support as well as unified-virtual memory 64bits space, pretty cool :D

I wonder if Apple / Microsoft / NVIDIA would join in the future ... :p