Update your bookmarks! This blog is now hosted on http://xoofx.com/blog

Thursday, November 24, 2011

Advanced HLSL using closures and function pointers

Shader languages like HLSL, Cg or GLSL are nowadays driving the most powerful processors in the world, but if you are developing with them, you may have been already a little bit frustrated by one of their expressiveness limitations: the common problem of abstraction and code reuse. In order to overcome this problem, solutions so far were mostly using a glue combination of #define/#include preprocessors directives in order to generate combinations of code, permutation of shaders, so called UberShaders. Recently, this problem has been addressed, for HLSL (new in Direct3D11), by providing the concept of Dynamic Linking, and for GLSL, the concept of SubRoutines, For Direct3D11, the new mechanism has been only available for Shader Model 5.0, meaning that even if this could greatly simplified the problem of abstraction, It is unfortunately only available for Direct3D11 class graphics card, which is of course a huge limitation...

But, here is the good news: While the classic usage of dynamic linking is not really possible from earlier version (like SM4.0 or SM3.0), I have found an interesting hack to bring some kind of closures and functions pointers to HLSL(!). This solution doesn't involve any kind of preprocessing directive and is able to work with SM3.0 and SM4.0, so It might be interesting for folks like me that like to abstract and reuse the code as often as possible! But let's see how It can be achieved...

Friday, November 18, 2011

Direct3D11 multithreading micro-benchmark in C# with SharpDX

Multi-threading is an important feature added to Direct3D11 two years ago and has been increasingly used on recent game engine in order to achieve better performance on PC. You can have a look at "DirectX 11 Rendering in Battlefield 3" from Johan Anderson/DICE which gives a great insight about how it was effectively used in practice in their game engine. Usage of the Direct3D11 multithreading API is pretty straightforward, and while we are also using it successfully at our work in our R&D 3D Engine, I didn't take the time to sit down with this feature and check how to get the best of it.

I recently came across a question on the gamedev forum about "[DX11] Command Lists on a Single Threaded Renderer": If command lists are an efficient way to store replayable drawing commands, would it be efficient to use them even in a single threaded scenario where lots of drawing commands are repeatable?

In order to verify this, among other things, I did a simple micro-benchmark using C#/SharpDX, but while the results are somehow expectable, there are a couple of gotchas that deserve a more in-depth look...

Sunday, October 2, 2011

Managed DirectX with Win8 Metro Style App

Since my last post is quite old, it's time to give more insight about the latest features included in the recent release of SharpDX 2.0 beta. This release is a major step for SharpDX as it includes lots of new APIs but also provides a preview of developing Managed DirectX from a Windows 8 Metro style application. In the meantime, I have been pretty busy as I got a job in a gamedev company located in Japan that is developing a new rendering engine in C#... which is using SharpDX for its windows rendering, and that's extremely exciting to work on a project like this (and of course rewarding for all the investement done in SharpDX!).

While SharpDX is starting to cover almost the whole DirectX API as well as some new Windows Multimedia API (like WIC), I put some effort, just after the Microsoft Build conference, on providing a preview of using SharpDX from a Win8 Metro style application, which is a fantastic opportunity for a C#/.Net developer to use DirectX from both a Metro style application and Desktop application using the same API.

Finally SharpDX is also going to get some new APIs and some interesting features in the following months, especially in the performance domain, to reduce as much as possible the difference of performance between a native C++ DirectX application and a managed C# DirectX application, that will deserve a post on its own. But lets start with what has been achieved in the latest 2.0 version!

Tuesday, March 15, 2011

Benchmarking C#/.Net Direct3D 11 APIs vs native C++

[Update 2012/05/15: Note that the original code was fine tuned to a particular config and may not give you the same results. I have rewritten this sample to give more accurate and predictible results. Comparison with XNA was also not fair and inaccurate, you should have something like x4 slower instead of x9. SharpDX latest version 2.1.0 is also x1.35 slower than C++ now. An update of this article will follow on new sharpdx.org website]
[Update 2014/06/17: Remove XNA comparison, as it is not fair and relevant]

If you are working with a managed language like C# and you are concerned by performance, you probably know that, even if the Microsoft JIT CLR is quite efficient, It has a significant cost over a pure C++ implementation. If you don't know much about this cost, you have probably heard about a mean cost for managed languages around 15-20%. If you are really concern by this, and depending on the cases, you know that the reality of a calculation-intensive managed application is more often around x2 or even x3 slower than its C++ counterpart. In this post, I'm going to present a micro-benchmark that measure the cost of calling a native Direct3D 11 API from a C# application, using various API, ranging from SharpDX, SlimDX, WindowsCodecPack.

Why this benchmark is important? Well, if you intend like me to build some serious 3D games with a C# managed API (don't troll me on this! ;) ), you need to know exactly what is the cost of calling intensively a native Direct3D API (mainly, the cost of the interop between a managed language and a native API) from a managed language. If your game is GPU bounded, you are unlikely to see any differences here. But if you want to apply lots of effects, with various models, particles, materials, playing with several rendering targets and a heavy deferred rendering technique, you are likely to perform lots of draw calls to the Direct3D API. For a AAA game, those calls could be as high as 3000-7000 draw submissions in instancing scenarios (look at latest great DICE publications in "DirectX 11 Rendering in Battlefield 3" from Johan Andersson). If you are running at 60fps (or lower 30fps), you just have 17ms (or 34ms) per frame to perform your whole rendering. In this short time range, drawing calls can take a significant amount of time, and this is a main reason why multi-threading batching command were introduced in DirectX11. We won't use such a technique here, as we want to evaluate raw calls.

As you are going to see, results are pretty interesting for someone that is concerned by performance and writing C# games (or even efficient tools for a 3D Middleware)