Modern graphic cards are the source of the enormous computing power that can be used not only for rendering. How about updating over million of particles with a realtime changing enviroment or emitter position?

One milion particles running on GPU

One milion particles running on GPU

There are many ways to compute data on gpu. This example will show how to do this with Transform Feedback. This feature, introduced in OpenGL 3.0, allows to send computed data from vertex shader back to the vertex buffer.

The image below shows the flow of updating and drawing particles.

TF flow (1)

The diagram shows the exemplary particles flow.

In this example one particle contains:


Which gives us 12 GLfloats per particle.

Assuming that we want to update one milion particles there are two Vertex Object Buffers needed with size of 48 milions of bytes ( ~46 MB) one. There are two VBOs needed because vertex shader output can’t be stored in the same VBO from which the input come.

The program that updates data is very specific, because it is made only from one shader – the vertex shader.

Right before linking, the shader has to be informed which variables will be the output ones. In this case there are four inputs (Position, Color, Velocity and Other) and four corresponding outputs:

The program is informed about the outputs before linking.

It is very important to maintain the same order of inputs, outputs and names in glTransformFeedbackVaryings. Without that the data will mix up.

When the proper vertex shader is in use and proper vertex attribute pointers are set launch this code to perform computation with the shader:

The last thing to do is to swap VBO so the output buffer become the input buffer.

Because in VBO there is always updated data it is very easy to render them. Just use the render shader with vertex atrribute pointers set to the place where positions and colors are stored in VBO and run OpenGL draw call.

For this example I made an emitter that emits from four different places, which are changing their position. The computation shader is emiting particles in proper time and then it updates it velocity and visibility.

But how about performance as compared to CPU?

I wrote a compute shader equivalent in C++ and check the application on the computer with Intel Core  i7-3770 3.9GHz processor and with GeForce GTX760 graphic card. FPS’s were measured with Fraps.

As shown below GPU handles this kind of computations much better than CPU. With GPU even 16 milions of particles could be rendered with 30fps, where CPU had problems with 2mln.

It is unlikely that some game will offer such amount of particles, but it might be a good technique to relieve CPU from some computation.


Performance comparsion between CPU and GPU in computing particles.

You can check the working application on github (Visual Studio 2015/2017 solution)

The parameters can be set using config.ini file.