Until this chapter, we have rendered the models by binding their material uniforms, their textures, their vertices and indices buffers and submitting one draw command for each of the meshes they are composed. In this chapter, we will start our way to a more efficient wat of rendering, we will begin the implementation of a bind-less render (at aleast almost bind-less). In this type of rendering we do not invoke a bunch of draw commands to draw the scene, instead we populate a buffer with the instructions that will allow the GPU to render them. This is called indirect rendering and it is a more efficient way of drawing because:
We remove the need to perform several bind operations before drawing each mesh.
We need just to invoke a single draw call.
We can perform in-GPU operations, such as frustum culling reducing the load on the CPU side.
As you can see, the ultimate goal is to maximize the utilization of the GPU while removing potential bottlenecks that may occur at the CPU side and latencies due to CPU to GPU communications. In this chapter we will transform our render to use indirect drawing starting with just static models. Animated models will be handled in next chapter.
You can find the complete source code for this chapter here.
Concepts
Prior to explaining the code, let's explain the concepts behind indirect drawing. In essence, we need to create a buffer which stores the drawing parameters which wll be used to render the vertices. You can think about that as instruction blocks, or draw commands, that will be read by the GPU that will instruct it to perform the drawing. Once the buffer is populated, we invoke the glMultiDrawElementsIndirect to trigger that process. Each draw command stored in the buffer is defined by the following parameters (if you are using C, this is modelled by the DrawElementsIndirectCommand structure):
count: The number of vertices to be drawn (understanding a vertex as the structure which groups the position, normal information, texture coordinates, etc.). This should contain the same values as the number of vertices which we used when invoking the glDrawElements when rendering meshes.
instanceCount: The number of instances to be drawn. We may have several entities that share the same model. Instead of storing a drawing instruction for each entity, we can just submit a single draw instruction but setting the number of entities that we want to draw. This is called instance rendering, and will save a lot of computing time. Without indirect drawing you can achieve the same results by setting specific attributes per VAO. I think that it is even simpler with this technique.
firstIndex: An offset to the buffer that will hold the indices values used for this draw instructions (the offset is measured in number of indices, not a byte offset).
baseVertex: An offset to the buffer that will hold the vertices data (the offset is measured on number of vertices, not a byte offset).
baseInstance: We can use this parameter to set a value that will be shared by all the instances to be drawn. Combining this value with the number of the instance to be drawn we will be able to access per instance data (we will see this later on).
Although it has been already commented when describing the parameters, indirect drawing needs a buffer that will hold the vertices data and another one for the indices. Th difference is that we will need to combine all that data form the multiple meshes that conform the models of our scene into a single buffer. The way we will access per-mesh specific data is by playing tih the offset values of the drawing parameters.
Another aspect to solve is how we pass material information or per-entity data (such as model matrices). In previous chapters we used uniforms for that, setting the proper value when we changed the mesh or the entities to be drawn. With indirect drawing we cannot do that, we cannot modify data during the render process, since submit a bulk set of drawing instructions at once. The solution to that is to use additional buffers, we can store per-entity data in a buffer and use the baseInstance parameter (combined with the instance id) to access the proper data (per entity) inside that buffer (we will see later on, that instead of a buffer we will use an array of uniforms, but you could use also a simpler buffer for that). Inside that buffer we will hold indices to access two additional buffers:
One that will hold model matrices data.
One that will hold material data (albedo color, etc.).
For textures we will use an array of textures which should not be confused with an array texture. An array texture is a texture which contains an array of values with texture information, with multiple images of the same size. An array of texture is a list of samples which map to regular textures, therefore they can have different sizes. Arrays of textures have a limitation, its length cannot have arbitrary length, they have a limit, that in teh examples we will set up to 16 textures (although you may want to check the capabilities of your GPU prior to setting that limit). 16 texture is not a hig value if you are using multiple models, in order to circumvent this limitation you may have two options:
Use a texture atlas (a giant texture file which combines individual textures). Even if you are not using indirect drawing you should try to use texture atlas as much as possible, since it limits the binding calls.
Use bindless textures. This approach basically allows us to pass handles (64 bit integer values) to identify a texture and use that identifier to get a sampler withing the shader program. This should be definitely the way to go with indirect rendering if you can (this is not a core feature but an extension starting with 4.4 version). We will no use this approach because RenderDoc does not currently support this (loosing the capability of debugging without RenderDoc is a showstopper for me).
The following picture depicts the buffers and structures involved in indirect drawing (keep in mind that this is only valid while rendering static models. We will see the new structures that we need to use when rendering animated models in the next chapter).
Please keep in mind that we will use arrays of uniforms for per entity-data, materials and model matrices (at the end an array is a buffer, but we will be able to access the dat in handy way by using uniforms).
Implementation
In order to use indirect drawing we will need to use at least OpenGL version 4.6. Therefore, the first step is to update the major and minor versions we use as window hints for window creation:
The next step is to modify the code to load all the meshes into a single buffer, but, prior to that, wew ill modify the class hierarchy that stores models, materials and meshes. Up to now, models have a set of associated materials which have a set of meshes. This class hierarchy was set to optimize the draw calls, where we first iterated over models, then over materials and finally over meshes. We will change this structure, not storing meshes under materials any more. Instead, meshes will ve stored directly under the models. We will store materials in a sort of cache, and will have a reference to a key in that cache for the meshes. In addition to that, previously, we created a Mesh instance for each of the model meshes, which in essence contained a VAO and the associated VBOs for the mesh data. Since we will be using a single buffer for all the meshes, we will just need a single VAO, ans its associated VBOs, for the whole set of meshes of the scene. Therefore, instead of storing a list of Mesh instances under the Model class, we will store the data that will be used to construct the draw parameters, such as the offset on the vertices buffer, the offset for the indices buffer, etc. Let's examine the changes one by one.
We will start with the MaterialCache class, which is defined like this:
AS you can see, we just store the Material instances in a List. Therefore, in order to identify a Material, we just need the index of that instance in the list. (This approach, may make more difficult to add dynamically new materials, but it is simple enough for the purpose of this sample. You may want to change that, and provide robust support for adding new models, materials, etc. in your code.). We will need to modify the Material class to remove the list of Mesh instances and store the material index in the materials cache:
As it has been explained before, we need to change the Model class to remove references to materials. Instead, we will hold two main references:
A list MeshData instances (a new class), which will hold the meshes data read using Assimp.
A list of RenderBuffers.MeshDrawData instances (also a new class), that will contained the information needed for indirect drawing (mainly offsets information associated to the data buffers explained above).
We will first populate the list of MeshData instances, when loading the models with assimp, and after that we will construct the global buffers that will hold the data, populating the RenderBuffers.MeshDrawData instances. After that, we can remove the references to MeshData instances. This is not a very elegant solution, but it is simple enough to explain the concepts without introducing more complexity using pre and post loading hierarchies. The changes in the Model class are as follows:
Changes in the ModelLoader class are also quite simple, we need to use the materials cache and store the data read in the new MeshData class (instead of the previous Mesh class). Also, materials wil not have references to mesh data, but mesh data will have a reference to the index of the material in the cache:
The Scene class will be the one that will hold the materials cache (also, the cleanup mwthod is no longer needed, because the VAOs and VBOs will not be longer linked to the model map):
It is turn now for one of the new key classes that we will create for indirect drawing, the RenderBuffers class. This class will create a single VAO which will hold the VBOs which will contain the data for all the meshes. In this case, we will just supporting static models, so we will need a single VAO. The RenderBuffers class starts like this:
We start by creating a VAO (which will be used for static models), and then iterate over the meshes of the models. We will use a single buffer to hold all the data, so we just iterate over those elements to get the final buffer size. We will calculate the number of position elements, normals, etc. We use that first loop to also populate the offset information that we wil store in a list that will contain RenderBuffers.MeshDrawData instances. After that we will create a single VBO. You will find a major difference with the one in the Mesh class that did a similar task, creating the VAO and VBOs. In this case, we use a single VBO for positions, normals etc. We just load all that data row by row instead of using separate VBOs. This is done in the populateMeshBuffer (which we will see after this). After that, we create the index VBO which will contain the indices for all the meshes of all the models.
It basically stores the size of the mesh in bytes (sizeInBytes), the material index to which it is associated, the offset in the buffer that holds the vertices information and the vertices, the number of indices for this mesh. The offset is measured in "rows" You can think that the portion of the mesh that holds positions, normals and texture coordinates as a single "row". This "row" holds all the information associated to a single vertex and will processed in teh vertex shader. This is why we just dive by three the number of position elements, each "row" will have three position elements, and the number of "rows" in the positions data will match the number of "Rows" in the normals data and so on.
As you can see, we just iterate over the "rows" of data and pack positions, normals and texture coordinates into the buffer. The defineVertexAttribs is defined like this:
The first thing that you will notice is that we have increased the version to 460. We also have removed the constants associated with animations (MAX_WEIGHTS and MAX_BONES), the attributes for bones indices and the uniform for bone matrices. You will see in next chapter that we will no need this information here for animations. We have created two new constants to define the size oif the drawElements and modelMatrices uniforms. The drawElements uniform will hold DrawElement instances. It will have one item per mesh and associated entity. If you remember, we will record a single instruction to draw all the items associated to a mesh, setting the number of instances to be drawn. We will need however, specific per entity data, such as the model matrix. This will be hold in the drawElements array, which will also point to the material index to be used. The modelMatrices array will just hold the model matrices for each of the entities. Material information will be used in the fragment shader you we pass it using the outMaterialIdx output variable.
The main function, since we do not have to deal with animations, has been simplified a lot:
The key here is to get the proper index to access the drawElements size. We use the gl_BaseInstance and gl_InstanceID built-in in variables. When recording the instructions for indirect drawing we will use the baseInstance attribute. The value for that attribute will be the one associated to gl_BaseInstance built-in in variable. The gl_InstanceID will start at 0 whenever we change form a mesh to another, and will be increased for of of the instances of the entities associated to the models. Therefore, by combining this two variables we will be able to access the per-entity specific information in the drawElements array. Once we have the proper index, we just transform positions and normal information as in previous versions of the shader.
The scene fragment shader (scene.frag) is defined like this:
The main changes are related to the way we access material information and textures. We will now have an array of materials information, which will be accessed by the index we calculated in the vertex shader which is now in the outMaterialIdx input variable (which has the flat modifier which states that this value should not be interpolated from vertex to fragment stage). We will be using an array of textures to access either regular textures or normal maps. The index to those textures are stored now in the Material struct. Since we will be accessing the array of samplers using non constant expressions we need to upgrade GLSL version to 400 (that feature is only available since OpenGL 4.0)
Now it is the turn to examine the changes in the SceneRender class. We will start by defining a set of constants that will be used in the code, one handle for the buffer that will have the indirect drawing instructions (staticRenderBufferHandle) and the number of drawing commands (staticDrawCount). We will need also to modify the createUniforms method according to the changes in the shaders shown before:
publicclassSceneRender {...publicstaticfinalint MAX_DRAW_ELEMENTS =100;publicstaticfinalint MAX_ENTITIES =50;privatestaticfinalint COMMAND_SIZE =5*4;privatestaticfinalint MAX_MATERIALS =20;privatestaticfinalint MAX_TEXTURES =16;...privateMap<String,Integer> entitiesIdxMap;...privateint staticDrawCount;privateint staticRenderBufferHandle;...publicSceneRender() {... entitiesIdxMap =newHashMap<>(); }privatevoidcreateUniforms() { uniformsMap =newUniformsMap(shaderProgram.getProgramId());uniformsMap.createUniform("projectionMatrix");uniformsMap.createUniform("viewMatrix");for (int i =0; i < MAX_TEXTURES; i++) {uniformsMap.createUniform("txtSampler["+ i +"]"); }for (int i =0; i < MAX_MATERIALS; i++) {String name ="materials["+ i +"]";uniformsMap.createUniform(name +".diffuse");uniformsMap.createUniform(name +".specular");uniformsMap.createUniform(name +".reflectance");uniformsMap.createUniform(name +".normalMapIdx");uniformsMap.createUniform(name +".textureIdx"); }for (int i =0; i < MAX_DRAW_ELEMENTS; i++) {String name ="drawElements["+ i +"]";uniformsMap.createUniform(name +".modelMatrixIdx");uniformsMap.createUniform(name +".materialIdx"); }for (int i =0; i < MAX_ENTITIES; i++) {uniformsMap.createUniform("modelMatrices["+ i +"]"); } }...}
The entitiesIdxMap will store the position in the list of entities associated to a model which each entity is located. We store that information in a Map using entity identifier as key. We will need this info later on since, the indirect drawing commands will be recorded iterating over meshes associated to each model. The main changes are in the render method, which is defined like this:
You can see that we now have to bind the array of texture samplers and activate all the texture units. In addition to that, we iterate over the entities and set up the uniform values for the model matrices. The next step is to setup the drawElements array uniform withe the proper values for each of the entities that will point to the index of the model matrix and the material index. After that, we call the glMultiDrawElementsIndirect function to perform the indirect drawing. Prior to that, we need to bind the buffers that hold drawing instructions (drawing commands) and the VAO that holds the meshes and indices data. But, when do we populate the buffer for indirect drawing? The answer is that this not need to be performed each render call, if there are no changes in the number of entities, you can record that buffer once, and use it in each render call. In this specific example, we will just populate that buffer at start-up. This means, that, if you want to make changes in the number of entities, you would nee to re-create that buffer again (you should do that for your own engine).
The method that actually builds the indirect draw buffer is called setupStaticCommandBuffer which is defined like this:
We first calculate the total number of meshes. After that, we will create the buffer that wil hold indirect drawing instructions and populate it. As you can see we first allocate a ByteBuffer. This buffer will hold as many instruction sets as meshes. Each set of draw instructions si composed by five attributes, each of them with a length of 4 bytes (total length of each set of parameters is what defines the COMMAND_SIZE constant). We cannot allocate this buffer using MemoryStack since we will run out of space quickly (the stack that LWJGL uses for this is limited in size). Therefore, we need to allocate it using MemoryUtil and remember to manually de-allocate that once we are done. Once we have the buffer we start iterating over the meshes associated to the model. You may have a look at the beginning of this chapter to check the struct that draw indirect requires. In addition to that, we also populate the drawElements uniform using the Map we calculated previously, to properly get the model matrix index for each entity. Finally, we just create a GPU buffer and dump the data into it.
We will need to update the cleanup method to free the indirect drawing buffer:
We will need a new method to the set up the values for the materials uniform:
publicclassSceneRender {...privatevoidsetupMaterialsUniform(TextureCache textureCache,MaterialCache materialCache) {List<Texture> textures =textureCache.getAll().stream().toList();int numTextures =textures.size();if (numTextures > MAX_TEXTURES) {Logger.warn("Only "+ MAX_TEXTURES +" textures can be used"); }Map<String,Integer> texturePosMap =newHashMap<>();for (int i =0; i <Math.min(MAX_TEXTURES, numTextures); i++) {texturePosMap.put(textures.get(i).getTexturePath(), i); }shaderProgram.bind();List<Material> materialList =materialCache.getMaterialsList();int numMaterials =materialList.size();for (int i =0; i < numMaterials; i++) {Material material =materialCache.getMaterial(i);String name ="materials["+ i +"]";uniformsMap.setUniform(name +".diffuse",material.getDiffuseColor());uniformsMap.setUniform(name +".specular",material.getSpecularColor());uniformsMap.setUniform(name +".reflectance",material.getReflectance());String normalMapPath =material.getNormalMapPath();int idx =0;if (normalMapPath !=null) { idx =texturePosMap.computeIfAbsent(normalMapPath, k ->0); }uniformsMap.setUniform(name +".normalMapIdx", idx);Texture texture =textureCache.getTexture(material.getTexturePath()); idx =texturePosMap.computeIfAbsent(texture.getTexturePath(), k ->0);uniformsMap.setUniform(name +".textureIdx", idx); }shaderProgram.unbind(); }...}
We just check that we are not surpassing the maximum number of supported textures (MAX_TEXTURES) and just create an array of materials information with the information we used in the previous chapters. The only change is that we will need to store the index of the associated texture and normal maps in the material information.
We need another method to update the entities indices map:
We will change also the shadow render process to use indirect drawing. The changes in the vertex shader (shadow.vert) are quite similar, we will not be using animation information and we need to access the proper model matrices using the combination of gl_BaseInstance and gl_InstanceID built-in variables. In this case, we do not need material information so the fragment shader (shadow.frag) is not changed.
Changes in ShadowRender are also pretty similar as the ones in the SceneRender class:
publicclassShadowRender {privatestaticfinalint COMMAND_SIZE =5*4;...privateMap<String,Integer> entitiesIdxMap;...privateint staticRenderBufferHandle;...publicShadowRender() {... entitiesIdxMap =newHashMap<>(); }publicvoidcleanup() {shaderProgram.cleanup();shadowBuffer.cleanup();glDeleteBuffers(staticRenderBufferHandle); }privatevoidcreateUniforms() {...for (int i =0; i <SceneRender.MAX_DRAW_ELEMENTS; i++) {String name ="drawElements["+ i +"]";uniformsMap.createUniform(name +".modelMatrixIdx"); }for (int i =0; i <SceneRender.MAX_ENTITIES; i++) {uniformsMap.createUniform("modelMatrices["+ i +"]"); } }...}
The createUniforms method needs to be update to use the new uniforms and the cleanup one needs to free the indirect draw buffer. The render method will use now the glMultiDrawElementsIndirectinstead of submitting individual draw commands for meshes and entities:
In the Render class, we just need to instantiate the RenderBuffers class and provide a new method setupData which can be called when every model and entity has been created to create the indirect drawing buffers and associated data.
Since we have modified the class hierarchy that deals with models and materials, we need to update the SkyBox class (loading individual models require now additional steps):
These changes also affect the SkyBoxRender class. For sky bos render we will not use indirect drawing (it is not worth it since we will be rendering just one mesh):
Finally, in the Main class, we will load two entities associated to a cube model. We will rotate them independently to check that the code works ok. The most important part is to cal the Render class setupData method when everything is loaded.