Until now we have only loaded static 3D models, but in this chapter we will learn how to animate them. When thinking about animations the first approach is to create different meshes for each model positions, load them up into the GPU and draw them sequentially to create the illusion of movement. Although this approach is perfect for some games, it's not very efficient in terms of memory consumption. This where skeletal animation comes to play. We will learn how to load these models using assimp.
You can find the complete source code for this chapter here.
Anti-aliasing support
In this chapter we will add also support for anti-aliasing. Up to this moment yo may have seen saw-like edges in the models. In order to remove those effects, we will apply anti-aliasing which basically uses the values of several samples to construct the final value for each pixel. In our case, we will use four sampled values. We need to set up this as a window hint prior to image creation (and add a new window option to control that):
In skeletal animation the way a model animates is defined by its underlying skeleton. A skeleton is defined by a hierarchy of special elements called bones. These bones are defined by their position and rotation. We have said also that it's a hierarchy, which means that the final position for each bones is affected by the position of their parents. For instance, think of a wrist: the position of a wrist is modified if a character moves the elbow and also if it moves the shoulder.
Bones do not need to represent a physical bone or articulation: they are artifacts that allow the creatives to model an animation. In addition to bones we still have vertices, the points that define the triangles that compose a 3D model. But in skeletal animation, vertices are drawn based on the position of the bones they relate to.
In this chapter I’ve consulted many different sources, but I have found two that provide a very good explanation about how to create an animated model. Theses sources can be consulted at:
If you load a model which contains animations with current code, you will get what is called the binding pose. You can try that (with code from previous chapter) and you will be able to see the 3D model perfectly. The binding poise defines the positions normals, texture coordinates of the model without being affected by animation at all. An animated model defines in essence the following additional information:
A tree like structure, composed by bones, which define a hierarchy where we can compose transformations.
Each mesh, besides containing information about vertex position, normals, etc, will include information about which bones does this vertex relate to (by using a bone index) and how much they are affected (that is modulating the effect by using a weight factor).
A set of animation key frames which define the specific transformations that should be applied to each bone and by extension wil modify the associated vertices. A model can define several animations and each of them may be composed of several animation key frames. When animation we iterate over those key frames (which define a duration) and we can even interoperate between them. In essence, for a specific instant of time we are applying to each vertex the transformations associated to the related bones.
Let’s review first the structures handled by assimp that contain animation information. We will start first with the bones and weights information. For each AIMesh, we can access the vertices positions, texture coordinates and indices. Meshes store also a list of bones. Each bone is defined by the following attributes:
A name.
An offset matrix: This will used later to compute the final transformations that should be used by each bone.
Bones also point to a list of weights. Each weight is defined by the following attributes:
A weight factor, that is, the number that will be used to modulate the influence of the bone’s transformation associated to each vertex.
A vertex identifier, that is, the vertex associated to the current bone.
The following picture shows the relationships between all these elements.
Therefore, each vertex, besides containing position, normals and texture coordinates will have now a set of indices (typically four values) of the bones that affect those vertices (jointIndices) and a set of weights that will modulate that effect. Each vertex will bve modified according to the transformation matrices associated to each joint in order to calculate final position. Therefore, we will need to augment the VAO associated to each mesh to hold that information as it is shown in the next figure.
Assimp scene object defines a Node’s hierarchy. Each Node is defined by a name a list of children node. Animations use these nodes to define the transformations that should be applied to. This hierarchy is defined indeed the bones’ hierarchy. Every bone is a node, and has a parent, except the root node, and possible a set of children. There are special nodes that are not bones, they are used to group transformations, and should be handled when calculating the transformations. Another issue is that this Nodes hierarchy is defined from the whole model, we do not have separate hierarchies for each mesh.
A scene also defines a set of animations. A single model can have more than one animation to model how a character walks, runs, etc. Each of these animations define different transformations. An animation has the following attributes:
A name.
A duration. That is, the duration in time of the animation. The name may seem confusing since an animation is the list of transformations that should be applied to each node for each different frame.
A list of animation channels. An animation channel contains, for a specific instant in time the translation, rotation and scaling information that should be applied to each node. The class that models the data contained in the animation channels is the AINodeAnim. Animation channels could be assimilated as the key frames.
The following figure shows the relationships between all the elements described above.
For a specific instant of time, for a frame, the transformation to be applied to a bone is the transformation defined in the animation channel for that instant, multiplied by the transformations of all the parent nodes up to the root node. Hence, we need to extract the information stored in the scene, the process is as follows:
Construct the node hierarchy.
For each animation, iterate over each animation channel (for each animation node) and construct the transformation matrices for each of the bones for all the potential animation frames. Those transformation matrices are a combination of the transformation matrix of the node associated to the bone and the bone transformation matrices.
We start at the root node, and for each frame, build transformation matrix for that node, which is the the transformation matrix of the node multiplied by the composition of the translation, rotation and scale matrix of that specific frame for that node.
We then get the bones associated to that node and complement that transformation by multiplying the offset matrices of the bones. The result will be a transformation matrix associated to the related bones for that specific frame, which will be used in the shaders.
After that, we iterate over the children nodes, passing the transformation matrix of the parent node to be used also in combination with the children node transformations.
Implementation
Let's start by analyzing the changes in the ModelLoader class:
We need an extra argument (named animation) in the loadModel method to indicate if we are loading a model with animations or not. If so, we cannot use the aiProcess_PreTransformVertices flag. This flag performs some transformation over the data loaded so the model is placed in the origin and the coordinates are corrected to math OpenGL coordinate System. We cannot use this flag for animated models because it removes animation data information.
While processing the meshes we will also process the associated bones and weights for each vertex when we are processing the meshes. While we are processing them, we will store the list of bones so we can later on build the required transformations:
This method traverses the bone definition for a specific mesh, getting their weights and generating filling up three lists:
boneList: It contains a list of bones, with their offset matrices. It will uses later on to calculate final bones transformations. A new class named Bone has been created to hold that information. This list will contain the bones for all the meshes.
boneIds: It contains just the identifiers of the bones for each vertex of the Mesh. Bones are identified by its position when rendering. This list only contains the bones for a specific Mesh.
weights: It contains the weights for each vertex of the Mesh to be applied for the associated bones.
The information retrieved in this method is encapsulated in the AnimMeshData record (defined inside the ModelLoader class). The new Bone and VertexWeight classes are also records. They are defined like this:
Going back to the loadModel method, when we have processed the meshes and the materials we will process the animation data (that is the different animation key frames associated to each animation and their transformations. All that information is also stored in the Model class:
This method returns a List of Model.Animation instances. Remember that a model can have more than one animation, so they are stored by their index. For each of these animations we construct a list of animation frames (Model.AnimatedFrame instances), which are essentially a list of the transformation matrices to be applied to each of the bones that compose the model. For each of the animations, we calculate the maximum number of frames by calling the method calcAnimationMaxFrames, which is defined like this:
As you can see we store the list of animations associated to the model, each of them defined by a name a duration and a list of animation frames, which in essence just store the bone transformation matrices to be applied for each bone.
Back to the ModelLoader class, each AINodeAnim instance defines some transformations to be applied to a node in the model for a specific frame. These transformations, for a specific node, are defined in the AINodeAnim instance. These transformations are defined in the form of position translations, rotations and scaling values. The trick here is that, for example, for a specific node, translation values can stop at a specific frae, but rotations and scaling values can continue for the next frames. In this case, we will have less translation values than rotation or scaling ones. Therefore, a good approximation, to calculate the maximum number of frames is to use the maximum value. The problem gest more complex, because this is defines per node. A node can define just some transformations for the first frames and do not apply more modifications for the rest. In this case, we should use always the last defined values. Therefore, we get the maximum number for all the animations associated to the nodes.
Going back to the processAnimations method, with that information, we are ready to iterate over the different frames and build the transformation matrices for the bones by calling the buildFrameMatrices method. For each frame we start with the root node, and will apply the transformations recursively from top to down of the nodes hierarchy. The buildFrameMatrices is defined like this:
We get the transformation associated to the node. Then we check if this node has an animation node associated to it. If so, we need to get the proper translation, rotation and scaling transformations that apply to the frame that we are handling. With that information, we get the bones associated to that node and update the transformation matrix for each of those bones, for that specific frame by multiplying:
The model inverse global transformation matrix (the inverse of the root node transformation matrix).
The transformation matrix for the node.
The bone offset matrix.
After that, we iterate over the children nodes, using the node transformation matrix as the parent matrix for those child nodes.
The AINodeAnim instance defines a set of keys that contain translation, rotation and scaling information. These keys are referred to specific instant of times. We assume that information is ordered in time, and construct a list of matrices that contain the transformation to be applied for each frame. As it has been said before, some of those transformations may "stop" at a specific frame, we should use the last values for the last of the frames.
The findAIAnimNode method is defined like this:
publicclassModelLoader {...privatestaticAINodeAnimfindAIAnimNode(AIAnimation aiAnimation,String nodeName) {AINodeAnim result =null;int numAnimNodes =aiAnimation.mNumChannels();PointerBuffer aiChannels =aiAnimation.mChannels();for (int i =0; i < numAnimNodes; i++) {AINodeAnim aiNodeAnim =AINodeAnim.create(aiChannels.get(i));if (nodeName.equals(aiNodeAnim.mNodeName().dataString())) { result = aiNodeAnim;break; } }return result; }...}
The Mesh class needs to be updated to allocate the new VBOs for bone indices and bone weights. You will see that we use a maximum of four weights (and the associated bone indices per vertex)
Now we can view how we render animated models and how they can coexist with static ones. Let's start with the SceneRender class. In this class we just need to setup a new uniform to pass the bones matrices (assigned to current animation frame) so they can be used in the shader. Besides that, render of static and animated entities do not have any additional impact over this class.
For static models, we will pass an array of matrices set to null. We need to modify also the UniformsMap to add a new method to setup the values for an array of matrices:
We need to modify the scene vertex shader (scene.vert) to put into play animation data. We start by defining some constants and the new input attributes for bone weights and indices (we are using four elements per vertex so we use vec4 and ivec4). We pass also the bone matrices associated to current animation as a uniform.
In the main function we will iterate over the bone weights and modify the position and normals using the matrices, designated by the associated bone indices, and modulated by the associated weights. You can think about it like if each bone would contribute to position (and normals) modification but modulated by using the weights. If we are using static models, the weights will be zero so we will stick to original position and normals values.