Texture Mapping and CUDA

06 Apr 2010

To get more realistic pictures I added texture mapping to my application MNRT. My motivation was the Sponza scene from Marko Dabrovic, which I encountered while preparing my photon mapping implementation. The scene can be downloaded here. A more complex version can be found on Crytek’s website. The following screenshot shows the result of my work:


Sponza scene (2x2 super sampling)

During the implementation I had the following problem: CUDA has support for textures as read-only memory, but doesn’t support arrays of texture references (see CUDA FAQ). Since the number of textures is not known in advance, this wasn’t very helpful. According to the CUDA FAQ there are (at least) two solutions:

  1. Move the data (here: 2D textures) into the slices of a 3D texture.
  2. One variable per 2D texture; switch statements to select textures.

Both variants have drawbacks as I want to explain now:

3D-Texture: In the beginning I found it pretty interesting to pack the 2D textures into the slices of a 3D texture. Therefore I tried this alternative at first. I created a cudaArray using the maximum width (height) of the 2D textures as the width (height). For the depth I used the number of 2D textures to store. This is illustrated in the next picture:


Packing 2D textures into the slices of a 3D texture

As you can see in the picture, this method will only work well when the texture dimensions do not vary too much. The subsequent enumeration summarizes the disadvantages of this method:

  • You are forced to use cudaFilterModePoint filter mode to avoid fetching from different textures.
  • Normalized texture coordinates are problematic.
  • 2D texture dimension is limited to 2048×2048 (cudaArray limits for 3D textures).
  • Wasting lots of memory when having greatly varying texture dimensions.

However, a big advantage of this method is the relatively variable number of 2D textures it can support (up to 2048).

One variable per texture: At first I wanted to avoid this alternative since it downgrades code quality a lot. The main reason for my concerns was that the Sponza scene has allready 14 textures.

But after I realized the problems of the 3D texture variant, I chose to implement this method. Currently I allow up to 20 textures. I used macros to avoid some redundant code. To fetch data from the textures I use a switch statement as proposed in the CUDA FAQ. This is shown in the subsequent code fragment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
__device__ float4 dev_fetchDiffTex(int idxTex, float u, float v)
{
    switch(idxTex)
    {
        case 0: return tex2D(tex_diffTex0, u, v);
        case 1: return tex2D(tex_diffTex1, u, v);
        case 2: return tex2D(tex_diffTex2, u, v);
        case 3: return tex2D(tex_diffTex3, u, v);
        ...
    }
    return make_float4(0.f, 0.f, 0.f, 0.f);
}

Essential drawbacks of this alternative are:

  • High code redundancy, therefore bad code quality.
  • Number of supported 2D textures relatively small. Extension possible, but check the first point.

Now to the advantages: Both normalized texture coordinates and cudaFilterModeLinear filter mode are working. Furthermore differing texture sizes don’t have any impact on required memory.

Concluding remarks: I’m not really happy with my current solution. It’s just too inflexible. An improvement might be the use of multiple 3D textures of different dimensions. One could move bigger textures into a larger 3D texture and smaller textures into a smaller 3D texture. This should reduce the memory requirements. However, the texture filtering is still to be done manually.