In the meantime I was able to add the last final gathering acceleration technique of Wang et al. to my implementation. It is called illumination cuts and computes a cut through the kd-tree of the given photon map within which we want to gather. Here a cut defines the node set that contains exactly one node of each path from the tree root to any leaf of the tree. Subsequently we estimate the irradiance for each cut node.
To perform the final gather step we are now able to use the reduced set of cut nodes to interpolate irradiance values. So instead of searching within all photons of the photon map (usually several 100k), we just have to look for the closest cut nodes (some 1k).
Currently obtained computation times for a scene with the dragon model from the Stanford 3D Scanning Repository with approx. 202k triangles are listed in the following table.
|Technique||Time / s||Speedup|
|Full final gathering||168.4||N/A|
|Illumination cut (17k)||13.8||12.2|
|Adapt. sample seeding (4k) + illum. cut (17k)||2.3||73.2|
Without acceleration techniques of Wang et al. the GPU-based generation of one image takes about 168.4 s using my current implementation. I reach 35-45 MRays/s for primary rays and approx. 20 MRays/s for final gather rays. For all subsequent images I used 64 shadow rays (area light source) and 256 final gather rays per sample.
Adding an illumination cut with around 17k cut nodes reduces the computation time by the factor of 12. Perceptually the resulting image (see below) seems just like the one above. However, a closer look will reveal some differences. The high number of cut nodes is the result of a currently used simplification: I just use all leaf nodes as cut nodes. This approach is neccessary since my method for approximating the normal vector in the nodes’ center points is not yet that sophisticated. Therefore images obtained with smaller cuts are less accurate.
Finally, when activating the use of adaptive sample seeding, I can reduce the computation time to about 2.3 seconds using 4k adaptive samples. Adjacent pictures show both the resulting image and an error image. The error image shows the 4x-scaled absolute error compared to the full final gathering image.
Scaling the error image led to seemingly significant errors, however perceptually the error is quite small. Nevertheless the last image could be generated about 73-times faster than the first image. It should be noted that the first image was genereated using the same GPU-based photon map and ray tracer implementation, just without final gathering acceleration techniques.