Category Archives: GPU

Path Tracing 3D Fractals

January 18, 2015Fragmentarium, GPU, Mandelbulb, RaytracingMikael Hvidtfeldt Christensen

In some ways path tracing is one of the simplest and most intuitive ways to do ray tracing.

Imagine you want to simulate how the photons from one or more light sources bounce around a scene before reaching a camera. Each time a photon hits a surface, we choose a new randomly reflected direction and continue, adjusting the intensity according to how likely the chosen reflection is. Though this approach works, only a very tiny fraction of paths would terminate at the camera.

So instead, we might start from the camera and trace the ray from here and until we hit a light source. And, if the light source is large and slowly varying (for instance when using Image Based Lighting), this may provide good results.

But if the light source is small, e.g. like the sun, we have the same problem: the chance that we hit a light source using a path of random reflections is very low, and our image will be very noisy and slowly converging. There are ways around this: one way is to trace rays starting from both the camera and the lights, and connect them (bidirectional path tracing), another is to test for possible direct lighting at each surface intersection (this is sometimes called ‘next event estimation’).

Even though the concept of path tracing might be simple, introductions to path tracing often get very mathematical. This blog post is an attempt to introduce path tracing as an operational tool without going through too many formal definitions. The examples are built around Fragmentarium (and thus GLSL) snippets, but the discussion should be quite general.

Let us start by considering how light behaves when hitting a very simple material: a perfect diffuse material.

Diffuse reflections

A Lambertian material is an ideal diffuse material, which has the same radiance when viewed from any angle.

Imagine that a Lambertian surface is hit by a light source. Consider the image above, showing some photons hitting a patch of a surface. By pure geometrical reasoning, we can see that the amount of light that hits this patch of the surface will be proportional to the cosine of the angle between the surface normal and the light ray:

\(
cos(\theta)=\vec{n} \cdot \vec{l}
\)

By definition of a Lambertian material this amount of incoming light will then be reflected with the same probability in all directions.

Now, to find the total light intensity in a given (outgoing) direction, we need to integrate over all possible incoming directions in the hemisphere:

\(
L_{out}(\vec\omega_o) = \int K*L_{in}(\vec\omega_i)cos(\theta)d\vec\omega_i
\)

where K is a constant that determines how much of the incoming light is absorbed in the material, and how much is reflected. Notice, that there must be an upper bound to the value of K – too high a value would mean we emitted more light than we received. This is referred to as the ‘conservation of energy’ constraint, which puts the following bound on K:

\(
\int Kcos(\theta)d\vec\omega_i \leq 1
\)

Since K is a constant, this integral is easy to solve (see e.g. equation 30 here):

\(
K \leq 1/\pi
\)

Instead of using the constant K, when talking about a diffuse materials reflectivity, it is common to use the Albedo, defined as \( Albedo = K\pi \). The Albedo is thus always between 0 and 1 for a physical diffuse materials. Using the Albedo definition, we have:

\(
L_{out}(\vec\omega_o) = \int (Albedo/\pi)*L_{in}(\vec\omega_i)cos(\theta)d\vec\omega_i
\)

The above is the Rendering Equation for a diffuse material. It describes how light scatters at a single point. Our diffuse material is a special case of the more general formula:

\(
L_{out}(\vec\omega_o) = \int BRDF(\vec\omega_i,\vec\omega_o)*L_{in}(\vec\omega_i)cos(\theta)d\vec\omega_i
\)

Where the BRDF (Bidirectional Reflectance Distribution Function) is a function that describes the reflection properties of the given material: i.e. do we have a shiny, metallic surface or a diffuse material.

Completely diffuse material (click for large version)

How to solve the rendering equation

An integral is a continuous quantity, which we must turn into something discrete before we can handle it on the computer.

To evaluate the integral, we will use Monte Carlo sampling, which is a very simple: to provide an estimate for an integral, we will take a number of samples and use the average values of these samples multiplied by the integration interval length.

\(
\int_a^b f(x)dx \approx \frac{b-a}{N}\sum _{i=1}^N f(X_i)
\)

If we apply this to our diffuse rendering equation above, we get the following discrete summation:

\(
\begin{align}
L_{out}(\vec\omega_o) &= \int (Albedo/\pi)*L_{in}(\vec\omega_i)cos(\theta)d\vec\omega_i \\
& = \frac{2\pi}{N}\sum_{\vec\omega_i} (\frac{Albedo}{\pi}) L_{in}(\vec\omega_i) (\vec{n} \cdot \vec\omega_i) \\
& = \frac{2 Albedo}{N}\sum_{\vec\omega_i} L_{in}(\vec\omega_i) (\vec{n} \cdot \vec\omega_i)
\end{align}
\)

Test render (click for large version)

Building a path tracer (in GLSL)

Now we are able to build a simple path tracer for diffuse materials. All we need to do is to shoot rays starting from the camera, and when a ray hits a surface, we will choose a random direction in the hemisphere defined by the surface normal. We will continue with this until we hit a light source. Each time the ray changes direction, we will modulate the light intensity by the factor found above:

\(
2*Color*Albedo*L_{in}(\vec\omega_i) (\vec{n} \cdot \vec\omega_i)
\)

The idea is to repeat this many times for each pixel, and then average the samples. This is why the sum and the division by N is no longer present in the formula. Also notice, that we have added a (material specific) color. Until now we have assumed that our materials handled all wavelengths the same way, but of course some materials absorb some wavelengths, while reflecting others. We will describe this using a three-component material color, which will modulate the light ray at each surface intersection.

All of this boils down to very few lines of codes:

vec3 color(vec3 from, vec3 dir)
{
  vec3 hit = vec3(0.0);
  vec3 hitNormal = vec3(0.0);
	
  vec3 luminance = vec3(1.0);
  for (int i=0; i < RayDepth; i++) {
    if (trace(from,dir,hit,hitNormal)) {
       dir = getSample(hitNormal); // new direction (towards light)
       luminance *= getColor()*2.0*Albedo*dot(dir,hitNormal);
       from = hit + hitNormal*minDist*2.0; // new start point
    } else {
       return luminance * getBackground( dir );
    }
  }
  return vec3(0.0); // Ray never reached a light source
}

The getBackground() method simulates the light sources in a given direction (i.e. infinitely far away). As we will see below, this fits nicely together with using Image Based Lighting.

But even when implementing getBackground() as a simple function returning a constant white color, we can get very nice images:

and

The above images were lightened only a constant white dome light, which gives the pure ambient occlusion like renders seen above.

Sampling the hemisphere in GLSL

The code above calls a 'getSample' function to sample the hemisphere.

       dir = getSample(hitNormal); // new direction (towards light)

This can be a bit tricky. There is a nice formula for \(cos^n\) sampling of a hemisphere in the GI compendium (equation 36), but you still need to align the hemisphere with the surface normal. And you need to be able to draw uniform random numbers in GLSL, which is not easy.

Below I use the standard approach of putting a seed into a noisy function. The seed should depend on the pixel coordinate and the sample number. Here is some example code:

vec2 seed = viewCoord*(float(subframe)+1.0);

vec2 rand2n() {
    seed+=vec2(-1,1);
	// implementation based on: lumina.sourceforge.net/Tutorials/Noise.html
    return vec2(fract(sin(dot(seed.xy ,vec2(12.9898,78.233))) * 43758.5453),
		fract(cos(dot(seed.xy ,vec2(4.898,7.23))) * 23421.631));
};

vec3 ortho(vec3 v) {
    //  See : http://lolengine.net/blog/2013/09/21/picking-orthogonal-vector-combing-coconuts
    return abs(v.x) > abs(v.z) ? vec3(-v.y, v.x, 0.0)  : vec3(0.0, -v.z, v.y);
}

vec3 getSampleBiased(vec3  dir, float power) {
	dir = normalize(dir);
	vec3 o1 = normalize(ortho(dir));
	vec3 o2 = normalize(cross(dir, o1));
	vec2 r = rand2n();
	r.x=r.x*2.*PI;
	r.y=pow(r.y,1.0/(power+1.0));
	float oneminus = sqrt(1.0-r.y*r.y);
	return cos(r.x)*oneminus*o1+sin(r.x)*oneminus*o2+r.y*dir;
}

vec3 getSample(vec3 dir) {
	return getSampleBiased(dir,0.0); // <- unbiased!
}

vec3 getCosineWeightedSample(vec3 dir) {
	return getSampleBiased(dir,1.0);
}

Importance Sampling

Now there are some tricks to improve the rendering a bit: Looking at the formulas above, it is clear that light sources in the surface normal direction will contribute the most to the final intensity (because of the \( \vec{n} \cdot \vec\omega_i \) term).

This means we might want sample more in the surface normal directions, since these contributions will have a bigger impact on the final average. But wait: we are estimating an integral using Monte Carlo sampling. If we bias the samples towards the higher values, surely our estimate will be too large. It turns out there is a way around that: it is okay to sample using a non-uniform distribution, as long as we divide the sample value by the probability density function (PDF).

Since we know the diffuse term is modulated by the \( \vec{n} \cdot \vec\omega_i = cos(\theta) \), it makes sense to sample from a non-uniform cosine weighted distribution. According to GI compendium (equation 35), this distribution has a PDF of \( cos(\theta) / \pi \), which we must divide by, when using cosine weighted sampling. In comparison, the uniform sampling on the hemisphere we used above, can be thought of either to be multiplied by the integral interval length (\( 2\pi \)), or diving by a constant PDF of \( 1 / 2\pi \).

If we insert this, we end up with a simpler expression for the cosine weighted sampling, since the cosine terms cancel out:

vec3 color(vec3 from, vec3 dir)
{
  vec3 hit = vec3(0.0);
  vec3 hitNormal = vec3(0.0);
	
  vec3 luminance = vec3(1.0);
  for (int i=0; i < RayDepth; i++) {
    if (trace(from,dir,hit,hitNormal)) {
       dir =getCosineWeightedSample(hitNormal);
       luminance *= getColor()*Albedo;
       from = hit + hitNormal*minDist*2.0; // new start point
    } else {
       return luminance * getBackground( dir );
    }
  }
  return vec3(0.0); // Ray never reached a light source
}

Image Based Lighting

It is now trivial to replace the constant dome light, with Image Based Lighting: just lookup the lighting from a panoramic HDR image in the 'getBackground(dir)' function.

This works nicely, at least if the environment map is not varying too much in light intensity. Here is an example:

Stereographic 4D Quaternion system (click for large version)

If, however, the environment has small, strong light sources (such as a sun), the path tracing will converge very slowly, since we are not likely to hit these by chance. But for some IBL images this works nicely - I usually use a filtered (blurred) image for lighting, since this will reduce noise a lot (though the result is not physically correct). The sIBL archive has many great free HDR images (the ones named '*_env.hdr' are prefiltered and useful for lighting).

Direct Lighting / Next Event Estimation

But without strong, localized light sources, there will be no cast shadows - only ambient occlusion like contact shadows. So how do we handle strong lights?

Test scene with IBL lighting

Let us consider the sun for a moment.

The sun has an angular diameter of 32 arc minutes, or roughly 0.5 degrees. How much of the hemisphere is this? The solid angle (which corresponds to the area covered of a unit sphere) is given by:

\(
\Omega = 2\pi (1 - \cos {\theta} )
\)

where \( \theta \) is half the angular diameter. Using this we get that the sun covers roughly \( 6*10^{-5} \) steradians or around 1/100000 of the hemisphere surface. You would actually need around 70000 samples, before there is even a 50% chance of a pixel actually catching some sun light (using \( 1-(1-10^{-5})^{70000} \approx 50\% \)).

Test scene: naive path tracing of a sun like light source (10000 samples per pixel!)

Obviously, we need to bias the sampling towards the important light sources in the scene - similar to what we did earlier, when we biased the sampling to follow the BRDF distribution.

One way to do this, is Direct Lighting or Next Event Estimation sampling. This is a simple extension: instead of tracing the light ray until we hit a light source, we send out a test ray in the direction of the sun light source at each surface intersection.

Test scene with direct lighting (100 samples per pixel)

Here is some example code:

vec3 getConeSample(vec3 dir, float extent) {
        // Formula 34 in GI Compendium
	dir = normalize(dir);
	vec3 o1 = normalize(ortho(dir));
	vec3 o2 = normalize(cross(dir, o1));
	vec2 r =  rand2n();
	r.x=r.x*2.*PI;
	r.y=1.0-r.y*extent;
	float oneminus = sqrt(1.0-r.y*r.y);
	return cos(r.x)*oneminus*o1+sin(r.x)*oneminus*o2+r.y*dir;
}

vec3 color(vec3 from, vec3 dir)
{
  vec3 hit = vec3(0.0);
  vec3 direct = vec3(0.0);
  vec3 hitNormal = vec3(0.0);
	
  vec3 luminance = vec3(1.0);
  for (int i=0; i < RayDepth; i++) {
    if (trace(from,dir,hit,hitNormal)) {
       dir =getCosineWeightedSample(hitNormal);
       luminance *= getColor()*Albedo;
       from = hit + hitNormal*minDist*2.0; // new start point

       // Direct lighting
       vec3 sunSampleDir = getConeSample(sunDirection,1E-5);
       float sunLight = dot(hitNormal, sunSampleDir);
       if (sunLight>0.0 && !trace(hit + hitNormal*2.0*minDist,sunSampleDir)) {
           direct += luminance*sunLight*1E-5;
       }
    } else {
       return direct + luminance*getBackground( dir );
    }
  }
  return vec3(0.0); // Ray never reached a light source
}

The 1E-5 factor is the hemisphere area covered by the sun. Notice, that you might run into precision errors with the single-precision floats used in GLSL when doing these calculations. For instance, on my graphics card, cos(0.4753 degrees) is exactly equal to 1.0, which means a physically sized sun can easily introduce large numerical errors (remember the sun is roughly 0.5 degrees).

Sky model

To provide somewhat more natural lighting, an easy improvement is to combine the sun light with a blue sky dome.

A slightly more complex model is the Preetham sky model, which is a physically based model, taking different kinds of scattering into account. Based on the code from Simon Wallner I implemented a Preetham model in Fragmentarium.

Here is an animated example, showing how the color of the sun light changes during the day:

Path tracing test from Syntopia (Mikael H. Christensen) on Vimeo.

Fractals

Now finally, we are ready to apply path tracing to fractals. Technically, there is not much new to this - I have previously covered how to do the ray-fractal intersection in this series of blog posts: Distance Estimated 3D fractals.

So the big question is whether it makes sense to apply path tracing to fractals, or whether the subtle details of multiple light bounces are lost on the complex fractal surfaces. Here is the Mandelbulb, rendered with the sky model:

Path traced Mandelbulb (click for larger version)

Here path tracing provides a very natural and pleasant lighting, which improves the 3D perceptions.

Here are some more comparisons of complex geometry:

Default ray tracer in Fragmentarium

Path traced in Fragmentarium

And another one:

Default ray tracer in Fragmentarium

Path traced in Fragmentarium

What's the catch?

The main concern with path tracing is of course the rendering speed, which I have not talked much about, mainly because it depends on a lot of factors, making it difficult to give a simple answer.

First of all, the images above are distance estimated fractals, which means they are a lot slower to render than polygons (at least of you have a decent spatial acceleration structure for the polygons, which is surprisingly difficult to implement on a GPU). But let me give some numbers anyway.

In general, the rendering speed will be (roughly) proportional to the number of pixels, the FLOPS of the GPU, and the number of samples per pixel.

On my laptop (a mobile mid-range NVIDIA 850M GPU) the Mandelbulb image above took 5 minutes to render at 2442x1917 resolution (with 100 samples per pixel). The simple test scene above took 30 seconds at the same resolution (with 100 samples per pixel). But remember, that since we can show the render progressively, it is still possible to use this at interactive speeds.

What about the ray lengths (the number of light bounces)?

Here is a comparison as an animated GIF, showing direct light only (the darkest), followed by one internal light bounce, and finally two internal light bounces:

In terms of speed one internal bounce made the render 2.2x slower, while two bounces made it 3.5x slower. It should be noted that the visual effect of adding additional light bounces is normally relatively small - I usually use only a single internal light bounce.

Even though the images above suggests that path tracing is a superior technique, it is also possible to create good looking images in Fragmentarium with the existing ray tracers. For instance, take a look at this image:

(taken from the Knots and Polyhedra series)

It was ray traced using the 'Soft-Raytracer.frag', and I was not able to improve the render using the Path tracer. Having said that, the Soft-Raytracer is also a multi-sample ray tracer which has to use lots of samples to produce the nice noise-free soft shadows.

References

The Fragmentarium path tracers are still Work-In-Progress, but they can be downloaded here:

Sky-Pathtracer.frag (which needs the Preetham model: Sunsky.frag).

and the image based lighting one:

IBL-Pathtracer.frag

The path tracers can be used by replacing an existing ray tracer '#include' in any Fragmentarium .frag file.

External resources

GI Total Compendium - very valuable collection of all formulas needed for ray tracing.

Vilém Otte's Bachelor Thesis on GPU Path Tracing is a good introduction.

Disney's BRDF explorer - Interactive display of different BRDF models - many examples included. The BRDF definitions are short GLSL snippets making them easy to use in Fragmentarium!

Inigo Quilez's path tracer was the first example I saw of using GPU path tracing of fractals.

Evan Wallace - the first WebGL Path tracer I am aware of.

Brigade is probably the most interesting real time path tracer: Vimeo video and paper.

I would have liked to talk a bit about unbiased and consistent rendering, but I don't understand these issues properly yet. It should be said, however, that since the examples I have given terminate after a fixed number of ray bounces, they will not converge to a true solution of the rendering equation (and, are thus both biased and inconsistent). For consistency, a better termination criterion, such as russian roulette termination, is needed.

Rendering 3D fractals without a distance estimator

September 30, 2012Distance Estimation, Fractals, Fragmentarium, GPUMikael Hvidtfeldt Christensen

I have written a lot about distance estimated 3D fractals, and while Distance Estimation is a fast and elegant technique, it is not always possible to derive a distance estimate for a particular system.

So, how do you render a fractal, if the only knowledge you have is whether a given point belongs to the set or not? Or, in other words, how much information can you extract if the only information you have is a black-box function of the form:

bool inside(vec3 point);

I decided to try out some simple brute-force methods to see how they would compare to the DE methods. Contrary to my expectations, it turned out that you can actually get reasonable results without a DE.

First a couple of disclaimers: brute-force methods can not compete with distance estimators in terms of speed. They will typically be a magnitude slower. And if you do have more information available, you should always use it: for instance, even if you can’t find a distance estimator for a given escape time fractal, the escape length contains information that can be used to speed up the rendering or create a surface normal.

The method I used is not novel nor profound: I simply sample random points along the camera ray for each pixel. Whenever a hit is found on the camera ray, the sampling will proceed on only the interval between the camera and the hit point (since we are only interested in finding the closest pixels), e.g. something like this:

for (int i=0; i
(The Near and Far distances are used to restrict the sample space, and speed up rendering)
There are different ways to choose the samples. The simplest is to just sample uniformly (as in the example above), but I found that a stratified approach, where the camera ray segment is divided into equal pieces and a sample is choosen from each part works better. I think the sampling scheme could be improved: in particular once you found a hit, you should probably bias the sampling towards the hit to make convergence faster. Since I use a progressive (double buffered) approach in Fragmentarium, it is also possible to read the pixel depths of adjacent pixels, which probably also could be used.
Now, after sampling the camera rays you end up with a depth map, like this:

(Be sure to render to a texture with 32-bit floats - a 8-bit buffer will cause quantization).
For distance estimated rendering, you can use the gradient of the distance estimator to obtain the surface normal. Unfurtunately this is not an option here. We can, however, calculate a screen space surface normal, based on the depths of adjacent pixels, and transform this normal back into world space:
// Hit position in world space.
vec3 worldPos = Eye + (Near+tex.w*(Far-Near)) * rayDir;

vec3 n = normalize(cross(dFdx(worldPos), dFdy(worldPos)));	

(Update: I found out that GLSL supports finite difference derivatives through the dFdx statement, which made the code above much simpler).
Now we can use a standard lighting scheme, like Phong shading. This really brings a lot of detail to the image:

In order to improve the depth perception, it is possible to apply a screen space ambient occlusion scheme. Recently, there was a very nice tutorial on SSAO on devmaster, but I was to lazy to try it out. Instead I opted for the simplest method I could think of: simply sample some pixels in a neighborhood, and count how many of them that are closer to the camera than the center pixel. 
float occ = 0.;
float samples = 0.;
for (float x = -5.; x<=5.; x++) {
  for (float y = -5.; y<=5.; y++) {
    if (x*x+y*y>25.) continue;
    vec3 jitteredPos = pos+vec2(dx*(x+rand(vec2(x,y)+pos)),dy*(y+rand(vec2(x,y)+pos);
    float depth = texture2D(frontbuffer,jitteredPos).w;
    if (depth>=centerDepth) occ+=1.;
    samples++;
  }
}
occ /= samples;

This is how this naive ambient occlusion scheme works:

(Notice that for pixels with no hits, I've choosen to lighten, rather than darken them. This creates an outer glow effect.)  
Now combined with the Phong shading we get:

I think it is quite striking how much detail you can infer simply from a depth map! In this case I didn't color the fractal, but nothing prevents you from assigning a calculated color. The depth buffer information only uses the alpha channel.
Here is another example (Aexion's MandelDodecahedron):

While brute-force rendering is much slower than distance estimation, it is possible to render these systems at interactive frame rates in Fragmentarium, especially since responsiveness can be improved by using progressive rendering: do a number of samples, then storing the best found solution (closest pixel) in a depth buffer (I use the alpha channel), render the frame and repeat.
There are a couple of downsides to brute force rendering:

It is slower than distance estimation
You have to rely on screen space methods for ambient occlusion, surface normals, and depth-of-field
Anti-aliasing is more tricky since you cannot accumulate and average. You may render at higher resolution and downsample, or use tiled rendering, but beware that screen space ambient occlusion introduce artifacts which may be visible on tile edges.

On the other hand, there are also advantages:

Much simpler to construct
Interior renderings are trivial - just reverse the 'inside' function
Progressive quality rendering: just keep adding samples, and the image will converge.

To use the Fragmentarium script, just implement an 'inside' function:
#define providesInside
#include "Brute-Raytracer.frag"

bool inside(vec3 pos) {
    // fractal definition here 
}

It is also possible to use the raytracer on existing DE's - here a point is assumed to be inside a fractal if the DE returns a negative number, and outside if the DE returns a positive one.
#include "Brute-Raytracer.frag"

float DE(vec3 pos) {
    // fractal definition here
    // notice, that only the sign of the return value is checked
}

The script can be downloaded as part of Fragmentarium source distribution (it is not yet in the binary distributions). The following files are needed:
Tutorials/3D fractals without a DE.frag
Include/Brute-Raytracer.frag
Include/DepthBufferShader.frag
Include/Brute3D.frag

Fragmentarium Version 0.9.12 (“Prague”) Released

August 21, 2012Fractals, Fragmentarium, GPUMikael Hvidtfeldt Christensen

I’ve released a new build of Fragmentarium, version 0.9.12 (“Prague”). It can be downloaded at Github. (Binaries for Windows, source for Windows/Linux/Mac)

The (now standard) caveat apply: Fragmentarium is very much work in progress, and is best suited for people who like to experiment with code.

Version 0.9.12 continues to move Fragmentarium in the direction of progressive HDR rendering. The default raytracers now use accumulated rendering for anti-alias, shadowing, and DOF. To start the progressive rendering, Fragmentarium must be set to ‘Continuous’ mode. It is possible to set a maximum number of rendered frames. All 2D and 3D system now also come with tone mapping, gamma correction, and color control (see the ‘Post’ tab).

IBL Raytracing, using an HDR panorama from Blotchi at HDRLabs.

There is a new raytracer, ‘IBL-raytracer.frag’ which can be used for DE’s instead of the default raytracer. It uses Image Based Lighting from HDR panorama maps. For an example of the new IBL raytracer, see the tutorial: “25 – Image Based Lighting.frag”.

If you need to do stuff like animation, it is still possible to use the old raytracers. They can be included as: “#include “DE-Raytracer-v0.9.1.frag” or “#include “DE-Raytracer-v0.9.10.frag”

Other than that there is now better support for buffer-swap systems (e.g. reaction-diffusion and game-of-life) and better control of texture look-ups.

There are also some interesting new fragments, including the absolutely amazing LivingKIFS.frag script from Kali.

New features

Added maximum subframe counter (for progressive rendering).
Added support for HDR textures (.hdr RGBE format).
Tonemapping, color control, and Gamma correction in buffershader.
Added support for widget for changing bound textures.

More host defines:

#define SubframeMax 0
#define DontClearOnChange   <- when sliders/camera changes, the backbuffer is not cleared.
#define IterationsBetweenRedraws 20  <- makes it possible to do several steps without updating screen.

Added texture parameters preprocessor defines:

#TexParameter texture GL_TEXTURE_MIN_FILTER GL_LINEAR
#TexParameter texture GL_TEXTURE_MAG_FILTER GL_NEAREST
#TexParameter texture GL_TEXTURE_WRAP_S GL_CLAMP
#TexParameter texture GL_TEXTURE_WRAP_T GL_REPEAT

Change of syntax: when using "#define providesColor", now implement a 'vec3 baseColor(vec3)' function.
DE-Raytracer.frag now uses a 'Blinn-Phong with Schickl term and physical normalization'. (Which is something I found in Naty Hoffman's Course Notes).
DE-Raytracer.frag and Soft-Raytracer now uses new '3D.frag' base class.
Added a texture manager (should reuse and discard textures in memory automatically)
Added option to set OpenGL refresh rate in preferences.
Progressive2D.frag now supports custom filtering (using '#define providesFiltering')
Added support for choosing '#include' through editor context menu.
Using arrow keys now work when sliders have focus.
Now does a 'reset all' when loading new system (otherwise too confusing).

New fragments

Added 'Kali's Creations': KaliBox, LivingKIFS, TreeBroccoli, Xray_KIFS. [Kali]
Added Doyle-Spirals.frag [Knighty]
Added: Droste.frag (Escher Droste effect)
Added: Reaction-Diffusion.frag (Gray-Scott example)
Added 'Convolution.frag' example (For precalculating specular and diffuse lighting from HDR panoramas)
Added examples of working with double precision floats and emulated double precision floats: "Include/EmulatedDouble.frag", "Theory/Mandelbrot - Emulated Doubles.frag"
Added 'IBL-Raytracer.frag' (Image Based Lighting raytracer)
Added tutorials: 'progressive2D.frag' and 'pure3D.frag'
Added experimental: 'testScene.frag' and 'triplanarTexturing.frag'
Added 'Thorn.frag'

Bug fixes

Reflection is now working again in 'DE-Raytracer.frag'
Fixed filename case sensitivity error when doing reverse lookup of line numbers.

Mac users

Some Mac users has reported problems with the latest versions of Fragmentarium. Again, I don't own a Mac, so I cannot solve these issues without help.

For examples of images generated with the new version, take a look at the Flickr Fragmentarium stream.

Finally, please read the FAQ, before asking questions.

Reaction-Diffusion Systems

August 8, 2012Fragmentarium, GPU, Reaction DiffusionMikael Hvidtfeldt Christensen

Reaction-diffusion systems model the spatial dynamics of chemicals. An interesting early application was Alan Turing’s theory of Morphogenesis (Turing’s 1951 paper). Here, he suggested, that the pattern formation in animal skin could be explained by a two component reaction-diffusion system.

Reaction-diffusion systems are interesting, because they display a wide range of self-organizing patterns, and they have been used by several digital artists, both for 2D pattern generation and 3D structure generation.

The reaction-diffusion model is a great example of how complex large-scale structure may emerge from simple, local rules.

Modelling Reaction-Diffusion on a GPU

As the name suggests, these systems have two driving components: diffusion, which tends to spread out or smoothen concentrations, and reactions, which describe how chemical species may transform into each other.

For each chemical species, it is possible to describe the evolution using a differential equation on the form:

\(\frac {dA}{dt} = K \nabla^2 A + P(A,B)\)

Where A and B are fields describing the concentration of a chemical species at each point in space. The \(K\) coefficient determines how quickly the concentration spreads out, and \(P(A,B)\) is a polynomial in the different species concentrations in the system. There will be a similar equation for the B field.

To model these, we can represent the concentrations on a discrete grid, which fits nicely on a 2D texture on a GPU. The time derivative can solved in discrete time steps using forward Euler integration (or something more powerful). On a GPU, we need two buffers to do this: we render the next time step into the front buffer using values from the back buffer, and then swap the buffers.

Buffer swapping is a standard technique, and in Fragmentarium the only thing you need to do, is to declare a ‘uniform sampler2D backbuffer;’ and Fragmentarium will take care of creation and swapping of buffers. We also use the Fragmentarium host define ‘#buffer RGBA32F’ to ask for four-component 32-bit float buffers, instead of the normal 8-bit integer buffers.

The Laplacian may be calculated using a finite differencing scheme, for instance using a five-point stencil:

vec3 P = vec3(pixelSize, 0.0); 

// Five point stencil Laplacian
vec4  laplacian5() {
	return 
	+  texture2D( backbuffer, position - P.zy)
	+  texture2D( backbuffer, position - P.xz) 
	-  4.0 * texture2D( backbuffer,  position )
	+ texture2D( backbuffer,  position + P.xz )
	+ texture2D( backbuffer,  position +  P.zy );
}

(see the Fragmentarium source for a nine-point stencil).

A simple two-component Gray-Scott system may then be modelled simply as:

// time step for Gray-Scott system:
vec4 v = texture2D(backbuffer, position);
vec2 lv = laplacian5().xy; // laplacian
float xyy = v.x*v.y*v.y;   // utility term
vec2 dV = vec2( Diffusion.x * lv.x - xyy + f*(1.-v.x), Diffusion.y * lv.y + xyy - (f+k)*v.y);
v.xy += timeStep*dV;

(Robert Munafo has a great page with more information on Gray-Scott systems).

Here is an example of a typical system created using the above system, though many other patterns are possible:

It is also possible to enforce some structure by changing the concentrations in certain regions:

You can even use a picture to modify the concentrations:

A template implementation can be found as part of the Fragmentarium source at GitHub: Reaction-Diffusion.frag. Notice, that this fragment requires a recent source build from the GitHub repository to run.

Reaction-Diffusion systems used by artists

Several artist have used Reaction Diffusion systems in different ways, but the most impressive examples of 2D images I have seen, are the works of Jonathan McCabe. For instance his Bone Music series: or his Turing Flow series:

McCabe’s images are created using a more complex multi-scale model. Softology’s blog entry and W:Blut’s post dissect McCabe’s approach (there is even a reference implementation in Processing). Notice, that Nervous System sells some of McCabe’s works as jigsaw puzzles.

Reaction-Diffusion systems in WebGL

Felix Woitzel (@Flexi23) has created some beautiful WebGL-based reaction-diffusion demos, such as this Fluid simulation with Turing patterns:

He also has created several other RD based variants over at WebGL Playground.

Fabricated 3D Objects

Jessica Rosenkrantz and Jesse Louis-Rosenberg at Nervous System create and sell objects designed and inspired by generative processes. Several of their objects, including these cups, plates, and lamps are based on reaction-diffusion systems, and can be bought from their webshop.

Be sure to read their blog entries about reaction-diffusion. And don’t forget to take a look at their Cell Cycle WebGL design app, while visiting.

Reaction-Diffusion Software

An easy way to explore reaction-diffusion systems with doing any coding is by using Ready, which uses OpenCL to explore RD systems. It has several interesting features, including the ability to run systems on 3D meshes and directly interact and ‘paint’ on the surfaces.

It also lets you run Game-of-Life on exotic geometries, such as a torus or even something as exotic as a Penrose tiling.

Double Precision in OpenGL and WebGL

July 10, 2012Fragmentarium, GPU, WebGLMikael Hvidtfeldt Christensen

This post talks about double precision numbers in OpenGL and WebGL, and how to emulate them if there is no native hardware support.

In GLSL 4.00.9 (which is OpenGL 4.0) and higher, there is a native double precision floating point type. And if your graphics card is able to run OpenGL 4.0, it most likely has native hardware support for doubles (except for a few ATI/AMD cards). There are some caveats, though:

Not all functions are supported with double precision arguments. For instance, there are no trigonometric and exponential functions. (The available functions may be found here).
You can not pass double precision ‘varying’ parameters from the vertex shader to the fragment shader, and have the GPU automatically interpolate them. Double precision varying variables must be flat.
Double precision performance may be artificially limited by the hardware manufacturers. This is the case for Nvidia’s Fermi architecture, where the scientific computing brand, the Tesla series, can execute double precision arithmetics at half the speed of single precision, while the consumer brand, the GeForce series, only can execute double precision arithmetics at 1/8 the speed of single precision. For Nvidia’s brand new Kepler architecture used in the GeForce 600 series, things change again: here the difference between single and double precision will be a whopping factor 24! Notice, that this will also be the case for some cards in the Kepler Tesla branch, such as the Tesla K10.
In Fragmentarium (and in general, in Qt’s OpenGL wrapper classes) it is not possible to set double precision uniforms. This should be easy to circumvent by using the OpenGL API directly, though.

(Non-related Fragmentarium image)

In order to use double precision, you must either specify a GLSL version 4.00 (or higher) or use the extension:

#extension GL_ARB_gpu_shader_fp64 : enable

Older cards, like the GeForce 310M in my laptop, does not support double precision in hardware. Here it is possible to use emulated double precision instead.

I used the functions by Henry Thasler described here in his posts, to emulate a double precision number stored in two single precision floats. The worst part about doing emulated doubles in GLSL, is that GLSL does not support operator overloading. This means the syntax gets ugly for simple arithmetics, e.g. ‘z = add(mul(z1,z2),z3)’ instead of ‘z = z1*z2+z3’.

On Nvidia cards, it is necessary to turn off optimization to use Thasler’s code – this can be done using the following pragmas:

#pragma optionNV(fastmath off)
#pragma optionNV(fastprecision off)

(Non-related Fragmentarium image)

Performance

To test performance, I used a Mandelbrot test scene, rendered at 1000×500 with 1000 iterations in Fragmentarium. The numbers show the performance in frames per second. The zoom factor was determined visually, by noticing when pixelation occurred.

	Geforce 570GTX	Tesla 2075	Max Zoom
	(~300USD)	(~2200USD)
Single	140	100	10^5
Double	41	70	10^14
Emulated Double	16	11	10^13

Some observations:

Emulated double precision is slightly less accurate then true hardware doubles, but not much in this particular scenario.
Emulated doubles are roughly 1/9th the speed of single precision. Amazingly, this suggest that on the Kepler architecture it might make more sense to use emulated double precision than the built-in hardware support!
Hardware doubles on the 570GTX performs better than expected (they should perform at roughly 1/8 the speed). This is probably because double precision arithmetics isn’t the only bottleneck in the shader.

Notice that the Tesla card was running on Windows in WDDM mode, not TCC mode (since you cannot use GLSL shaders in TCC mode). Not that I think performance would change.

WebGL and double precision

WebGL does not support double precision in its current incarnation. This might change in the future, but currently the only choice is to emulate them. This, however, is problematic since WebGL seems to strip away pragmas! Henry Thasler’s emulation code doesn’t work under the ANGLE layer either. In fact, the only configuration I could get to work, was on a Intel HD 3000 GPU with ANGLE disabled. I did create a sample application to test this which can be tried out here:
Click to run WebGL app. Left side is single-precision, right side is emulated double precision. Here shown on Firefox without ANGLE on a Intel HD 3000 card.

It is not clear why the WebGL version does not work on Nvidia cards. Floating points may run at lower resolution in WebGL, but I’m using the ‘precision highp’ qualifiers. I also tried querying the resolution using glContext.getShaderPrecisionFormat(…), but had no luck – it is only available on Firefox, and on my GPU’s it just returns precision=0.

The most likely explanation is that Nvidia drivers perform some optimizations which spoils the emulation code. This is also the case for desktop OpenGL, but here the pragma’s solve the problem.

The emulation code uses constructs like:
z = a - (a - b);
which I suspect the well-meaning compiler might translate to ‘z=b’, since the rounding errors normally would be insignificant. Judging from some comments on Thasler’s original posts, it might be possible to prevent this using constructs such as: ‘z = a – float(a-b)’, but I have not pursued this.

Fragmentarium and Double Precision

Except that there are no double-precision sliders (uniforms), it is straight-forward to use double precision code in Fragmentarium. The only thing to remember is that you cannot pass doubles from the vertex shader to the fragment shader, which is the standard way of passing camera information to the shader in Fragmentarium.

I’ve also included a small port of Thaslers GLSL code in the distribution (see “Include/EmulatedDouble.frag”). It is quite easy to use (for an example, try the included “Theory/Mandelbrot – Emulated Doubles.frag”).

WebGL for Shader Graphics

June 15, 2012Fractals, GPU, WebGLMikael Hvidtfeldt Christensen

Web applications are becoming popular, not at least because of Google’s massive effort to push everything through the browser (with Chrome OS being the most extreme example, where everything is running through a browser interface).

Before WebGL, the only way to create efficient graphics was through plug-ins, such as Adobe’s Flash, Microsoft’s Silverlight, Unity, or Google’s O3D and Native Client. But WebGL is a vendor independent technology, directly integrated with the browser’s JavaScript language and DOM model.

Unfortunately, WebGL browser support is limited. WebGL is not available in Internet Explorer on Windows, and is not enabled by default in Safari on Mac OS X. This means that roughly 50% of all internet users won’t have access to WebGL content. WebGL is not supported on iOS devices either (even though it is accessible for iAds, and can be enabled on jail-broken devices).

What is worse, is that Microsoft do not even plan to support WebGL, since they consider it a security threat. Their concerns are reasonable, but their solution is not: it would be much better if they simply showed a dialog box message, warning the user that executing WebGL provides a security risk, and giving a choice to continue or not – the same way they warn about plugins and downloaded executables.

Some very impressive stuff has been done using WebGL, though: for instance ro.me, Path Tracing (Evan Wallace) , Cars (Altered Qualia), Terrain Editor (Rob Chadwick), Traveling Wavefronts (Felix Woitzel), Hartverdrahtet.

Using WebGL for Fractals

There are already some great tools available for experimenting with WebGL: ShaderToy, GLSLSandbox, WebGL Playground. Their main weakness is that it is difficult to store state information (for instance, if you want a movable camera), since this cannot be done in the shader itself, without using weird hacks. So, I decided to start out from scratch to get a feeling for WebGL.

WebGL (specification) is a JavaScript API based on OpenGL ES 2.0, a subset of the desktop OpenGL version designed for embedded devices such as cell phones.

Being a ‘modern’ OpenGL implementation, there is no support for fixed pipeline rendering: there is no matrix stack, no default shaders, no immediate mode rendering (you cannot use glBegin(…) – instead you must use vertex buffers). WebGL also misses some of more advanced features of the desktop OpenGL version, such as 3D textures, multiple render targets, and double precision support. And float texture support is an optional extension.

The first example I made was this Mandelbrot viewer: It demonstrates how to initialise WebGL and compile shaders, render a full-canvas quad, and process keyboard and mouse events and pass them through uniforms to the fragment shader.
Click the image to try out the WebGL demo.

A few programming comments. First JavaScript: I’m not very fond of JavaScript’s type system. The loose typing means that you risk finding bugs later, at run-time, instead of when compiling. It also means that it can be hard to read third-party code (which kind of parameters are you supposed to provide to a function like ‘update(ev, ui)’?). As for numerical types, JavaScript only has the Number type: an IEEE 754 double precision type – no integers!. Some browsers also silently ignore errors during run-time, which makes it even harder to find bugs. On the positive side is the quick iteration time, and the Firebug Firefox plugin, which is an extremely powerful tool for debugging web and JavaScript code.

As for the HTML, I still find it difficult to do table-less layout using floating div’s and css. I’m missing the flexible layout managers that many desktop UI kits provide, which makes it easy to align components and control how they scale when resized (but I may be biased towards desktop UI’s). Also, as HTML was not designed with UI widgets in mind, you have to use a third-party library to display a simple slider: I chose jQuery UI, which was easy to setup and use.

Finally the WebGL: The WebGL GLSL shader code is very similar to the desktop GLSL dialect. The biggest difference is the way loops are handled. Only ‘for’ loops are available, and with a very restricted syntax. It seems the iteration count must be determinable at compilation time (probably because some implementations unroll all loops), which means you no longer can use uniforms to control the loops (you can, however, ‘break’ out of loops dynamically based on run-time variables). This means, that in order to pass the iteration count and number of samples to the Mandelbrot shader, I have to do direct text substitutions in the shader code and recompile.

But my biggest frustation was caused by the ANGLE translation layer. Even for this very simple example, I had several issues with ANGLE – see the notes below.

Feel free to use the example as a starting point for further experiments – it is quite simple to modify the 2D shader code.

Notes about ANGLE

A problem with WebGL is poor graphics driver support for OpenGL. Chrome and Firefox have chosen a very radical approach to solve this: on Windows, they convert all WebGL GLSL shader code into DirectX 9 HLSL code through a converter called ANGLE. Their rationale for doing this, is that OpenGL 2.0 drivers are not available on all computers. However, several shaders won’t run due to the ANGLE translation, and the compilation time can be extremely slow. Wrt drivers, older machines with integrated graphics might be affected, but anything with a less than five year old Nvidia, AMD, or Intel HD graphics card should work with OpenGL 2.0.

In my experiments above, I ran into a bug that in some cases make loops with more than 255 iterations fail (I’ve submitted a bug report).

When debugging ANGLE problems, a good first step is to disable ANGLE and test the shaders. In Chrome, this can be done by starting the executable with the comand line argument –use-gl=desktop. You can check your ANGLE version with the URL chrome://gpu-internals/. In Firefox use the about:config URL, and webgl.force-enabled=true and webgl.prefer-native-gl=true to disable ANGLE.

It is also possible to get the translated HLSL code using the WEBGL_debug_shaders extension. However, this extension is only available for privileged code, which means Chrome must be started with the command line parameter –enable-privileged-webgl-extensions. After that the HLSL source can be obtained by calling:

var hlsl = gl.getExtension("WEBGL_debug_shaders").getTranslatedShaderSource(fragmentShader)

I still haven’t found an workaround for this earlier Mandelbulb experiment (using GLSLSandbox), which fails with another ANGLE bug:
Click the image to try out the WebGL demo (fails on ANGLE systems).

But, I’ll try implementing it from scratch to see if I can find the bug.

Fragmentarium FAQ

December 30, 2011Fragmentarium, GPUMikael Hvidtfeldt Christensen

Here is a small collection of questions, I’ve have been asked from time to time about Fragmentarium.

Why does Fragmentarium crash?

The most common cause for this is, that a drawing operation takes too long to complete. On Windows, there is a two seconds maximum time limit on jobs executing on the GPU. After that, a GPU watchdog timer will kill the graphics driver and restore it (resulting in unresponsive, possible black, display for 5-10 seconds). The host process (here Fragmentarium) will also be shut down by Windows.

If this happens, you will get errors like this:

"The NVIDIA OpenGL driver lost connection with the display driver
and is unable to continue. 
The application must close.... Error code: 8"

"Display driver stopped responding and has recovered"

Workarounds:
Try lowering the number of ray steps, the number of iterations or use the preview feature. Notice that for high-resolution renders, the time limit is for each tile, so it is still possible to do high resolution images.

Another solution is to change the watchdog timer behaviour via the TDR Registry Keys:

The TDR registry keys were not defined in my registry, but I added a

DWORD TdrDelay = 30

and a

DWORD TdrDdiDelay = 30

key to

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers

which sets a 30 second window for GPU calculations. You have to restart Windows to apply these settings. Be advised that changing these settings may render your system completely unresponsive if the GPU crashes.

Why does Fragmentarium not work on my GPU?

Even though GLSL is a well-defined standard, the are differences between different vendor implementations and different drivers. The computers I have use have Nvidia cards, so Fragmentarium is most tested on Nvidia’s platform.

Typical problems:

The ATI compiler is more strict about casting. For instance, adding an integer to a float results in an error, and not in a warning:

float a = b + 3;

The ATI compiler also seems to suffer from some loop optimization problems: some of the fractals in Fragmentarium does not work on my ATI card, if the number of iterations is not locked (or hard-coded in the fragment code). Inserting a condition into the loop also solves the problem.

The Intel GPU compiler also has some issues: for instance, some operations on literal constants results in errors:

vec3 up = normalize(vec3(1.0,0.0,0.0)); // fails on Intel

I get weird errors about disabled widgets?

If you see warnings like:

Unable to find 'FloorNormal' in shader program. Disabling widget.

it does not indicate a problem. For instance, the warning above appears when the EnableFloor checkbox is locked and disabled. In this case, the GLSL will optimize the floor code away, and the FloorNormal uniform variable will no longer be part of the compiled program – hence the warning. These warnings can be safely ignored.

Why does your Fragmentarium images look much nicer than the ones I get in Fragmentarium?

The images I post are always rendered at a higher resolution using the High Resolution Render option. I then downscale the images to a lower resolution. This reduces alias and rendering artifacts. Use a painting program with a proper downscaling filter – I use Paint.NET which seems to work okay.

Before rendering in hi-res, use the Tile Preview to zoom in, and adjust the details level and number of ray steps, so the image looks okay in the Tile Preview resolution.

Why is Fragmentarium slower than BoxPlorer/Fractals.io/…?

The default ray tracer in Fragmentarium has grown somewhat complex. It is possible to gain speed by locking variables, but this is somewhat tedious. Another solution is to change to another raytracer, e.g. change

#include "DE-Raytracer.frag"

to either

#include "Fast-Raytracer.frag"

(a faster version of the ones above, which will remember all settings)

#include "Subblue-Raytracer.frag"

(Tom Beddards raytracer, which uses a set of different parameters)

Can I use double precision in my shaders?

Most modern graphics cards support double precision numbers in hardware, so in principle, yes, if your card supports it. In practice, it is much more difficult:

First, the Fragmentarium presets and sliders (including camera settings), will only transfer data (uniforms) to the GPU as single precision floats. This is not the biggest problem, since you might only need double precision for the numbers that accumulate errors. The Qt OpenGL wrappers I use doesn’t support double types, but it would be possible to work around this if needed.

Second, while newer GLSL versions do support double precision numbers (through types such as double, dvec3, dmat3), not all of the built-in functions support them. In particular, there are no trigonometric or exponential functions. So no cos(double), exp(double), etc. The available functions are described here.

Third, it might be slow: When Nvidia designed their Fermi architecture, used in their recent graphic cards, they built it, so that is should be able to process double precision operations with half the speed of single precision operations (which is optimal given the double size of the numbers). However, they decided that their consumer branch (the Nvidia cards) should be artificially limited to run double precision calculations at 1/8 the speed of single precision numbers. Their Tesla line of graphics card (which shares architecture with the Geforce branch), is not artifically throttled and will run double precision at half the speed of single precision. As for the AMD/ATI cards, I do not think they have similar limitations, but I’m not sure about this.

If you really still want to try, you must insert this command at the top of the script:

#extension GL_ARB_gpu_shader_fp64 : enable

(Or use a #version command, but this will like cause problems with most of the existing examples).

Finally, what about emulating double precision numbers, instead of using the hardware versions? While this sounds very slow, it is probably not much slower than the throttled implementation. The downside is, the GLSL does not support operator overloading: there is no syntactically nice way to implement such functionality: instead of just changing your data types, you must convert all code from e.g.

A = B*C+D;

A = dplus(dmul(B,C),D);

If you are still interested, here is a great introduction to emulating doubles in GLSL.

How do I report errors, so that it is easiest for you to correct them?

(Well, actually nobody asks this questions)

If you find an error, please report the following:
– Operating system, and the graphics card you are using
– The version of Fragmentarium, and whether you built it yourself
– A reproducible description of the steps that caused the error (if possible).

You may mail errors to me at mikael (at) hvidtfeldts.net.

Optimizing GLSL Code

July 30, 2011Fragmentarium, GPU, ProgrammingMikael Hvidtfeldt Christensen

By making selected variables constant at compile time, some 3D fractals render more than four times faster. Support for easily locking variables has been added to Fragmentarium.

Some time ago, I became aware that the raytracer in Fragmentarium was somewhat slower than both Fractal Labs and Boxplorer for similar systems – this was somewhat puzzling since the DE raycasting technique is pretty much the same. After a bit of investigation, I realized that my standard raytracer had grown slower and slower, as new features had been added (e.g. reflections, hard shadows, and floor planes) – even if the features were turned off!

One way to speed up GLSL code, is by marking some variables constant at compile-time. This way the compiler may optimize code (e.g. unroll loops) and remove unused code (e.g. if hard shadows are disabled). The drawback is that changing these constant variables requires that the GLSL code is compiled again.

It turned out that this does have a great impact on some systems. For instance for the ‘Dodecahedron.frag’, take a look at the following render times:

No constants: 1.4 fps (1.0x)
Constant rotation matrices : 3.4 fps (2.4x)
Constant rotation matrices + Anti-alias + DetailAO: 5.6 fps (4.0x)
All 38 parameters (except camera): 6.1 fps (4.4x)

The fractal rotation matrices are the matrices used inside the DE-loop. Without the constant declarations, they must be calculated from scratch for each pixel, even though they are identical for all pixels. Doing the calculation at compile-time gives a notable speedup of 2.4x (notice that another approach would be to calculate such frame constants in the vertex shader and pass them to the pixel shader as ‘varying’ variables. But according to this post this is – surprisingly – not very effective).

The next speedup – from the ‘Anti-alias’ and ‘DetailAO’ variables – is more subtle. It is difficult to see from the code why these two variables should have such impact. And in fact, it turns out that combinations of other variables will amount in the same speedup. But these speedups are not additive! Even if you make all variables constants, the framerate only increases slightly above 5.6 fps. It is not clear why this happens, but I have a guess: it seems that when the complexity is lowered between a certain treshold, the shader code execution speed increases sharply. My guess is that for complex code, the shader runs out of free registers and needs to perform calculations using a slower kind of memory storage.

Interestingly, the ‘iterations’ variable offers no speedup – even though the compiler must be able to unroll the principal DE loop, there is no measurable improvement by doing it.

Finally, the compile time is also greatly reduced when making variables constant. For the ‘Dodecahedron.frag’ code, the compile time is ~2000ms with no constants. By making most variables constant, the compile time is lowered to around ~335ms on my system.

Locking in Fragmentarium.

In Fragmentarium variables can be locked (made compile-time constant) by clicking the padlock next to them. Locked variables appear with a yellow padlock next to them. When a variable is locked, any changes to it will first be executed when the system is compiled (by pressing ‘build’). Locked variables, which have been changes, will appear with a yellow background until the system is compiled, and the changes are executed.

Notice, that whole parameter groups may be locked, by using the buttons at the bottom.

The locking interface – click to enlarge.

The ‘AntiAlias’ and ‘DetailAO’ variables are locked. The ‘DetailAO’ has been changed, but the changes are not executed yet (the yellow background). The ‘BoundingSphere’ variable has a grey background, because it has keyboard focus: its value can be finetuned using the arrow keys (up/down controls step size, left/right changes value).

In a fragment, a user variable can be marked as locked by default, by adding a ‘locked’ keyword to it:
uniform float Scale; slider[-5.00,2.0,4.00] Locked

Some variables can not be locked – e.g. the camera settings. It is possible to mark such variables by the ‘NotLockable’ keyword:
uniform vec3 Eye; slider[(-50,-50,-50),(0,0,-10),(50,50,50)] NotLockable

The same goes for presets. Here the locking mode can be stated, if it is different from the default locking mode:
#preset SomeName AntiAlias = 1 NotLocked Detail = -2.81064 Locked Offset = 1,1,1 ... #endpreset

Locking will be part of Fragmentarium v0.9, which will be released soon.

Plotting High-frequency Functions Using a GPU.

July 2, 2011Fragmentarium, GPUMikael Hvidtfeldt Christensen

A slight digression from the world of fractals and generative art: This post is about drawing high-quality graphs of high-frequency functions.

Yesterday, I needed to draw a few graphs of some simple functions. I started out by using Iñigo Quilez’s nice little GraphToy, but my functions could not be expressed in a single line. So I decided to implement a graph plotter example in Fragmentarium instead.

Plotting a graph using a GLSL shader is not an obvious task – you have to frame the problem in a way, such that each pixel can be processed individually. This is in contrast to the standard way of drawing graphs – where you choose a uniform set of values for the x-axis, and draw the lines connecting the points in the (x,f(x)) set.

So how do you do it for each pixel individually?

The first thing to realize is, that it is easy to determine whether a pixel is above or below the graph – this can be done by checking whether y<f(x) or y>f(x). The tricky part is, that we only want to draw the boundary – the curve that separates the regions above and below the graph.

So how do we determine the boundary? After having tried a few different approaches, I came up with the following simple edge detection procedure: for each pixel, choose a number of samples, in a region around the pixel center. Then count how many samples are above, and how many samples are below the curve.

If all samples are above, or all samples are below, the pixel is not on the boundary. However, if there are samples both above and below, the boundary must be passing through the pixel.

The whole idea can be expressed in a few lines of code:

for (float i = 0.0; i < samples; i++) {
	for (float  j = 0.0;j < samples; j++) {
		float f = function(pos.x+ i*step.x)-(pos.y+ j*step.y);
		count += (f>0.) ? 1 : -1;
	}
}
// base color on abs(count)/(samples*samples)

It should be noted, that the sampling can be improved by adding a small amount of jittering (random offsets) to the positions – this reduces the aliasing at the cost of adding a small amount of noise.

Highfrequency functions and aliasing

So why it this better than the common ‘connecting line’ approach?

Because this approach deals with the high-frequency information much better.

Consider the function f(x)=sin(x*x*x)*sin(x).
Here is a plot from GraphToy:

Notice how the graph near the red arrows seem to be slowly varying. This is not the true behavior of function, but an artifact of the way we sample our function. Our limited resolution cannot capture the high frequency components, which results in aliasing.

Whenever you do anything media-related on a computer, you will at some point run into problems with aliasing: whether you are doing sound synthesis, image manipulation, 3D rendering or even drawing a straight line.

However, using the pixel shader approach, aliasing is much easier to avoid. Here is a Fragmentarium plot of the same function:

Even though it may seem backwards to evaluate the function for all pixels on the screen, it makes it possible to tame the aliasing, and even on a modest GPU, the procedure is fast enough for realtime interactions.

The example is included in GitHub under Examples/2D Systems/GraphPlotter.frag.

Distance Estimated 3D Fractals (Part I)

June 3, 2011Fractals, Fragmentarium, GPUMikael Hvidtfeldt Christensen

During the last two years, the 3D fractal field has undergone a small revolution: the Mandelbulb (2009), the Mandelbox (2010), The Kaleidoscopic IFS’s (2010), and a myriad of equally or even more interesting hybrid systems, such as Spudsville (2010) or the Kleinian systems (2011).

All of these systems were made possible using a technique known as Distance Estimation and they all originate from the Fractal Forums community.

Overview of the posts

Part I briefly introduces the history of distance estimated fractals, and discuss how a distance estimator can be used for ray marching.

Part II discuss how to find surface normals, and how to light and color fractals.

Part III discuss how to actually create a distance estimator, starting with distance fields for simple geometric objects, and talking about instancing, combining fields (union, intersections, and differences), and finally talks about folding and conformal transformation, ending up with a simple fractal distance estimator.

Part IV discuss the holy grail: the search for generalization of the 2D (complex) Mandelbrot set, including Quaternions and other hypercomplex numbers. A running derivative for quadratic systems is introduced.

Part V continues the discussion about the Mandelbulb. Different approaches for constructing a running derivative is discussed: scalar derivatives, Jacobian derivatives, analytical solutions, and the use of different potentials to estimate the distance.

Part VI is about the Mandelbox fractal. A more detailed discussion about conformal transformations, and how a scalar running derivative is sufficient, when working with these kind of systems.

Part VII discuss how dual numbers and automatic differentation may used to construct a distance estimator.

Part VIII is about hybrid fractals, geometric orbit traps, various other systems, and links to relevant software and resources.

The background

The first paper to introduce Distance Estimated 3D fractals was written by Hart and others in 1989:
Ray tracing deterministic 3-D fractals

In this paper Hart describe how Distance Estimation may be used to render a Quaternion Julia 3D fractal. The paper is very well written and definitely worth spending some hours on (be sure to take a look at John Hart’s other papers as well). Given the age of Hart’s paper, it is striking that is not until the last couple of years that the field of distance estimated 3D fractals has exploded. There has been some important milestones, such as Keenan Crane’s GPU implementation (2004), and Iñigo Quilez 4K demoscene implementation (2007), but I’m not aware of other fractal systems being explored using Distance Estimation, before the advent of the Mandelbulb.

Raymarching

Classic raytracing shoots one (or more) rays per pixel and calculate where the rays intersect the geometry in the scene. Normally the geometry is described by a set of primitives, like triangles or spheres, and some kind of spatial acceleration structure is used to quickly identify which primitives intersect the rays.

Distance Estimation, on the other hand, is a ray marching technique.

Instead of calculating the exact intersection between the camera ray and the geometry, you proceed in small steps along the ray and check how close you are to the object you are rendering. When you are closer than a certain threshold, you stop. In order to do this, you must have a function that tells you how close you are to the object: a Distance Estimator. The value of the distance estimator tells you how large a step you are allowed to march along the ray, since you are guaranteed not to hit anything within this radius.

Schematics of ray marching using a distance estimator.

The code below shows how to raymarch a system with a distance estimator:

float trace(vec3 from, vec3 direction) {
	float totalDistance = 0.0;
	int steps;
	for (steps=0; steps < MaximumRaySteps; steps++) {
		vec3 p = from + totalDistance * direction;
		float distance = DistanceEstimator(p);
		totalDistance += distance;
		if (distance < MinimumDistance) break;
	}
	return 1.0-float(steps)/float(MaxRaySteps);
}

Here we simply march the ray according to the distance estimator and return a greyscale value based on the number of steps before hitting something. This will produce images like this one (where I used a distance estimator for a Mandelbulb):

It is interesting that even though we have not specified any coloring or lighting models, coloring by the number of steps emphasizes the detail of the 3D structure - in fact, this is an simple and very cheap form of the Ambient Occlusion soft lighting often used in 3D renders.

Parallelization

Another interesting observation is that these raymarchers are trivial to parallelise, since each pixel can be calculated independently and there is no need to access complex shared memory structures like the acceleration structure used in classic raytracing. This means that these kinds of systems are ideal candidates for implementing on a GPU. In fact the only issue is that most GPU's still only supports single precision floating points numbers, which leads to numerical inaccuracies faster than for the CPU implementations. However, the newest generation of GPU's support double precision, and some API's (such as OpenCL and Pixel Bender) are heterogeneous, meaning the same code can be executed on both CPU and GPU - making it possible to create interactive previews on the GPU and render final images in double precision on the CPU.

Estimating the distance

Now, I still haven't talked about how we obtain these Distance Estimators, and it is by no means obvious that such functions should exist at all. But it is possible to intuitively understand them, by noting that systems such as the Mandelbulb and Mandelbox are escape-time fractals: we iterate a function for each point in space, and follow the orbit to see whether the sequence of points diverge for a maximum number of iterations, or whether the sequence stays inside a fixed escape radius. Now, by comparing the escape-time length (r), to its spatial derivative (dr), we might get an estimate of how far we can move along the ray before the escape-time length is below the escape radius, that is:

\(DE = \frac{r-EscapeRadius }{dr}\)

This is a hand-waving estimate - the derivative might fluctuate wildly and get larger than our initial value, so a more rigid approach is needed to find a proper distance estimator. I'll a lot more to say about distance estimators inthe later posts, so for now we will just accept that these function exists and can be obtained for quite a diverse class of systems, and that they are often constructed by comparing the escape-time length with some approximation of its derivative.

It should also be noticed that this ray marching approach can be used for any kinds of systems, where you can find a lower bound for the closest geometry for all points in space. Iñigo Quilez has used this in his impressive procedural SliseSix demo, and has written an excellent introduction, which covers many topics also relevant for Distance Estimation of 3D fractals.

This concludes the first part of this series of blog entries. Part II discusses lighting and coloring of fractals.