Distance Estimated 3D Fractals (III): Folding Space

The previous posts (part I, part II) introduced the basics of rendering DE (Distance Estimated) systems, but left out one important question: how do we create the distance estimator function?

Drawing spheres

Remember that a distance estimator is nothing more than a function, that for all points in space returns a length smaller than (or equal to) the distance to the closest object. This means we are safe to march at least this step length without hitting anything – and we use this information to speed up the ray marching.

It is fairly easy to come up with distance estimators for most simple geometric shapes. For instance, let us start by a sphere. Here are three different ways to calculate the distance from a point in space, p, to a sphere with radius R:

(1) DE(p) = max(0.0, length(p)-R)  // solid sphere, zero interior
(2) DE(p) = length(p)-R // solid sphere, negative interior
(3) DE(p) = abs(length(p)-R)    // hollow sphere shell

From the outside all of these look similar. But (3) is hollow – we would be able to position the camera inside it, and it would look different if intersected with other objects.

What about the first two? There is actually a subtle difference: the common way to find the surface normal, is to sample the DE function close to the camera ray/surface intersection. But if the intersection point is located very close to the surface (for instance exactly on it), we might sample the DE inside the sphere. And this will lead to artifacts in the normal vector calculation for (1) and (3). So, if possible use signed distance functions. Another way to avoid this, is to backstep along the camera ray a bit before calculating the surface normal (or to add a ray step multiplier less than 1.0).


From left to right: Sphere (1), with normal artifacts because the normal was not backstepped. Sphere (2) with perfect normals. Sphere (3) drawn with normal backstepping, and thus perfect normals. The last row shows how the spheres look when cut open.

Notice that distance estimation only tells the distance from a point to an object. This is in contrast to classic ray tracing, which always is about finding the distance from a point to a given object along a line. The formulas for ray-object intersection in classic ray tracing are thus more complex, for instance the ray-sphere intersection involves solving a quadratic equation. The drawback of distance estimators is that multiple ray steps are needed, even for simple objects like spheres.

Combining objects

Distance fields have some nice properties. For instance, it is possible to combine two distance fields using a simple minimum(a,b) operator. As an example we could draw the union of two spheres the following way:

DE(p) = min( length(p)-1.0 , length(p-vec3(2.0,0.0,0.0))-1.0 );

This would give us two spheres with unit radius, one centered at origo, and another at (2,0,0). The same way it is possible to calculate the intersection of two objects, by taking the maximum value of the fields. Finally, if you are using signed distance functions, it is possible to subtract one shape from another by inverting one of the fields, and calculating the intersection (i.e. taking max(A, -B)).

So now we have a way to combine objects. And it is also possible to apply local transformations, to get interesting effects:


This image was created by combining the DE’s of a ground plane and two tori while applying a twisting deformation to the tori.

Rendering of (non-fractal) distance fields is described in depth in this paper by Hart: Sphere Tracing: A Geometric Method for the Antialiased Ray Tracing of Implicit Surfaces. This paper also describes distance estimators for various geometric objects, such as tori and cones, and discuss deformations in detail. Distance field techniques have also been adopted by the demoscene, and Iñigo Quilez’s introduction contains a lot of information. (Update: Quilez has created a visual reference page for distance field primitives and transformations)

Building Complexity

This is all nice, but even if you can create interesting structures, there are some limitations. The above method works fine, but scales very badly when the number of distance fields to be combined increases. Creating a scene with 1000 spheres by finding the minimum of the 1000 fields would already become too slow for real-time purposes. In fact ordinary ray tracing scales much better – the use of spatial acceleration structures makes it possible for ordinary ray tracers to draw scenes with millions of objects, something that is far from possible using the “find minimum of all object fields” distance field approach sketched above.

But fractals are all about detail, and endless complexity, so how do we proceed?

It turns out that there are some tricks, that makes it possible to add complexity in ways that scales much better.

First, it is possible to reuse (or instance) objects using e.g. the modulo-operator. Take a look at the following DE:

float DE(vec3 z)
{
  z.xy = mod((z.xy),1.0)-vec3(0.5); // instance on xy-plane
  return length(z)-0.3;             // sphere DE
}

Which generates this image:

Now we are getting somewhere. Tons of detail, at almost no computational cost. Now we only need to make it more interesting!

A Real Fractal

Let us continue with the first example of a real fractal: the recursive tetrahedron.

A tetrahedron may be described as a polyhedron with vertices (1,1,1),(-1,-1,1),(1,-1,-1),(-1,1,-1). Now, for each point in space, lets us take the vertex closest to it, and scale the system by a factor of 2.0 using this vertex as center, and then finally return the distance to the point where we end, after having repeated this operation. Here is the code:

float DE(vec3 z)
{
	vec3 a1 = vec3(1,1,1);
	vec3 a2 = vec3(-1,-1,1);
	vec3 a3 = vec3(1,-1,-1);
	vec3 a4 = vec3(-1,1,-1);
	vec3 c;
	int n = 0;
	float dist, d;
	while (n < Iterations) {
		 c = a1; dist = length(z-a1);
	        d = length(z-a2); if (d < dist) { c = a2; dist=d; }
		 d = length(z-a3); if (d < dist) { c = a3; dist=d; }
		 d = length(z-a4); if (d < dist) { c = a4; dist=d; }
		z = Scale*z-c*(Scale-1.0);
		n++;
	}

	return length(z) * pow(Scale, float(-n));
}

Which results in the following image:

Our first fractal! Even though we do not have the infinite number of objects, like the mod-example above, the number of objects grow exponentially as we crank up the number of iterations. In fact, the number of objects is equal to 4^Iterations. Just ten iterations will result in more than a million objects - something that is easily doable on a GPU in realtime! Now we are getting ahead of the standard ray tracers.

Folding Space

But it turns out that we can do even better, using a clever trick by utilizing the symmetries of the tetrahedron.

Now, instead of scaling about the nearest vertex, we could use the mirror points in the symmetry planes of the tetrahedron, to make sure that we arrive at the same "octant" of the tetrahedron - and then always scale from the vertex it contains.

The following illustration tries to visualize this:

The red point at the top vertex is the scaling center at (1,1,1). Three symmetry planes of the tetrahedron have been drawn in red, green, and blue. By mirroring points if they are on the wrong side (the non-white points) of plane, we will ensure they get mapped to the white "octant". The operation of mirroring a point, if it is on one side of a plane, is called a 'folding operation' or just a fold.

Here is the code:

float DE(vec3 z)
{
    float r;
    int n = 0;
    while (n < Iterations) {
       if(z.x+z.y<0) z.xy = -z.yx; // fold 1
       if(z.x+z.z<0) z.xz = -z.zx; // fold 2
       if(z.y+z.z<0) z.zy = -z.yz; // fold 3
       z = z*Scale - Offset*(Scale-1.0);
       r = dot(z, z);
       n++;
    }
    return (length(z) ) * pow(Scale, -float(n));
}

These folding operations shows up in several fractals. A fold in a general plane with normal n can be expressed as:

float t = dot(z,n1); if (t<0.0) { z-=2.0*t*n1; }

or in a optimized version (due to AndyAlias):

z-=2.0 * min(0.0, dot(z, n1)) * n1;

Also notice that folds in the xy, xz, or yz planes may be expressed using the 'abs' operator.

That was a lot about folding operations, but the really interesting stuff happens when we throw rotations into the system. This was first introduced by Knighty in the Fractal Forum's thread Kaleidoscopic (escape time) IFS. The thread shows recursive versions of all the Platonic Solids and the Menger Sponge - including the spectacular forms that arise when inserting rotations and translations into the system.

The Kaleidoscopic IFS fractals are in my opinion some of the most interesting 3D fractals ever discovered (or created if you are not a mathematical platonist). Here are some examples of forms that may arise from a system with icosahedral symmetry:

Here the icosahedral origin might be evident, but it possible to tweak these structures beyond any recognition of their origin. Here are a few more examples:




Knighty's fractals are composed using a small set of transformations: scalings, translations, plane reflections (the conditional folds), and rotations. The folds are of course not limited to the symmetry planes of the Platonic Solids, all planes are possible.

The transformations mentioned above all belong to the group of conformal (angle preserving) transformations. It is sometimes said (on Fractal Forums) that for 'true' fractals the transformations must be conformal, since non-conformal transformations tend to stretch out detail and create a 'whipped cream' look, which does not allow for deep zooms. Interestingly, according to Liouville's theorem there are not very many possible conformal transformations in 3D. In fact, if I read the theorem correctly, the only possible conformal 3D transformations are the ones above and the sphere inversions.

Part IV discusses how to arrive at Distance Estimators for the fractals such as the Mandelbulb, which originates in attempts to generalize the Mandelbrot formula to three dimensions: the so-called search for the holy grail in 3D fractals.

Distance Estimated 3D Fractals (II): Lighting and Coloring

The first post discussed how to find the intersection between a camera ray and a fractal, but did not talk about how to color the object. There are two steps involved here: setting up a coloring scheme for the fractal object itself, and the shading (lighting) of the object.

Lights and shading

Since we are raymarching our objects, we can use the standard lighting techniques from ray tracing. The most common form of lightning is to use something like Blinn-Phong, and calculate approximated ambient, diffuse, and specular light based on the position of the light source and the normal of the fractal object.

Surface Normal

So how do we obtain a normal of a fractal surface?

A common method is to probe the Distance Estimator function in small steps along the coordinate system axis and use the numerical gradient obtained from this as the normal (since the normal must point in the direction where the distance field increase most rapidly). This is an example of the finite difference method for numerical differentiation. The following snippet shows how the normal may be calculated:

vec3 n = normalize(vec3(DE(pos+xDir)-DE(pos-xDir),
DE(pos+yDir)-DE(pos-yDir),
DE(pos+zDir)-DE(pos-zDir)));

The original Hart paper also suggested that alternatively, the screen space depth buffer could be used to determine the normal – but this seems to be both more difficult and less accurate.

Finally, as fpsunflower noted in this thread it is possible to use Automatic Differentiation with dual numbers, to obtain a gradient without having to introduce an arbitrary epsilon sampling distance.

Ambient Occlusion

Besides the ambient, diffuse, and specular light from Phong-shading, one thing that really improves the quality and depth illusion of a 3D model is ambient occlusion. In my first post, I gave an example of how the number of ray steps could be used as a very rough measure of how occluded the geometry is (I first saw this at Subblue’s site – his Quaternion Julia page has some nice illustrations of this effect). This ‘ray step AO‘ approach has its shortcomings though: for instance, if the camera ray is nearly parallel to a surface (a grazing incidence) a lot of steps will be used, and the surface will be darkened, even if it is not occluded at all.

Another approach is to sample the Distance Estimator at points along the normal of the surface and use this information to put together a measure for the Ambient Occlusion. This is a more intuitive method, but comes with some other shortcomings – i.e. new parameters are needed to control the distance between the samplings and their relative weights with no obvious default settings. A description of this ‘normal sampling AO‘ approach can be found in Iñigo Quilez’s introduction to distance field rendering.

In Fragmentarium, I’ve implemented both methods: The ‘DetailAO’ parameter controls the distance at which the normal is sampled for the ‘normal sampling AO’ method. If ‘DetailAO’ is set to zero, the ‘ray step AO’ method is used.

Other lighting effects

Besides Phong shading and ambient occlusion, all the usual tips and tricks in ray tracing may be applied:

Glow – can be added simply by mixing in a color based on the number of ray steps taken (points close to the fractal will use more ray steps, even if they miss the fractal, so pixels close to the object will glow).

Fog - is also great for adding to the depth perception. Simply blend in the background color based on the distance from the camera.

Hard shadows are also straight forward – check if the ray from the surface point to the light source is occluded.

Soft shadows: Iñigo Quilez has a good description of doing softened shadows.

Reflections are pretty much the same – reflect the camera ray in the surface normal, and mix in the color of whatever the reflected ray hits.

The effects above are all implemented in Fragmentarium as well. Numerous other extensions could be added to the raytracer: for example, environment mapping using HDRI panoramic maps provides very natural lighting and is easy to apply for the user, simulated depth-of-field also adds great depth illusion to an image, and can be calculated in reasonable time and quality using screen space buffers, and more complex materials could also be added.

Coloring

Fractal objects with a uniform base color and simple colored light sources can produce great images. But algorithmic coloring is a powerful tool for bringing the fractals to life.

Algorithmic color use one or more quantities, determined by looking at the orbit or the escape point or time.

Orbit traps is a popular way to color fractals. This method keeps track of how close the orbit comes to a chosen geometric object. Typical traps include keeping track of the minimum distance to the coordinate system center, or to simple geometric shapes like planes, lines, or spheres. In Fragmentarium, many of the systems use a 4-component vector to keep track of the minimum distance to the three x=0, y=0, and z=0 planes and to the distance from origo. These are mapped to color through the X,Y,Z, and R parameters in the ‘Coloring’ tab.

The iteration count is the number of iterations it takes before the orbit diverges (becomes larger than the escape radius). Since this is an integer number it is prone to banding, which is discussed later in this post. One way to avoid this is by using a smooth fractional iteration count:

float smooth = float(iteration)
+ log(log(EscapeRadiusSquared))/log(Scale)
- log(log(dot(z,z)))/log(Scale);

(For a derivation of this quantity, see for instance here)

Here ‘iteration’ is the number of iterations, and dot(z,z) is the square of the escape time length. There are a couple of things to notice. First, the formula involves a characteristic scale, referring to the scaling factor in the problem (e.g. 2 for a standard Mandelbrot, 3 for a Menger). It is not always possible to obtain such a number (e.g. for Mandelboxes or hybrid systems). Secondly, if the smooth iteration count is used to lookup a color in a palette, offset may be ignored, which means the second term can be dropped. Finally, which ‘log’ functions should be used? This does not matter if only they are used consistently: since all different log functions are proportional, the ratio of two logs does not depend on the base used. For the inner logs (e.g. log(dot(,z))), changing the log will result in a constant offset to the overall term, so again this will just result in an offset in the palette lookup.


The lower half of this image use a smooth iteration count.

Conditional Path Coloring

(I made this name up – I’m not sure there is an official name, but I’ve seen the technique used several times in Fractal Forums posts.)

Some fractals may have conditional branches inside their iteration loop (sometimes disguised as an ‘abs’ operator). The Mandelbox is a good example: the sphere fold performs different actions depending on whether the length of the iterated point is smaller or larger than a set threshold. This makes it possible to keep track of a color variable, which is updated depending on the path taken.

Many other types of coloring are also possible, for example based on the normal of the surface, spherical angles of the escape time points, and so on. Many of the 2D fractal coloring types can also be applied to 3D fractals. UltraFractal has a nice list of 2D coloring types.

Improving Quality

Some visual effects and colorings, are based on integer quantities – for example glow is based on on the number of ray steps. This will result in visible boundaries between the discrete steps, an artifact called banding.

The smooth iteration count introduced above is one way to get rid of banding, but it is not generally applicable. A more generic approach is to add some kind of noise into the system. For instance, by scaling the length of the first ray step for each pixel by a random number, the banding will disappear – at the cost of introducing some noise.

Personally, I much prefer noise to banding – in fact I like the noisy, grainy look, but that is a matter of preference.

Another important issue is aliasing: if only one ray is traced per pixel, the image is prone to aliasing and artifacts. Using more than one sample will remove aliasing and reduce noise. There are many ways to oversample the image – different strategies exist for choosing the samples in a way that optimizes the image quality and there are different ways of weighting (filtering) the samples for each pixel. Physical Based Rendering has a very good chapter on sampling and filtering for ray tracing, and this particular chapter is freely available here:

In Fragmentarium there is some simple oversampling built-in – by setting the ‘AntiAlias’ variable, a number of samples are chosen (on a uniform grid). They are given the same weight (box filtered). I usually only use this for 2D fractals – because they render faster, which allows for a high number of samples. For 3D renders, I normally render a high resolution image, and downscale it in a image editing program – this seems to create better quality images for the same number of samples.

Part III discusses how to derive and work with Distance Estimator functions.

Optimizing GLSL Code

By making selected variables constant at compile time, some 3D fractals render more than four times faster. Support for easily locking variables has been added to Fragmentarium.

Some time ago, I became aware that the raytracer in Fragmentarium was somewhat slower than both Fractal Labs and Boxplorer for similar systems – this was somewhat puzzling since the DE raycasting technique is pretty much the same. After a bit of investigation, I realized that my standard raytracer had grown slower and slower, as new features had been added (e.g. reflections, hard shadows, and floor planes) – even if the features were turned off!

One way to speed up GLSL code, is by marking some variables constant at compile-time. This way the compiler may optimize code (e.g. unroll loops) and remove unused code (e.g. if hard shadows are disabled). The drawback is that changing these constant variables requires that the GLSL code is compiled again.

It turned out that this does have a great impact on some systems. For instance for the ‘Dodecahedron.frag’, take a look at the following render times:

No constants: 1.4 fps (1.0x)
Constant rotation matrices : 3.4 fps (2.4x)
Constant rotation matrices + Anti-alias + DetailAO: 5.6 fps (4.0x)
All 38 parameters (except camera): 6.1 fps (4.4x)

The fractal rotation matrices are the matrices used inside the DE-loop. Without the constant declarations, they must be calculated from scratch for each pixel, even though they are identical for all pixels. Doing the calculation at compile-time gives a notable speedup of 2.4x (notice that another approach would be to calculate such frame constants in the vertex shader and pass them to the pixel shader as ‘varying’ variables. But according to this post this is – surprisingly – not very effective).

The next speedup – from the ‘Anti-alias’ and ‘DetailAO’ variables – is more subtle. It is difficult to see from the code why these two variables should have such impact. And in fact, it turns out that combinations of other variables will amount in the same speedup. But these speedups are not additive! Even if you make all variables constants, the framerate only increases slightly above 5.6 fps. It is not clear why this happens, but I have a guess: it seems that when the complexity is lowered between a certain treshold, the shader code execution speed increases sharply. My guess is that for complex code, the shader runs out of free registers and needs to perform calculations using a slower kind of memory storage.

Interestingly, the ‘iterations’ variable offers no speedup – even though the compiler must be able to unroll the principal DE loop, there is no measurable improvement by doing it.

Finally, the compile time is also greatly reduced when making variables constant. For the ‘Dodecahedron.frag’ code, the compile time is ~2000ms with no constants. By making most variables constant, the compile time is lowered to around ~335ms on my system.

Locking in Fragmentarium.

In Fragmentarium variables can be locked (made compile-time constant) by clicking the padlock next to them. Locked variables appear with a yellow padlock next to them. When a variable is locked, any changes to it will first be executed when the system is compiled (by pressing ‘build’). Locked variables, which have been changes, will appear with a yellow background until the system is compiled, and the changes are executed.

Notice, that whole parameter groups may be locked, by using the buttons at the bottom.

Screenshot

The locking interface - click to enlarge.


The ‘AntiAlias’ and ‘DetailAO’ variables are locked. The ‘DetailAO’ has been changed, but the changes are not executed yet (the yellow background). The ‘BoundingSphere’ variable has a grey background, because it has keyboard focus: its value can be finetuned using the arrow keys (up/down controls step size, left/right changes value).

In a fragment, a user variable can be marked as locked by default, by adding a ‘locked’ keyword to it:

uniform float Scale; slider[-5.00,2.0,4.00] Locked

Some variables can not be locked – e.g. the camera settings. It is possible to mark such variables by the ‘NotLockable’ keyword:

uniform vec3 Eye; slider[(-50,-50,-50),(0,0,-10),(50,50,50)] NotLockable

The same goes for presets. Here the locking mode can be stated, if it is different from the default locking mode:

#preset SomeName
AntiAlias = 1 NotLocked
Detail = -2.81064 Locked
Offset = 1,1,1
...
#endpreset

Locking will be part of Fragmentarium v0.9, which will be released soon.

Syntopia Blog Update

It has not been possible to post comments at my blog for some months. Apparently, my reCAPTCHA plugin was broken (amazingly, spam comments still made their way into the moderation queue).

This should be fixed now.

I’m also on twitter now: @SyntopiaDK, where I’ll post links and news releated to generative systems, 3D fractals, or whatever pops up.

Finally, if you are near Stockholm, some of my images are on display at a small gallery (from July 9th to September 11th): Kungstensgatan 27.

Plotting High-frequency Functions Using a GPU.

A slight digression from the world of fractals and generative art: This post is about drawing high-quality graphs of high-frequency functions.

Yesterday, I needed to draw a few graphs of some simple functions. I started out by using Iñigo Quilez’s nice little GraphToy, but my functions could not be expressed in a single line. So I decided to implement a graph plotter example in Fragmentarium instead.

Plotting a graph using a GLSL shader is not an obvious task – you have to frame the problem in a way, such that each pixel can be processed individually. This is in contrast to the standard way of drawing graphs – where you choose a uniform set of values for the x-axis, and draw the lines connecting the points in the (x,f(x)) set.

So how do you do it for each pixel individually?

The first thing to realize is, that it is easy to determine whether a pixel is above or below the graph – this can be done by checking whether y<f(x) or y>f(x). The tricky part is, that we only want to draw the boundary – the curve that separates the regions above and below the graph.

So how do we determine the boundary? After having tried a few different approaches, I came up with the following simple edge detection procedure: for each pixel, choose a number of samples, in a region around the pixel center. Then count how many samples are above, and how many samples are below the curve.

If all samples are above, or all samples are below, the pixel is not on the boundary. However, if there are samples both above and below, the boundary must be passing through the pixel.

The whole idea can be expressed in a few lines of code:

for (float i = 0.0; i < samples; i++) {
	for (float  j = 0.0;j < samples; j++) {
		float f = function(pos.x+ i*step.x)-(pos.y+ j*step.y);
		count += (f>0.) ? 1 : -1;
	}
}
// base color on abs(count)/(samples*samples)

It should be noted, that the sampling can be improved by adding a small amount of jittering (random offsets) to the positions – this reduces the aliasing at the cost of adding a small amount of noise.

Highfrequency functions and aliasing

So why it this better than the common ‘connecting line’ approach?

Because this approach deals with the high-frequency information much better.

Consider the function f(x)=sin(x*x*x)*sin(x).
Here is a plot from GraphToy:

Notice how the graph near the red arrows seem to be slowly varying. This is not the true behavior of function, but an artifact of the way we sample our function. Our limited resolution cannot capture the high frequency components, which results in aliasing.

Whenever you do anything media-related on a computer, you will at some point run into problems with aliasing: whether you are doing sound synthesis, image manipulation, 3D rendering or even drawing a straight line.

However, using the pixel shader approach, aliasing is much easier to avoid. Here is a Fragmentarium plot of the same function:

Even though it may seem backwards to evaluate the function for all pixels on the screen, it makes it possible to tame the aliasing, and even on a modest GPU, the procedure is fast enough for realtime interactions.

The example is included in GitHub under Examples/2D Systems/GraphPlotter.frag.

Distance Estimated 3D Fractals (Part I)

During the last two years, the 3D fractal field has undergone a small revolution: the Mandelbulb (2009), the Mandelbox (2010), The Kaleidoscopic IFS’s (2010), and a myriad of equally or even more interesting hybrid systems, such as Spudsville (2010) or the Kleinian systems (2011).

All of these systems were made possible using a technique known as Distance Estimation and they all originate from the Fractal Forums community.

The background

The first paper to introduce Distance Estimated 3D fractals was written by Hart and others in 1989:
Ray tracing deterministic 3-D fractals

In this paper Hart describe how Distance Estimation may be used to render a Quaternion Julia 3D fractal. The paper is very well written and definitely worth spending some hours on (be sure to take a look at John Hart’s other papers as well). Given the age of Hart’s paper, it is striking that is not until the last couple of years that the field of distance estimated 3D fractals has exploded. There has been some important milestones, such as Keenan Crane’s GPU implementation (2004), and Iñigo Quilez 4K demoscene implementation (2007), but I’m not aware of other fractal systems being explored using Distance Estimation, before the advent of the Mandelbulb.

Raymarching

Classic raytracing shoots one (or more) rays per pixel and calculate where the rays intersect the geometry in the scene. Normally the geometry is described by a set of primitives, like triangles or spheres, and some kind of spatial acceleration structure is used to quickly identify which primitives intersect the rays.

Distance Estimation, on the other hand, is a ray marching technique.

Instead of calculating the exact intersection between the camera ray and the geometry, you proceed in small steps along the ray and check how close you are to the object you are rendering. When you are closer than a certain threshold, you stop. In order to do this, you must have a function that tells you how close you are to the object: a Distance Estimator. The value of the distance estimator tells you how large a step you are allowed to march along the ray, since you are guaranteed not to hit anything within this radius.

Schematics of ray marching using a distance estimator.

The code below shows how to raymarch a system with a distance estimator:

float trace(vec3 from, vec3 direction) {
	float totalDistance = 0.0;
	int steps;
	for (steps=0; steps < MaximumRaySteps; steps++) {
		vec3 p = from + totalDistance * direction;
		float distance = DistanceEstimator(p);
		totalDistance += distance;
		if (distance < MinimumDistance) break;
	}
	return 1.0-float(steps)/float(MaxRaySteps);
}

Here we simply march the ray according to the distance estimator and return a greyscale value based on the number of steps before hitting something. This will produce images like this one (where I used a distance estimator for a Mandelbulb):

It is interesting that even though we have not specified any coloring or lighting models, coloring by the number of steps emphasizes the detail of the 3D structure - in fact, this is an simple and very cheap form of the Ambient Occlusion soft lighting often used in 3D renders.

Parallelization

Another interesting observation is that these raymarchers are trivial to parallelise, since each pixel can be calculated independently and there is no need to access complex shared memory structures like the acceleration structure used in classic raytracing. This means that these kinds of systems are ideal candidates for implementing on a GPU. In fact the only issue is that most GPU's still only supports single precision floating points numbers, which leads to numerical inaccuracies faster than for the CPU implementations. However, the newest generation of GPU's support double precision, and some API's (such as OpenCL and Pixel Bender) are heterogeneous, meaning the same code can be executed on both CPU and GPU - making it possible to create interactive previews on the GPU and render final images in double precision on the CPU.

Estimating the distance

Now, I still haven't talked about how we obtain these Distance Estimators, and it is by no means obvious that such functions should exist at all. But it is possible to intuitively understand them, by noting that systems such as the Mandelbulb and Mandelbox are escape-time fractals: we iterate a function for each point in space, and follow the orbit to see whether the sequence of points diverge for a maximum number of iterations, or whether the sequence stays inside a fixed escape radius. Now, by comparing the escape-time length (r), to its spatial derivative (dr), we might get an estimate of how far we can move along the ray before the escape-time length is below the escape radius, that is:

distance=(r-EscapeRadius)/dr.

This is a hand-waving estimate - the derivative might fluctuate wildly and get larger than our initial value, so a more rigid approach is needed to find a proper distance estimator. I'll a lot more to say about distance estimators in part III, so for now we will just accept that these function exists and can be obtained for quite a diverse class of systems, and that they are often constructed by comparing the escape-time length with some approximation of its derivative.

It should also be noticed that this ray marching approach can be used for any kinds of systems, where you can find a lower bound for the closest geometry for all points in space. Iñigo Quilez has used this in his impressive procedural SliseSix demo, and has written an excellent introduction, which covers many topics also relevant for Distance Estimation of 3D fractals.

This concludes the first part of this series of blog entries. Part II discusses lighting and coloring of fractals and Part III discuss the DE-function and how to arrive at it.

Hybrid 3D Fractals

A lot of great images have been made of the Mandelbulb, the Mandelbox, and the various kaleidoscopic IFS’s (the non-platonic non-solids). And it turns out that by combining these formulas (and stirring a few assorted functions into the mix), a variety of new, amazing, and surprising forms emerge.

I’m currently working on making it easier to combine different formulas in Fragmentarium – but until I get something released, here is a collection of images and movies created by Mandelbulb 3D (Windows, free) and Mandelbulber (Windows, free, open-source), that illustrates the beauty and diversity of these hybrid systems. Be sure to view the large versions by following the links. The images were all found at Fractal Forums.

Videos


Buddhi – Mandelbox and Flying Lights


Jérémie Brunet (Bib) – Weird Planet II


Jérémie Brunet (Bib) – Like in a dream II

Images


Lenord – Pray your Gods


Tomot – It’s a jungle out there


Lenord – J.A.R.


MarkJayBee – Security Mechanisms


Fractal00 – Alien Stones


Kr0mat1k – Restructuration


BrutalToad – Jülchen

Fragmentarium v0.8

I’ve released a new build of Fragmentarium with some much needed updates, including better camera control, high resolution renders, and animation.

New features in version 0.8:

  • The 3D camera has been rewritten: it is now a “first-person”, pinhole camera (like Boxplorer and Fractal Lab), and is controllable using mouse and keyboard. Camera view can now be saved together with other settings.
  • Arbitrary resolution renderings (using tile based rendering – the GPU won’t time out).
  • Preview modes (renders to FBO with lower resolution and rescales).
  • ‘Tile preview’ for previewing part of high-resolution renders.
  • Animation controller (experimental: no keyframes yet, you must animate using the system supplied ‘time’ variable. Animation is output as a sequence of still images).
  • Presets (group parameters settings and load them into a dropbox)
  • New fractals: QuaternionMandelbrot4D, Ducks, NewMenger.
  • Improved raytracer: dithering, fog, new coloring schemes.

Download it here: http://syntopia.github.com/Fragmentarium/get.html


High-resolution render of a 4D Quaternion Mandelbrot (click for large)


Samuel Monnier’s ‘Ducks’ Fractal has been added.


Mandelbrot/Julia type system now with embedded Mandelbrot map.

Fragmentarium test animation – click here for higher resolution.

GPU versus CPU for pixel graphics

After having gained a bit of experience with GPU shader programming during my Fragmentarium development, a natural question to ask is: how fast are these GPU’s?

This is not an easy question to answer, and it depends on the specific application. But I will try to give an answer for the kind of systems that I’m interested in: pixel graphics systems, where each pixel can be calculated independently of the others, such as raytraced 3D fractals.

Lets take my desktop computer, a fairly standard desktop machine, as an example. It is equipped with Nvidia Geforce 9800GT GPU @ 1.5 GHz, and a Intel Core 2 Quad Q8200 @ 2.33GHz.

How many processing unit are there?

Number of processing units (CPU): 4 CPU cores
Number of processing units (GPU): 112 Shader units

Based on these numbers, we might expect the GPU to be a factor of 28x times faster than the CPU. Of course, this totally ignores the efficiency and operating speed of the processing units. Lets try looking at the processing power in terms of maximum number of floating-point operations per second instead:

Theoretical Peak Performance

Both Intel and Nvidia list the GFLOPS (billion floating point operations per second) rating for their products. Intel’s list can be found here, and Nvidia’s here. For my system, I found the following numbers:

Performance (CPU): 37.3 GFLOPS
Performance (GPU): 504 GFLOPS

Based on these numbers, we might expect the GPU to be a factor of 14x times faster than the CPU. But what do these numbers really mean, and can they be compared? It turns out that these number are obtained by multiplying the processor frequency by the maximum number of instructions per clock cycle.

For the CPU, we have four cores. Now, when Intel calculate their numbers, they do it based on the special 128-bit SSE registers on every modern Pentium derived CPU. These extensions make it possible to handle two double precision floating point, or four single precision floating point numbers per clock cycle. And in fact there exists a special instruction – the MAD, or Multiply-Add, instruction – which allows for two arithmetic operations per clock cycle on each element in the SSE registers. These means Intel assume 4 (cores) x 2 (double precision floats) x 2 (MAD instructions) = 16 instructions per clock cycle. This gives the theoretical peak performance stated above:

Performance (CPU): 2.33 GHz * 4 * 2 * 2 = 37.3 GFLOPS (double precision floats)

What about the GPU? Here we have 112 independent processing units. On the GPU architecture an even more benchmarking-friendly instruction exists: the MAD+MUL which combines two multiplies and one addition in a single clock cycle. This means Nvidia assumes 112 (cores) * 3 (MAD+MUL instructions) = 336 instructions per clock cycle. Combining this with a stated processing frequency of 1.5 GHz, we arrive at the number stated earlier:

Performance (GPU): 1.5 GHz * 112 * 3 = 504 GFLOPS (single precision floats)

But wait… Nvidia’s number are for single precision floats – the Geforce 8800GT does not even support double precision floats. So for a fair comparison we should double Intel’s number, since the SSE extensions allows four simultaneous single precision numbers to be processed instead of two double precision floats. This way we get:

Performance (CPU): 2.33 GHz * 4 * 4 * 2 = 74.6 GFLOPS (single precision floats)

Now, using this as a guideline, we would expect my GPU to be a factor of 6.8x faster than my CPU. But we have some pretty big assumptions here: for instance, not many CPU programmers would write SSE-optimized code – and is a modern C++ compiler powerful enough to automatically take advantage of them anyway? And how often is the GPU able to use the three operation MUL+MAD instruction?

A real-world experiment

To find out I wrote a simple 2D Mandelbrot system and benchmarked it on the CPU and GPU. This is really the kind of computational tasks that I’m interested in: it is trivial to parallelize and is not memory-intensive, and the majority of executed code will be floating point arithmetics. I did not try to optimize the C++ code, because I wanted to see if the compiler was able to perform some SSE optimization for me. Here are the execution times:

13941 ms – CPU single precision (x87)
13941 ms – CPU double precision (x87)
10535 ms – CPU single precision (SSE)
11367 ms – CPU double precision (SSE)
424 ms – GPU single precision

(These numbers have some caveats – I did perform the tests multiple times and discarded the first few runs, but the CPU code was only single-threaded – so I assumed the numbers would scale perfectly and divided the execution times by four. Also, I verified by checking the generated assembly code, that SSE instructions indeed were used for the core Mandelbrot loop, when they were enabled.).

There are a couple of things to notice here: first, there is no difference between single and double precision on the CPU. This is as could be expected for the x87 compiled code (since the x87 defaults to 80-bit precision anyway), but for the SSE version, we would expect a double up in speed. As can be seen, the SSE code is really not very much more efficient the the x87 code – which strongly suggests that the compiler (here Visual Studio C++ 2008) is not very good at optimizing for SSE.

So for this example we got a factor of 25x speedup by using the GPU instead of the CPU.

“Measured” GFLOPS

Another questions is how this example compares to the theoretical peak performance. By using Nvidia’s Cg SDK I was able to get the GPU assembly code. Since I now could count the number of instruction in the main loop, and I knew how many iterations were performed, I was able to calculate the actual number of floating point operations per second:

GPU: 211 (Mandel)GFLOPS
CPU: 8.4 (Mandel)GFLOPS*

(*The CPU number was obtained by assuming the number of instructions in the core loop was the same as for the GPU: in reality, the CPU disassembly showed that the simple loop was partially unrolled to more than 200 lines of very complex assembly code.)

Compared to the theoretical maximum numbers of 504 GFLOPS and 74.6 GFLOPS respectively, this shows the GPU is much closer to its theoretical limit than the CPU.

GPU Caps Viewer – OpenCL

A second test was performed using the GPU Caps Viewer. This particular application includes a 4D Quaternion Julia fractal demo in OpenCL. This is interesting since OpenCL is a heterogeneous platform – it can be compiled to both CPU and GPU. And since Intel just released an alpha version of their OpenCL SDK, I could compare it to Nvidia’s SDK.

The results were interesting:

Intel OpenCL: ~5 fps
Nvidia OpenCL: ~35 fps

(The FPS did vary through the animation, so these numbers are not very accurate. There were no dedicated benchmark mode.)

This suggest that Intel’s OpenCL compiler is actually able to take advantage of the SSE instructions and provides a highly optimized output. Either that, or Nvidia’s OpenCL implementation is not very efficient (which is not likely).

The OpenCL based benchmark showed my GPU to be approximately 7x times faster than my CPU. Which is exactly the same as predicted by comparing the theoretical GFLOPS values (for single precision).

Conclusion

For normal code written in a high-level language like C or GLSL (multithreaded, single precision, and without explicit SSE instructions) the computational power is roughly equivalent to the number of cores or shader units. For my system this makes the GPU a factor of 25x faster.

Even though the CPU cores have higher operating frequency and in principle could execute more instructions via their SSE registers, this does not seem be fully utilized (and in fact, compiling with and without SSE optimization did not make a significant difference, even for this very simple example).

The OpenCL example tells another story: here the measured performance was proportional to the theoretical GFLOPS ratings. This is interesting since this indicate, that OpenCL could also be interesting for CPU-applications.

One thing to bear in mind is, that the examples tested here (the Mandelbrot and 4D Quaternion Julia) are very well-suited for GPU execution. For more complex code, with conditional branching, double precision floating point operations, and non-coalesced memory access, the CPU is much more efficient than the GPU. So for a desktop computer such as mine, a factor of 25x is probably the best you can hope for (and it is indeed a very impressive speedup for any kind of code).

It is also important to remember that GPU’s are not magical devices. They perform operations with a theoretical peak performance typically 5-15 times larger than a CPU. So whenever you see these 1000x speed up claims (e.g. some of the CUDA showcases), it is probably just an indication of a poor CPU implementation.

But even though the performance of GPU’s may be somewhat exaggerated you can still get a tremendous speedup. And GPU interfaces such as GLSL shaders are really simple to use: you do not need to deal explicitly with threads, you have built-in vectors and matrices, and you can compile GLSL code dynamically, during run-time. All features which makes GPU programming nearly ideal for exploring pixel graphic systems.

Fragmentarium – an IDE for exploring fractals and generative systems on the GPU.

As I mentioned in my previous post, I started experimenting with GLSL raytracing a couple of months ago, and I’m now ready to release the first version of Fragmentarium, an open source, cross-platform IDE for exploring pixel based graphics on the GPU.

It was mainly created for exploring Distance Estimated systems, such as Mandelbulbs or Kaleidoscopic IFS, but it can also be used for 2D systems.

Fragmentarium is inspired by Adobe’s Pixel Bender, but uses pure GLSL, and is specifically created with fractals and generative systems in mind. Besides Pixel Bender, there are also other, more specialized, GPU fractal applications out there, such as Boxplorer and Tacitus, but I wanted something code-centric, where I quickly can modify code and use code in a more modular manner.

Features:

  • Multi-tabbed IDE, with GLSL syntax highlighting

  • Modular GLSL programming – include other fragments
  • User widgets to manipulate parameter settings.
  • Different ‘mouse to GLSL’ mapping schemes (2D and 3D)
  • Includes raytracer for distance estimated systems
  • Many examples including Mandelbulb, Mandelbox, Kaleidoscopic IFS, and Julia Quaternion

Fragmentarium can be downloaded from:
http://syntopia.github.com/Fragmentarium/

There are binaries for Windows, but for now you’ll have to build it yourself for Mac and Linux. You will need a graphics card capable of running GLSL (any reasonably moderne discrete card will do).

Here is a screenshot:

Fragmentarium Screenshot

Fragmentarium screenshot (click to enlarge).

There’s also a gallery at Flickr: Fragmentarium Group

Fragmentarium is not a mature application yet. Especially the camera handling needs some work in the next versions – camera settings are not saved as part of the parameters, no field-of-view control and you often have to compensate for clipping. For future versions I also plan arbitrary resolution renders (tile based rendering) and animations.

There are probably also many small quirks and bugs – I’ve had several problems with ATI drivers, which seems to be much more strict than Nvidias.