Fragmentarium FAQ

Here is a small collection of questions, I’ve have been asked from time to time about Fragmentarium.

Why does Fragmentarium crash?

The most common cause for this is, that a drawing operation takes too long to complete. On Windows, there is a two seconds maximum time limit on jobs executing on the GPU. After that, a GPU watchdog timer will kill the graphics driver and restore it (resulting in unresponsive, possible black, display for 5-10 seconds). The host process (here Fragmentarium) will also be shut down by Windows.

If this happens, you will get errors like this:

"The NVIDIA OpenGL driver lost connection with the display driver
and is unable to continue.
The application must close.... Error code: 8"
"Display driver stopped responding and has recovered"

Workarounds:
Try lowering the number of ray steps, the number of iterations or use the preview feature. Notice that for high-resolution renders, the time limit is for each tile, so it is still possible to do high resolution images.

Another solution is to change the watchdog timer behaviour via the TDR Registry Keys:

The TDR registry keys were not defined in my registry, but I added a

DWORD TdrDelay = 30

and a

DWORD TdrDdiDelay = 30

key to

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers

which sets a 30 second window for GPU calculations. You have to restart Windows to apply these settings. Be advised that changing these settings may render your system completely unresponsive if the GPU crashes.

Why does Fragmentarium not work on my GPU?

Even though GLSL is a well-defined standard, the are differences between different vendor implementations and different drivers. The computers I have use have Nvidia cards, so Fragmentarium is most tested on Nvidia’s platform.

Typical problems:

The ATI compiler is more strict about casting. For instance, adding an integer to a float results in an error, and not in a warning:

float a = b + 3;

The ATI compiler also seems to suffer from some loop optimization problems: some of the fractals in Fragmentarium does not work on my ATI card, if the number of iterations is not locked (or hard-coded in the fragment code). Inserting a condition into the loop also solves the problem.

The Intel GPU compiler also has some issues: for instance, some operations on literal constants results in errors:

vec3 up = normalize(vec3(1.0,0.0,0.0)); // fails on Intel

I get weird errors about disabled widgets?

If you see warnings like:

Unable to find 'FloorNormal' in shader program. Disabling widget.

it does not indicate a problem. For instance, the warning above appears when the EnableFloor checkbox is locked and disabled. In this case, the GLSL will optimize the floor code away, and the FloorNormal uniform variable will no longer be part of the compiled program – hence the warning. These warnings can be safely ignored.

Why does your Fragmentarium images look much nicer than the ones I get in Fragmentarium?

The images I post are always rendered at a higher resolution using the High Resolution Render option. I then downscale the images to a lower resolution. This reduces alias and rendering artifacts. Use a painting program with a proper downscaling filter – I use Paint.NET which seems to work okay.

Before rendering in hi-res, use the Tile Preview to zoom in, and adjust the details level and number of ray steps, so the image looks okay in the Tile Preview resolution.

Why is Fragmentarium slower than BoxPlorer/Fractals.io/…?

The default ray tracer in Fragmentarium has grown somewhat complex. It is possible to gain speed by locking variables, but this is somewhat tedious. Another solution is to change to another raytracer, e.g. change

#include "DE-Raytracer.frag"

to either

#include "Fast-Raytracer.frag"

(a faster version of the ones above, which will remember all settings)

or

#include "Subblue-Raytracer.frag"

(Tom Beddards raytracer, which uses a set of different parameters)

Can I use double precision in my shaders?

Most modern graphics cards support double precision numbers in hardware, so in principle, yes, if your card supports it. In practice, it is much more difficult:

First, the Fragmentarium presets and sliders (including camera settings), will only transfer data (uniforms) to the GPU as single precision floats. This is not the biggest problem, since you might only need double precision for the numbers that accumulate errors. The Qt OpenGL wrappers I use doesn’t support double types, but it would be possible to work around this if needed.

Second, while newer GLSL versions do support double precision numbers (through types such as double, dvec3, dmat3), not all of the built-in functions support them. In particular, there are no trigonometric or exponential functions. So no cos(double), exp(double), etc. The available functions are described here.

Third, it might be slow: When Nvidia designed their Fermi architecture, used in their recent graphic cards, they built it, so that is should be able to process double precision operations with half the speed of single precision operations (which is optimal given the double size of the numbers). However, they decided that their consumer branch (the Nvidia cards) should be artificially limited to run double precision calculations at 1/8 the speed of single precision numbers. Their Tesla line of graphics card (which shares architecture with the Geforce branch), is not artifically throttled and will run double precision at half the speed of single precision. As for the AMD/ATI cards, I do not think they have similar limitations, but I’m not sure about this.

If you really still want to try, you must insert this command at the top of the script:

#extension GL_ARB_gpu_shader_fp64 : enable

(Or use a #version command, but this will like cause problems with most of the existing examples).

Finally, what about emulating double precision numbers, instead of using the hardware versions? While this sounds very slow, it is probably not much slower than the throttled implementation. The downside is, the GLSL does not support operator overloading: there is no syntactically nice way to implement such functionality: instead of just changing your data types, you must convert all code from e.g.

A = B*C+D;

to

A = dplus(dmul(B,C),D);

If you are still interested, here is a great introduction to emulating doubles in GLSL.

How do I report errors, so that it is easiest for you to correct them?

(Well, actually nobody asks this questions)

If you find an error, please report the following:
- Operating system, and the graphics card you are using
- The version of Fragmentarium, and whether you built it yourself
- A reproducible description of the steps that caused the error (if possible).

You may mail errors to me at mikael (at) hvidtfeldts.net.

Optimizing GLSL Code

By making selected variables constant at compile time, some 3D fractals render more than four times faster. Support for easily locking variables has been added to Fragmentarium.

Some time ago, I became aware that the raytracer in Fragmentarium was somewhat slower than both Fractal Labs and Boxplorer for similar systems – this was somewhat puzzling since the DE raycasting technique is pretty much the same. After a bit of investigation, I realized that my standard raytracer had grown slower and slower, as new features had been added (e.g. reflections, hard shadows, and floor planes) – even if the features were turned off!

One way to speed up GLSL code, is by marking some variables constant at compile-time. This way the compiler may optimize code (e.g. unroll loops) and remove unused code (e.g. if hard shadows are disabled). The drawback is that changing these constant variables requires that the GLSL code is compiled again.

It turned out that this does have a great impact on some systems. For instance for the ‘Dodecahedron.frag’, take a look at the following render times:

No constants: 1.4 fps (1.0x)
Constant rotation matrices : 3.4 fps (2.4x)
Constant rotation matrices + Anti-alias + DetailAO: 5.6 fps (4.0x)
All 38 parameters (except camera): 6.1 fps (4.4x)

The fractal rotation matrices are the matrices used inside the DE-loop. Without the constant declarations, they must be calculated from scratch for each pixel, even though they are identical for all pixels. Doing the calculation at compile-time gives a notable speedup of 2.4x (notice that another approach would be to calculate such frame constants in the vertex shader and pass them to the pixel shader as ‘varying’ variables. But according to this post this is – surprisingly – not very effective).

The next speedup – from the ‘Anti-alias’ and ‘DetailAO’ variables – is more subtle. It is difficult to see from the code why these two variables should have such impact. And in fact, it turns out that combinations of other variables will amount in the same speedup. But these speedups are not additive! Even if you make all variables constants, the framerate only increases slightly above 5.6 fps. It is not clear why this happens, but I have a guess: it seems that when the complexity is lowered between a certain treshold, the shader code execution speed increases sharply. My guess is that for complex code, the shader runs out of free registers and needs to perform calculations using a slower kind of memory storage.

Interestingly, the ‘iterations’ variable offers no speedup – even though the compiler must be able to unroll the principal DE loop, there is no measurable improvement by doing it.

Finally, the compile time is also greatly reduced when making variables constant. For the ‘Dodecahedron.frag’ code, the compile time is ~2000ms with no constants. By making most variables constant, the compile time is lowered to around ~335ms on my system.

Locking in Fragmentarium.

In Fragmentarium variables can be locked (made compile-time constant) by clicking the padlock next to them. Locked variables appear with a yellow padlock next to them. When a variable is locked, any changes to it will first be executed when the system is compiled (by pressing ‘build’). Locked variables, which have been changes, will appear with a yellow background until the system is compiled, and the changes are executed.

Notice, that whole parameter groups may be locked, by using the buttons at the bottom.

Screenshot

The locking interface - click to enlarge.


The ‘AntiAlias’ and ‘DetailAO’ variables are locked. The ‘DetailAO’ has been changed, but the changes are not executed yet (the yellow background). The ‘BoundingSphere’ variable has a grey background, because it has keyboard focus: its value can be finetuned using the arrow keys (up/down controls step size, left/right changes value).

In a fragment, a user variable can be marked as locked by default, by adding a ‘locked’ keyword to it:

uniform float Scale; slider[-5.00,2.0,4.00] Locked

Some variables can not be locked – e.g. the camera settings. It is possible to mark such variables by the ‘NotLockable’ keyword:

uniform vec3 Eye; slider[(-50,-50,-50),(0,0,-10),(50,50,50)] NotLockable

The same goes for presets. Here the locking mode can be stated, if it is different from the default locking mode:

#preset SomeName
AntiAlias = 1 NotLocked
Detail = -2.81064 Locked
Offset = 1,1,1
...
#endpreset

Locking will be part of Fragmentarium v0.9, which will be released soon.

Plotting High-frequency Functions Using a GPU.

A slight digression from the world of fractals and generative art: This post is about drawing high-quality graphs of high-frequency functions.

Yesterday, I needed to draw a few graphs of some simple functions. I started out by using Iñigo Quilez’s nice little GraphToy, but my functions could not be expressed in a single line. So I decided to implement a graph plotter example in Fragmentarium instead.

Plotting a graph using a GLSL shader is not an obvious task – you have to frame the problem in a way, such that each pixel can be processed individually. This is in contrast to the standard way of drawing graphs – where you choose a uniform set of values for the x-axis, and draw the lines connecting the points in the (x,f(x)) set.

So how do you do it for each pixel individually?

The first thing to realize is, that it is easy to determine whether a pixel is above or below the graph – this can be done by checking whether y<f(x) or y>f(x). The tricky part is, that we only want to draw the boundary – the curve that separates the regions above and below the graph.

So how do we determine the boundary? After having tried a few different approaches, I came up with the following simple edge detection procedure: for each pixel, choose a number of samples, in a region around the pixel center. Then count how many samples are above, and how many samples are below the curve.

If all samples are above, or all samples are below, the pixel is not on the boundary. However, if there are samples both above and below, the boundary must be passing through the pixel.

The whole idea can be expressed in a few lines of code:

for (float i = 0.0; i < samples; i++) {
	for (float  j = 0.0;j < samples; j++) {
		float f = function(pos.x+ i*step.x)-(pos.y+ j*step.y);
		count += (f>0.) ? 1 : -1;
	}
}
// base color on abs(count)/(samples*samples)

It should be noted, that the sampling can be improved by adding a small amount of jittering (random offsets) to the positions – this reduces the aliasing at the cost of adding a small amount of noise.

Highfrequency functions and aliasing

So why it this better than the common ‘connecting line’ approach?

Because this approach deals with the high-frequency information much better.

Consider the function f(x)=sin(x*x*x)*sin(x).
Here is a plot from GraphToy:

Notice how the graph near the red arrows seem to be slowly varying. This is not the true behavior of function, but an artifact of the way we sample our function. Our limited resolution cannot capture the high frequency components, which results in aliasing.

Whenever you do anything media-related on a computer, you will at some point run into problems with aliasing: whether you are doing sound synthesis, image manipulation, 3D rendering or even drawing a straight line.

However, using the pixel shader approach, aliasing is much easier to avoid. Here is a Fragmentarium plot of the same function:

Even though it may seem backwards to evaluate the function for all pixels on the screen, it makes it possible to tame the aliasing, and even on a modest GPU, the procedure is fast enough for realtime interactions.

The example is included in GitHub under Examples/2D Systems/GraphPlotter.frag.

Distance Estimated 3D Fractals (Part I)

During the last two years, the 3D fractal field has undergone a small revolution: the Mandelbulb (2009), the Mandelbox (2010), The Kaleidoscopic IFS’s (2010), and a myriad of equally or even more interesting hybrid systems, such as Spudsville (2010) or the Kleinian systems (2011).

All of these systems were made possible using a technique known as Distance Estimation and they all originate from the Fractal Forums community.

The background

The first paper to introduce Distance Estimated 3D fractals was written by Hart and others in 1989:
Ray tracing deterministic 3-D fractals

In this paper Hart describe how Distance Estimation may be used to render a Quaternion Julia 3D fractal. The paper is very well written and definitely worth spending some hours on (be sure to take a look at John Hart’s other papers as well). Given the age of Hart’s paper, it is striking that is not until the last couple of years that the field of distance estimated 3D fractals has exploded. There has been some important milestones, such as Keenan Crane’s GPU implementation (2004), and Iñigo Quilez 4K demoscene implementation (2007), but I’m not aware of other fractal systems being explored using Distance Estimation, before the advent of the Mandelbulb.

Raymarching

Classic raytracing shoots one (or more) rays per pixel and calculate where the rays intersect the geometry in the scene. Normally the geometry is described by a set of primitives, like triangles or spheres, and some kind of spatial acceleration structure is used to quickly identify which primitives intersect the rays.

Distance Estimation, on the other hand, is a ray marching technique.

Instead of calculating the exact intersection between the camera ray and the geometry, you proceed in small steps along the ray and check how close you are to the object you are rendering. When you are closer than a certain threshold, you stop. In order to do this, you must have a function that tells you how close you are to the object: a Distance Estimator. The value of the distance estimator tells you how large a step you are allowed to march along the ray, since you are guaranteed not to hit anything within this radius.

Schematics of ray marching using a distance estimator.

The code below shows how to raymarch a system with a distance estimator:

float trace(vec3 from, vec3 direction) {
	float totalDistance = 0.0;
	int steps;
	for (steps=0; steps < MaximumRaySteps; steps++) {
		vec3 p = from + totalDistance * direction;
		float distance = DistanceEstimator(p);
		totalDistance += distance;
		if (distance < MinimumDistance) break;
	}
	return 1.0-float(steps)/float(MaxRaySteps);
}

Here we simply march the ray according to the distance estimator and return a greyscale value based on the number of steps before hitting something. This will produce images like this one (where I used a distance estimator for a Mandelbulb):

It is interesting that even though we have not specified any coloring or lighting models, coloring by the number of steps emphasizes the detail of the 3D structure - in fact, this is an simple and very cheap form of the Ambient Occlusion soft lighting often used in 3D renders.

Parallelization

Another interesting observation is that these raymarchers are trivial to parallelise, since each pixel can be calculated independently and there is no need to access complex shared memory structures like the acceleration structure used in classic raytracing. This means that these kinds of systems are ideal candidates for implementing on a GPU. In fact the only issue is that most GPU's still only supports single precision floating points numbers, which leads to numerical inaccuracies faster than for the CPU implementations. However, the newest generation of GPU's support double precision, and some API's (such as OpenCL and Pixel Bender) are heterogeneous, meaning the same code can be executed on both CPU and GPU - making it possible to create interactive previews on the GPU and render final images in double precision on the CPU.

Estimating the distance

Now, I still haven't talked about how we obtain these Distance Estimators, and it is by no means obvious that such functions should exist at all. But it is possible to intuitively understand them, by noting that systems such as the Mandelbulb and Mandelbox are escape-time fractals: we iterate a function for each point in space, and follow the orbit to see whether the sequence of points diverge for a maximum number of iterations, or whether the sequence stays inside a fixed escape radius. Now, by comparing the escape-time length (r), to its spatial derivative (dr), we might get an estimate of how far we can move along the ray before the escape-time length is below the escape radius, that is:

distance=(r-EscapeRadius)/dr.

This is a hand-waving estimate - the derivative might fluctuate wildly and get larger than our initial value, so a more rigid approach is needed to find a proper distance estimator. I'll a lot more to say about distance estimators in part III, so for now we will just accept that these function exists and can be obtained for quite a diverse class of systems, and that they are often constructed by comparing the escape-time length with some approximation of its derivative.

It should also be noticed that this ray marching approach can be used for any kinds of systems, where you can find a lower bound for the closest geometry for all points in space. Iñigo Quilez has used this in his impressive procedural SliseSix demo, and has written an excellent introduction, which covers many topics also relevant for Distance Estimation of 3D fractals.

This concludes the first part of this series of blog entries. Part II discusses lighting and coloring of fractals and Part III discuss the DE-function and how to arrive at it.

GPU versus CPU for pixel graphics

After having gained a bit of experience with GPU shader programming during my Fragmentarium development, a natural question to ask is: how fast are these GPU’s?

This is not an easy question to answer, and it depends on the specific application. But I will try to give an answer for the kind of systems that I’m interested in: pixel graphics systems, where each pixel can be calculated independently of the others, such as raytraced 3D fractals.

Lets take my desktop computer, a fairly standard desktop machine, as an example. It is equipped with Nvidia Geforce 9800GT GPU @ 1.5 GHz, and a Intel Core 2 Quad Q8200 @ 2.33GHz.

How many processing unit are there?

Number of processing units (CPU): 4 CPU cores
Number of processing units (GPU): 112 Shader units

Based on these numbers, we might expect the GPU to be a factor of 28x times faster than the CPU. Of course, this totally ignores the efficiency and operating speed of the processing units. Lets try looking at the processing power in terms of maximum number of floating-point operations per second instead:

Theoretical Peak Performance

Both Intel and Nvidia list the GFLOPS (billion floating point operations per second) rating for their products. Intel’s list can be found here, and Nvidia’s here. For my system, I found the following numbers:

Performance (CPU): 37.3 GFLOPS
Performance (GPU): 504 GFLOPS

Based on these numbers, we might expect the GPU to be a factor of 14x times faster than the CPU. But what do these numbers really mean, and can they be compared? It turns out that these number are obtained by multiplying the processor frequency by the maximum number of instructions per clock cycle.

For the CPU, we have four cores. Now, when Intel calculate their numbers, they do it based on the special 128-bit SSE registers on every modern Pentium derived CPU. These extensions make it possible to handle two double precision floating point, or four single precision floating point numbers per clock cycle. And in fact there exists a special instruction – the MAD, or Multiply-Add, instruction – which allows for two arithmetic operations per clock cycle on each element in the SSE registers. These means Intel assume 4 (cores) x 2 (double precision floats) x 2 (MAD instructions) = 16 instructions per clock cycle. This gives the theoretical peak performance stated above:

Performance (CPU): 2.33 GHz * 4 * 2 * 2 = 37.3 GFLOPS (double precision floats)

What about the GPU? Here we have 112 independent processing units. On the GPU architecture an even more benchmarking-friendly instruction exists: the MAD+MUL which combines two multiplies and one addition in a single clock cycle. This means Nvidia assumes 112 (cores) * 3 (MAD+MUL instructions) = 336 instructions per clock cycle. Combining this with a stated processing frequency of 1.5 GHz, we arrive at the number stated earlier:

Performance (GPU): 1.5 GHz * 112 * 3 = 504 GFLOPS (single precision floats)

But wait… Nvidia’s number are for single precision floats – the Geforce 8800GT does not even support double precision floats. So for a fair comparison we should double Intel’s number, since the SSE extensions allows four simultaneous single precision numbers to be processed instead of two double precision floats. This way we get:

Performance (CPU): 2.33 GHz * 4 * 4 * 2 = 74.6 GFLOPS (single precision floats)

Now, using this as a guideline, we would expect my GPU to be a factor of 6.8x faster than my CPU. But we have some pretty big assumptions here: for instance, not many CPU programmers would write SSE-optimized code – and is a modern C++ compiler powerful enough to automatically take advantage of them anyway? And how often is the GPU able to use the three operation MUL+MAD instruction?

A real-world experiment

To find out I wrote a simple 2D Mandelbrot system and benchmarked it on the CPU and GPU. This is really the kind of computational tasks that I’m interested in: it is trivial to parallelize and is not memory-intensive, and the majority of executed code will be floating point arithmetics. I did not try to optimize the C++ code, because I wanted to see if the compiler was able to perform some SSE optimization for me. Here are the execution times:

13941 ms – CPU single precision (x87)
13941 ms – CPU double precision (x87)
10535 ms – CPU single precision (SSE)
11367 ms – CPU double precision (SSE)
424 ms – GPU single precision

(These numbers have some caveats – I did perform the tests multiple times and discarded the first few runs, but the CPU code was only single-threaded – so I assumed the numbers would scale perfectly and divided the execution times by four. Also, I verified by checking the generated assembly code, that SSE instructions indeed were used for the core Mandelbrot loop, when they were enabled.).

There are a couple of things to notice here: first, there is no difference between single and double precision on the CPU. This is as could be expected for the x87 compiled code (since the x87 defaults to 80-bit precision anyway), but for the SSE version, we would expect a double up in speed. As can be seen, the SSE code is really not very much more efficient the the x87 code – which strongly suggests that the compiler (here Visual Studio C++ 2008) is not very good at optimizing for SSE.

So for this example we got a factor of 25x speedup by using the GPU instead of the CPU.

“Measured” GFLOPS

Another questions is how this example compares to the theoretical peak performance. By using Nvidia’s Cg SDK I was able to get the GPU assembly code. Since I now could count the number of instruction in the main loop, and I knew how many iterations were performed, I was able to calculate the actual number of floating point operations per second:

GPU: 211 (Mandel)GFLOPS
CPU: 8.4 (Mandel)GFLOPS*

(*The CPU number was obtained by assuming the number of instructions in the core loop was the same as for the GPU: in reality, the CPU disassembly showed that the simple loop was partially unrolled to more than 200 lines of very complex assembly code.)

Compared to the theoretical maximum numbers of 504 GFLOPS and 74.6 GFLOPS respectively, this shows the GPU is much closer to its theoretical limit than the CPU.

GPU Caps Viewer – OpenCL

A second test was performed using the GPU Caps Viewer. This particular application includes a 4D Quaternion Julia fractal demo in OpenCL. This is interesting since OpenCL is a heterogeneous platform – it can be compiled to both CPU and GPU. And since Intel just released an alpha version of their OpenCL SDK, I could compare it to Nvidia’s SDK.

The results were interesting:

Intel OpenCL: ~5 fps
Nvidia OpenCL: ~35 fps

(The FPS did vary through the animation, so these numbers are not very accurate. There were no dedicated benchmark mode.)

This suggest that Intel’s OpenCL compiler is actually able to take advantage of the SSE instructions and provides a highly optimized output. Either that, or Nvidia’s OpenCL implementation is not very efficient (which is not likely).

The OpenCL based benchmark showed my GPU to be approximately 7x times faster than my CPU. Which is exactly the same as predicted by comparing the theoretical GFLOPS values (for single precision).

Conclusion

For normal code written in a high-level language like C or GLSL (multithreaded, single precision, and without explicit SSE instructions) the computational power is roughly equivalent to the number of cores or shader units. For my system this makes the GPU a factor of 25x faster.

Even though the CPU cores have higher operating frequency and in principle could execute more instructions via their SSE registers, this does not seem be fully utilized (and in fact, compiling with and without SSE optimization did not make a significant difference, even for this very simple example).

The OpenCL example tells another story: here the measured performance was proportional to the theoretical GFLOPS ratings. This is interesting since this indicate, that OpenCL could also be interesting for CPU-applications.

One thing to bear in mind is, that the examples tested here (the Mandelbrot and 4D Quaternion Julia) are very well-suited for GPU execution. For more complex code, with conditional branching, double precision floating point operations, and non-coalesced memory access, the CPU is much more efficient than the GPU. So for a desktop computer such as mine, a factor of 25x is probably the best you can hope for (and it is indeed a very impressive speedup for any kind of code).

It is also important to remember that GPU’s are not magical devices. They perform operations with a theoretical peak performance typically 5-15 times larger than a CPU. So whenever you see these 1000x speed up claims (e.g. some of the CUDA showcases), it is probably just an indication of a poor CPU implementation.

But even though the performance of GPU’s may be somewhat exaggerated you can still get a tremendous speedup. And GPU interfaces such as GLSL shaders are really simple to use: you do not need to deal explicitly with threads, you have built-in vectors and matrices, and you can compile GLSL code dynamically, during run-time. All features which makes GPU programming nearly ideal for exploring pixel graphic systems.

Fragmentarium – an IDE for exploring fractals and generative systems on the GPU.

As I mentioned in my previous post, I started experimenting with GLSL raytracing a couple of months ago, and I’m now ready to release the first version of Fragmentarium, an open source, cross-platform IDE for exploring pixel based graphics on the GPU.

It was mainly created for exploring Distance Estimated systems, such as Mandelbulbs or Kaleidoscopic IFS, but it can also be used for 2D systems.

Fragmentarium is inspired by Adobe’s Pixel Bender, but uses pure GLSL, and is specifically created with fractals and generative systems in mind. Besides Pixel Bender, there are also other, more specialized, GPU fractal applications out there, such as Boxplorer and Tacitus, but I wanted something code-centric, where I quickly can modify code and use code in a more modular manner.

Features:

  • Multi-tabbed IDE, with GLSL syntax highlighting

  • Modular GLSL programming – include other fragments
  • User widgets to manipulate parameter settings.
  • Different ‘mouse to GLSL’ mapping schemes (2D and 3D)
  • Includes raytracer for distance estimated systems
  • Many examples including Mandelbulb, Mandelbox, Kaleidoscopic IFS, and Julia Quaternion

Fragmentarium can be downloaded from:
http://syntopia.github.com/Fragmentarium/

There are binaries for Windows, but for now you’ll have to build it yourself for Mac and Linux. You will need a graphics card capable of running GLSL (any reasonably moderne discrete card will do).

Here is a screenshot:

Fragmentarium Screenshot

Fragmentarium screenshot (click to enlarge).

There’s also a gallery at Flickr: Fragmentarium Group

Fragmentarium is not a mature application yet. Especially the camera handling needs some work in the next versions – camera settings are not saved as part of the parameters, no field-of-view control and you often have to compensate for clipping. For future versions I also plan arbitrary resolution renders (tile based rendering) and animations.

There are probably also many small quirks and bugs – I’ve had several problems with ATI drivers, which seems to be much more strict than Nvidias.

Folding Space II: Kaleidoscopic Fractals

Another type of interesting 3D fractal has appeared over at fractalforums.com: the Kaleidoscopic 3D fractals, introduced in this thread, by Knighty.

Once again these fractals are defined by investing the convergence properties of a simple function. And similar to the Mandelbox, the function is built around the concept of folds. Geometrically, a fold is simply a conditional reflection: you reflect a point in a plane, if it is located on the wrong side of the plane.

It turns out that just by using plane-folds and scaling, it is possible to create classic 3D fractals, such as the Menger cube and the Sierpinsky tetrahedron, and even recursive versions of the rest of the Platonic solids: the octahedron, the dodecahedron, and the icosahedron.


Example of a recursive dodecahedron

The kaleidoscopic fractals introduce an additional 3D rotation before and after the folds. It turns out that these perturbations introduce a rich variety of interesting and complex structures.

I’ve followed the thread and implemented most of the proposed systems by modifying Subblue’s Pixel Bender scripts.

Below are some of my images:

The Menger Sponge

My first attempts. Pixel Bender kept crashing on me, until I realized that there is a GPU timeout in Windows Vista (read this for a solution).


The Sierpinsky

Then I moved on to the Sierpinsky. The sequence below shows something characteristic for these fractals: the first slightly perturbed variations look artificial and synthetic, but when the system is distorted, it becomes organic and alive.



The Icosahedron

I also tried the octahedron and dodecahedron, but my favorite is the icosahedron. Especially knighty’s hollow variant.




Arbitrary Planes

One nice thing about these systems is, that you do not necessarily need to derive a complex distance estimator – you can also just modify the distance estimator code, and see what happens. These last two images were constructed by modifying existing distance estimators.

It will be interesting to see where this is going.

Many fascinating 3D fractals have appeared at fractalforums.com over the last few weeks. And GPU processing now makes it is possible to explore these systems in real-time.

Shader Toy

For some time I’ve been wanting to play around with pixel (fragment) shaders, but I couldn’t find a proper playground.

Then I stumbled upon Shader Toy, by Inigo Quilez (whom I’ve mentioned several times on this blog). A couple of things make Shader Toy stand out:

It runs inside your browser. It uses the emerging WebGL standard, which is JavaScript bindings for OpenGL (ES) 2.0. OpenGL can be used directly inside a Canvas HTML element, including support for custom shaders. As Shader Toy demonstrates, this makes it possible to do some very impressive stuff, such as real-time GPU-accelerated raytracing inside an element on a web page.

The examples are great. While Shader Toy itself is mostly a thin wrapper around the WebGL functionality, the great thing about it is the example shaders: 2D fractals and Demo Scene effects, but also complex examples like the Slisesix 4K demo, and examples of raytracing, and complex fractals, like the Quaternion Julia set, and the Mandelbulb.

The only problem with WebGL is, that it is not supported by the current generation of browsers.

The good news is that the nightly builds of Firefox, Safari (WebKit), and Chromium (Google Chrome) all support it, and are quite easy to install: this is a good place for more information. If you use the Chromium builds, you don’t have to worry about messing up your existing browser configuration – the nightly builds are standalone versions and can be run without installation.

There are lots of complex shader tools out there: for instance, NVIDIAs FX Composer, AMDs Rendermonkey, TyphoonLabs OpenGL Shader Designer, and Lumina, but Shader Toy makes it very easy to get started with shaders. And it provides a rare insight into how those amazing 4K demos were made.

Mandelbulb Implementations

Several implementations have appeared since the Mandelbulb surfaced a couple of months ago.

The first public GPU implementation I know of was created by ‘cbuchner1′. It is based on a sample from NVIDIAs OptiX SDK, and features anaglyphic 3D, ambient occlusion, phong shading, reflection, and environment maps. It can be downloaded here (Windows only and requires a forum signup).


Example made with cbuchner1′s implementation

Very interestingly this binary runs on my laptops modest GeForce 8400M. I am a bit puzzled about this – NVIDIA state that the OptiX SDK requires a Quadro or a Tesla card, and I am not able to run the Julia OptiX demo, that cbuchner1s app is derived from.

Subblue has also created a Mandelbulb implementation, released as a Pixel Bender script and a Quartz composer plugin. A number of interesting customizations makes this my favorite choice: it is possible to explore negative and fractional powers, switch to Julia sets, and the lightning options can be fine-tuned. The only drawback is that Pixel Bender does not make it possible to directly rotate, zoom, and translate the camera – you have to rely on sliders for that.


Example created by Subblue.

Iñigo Quílez has also created a GPU implementation, but unfortunately he has not released any code yet. A couple of videos are available on Youtube, though: Part 1, Part 2, Part 3.


Quilez also discovered this intimate connection between the Shroud of Turin and the Mandelbulb.

The MathFuncRenderer also has a Mandelbulb implementation. I had a few quirks with this one – I had to install OpenAL, and the UI was quite non-responsive, but this may be due to my graphics card.

Another very interesting implementation is the GigaVoxels Mandelbulb: Whereas most implementations cast rays and use a distance estimator to speed up the ray marching, GigaVoxels use voxels stored into an Octree, which is populated on-the-fly.

For other implementations keep an eye on Fractal Forums Mandelbulb Implementation category.

Mandelbulb

A lot of sites have reported that a new, interesting 3D version of the Mandelbrot set has been discovered. The Mandelbulb has aesthetic qualities similar to Quaternion-Julia sets, but seems more diverse and suited for exploration.


“Cave of Lost Secrets” from Skytopia.

Skytopia has a great overview complete with many stunning images.

A good way to view the basic structure is this 56 Megapixel render from Skytopia (using the Seadragon viewer – requires Silverlight):


As of now, I do not know of any released software capable of generating Mandelbulbs, but it probably won’t be long:

Recent posts by Iñigo Quílez (who produced the Kindernoiser Quaternion-Julia set GPU renderer) indicate that he is very to close to completing a fast GPU implementation. These posts also include the basic source-code, which I believe should make it possible to port to other targets, for instance Pixel Bender. Apparently Quílez has cooked up a distance estimator, and a fake ambient occlusion scheme (based on orbit traps) for these Mandelbulbs, which sounds very promising.