View on GitHub

clij

Blog post: OpenCL/GPU based image processing from ImageJ macro

ClearCLIJ - OpenCL based GPU image processing from ImageJ macro

Image processing in modern GPUs allows for accelerating processing speeds massively. This page introduces how to do image processing in the graphics processing unit (GPU) using OpenCL from ImageJ macro inside Fiji using the ClearCLIJ library. It is not necessary to learn OpenCL itself. Preprogrammed routines are supposed to do GPU image processing for you with given ImageJ macro programming experience. The list of preprogrammed routines might be extended depending on the communities needs.

This is how your code might look like if you do GPU based image processing in ImageJ macro:

Image

Before we start

The presented software is in early developmental stage. Yet it is unclear where this project is heading towards. In order to know what might be necessary to develop ClearCLIJ up to a degree where people can just use it, I need your help:

Thanks for reading. Now, let’s get started.

Installation

Download and install Fiji from its website. Add the update site http://sites.imagej.net/clij to your Fiji installation. If needed read more about how to activate update sites. Restart Fiji. ClearCLIJ is successfully installed, if you find a menu entry Plugins > CLIJ.

Please note: This is experimental software, it may have side effects on your Fiji installation and may break other plugins. It is strongly recommended to not install ClearCLIJ in a production environment. Use a freshly downloaded Fiji installation for testing experimental plugins like this. Read the BSD license file for more details.

A first macro

The first macro is help.ijm. It will assist us to get an overview which methods are supported by ClearCLIJ to process images. It looks like this:

run("CLIJ Macro Extensions", "cl_device=");
Ext.CLIJ_help("add");

By executing it, you will find a list of commands containing the term add in their names:

Found 3 method(s) containing the pattern "add":
Ext.CLIJ_addPixelwise(Image summand1, Image summand2, Image destination);
Ext.CLIJ_addScalar(Image source, Image destination, Number scalar);
Ext.CLIJ_addWeightedPixelwise(Image summand1, Image summand2, Image destination, Number factor1, Number factor2);

Keep this example program, you might need it later again if you want to search for help on ClearCLIJ methods. It also tells you which parameters the methods need in order to run.

Note: The first line of this macro may contain the specific name for a GPU. You can - but you don’t have to - specify one. If none is specified, the system will take the first one found. If you don’t have the named GPU in your computer, another one will be chosen. You don’t have to enter the full name, you can also just specify a part of the name. In order to run on any HD named GPU, change the macro like this:

run("CLIJ Macro Extensions", "cl_device=HD");
Ext.CLIJ_help("add");

Transferring images between ImageJ and the GPU

In order to allow images to be processed by the GPU, you need to transfer them into the memory of the GPU. In order to view images which were processed by the GPU, you need to transfer them back to ImageJ. The two methods for doing this are called push(image) and pull(image). You can remove a single image from the GPUs memory by using the release(image) method. Finally, you can remove all images from the GPU with the clear() method. In case the destination image doesn’t exist, it will be created automatically in the GPU. Just push an image A to the GPU, process it with destination B and afterwards, you can pull B back from the GPU to ImageJ in order to show it.

Let’s have a look at an example which loads an image and blurs it using the push-pull mechanism.

// Get test data
run("T1 Head (2.4M, 16-bits)");
input = getTitle();
getDimensions(width, height, channels, slices, frames);

// define under which name the result should be saved
blurred = "Blurred";

// Init GPU
run("CLIJ Macro Extensions", "cl_device=");
Ext.CLIJ_clear();

// push images to GPU
Ext.CLIJ_push(input);

// cleanup ImageJ
run("Close All");

// Blur in GPU
Ext.CLIJ_blur3d(input, blurred, 10, 10, 1);

// Get results back from GPU
Ext.CLIJ_pull(blurred);

// Cleanup by the end
Ext.CLIJ_clear();

To find out, which images are currently stored in the GPU, run the Ext.CLIJ_reportMemory(); method.

Sparing time with GPU based image processing

The overall goal for processing images in the GPU is sparing time. GPUs can process images faster because they can calculate pixel values of many pixels in parallel. Furthermore, images in memory of modern GPUs can be accessed faster than in ImageJ. However, there is a drawback: pushing/pulling the images to/from the GPU takes time. Thus, overall efficiency can only be achieved if whole pipelines are processed in the GPU. Furthermore, repeatedly using the same operations on a GPU pays off because operations are cached. Reusing them is faster than using other methods.

Let’s compare the Mean 3D filter of ImageJ with it’s counterpart in ClearCLIJ. The example macro is benchmarking.ijm. It executes both operations ten times and measures the time each operation takes. This is just an excerpt of the macro:

// Local mean filter in CPU
for (i = 1; i <= 10; i++) {
	time = getTime();
	run("Mean 3D...", "x=3 y=3 z=3");
	print("CPU mean filter no " + i + " took " + (getTime() - time));
}
// push images to GPU
time = getTime();
Ext.CLIJ_push(input);
Ext.CLIJ_push(blurred);
print("Pushing two images to the GPU took " + (getTime() - time) + " msec");

// Local mean filter in GPU
for (i = 1; i <= 10; i++) {
	time = getTime();
	Ext.CLIJ_mean3d(input, blurred, 3, 3, 3);
	print("GPU mean filter no " + i + " took " + (getTime() - time));
}

// Get results back from GPU
time = getTime();
Ext.CLIJ_pull(blurred);
print("Pulining one image from the GPU took " + (getTime() - time) + " msec");

When executing the macro on an Intel Core i7-8650U CPU with a built-in Intel UHD Graphics 620 GPU (Windows 10, 64 bit), the output is:

CPU mean filter no 1 took 3043 msec
CPU mean filter no 2 took 4350 msec
CPU mean filter no 3 took 4467 msec
CPU mean filter no 4 took 4611 msec
CPU mean filter no 5 took 4509 msec
CPU mean filter no 6 took 4384 msec
CPU mean filter no 7 took 4422 msec
CPU mean filter no 8 took 4379 msec
CPU mean filter no 9 took 4365 msec
CPU mean filter no 10 took 4372 msec
Pushing two images to the GPU took 819 msec
GPU mean filter no 1 took 872 msec
GPU mean filter no 2 took 183 msec
GPU mean filter no 3 took 192 msec
GPU mean filter no 4 took 195 msec
GPU mean filter no 5 took 212 msec
GPU mean filter no 6 took 251 msec
GPU mean filter no 7 took 305 msec
GPU mean filter no 8 took 300 msec
GPU mean filter no 9 took 260 msec
GPU mean filter no 10 took 263 msec
Pulling one image from the GPU took 1123 msec

Thus, on the CPU it takes 43 seconds, while using the GPU it just takes 5 seconds. Let’s execute it again.

CPU mean filter no 1 took 2596 msec
CPU mean filter no 2 took 2739 msec
CPU mean filter no 3 took 2913 msec
CPU mean filter no 4 took 3336 msec
CPU mean filter no 5 took 4643 msec
CPU mean filter no 6 took 4505 msec
CPU mean filter no 7 took 4473 msec
CPU mean filter no 8 took 4524 msec
CPU mean filter no 9 took 4414 msec
CPU mean filter no 10 took 4435 msec
Pushing two images to the GPU took 249 msec
GPU mean filter no 1 took 216 msec
GPU mean filter no 2 took 222 msec
GPU mean filter no 3 took 237 msec
GPU mean filter no 4 took 245 msec
GPU mean filter no 5 took 243 msec
GPU mean filter no 6 took 244 msec
GPU mean filter no 7 took 275 msec
GPU mean filter no 8 took 251 msec
GPU mean filter no 9 took 267 msec
GPU mean filter no 10 took 253 msec
Pulling one image from the GPU took 433 msec

On the CPU it still takes 39 seconds, while using the GPU it goes down to three seconds. The additional speedup comes from the caching mechanism mentioned above.

Heureka, we can spare 90% of the time by executing the operation on the GPU! And this works on a small laptop without dedicated GPU. I tested some more systems, just to see if and how fast CLIJ performs. The following numbers are time measurements from the second run:

These are just rough benchmarks. When ClearCLIJ matures, I might do a more detailed benchmarking of several methods. This example here should just motivate you to test your workflow on a GPU and guide you how to evaluate its performance.

Side note: ImageJs mean filter runs inplace. That means the result is stored in the same memory as the input image. With every iteration in the for loop, the image becomes more and more blurry. The OpenCL operation in the GPU always starts from the input image and puts its result in the blurred image. Thus, the resulting images will look different. Be a sceptical scietist when processing images in the GPU. Check that the workflow is indeed doing the right thing. This is especially important when working with experimental software.

This is the view on results from the mean filter on CPU and GPU together with the difference image of both:

Image

In presented case, have a look at mean.ijm to see how different the results from CPU and GPU actually are. In some of the filters, I observed small differences between ImageJ and OpenCL especially at the borders of the images. I am aware of these issues. There is a large number of unit tests in the library, ensuring these differences are small and in case they appear, they only influence the borders.

Limitations

An often criticised issue when working with OpenCL is limited compatibility with graphics cards, operating systems and environments. To my best knowledge, ClearCLIJ runs on recent Intels integrated HD graphics cards and NVidia graphics cards independent of the operating system. I also tested on an AMD Ryzen 3 / Vega GPU under Windows 10 and was happy to see it run. However, I experienced some issues on not as recent AMD GPUs. If you run into any trouble with a GPU I haven’t tested: A helpful workaround is converting all images to 32 bit using run("32-bit"); before sending them to the GPU. The issue behind is known and I’m working on it. Check the project page for a full list of tested systems. Let me know, if you experience issues on systems which were not reported in that list.

Again, please let me know what you think about ClearCLIJ, create github issues to guide its further development and visit the project page to stay up-to-date.

Happy coding!

Cheers,

Robert @haesleinhuepf Haase