GPU-accelerated image processing using CUPY and CUCIM#

Processing large images with python can take time. In order to accelerate processing, graphics processing units (GPUs) can be exploited, for example using NVidia CUDA. For processing images with CUDA, there are a couple of libraries available. We will take a closer look at cupy, which brings more general computing capabilities for CUDA compatible GPUs, and cucim, a library of image processing specific operations using CUDA. Both together can serve as GPU-surrogate for scikit-image.

See also

Before we start, we need to install CUDA and CUCIM it properly. The following commands make this notebook run in Google Colab.

!curl https://colab.chainer.org/install | sh -

!pip install cucim
!pip install scipy scikit-image cupy-cuda100
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1580  100  1580    0     0   5851      0 --:--:-- --:--:-- --:--:--  5851
+ apt -y -q install cuda-libraries-dev-10-0
Reading package lists...
Building dependency tree...
Reading state information...
cuda-libraries-dev-10-0 is already the newest version (10.0.130-1).
0 upgraded, 0 newly installed, 0 to remove and 39 not upgraded.
+ pip install -q cupy-cuda100  chainer 
+ set +ex
Installation succeeded!
Requirement already satisfied: cucim in /usr/local/lib/python3.7/dist-packages (0.19.0)
Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from cucim) (7.1.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from cucim) (1.19.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (1.4.1)
Requirement already satisfied: scikit-image in /usr/local/lib/python3.7/dist-packages (0.16.2)
Requirement already satisfied: cupy-cuda100 in /usr/local/lib/python3.7/dist-packages (9.1.0)
Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.7/dist-packages (from scipy) (1.19.5)
Requirement already satisfied: pillow>=4.3.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image) (7.1.2)
Requirement already satisfied: PyWavelets>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image) (1.1.1)
Requirement already satisfied: networkx>=2.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image) (2.5.1)
Requirement already satisfied: matplotlib!=3.0.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image) (3.2.2)
Requirement already satisfied: imageio>=2.3.0 in /usr/local/lib/python3.7/dist-packages (from scikit-image) (2.4.1)
Requirement already satisfied: fastrlock>=0.5 in /usr/local/lib/python3.7/dist-packages (from cupy-cuda100) (0.6)
Requirement already satisfied: decorator<5,>=4.3 in /usr/local/lib/python3.7/dist-packages (from networkx>=2.0->scikit-image) (4.4.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image) (2.4.7)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image) (2.8.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib!=3.0.0,>=2.0.0->scikit-image) (1.3.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.1->matplotlib!=3.0.0,>=2.0.0->scikit-image) (1.15.0)
import numpy as np
import cupy as cp
import cucim
from skimage.io import imread, imshow
import pandas as pd

In the following, we are using image data from Paci et al shared under the CC BY 4.0 license. See also: https://doi.org/10.17867/10000140

image = imread('https://idr.openmicroscopy.org/webclient/render_image_download/9844418/?format=tif')

imshow(image)
<matplotlib.image.AxesImage at 0x7f5901fd2d50>
../_images/fc365ecf718b90387e6474f4c449fadef90d85e9f69bdf33e2c0558882dbddb7.png

In order to process an image using CUDA on the GPU, we need to convert it. Under the hood of this conversion, the image data is sent from computer random access memory (RAM) to the GPUs memory.

image_gpu = cp.asarray(image)

image_gpu.shape
(640, 640, 3)

Extracting a single channel out of the three-channel image works like if we were working with numpy. Showing the image does not work, because the CUDA image is not available in memory. In order to get it back from GPU memory, we need to convert it to a numpy array.

single_channel_gpu = image_gpu[:,:,1]

# the following line would fail
# imshow(single_channel_gpu)

# get single channel image back from GPU memory and show it
single_channel = cp.asnumpy(single_channel_gpu)
imshow(single_channel)

# we can also do this with a convenience function
def gpu_imshow(image_gpu):
  image = np.asarray(image_gpu)
  imshow(image)
../_images/80ace2c94a217658b23f98a7bb4be4ca6d19aa66fe053b6b2c06eafaf18aa627.png

Image filtering and segmentation#

The cucim developers have re-implemented many functions from sckit image, e.g. the Gaussian blur filter, Otsu Thresholding after Otsu et al. 1979, binary erosion and connected component labeling.

 from cucim.skimage.filters import gaussian

 blurred_gpu = gaussian(single_channel_gpu, sigma=5)

 gpu_imshow(blurred_gpu)
../_images/1ba7e4349064ed31600c4c08ed1469709e984d4b24bf38c21b1e976f364b8379.png
from cucim.skimage.filters import threshold_otsu

# determine threshold
threshold = threshold_otsu(blurred_gpu)

# binarize image by apply the threshold
binary_gpu = blurred_gpu > threshold

gpu_imshow(binary_gpu)
../_images/27309930ccf21e0602eb7257fd04c23450257794119d6457f3432dc4e10a184f.png
from cucim.skimage.morphology import binary_erosion, disk

eroded_gpu = binary_erosion(binary_gpu, selem=disk(2))

gpu_imshow(eroded_gpu)
../_images/d418045ad805885d72122fd9d8b5b7f3837905c202107e3babbb656a6d909613.png
from cucim.skimage.measure import label

labels_gpu = label(eroded_gpu)

gpu_imshow(labels_gpu)
/usr/local/lib/python3.7/dist-packages/skimage/io/_plugins/matplotlib_plugin.py:150: UserWarning: Low image data range; displaying image with stretched contrast.
  lo, hi, cmap = _get_display_range(image)
../_images/379e5e748565dad865e73002aa5179d3a155381d449dfe87a5950cbb7d767d05.png

For visualization purposes, it is recommended to turn the label image into an RGB image, especially if you want to save it to disk.

from cucim.skimage.color import label2rgb

labels_rgb_gpu = label2rgb(labels_gpu)

gpu_imshow(labels_rgb_gpu)
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:3: FutureWarning: The new recommended value for bg_label is 0. Until version 0.19, the default bg_label value is -1. From version 0.19, the bg_label default value will be 0. To avoid this warning, please explicitly set bg_label value.
  This is separate from the ipykernel package so we can avoid doing imports until
../_images/7b7376dc35a8510cb5e9a9e81e322f425483770fe3686b8016a5c5fdcb42789d.png

Quantitative measurements#

Also quantitative measurments using regionprops_table have been implemented in cucim. A major difference is that you need to convert its result back to numpy if you want to continue processing on the CPU, e.g. using pandas.

from cucim.skimage.measure import regionprops_table 

table_gpu = regionprops_table(labels_gpu, intensity_image=single_channel_gpu, properties=('mean_intensity', 'area', 'solidity'))

table_gpu
{'area': array([ 238, 5833, 6006, 5679, 2534, 4200, 4655, 2662, 3846, 3350, 5005,
        2200, 3952, 3837, 4298, 4111,  139]),
 'mean_intensity': array([68.93277311, 54.14537974, 68.23676324, 59.95175207, 87.22178374,
        71.32666667, 74.37529538, 64.63561232, 58.93213729, 66.88656716,
        62.15684316, 50.51363636, 62.50177126, 92.24863174, 71.68776175,
        51.80345415, 72.3381295 ]),
 'solidity': array([0.97942387, 0.95842918, 0.97785738, 0.97964464, 0.98868513,
        0.98522167, 0.98727466, 0.99143389, 0.87171351, 0.98355843,
        0.96864718, 0.59945504, 0.98676654, 0.98510911, 0.98397436,
        0.97997616, 0.97887324])}
# The following line would fail.
# pd.DataFrame(table_gpu)

# We need to convert that table to numpy before we can pass it to pandas.
table = {item[0] : cp.asnumpy(item[1]) for item in table_gpu.items()}
pd.DataFrame(table)
mean_intensity area solidity
0 68.932773 238 0.979424
1 54.145380 5833 0.958429
2 68.236763 6006 0.977857
3 59.951752 5679 0.979645
4 87.221784 2534 0.988685
5 71.326667 4200 0.985222
6 74.375295 4655 0.987275
7 64.635612 2662 0.991434
8 58.932137 3846 0.871714
9 66.886567 3350 0.983558
10 62.156843 5005 0.968647
11 50.513636 2200 0.599455
12 62.501771 3952 0.986767
13 92.248632 3837 0.985109
14 71.687762 4298 0.983974
15 51.803454 4111 0.979976
16 72.338129 139 0.978873