Tiled image file formats: zarr#

When working with big image data, special file formats such as the zarr format are commonly used. Zarr stores image data in chunks. Instead of loading a huge image data set from disk and then tiling it, it is possible to load individual zarr tiles, process them and save the result back to disc. In that way one can process big images without ever loading the big image into memory.

Using these formats brings additional challenges, for example re-saving the big image into small zarr-based tiles must happen on a computer that is capable of opening the big image to begin with. This notebook shows how to do this in a slightly unrealistic scenario: We’re loading the dataset first to resave it as tiles and by the end, we load these tiles from disk and visualize them. In a realistic scenario, these two steps would not be possible. Depending on the scenario, those two steps must be improvised.

See also

import zarr
import dask.array as da
import numpy as np
from skimage.io import imread, imshow
from numcodecs import Blosc

For demonstration purposes, we use a dataset that is provided by Theresa Suckert, OncoRay, University Hospital Carl Gustav Carus, TU Dresden. The dataset is licensed License: CC-BY 4.0. We are using a cropped version here that was resaved a 8-bit image to be able to provide it with the notebook. You find the full size 16-bit image in CZI file format online.

image = imread('../../data/P1_H_C3H_M004_17-cropped.tif')[1]

# for testing purposes, we crop the image even more.
# comment out the following line to run on the whole 5000x2000 pixels
image = image[1000:1500, 1000:1500]

image.shape
(500, 500)
imshow(image)
<matplotlib.image.AxesImage at 0x2ca194abf70>
../_images/2f703e718a1144a45bc33612dc5d898fe92d8e3dc9405ac96d73a93c0b743902.png

Saving as zarr#

We will now resaved our big image to the zarr file format.

#compress AND change the numpy array into a zarr array
compressor = Blosc(cname='zstd', clevel=3, shuffle=Blosc.BITSHUFFLE)

chunk_size = (100, 100)

zarray = zarr.array(image, chunks=chunk_size, compressor=compressor)
zarr_filename = '../../data/P1_H_C3H_M004_17-cropped.zarr'
zarr.convenience.save(zarr_filename, zarray)

You will then see that a folder is created with the given name. In that folder many files will be located. Each of these files correspond to an image tile.

Loading zarr#

Just for demonstration purposes, we will load the zarr backed tiled image and visualize it. When working with big data, this step might not be possible.

zarr_result = da.from_zarr(zarr_filename)
zarr_result
Array Chunk
Bytes 244.14 kiB 9.77 kiB
Shape (500, 500) (100, 100)
Dask graph 25 chunks in 2 graph layers
Data type uint8 numpy.ndarray
500 500
result = zarr_result.compute()

imshow(result)
<matplotlib.image.AxesImage at 0x2ca19586b80>
../_images/2f703e718a1144a45bc33612dc5d898fe92d8e3dc9405ac96d73a93c0b743902.png