Tiled image file formats: zarr#

When working with big image data, special file formats such as the zarr format are commonly used. Zarr stores image data in chunks. Instead of loading a huge image data set from disk and then tiling it, it is possible to load individual zarr tiles, process them and save the result back to disc. In that way one can process big images without ever loading the big image into memory.

Using these formats brings additional challenges, for example re-saving the big image into small zarr-based tiles must happen on a computer that is capable of opening the big image to begin with. This notebook shows how to do this in a slightly unrealistic scenario: We’re loading the dataset first to resave it as tiles and by the end, we load these tiles from disk and visualize them. In a realistic scenario, these two steps would not be possible. Depending on the scenario, those two steps must be improvised. Napari for example can be used to visualize large datasets.

See also

import zarr
import dask.array as da
import numpy as np
from skimage.io import imread
import pyclesperanto_prototype as cle
from pyclesperanto_prototype import imshow
from numcodecs import Blosc

For demonstration purposes, we use a dataset that is provided by Theresa Suckert, OncoRay, University Hospital Carl Gustav Carus, TU Dresden. The dataset is licensed License: CC-BY 4.0. We are using a cropped version here that was resaved a 8-bit image to be able to provide it with the notebook. You find the full size 16-bit image in CZI file format online.

image = imread('../../data/P1_H_C3H_M004_17-cropped.tif')[1]

# for testing purposes, we crop the image even more.
# comment out the following line to run on the whole 5000x2000 pixels
image = image[1000:1500, 1000:1500]

(500, 500)

Saving as zarr#

We will now resaved our big image to the zarr file format.

#compress AND change the numpy array into a zarr array
compressor = Blosc(cname='zstd', clevel=3, shuffle=Blosc.BITSHUFFLE)

chunk_size = (100, 100)

zarray = zarr.array(image, chunks=chunk_size, compressor=compressor)
zarr_filename = '../../data/P1_H_C3H_M004_17-cropped.zarr'
zarr.convenience.save(zarr_filename, zarray)

You will then see that a folder is created with the given name. In that folder many files will be located. Each of these files correspond to an image tile.

Loading zarr#

Just for demonstration purposes, we will load the zarr backed tiled image and visualize it. When working with big data, this step might not be possible.

zarr_result = da.from_zarr(zarr_filename)
Array Chunk
Bytes 244.14 kiB 9.77 kiB
Shape (500, 500) (100, 100)
Count 26 Tasks 25 Chunks
Type uint8 numpy.ndarray
500 500
result = zarr_result.compute()