Exploring the BioImage Archive#

In this notebook we use the bia-explorer project to explore the S-BIAD634 dataset in the Bio-image Archive. We will download some images and store them in a local directory.

from bia_explorer import io, biostudies
from skimage.io import imread, imsave
from IPython.display import display, Markdown
import stackview

Accessing meta-data#

First we access the meta data of the dataets. Here we can for example see what the data can be used for and under which license it can be used.

accession = 'S-BIAD634'
submission = biostudies.load_submission(accession)

for a in submission.section.attributes:
    name = a.name
    short_value = str(a.value)
    
    print(f"{name} : {short_value}")
Title : An annotated fluorescence image dataset for training nuclear segmentation methods
Description : This dataset contains annotated fluorescent nuclear images of normal or cancer cells from different tissue origins and sample preparation types, and can be used to train machine-learning based nuclear image segmentation algorithms. It consists of 79 expert-annotated fluorescence images of immuno and DAPI stained samples containing 7813 nuclei in total. In addition, the dataset is heterogenous in aspects such as type of preparation, imaging modality, magnification, signal-to-noise ratio and other technical aspects. Relevant parameters, e.g. diagnosis, magnification, signal-to-noise ratio and modality with respect to the type of preparation are provided in the file list. The images are derived from one Schwann cell stroma-rich tissue (from a ganglioneuroblastoma) cryosection (10 images/2773 nuclei), seven neuroblastoma (NB) patients (19 images/931 nuclei), one Wilms patient (1 image/102 nuclei), two NB cell lines (CLB-Ma, STA-NB10) (8 images/1785 nuclei) and a human keratinocyte cell line (HaCaT) (41 images/2222 nuclei).
Keywords : AI
Keywords : segmentation
Keywords : nucleus
Keywords : fluorescence
License : CC0
Funding statement : This work was facilitated by an EraSME grant (project TisQuant) under the grant no. 844198 and by a COIN grant (project VISIOMICS) under the grant no. 861750, both grants kindly provided by the Austrian Research Promotion Agency (FFG), and the St. Anna Kinderkrebsforschung. Partial funding was further provided by BMK, BMDW, and the Province of Upper Austria in the frame of the COMET Programme managed by FFG.

We can also see how many images are in the dataset.

study = io.load_bia_study(accession)

len(study.images)
388

Visualizing images#

A single image can be loaded and shown like this (See also).

image = study.images[0]
image
BIAImage(uri='https://www.ebi.ac.uk/biostudies/files/S-BIAD634/dataset\\groundtruth\\Ganglioneuroblastoma_0.tif', size=2239668, fpath=WindowsPath('dataset/groundtruth/Ganglioneuroblastoma_0.tif'))
uri = image.uri.replace("\\", "/")
image_data = imread(uri)
stackview.insight(image_data)
shape(914, 1225)
dtypeuint16
size2.1 MB
min0
max300

To get an idea about the folder structure within the datasets, we can print out paths on the server.

# print out filenames of some images
for image in study.images[:5] + study.images[-5:]:
    print(str(image.fpath))
dataset\groundtruth\Ganglioneuroblastoma_0.tif
dataset\groundtruth\Ganglioneuroblastoma_1.tif
dataset\groundtruth\Ganglioneuroblastoma_10.tif
dataset\groundtruth\Ganglioneuroblastoma_2.tif
dataset\groundtruth\Ganglioneuroblastoma_3.tif
dataset\rawimages\otherspecimen_5.tif
dataset\rawimages\otherspecimen_6.tif
dataset\rawimages\otherspecimen_7.tif
dataset\rawimages\otherspecimen_8.tif
dataset\rawimages\otherspecimen_9.tif

Downloading data#

Before we download selected images, we need to ensure that the folder exists where we want to store the data.

import os

def ensure_folder_exists(folder_path):
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)

base_folder = f"../../data/{accession}"
raw_folder = f"../../data/{accession}/images"
groundtruth_folder = f"../../data/{accession}/groundtruth"

ensure_folder_exists(base_folder)
ensure_folder_exists(raw_folder)
ensure_folder_exists(groundtruth_folder)

Next we download all raw images, and ground-truth annotations of all datasets containing “Ganglioneuroblastoma” in their name. We also only download the files in case they are not downloaded yet.

for image in study.images:
    if "Ganglioneuroblastoma" in str(image.fpath):
        uri = image.uri.replace("\\", "/")
        filename = uri.split("/")[-1]
        if "\\rawimages\\" in str(image.fpath):
            target_file = raw_folder + "/" + filename
            if not os.path.exists(target_file):
                image_data = imread(uri)
                imsave(target_file, image_data)
        if "\\groundtruth\\" in str(image.fpath):
            target_file = groundtruth_folder + "/" + filename
            if not os.path.exists(target_file):
                image_data = imread(uri)
                imsave(target_file, image_data)
C:\Users\haase\AppData\Local\Temp\ipykernel_26092\2091960599.py:14: UserWarning: ../../data/S-BIAD634/groundtruth/Ganglioneuroblastoma_0.tif is a low contrast image
  imsave(target_file, image_data)
C:\Users\haase\AppData\Local\Temp\ipykernel_26092\2091960599.py:14: UserWarning: ../../data/S-BIAD634/groundtruth/Ganglioneuroblastoma_1.tif is a low contrast image
  imsave(target_file, image_data)
C:\Users\haase\AppData\Local\Temp\ipykernel_26092\2091960599.py:14: UserWarning: ../../data/S-BIAD634/groundtruth/Ganglioneuroblastoma_10.tif is a low contrast image
  imsave(target_file, image_data)
C:\Users\haase\AppData\Local\Temp\ipykernel_26092\2091960599.py:14: UserWarning: ../../data/S-BIAD634/groundtruth/Ganglioneuroblastoma_2.tif is a low contrast image
  imsave(target_file, image_data)
C:\Users\haase\AppData\Local\Temp\ipykernel_26092\2091960599.py:14: UserWarning: ../../data/S-BIAD634/groundtruth/Ganglioneuroblastoma_3.tif is a low contrast image
  imsave(target_file, image_data)
C:\Users\haase\AppData\Local\Temp\ipykernel_26092\2091960599.py:14: UserWarning: ../../data/S-BIAD634/groundtruth/Ganglioneuroblastoma_4.tif is a low contrast image
  imsave(target_file, image_data)
C:\Users\haase\AppData\Local\Temp\ipykernel_26092\2091960599.py:14: UserWarning: ../../data/S-BIAD634/groundtruth/Ganglioneuroblastoma_6.tif is a low contrast image
  imsave(target_file, image_data)
C:\Users\haase\AppData\Local\Temp\ipykernel_26092\2091960599.py:14: UserWarning: ../../data/S-BIAD634/groundtruth/Ganglioneuroblastoma_7.tif is a low contrast image
  imsave(target_file, image_data)
C:\Users\haase\AppData\Local\Temp\ipykernel_26092\2091960599.py:14: UserWarning: ../../data/S-BIAD634/groundtruth/Ganglioneuroblastoma_8.tif is a low contrast image
  imsave(target_file, image_data)
C:\Users\haase\AppData\Local\Temp\ipykernel_26092\2091960599.py:14: UserWarning: ../../data/S-BIAD634/groundtruth/Ganglioneuroblastoma_9.tif is a low contrast image
  imsave(target_file, image_data)

We can then check which files arrived.

for f in os.listdir(raw_folder):
    print(f)
Ganglioneuroblastoma_0.tif
Ganglioneuroblastoma_1.tif
Ganglioneuroblastoma_10.tif
Ganglioneuroblastoma_2.tif
Ganglioneuroblastoma_3.tif
Ganglioneuroblastoma_4.tif
Ganglioneuroblastoma_6.tif
Ganglioneuroblastoma_7.tif
Ganglioneuroblastoma_8.tif
Ganglioneuroblastoma_9.tif
for f in os.listdir(groundtruth_folder):
    print(f)
Ganglioneuroblastoma_0.tif
Ganglioneuroblastoma_1.tif
Ganglioneuroblastoma_10.tif
Ganglioneuroblastoma_2.tif
Ganglioneuroblastoma_3.tif
Ganglioneuroblastoma_4.tif
Ganglioneuroblastoma_6.tif
Ganglioneuroblastoma_7.tif
Ganglioneuroblastoma_8.tif
Ganglioneuroblastoma_9.tif

Exercise#

Download all images with “Neuroblastoma” in their name and upload them to a folder in the owncloud. Do not download and upload files which already exist.