How process files in a folder

How process files in a folder#

In this notebook we will program a loop which walks over a folder of images. Furthermore, the loop will call a python function that analyses the images one by one. Hence, we will process all images in that folder using the same procedure.

See also

import os
from skimage.io import imread
from matplotlib.pyplot import imshow, show
from skimage.io import imread
import numpy as np

For demonstration purposes, we reuse a folder of images showing banana-slices imaged using magnetic resonance imaging (Courtesy of Nasreddin Abolmaali, OncoRay, TU Dresden)

# define the location of the folder to go through
directory = '../../data/banana/'

# get a list of files in that folder
file_list = os.listdir(directory)

file_list
['banana0002.tif',
 'banana0003.tif',
 'banana0004.tif',
 'banana0005.tif',
 'banana0006.tif',
 'banana0007.tif',
 'banana0008.tif',
 'banana0009.tif',
 'banana0010.tif',
 'banana0011.tif',
 'banana0012.tif',
 'banana0013.tif',
 'banana0014.tif',
 'banana0015.tif',
 'banana0016.tif',
 'banana0017.tif',
 'banana0018.tif',
 'banana0019.tif',
 'banana0020.tif',
 'banana0021.tif',
 'banana0022.tif',
 'banana0023.tif',
 'banana0024.tif',
 'banana0025.tif',
 'banana0026.tif',
 'image_source.txt']

Obviously, there are not just images in that folder. We can filter that list with a short for-statement:

image_file_list = [file for file in file_list if file.endswith(".tif")]

image_file_list
['banana0002.tif',
 'banana0003.tif',
 'banana0004.tif',
 'banana0005.tif',
 'banana0006.tif',
 'banana0007.tif',
 'banana0008.tif',
 'banana0009.tif',
 'banana0010.tif',
 'banana0011.tif',
 'banana0012.tif',
 'banana0013.tif',
 'banana0014.tif',
 'banana0015.tif',
 'banana0016.tif',
 'banana0017.tif',
 'banana0018.tif',
 'banana0019.tif',
 'banana0020.tif',
 'banana0021.tif',
 'banana0022.tif',
 'banana0023.tif',
 'banana0024.tif',
 'banana0025.tif',
 'banana0026.tif']

Alternatively, we can also write a longer for-loop and check if files are images. This code does exactly the same, it is just written in a different way.

# go through all files in the folder
for file in file_list:
    # if the filename is of a tif-image, print it out
    if file.endswith(".tif"):
        print(file)
banana0002.tif
banana0003.tif
banana0004.tif
banana0005.tif
banana0006.tif
banana0007.tif
banana0008.tif
banana0009.tif
banana0010.tif
banana0011.tif
banana0012.tif
banana0013.tif
banana0014.tif
banana0015.tif
banana0016.tif
banana0017.tif
banana0018.tif
banana0019.tif
banana0020.tif
banana0021.tif
banana0022.tif
banana0023.tif
banana0024.tif
banana0025.tif
banana0026.tif

As you can see above image_file_list is a list of strings. Storing the name of the image in a list means way less computational power than storing the images themselves in the list. It makes sense to imread the images at the latest possible point in time, here in the for-loop below. If you are interested in folder structures and specifying these directories, you can check out these two jupyter notebooks here and here.

In order to show all images, we need to open them from the correct directory:

# go through all files in the folder
for image_file in image_file_list:
    image = imread(directory + image_file)
    imshow(image)
    show()
../_images/463579b6cf2a550cd414eb2519d755e2836e4a3c882a207b26022da1dd04a8be.png ../_images/1a9062bec160c5e28f8c2d62ab63053954bdaf98ae8ca4d3b359b21072232ca9.png ../_images/551eabd0a1b98cee1a054a84cadc7675320b194667ac54f5c93670d8bd7a350b.png ../_images/065901061edc9e4a7295700b1a4aaa71c2d7ba9ed17fef539017b368539b6675.png ../_images/8cd43ab683a400eefa2f4d459d7f8eb001753020784cb1f2cacc981d2ca65cf7.png ../_images/f5393e04c479b8a781de147fea26dcd4b687de7c6014c2aabe7a7823457badf9.png ../_images/1cec70f9aa7eaf5847ca0f48e09cb903be337484a131aedc3a5c8bd5d0249391.png ../_images/7df5f4ed33cfdb2c2626f24bea390be10d8390e713967752101d4d6e36da7a29.png ../_images/f95fc5f18503e10ff8e71119788757267685ffdc602989894bbc57b3d451bfd7.png ../_images/66b322ab32796e2b15a580c034f4dbc049f768e7e1520ccf312b1e7e31b1f750.png ../_images/d03a0a09094ffb06be6f56e11ffb2d3d3dbb0560b276063fc4de853c34f4d455.png ../_images/7d0139f44c44191024d479e7be1348f4d5542a9d7fa9ee3d6cbc4eb953867da1.png ../_images/ac10286f879e8dc09f10523e80a2cd252421659e271bc6b01c32cf9a0d6878dc.png ../_images/22d47ecdd8ef622400beccf7a0f34633354fb4af8de5aaf5ccae05496d406a27.png ../_images/8ad48105fa47ef44870a67e381fdb11a03fdc4f6653d1201f81a26683d8d8611.png ../_images/d919a5f34629c3b60dd67a67a29c899d796f1e141c76c419aefd9d0dcd4beede.png ../_images/d0649ca06713b74857ffcdba81d357195a950bf7674230cb88b91d6f6adc9e7e.png ../_images/6b67fc88ec2722f440da8cec26a30f882bf4e895777ddafd64a2eebefbeb0bdb.png ../_images/39b19db914785fd338ed26cba77c238b02443bcb817b9fbe9f434bf3f4b6951c.png ../_images/a4b5278aa51f09747d8950cd291879ab9f0ee6e0a2f56ae4ed4c0aaf716bbfd6.png ../_images/413a96d2336ac979969fdb7d1f1fb7d9e339e8ce22760dea3cf5d4492e79c6df.png ../_images/8f8c05c229a4bb1e7f77db5a78639b3d8991d123cea661d7c0ddc4ef699dd8bf.png ../_images/cdd24ba6dbbc2e5339f4cf1d3ebb18eb7e8450909cb895283cb4b276c0222b84.png ../_images/718a5c6aaedf1b838c15a57c0cd4460391b27aaa9a34bd19a7e9974b85bcd855.png ../_images/a060c099c90085aaf44efab8c66a4b19dd294d78eb586a85bc62209ac64f8e01.png

Custom functions help us to keep code organized. For example, we can put image-analysis code in a function and then just call it:

def load_and_measure(filename):
    """
    This function opens an image and returns its mean intensity.
    """
    image = imread(filename)
    
    # return mean intensity in the image
    return np.mean(image)

# for testing
load_and_measure(directory + "banana0010.tif")
69.15106201171875

With such a custom function, we can also make use of the short form for writing for-loops:

mean_intensities_of_all_images = [load_and_measure(directory + file) for file in image_file_list]
mean_intensities_of_all_images
[12.94198947482639,
 25.04678683810764,
 39.627543131510414,
 49.71319580078125,
 56.322109646267364,
 60.08679877387153,
 63.94538031684028,
 66.04618326822917,
 69.15106201171875,
 70.85603162977431,
 74.40909152560764,
 77.48423936631944,
 81.77360026041667,
 85.44072129991319,
 91.22532823350694,
 94.36199951171875,
 98.47229682074652,
 99.3980712890625,
 102.34300401475694,
 101.50947401258681,
 97.14067247178819,
 80.13118489583333,
 49.77497694227431,
 28.36090766059028,
 18.806070963541668]

Exercise#

Open all images of the banana dataset, segment the images and measure the centroid of the banana slices to a table. Write measurement results to “banana.csv”.

Hint: Instead of the imshow command in the last example, execute your image processing workflow. Setup the image processing workflow first, e.g. in a custom function. Programm iterating over files in a folder last, after the image processing works.