Skip to main content

Ctrl+K

Site Navigation

Trailer: Bio-image Analysis with Python
Setting up your computer
Python basics
Writing sustainable code
Image analysis basics

Image file formats

Machine learning basics

Image visualization in 3D

Image filtering

Image deconvolution

Spatial transforms

Image segmentation

Machine learning for image segmentation

Deep Learning based image segmentation

Large Language Vision Models

Segmentation post-processing

Surface processing

Feature extraction

Neighborhood analysis in tissues

Cell classification

Algorithm validation

Explainable AI / SHAP

Simulating data

Advanced python programming

GPU accelerated image processing

Graphical user interfaces

Tiled image processing

Batch processing

Timelapse analysis

Parameter optimization

Prompt engineering

Workflow automation

Tabular data wrangling

Querying databases

Descriptive statistics

Data visualization

Site Navigation

Trailer: Bio-image Analysis with Python
Setting up your computer
Python basics
Writing sustainable code
Image analysis basics

Image file formats

Machine learning basics

Image visualization in 3D

Image filtering

Image deconvolution

Spatial transforms

Image segmentation

Machine learning for image segmentation

Deep Learning based image segmentation

Large Language Vision Models

Segmentation post-processing

Surface processing

Feature extraction

Neighborhood analysis in tissues

Cell classification

Algorithm validation

Explainable AI / SHAP

Simulating data

Advanced python programming

GPU accelerated image processing

Graphical user interfaces

Tiled image processing

Batch processing

Timelapse analysis

Parameter optimization

Prompt engineering

Workflow automation

Tabular data wrangling

Querying databases

Descriptive statistics

Data visualization

Ctrl+K

Bio-image Analysis Notebooks

Basics

Trailer: Bio-image Analysis with Python
Setting up your computer
Python basics
Writing sustainable code
Image analysis basics
Image file formats
Remote files
Machine learning basics
Image visualization in 3D
Image filtering
Image deconvolution
Spatial transforms

Image Segmentation

Image segmentation
Machine learning for image segmentation
Deep Learning based image segmentation
Large Language Vision Models
- Vision Large Language Models for Counting objects
Segmentation post-processing
Blob detection
- Local maxima detection
- Blob detection
Surface processing

Quantitative analysis

Feature extraction
Neighborhood analysis in tissues
Cell classification
Colocalization
Algorithm validation
Explainable AI / SHAP
- Pixel classification explained with SHAP
- Explaining Object classification using SHAP
Simulating data
- Simulation of image formation + image restoration
- Counting cell neighbors in tissues

Advanced techniques

Advanced python programming
GPU accelerated image processing
Graphical user interfaces
Tiled image processing
Batch processing
Timelapse analysis
- Measuring features in a time-lapse dataset
- Cell tracking
Parameter optimization
Prompt engineering

Workflow automation

Workflow automation

Tabular data, plots and statistics

Tabular data wrangling
Querying databases
Descriptive statistics
Clustering
Plotting
Data visualization

Appendix

Glossary
Imprint

repository
open issue

.ipynb

Scaling

Contents

Clustering data in different ranges
Standard Scaling

(machine_learning_basics.scaling=)

Scaling#

When using machine learning algorithms for processing data, the range of parameters is crucial. To get different parameters in the same range, scaling might be necessary.

See also

Standardization using scikit-learn

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# local import; this library is located in the same folder as the notebook
from data_generator import generate_biomodal_2d_data

data1 = generate_biomodal_2d_data()

plt.scatter(data1[:, 0], data1[:, 1], c='grey')

<matplotlib.collections.PathCollection at 0x7f79e40aeca0>

../_images/f745af080311ea1d1d7171bfd879e36b2be9b194c18ea5cdd7943a5220597f72.png

data2 = generate_biomodal_2d_data()
data2[:, 1] = data2[:, 1] * 0.1

plt.scatter(data2[:, 0], data2[:, 1], c='grey')

<matplotlib.collections.PathCollection at 0x7f7980026b80>

../_images/b4de18f1594eeba63421e15c266c64fea436be6116e0e162adb120748dac4b30.png

Clustering data in different ranges#

We will now cluster the two apparently similar data sets using k-means clustering. The effect can also be observed when using other algorithms. To make sure we apply the same algorithm using the same configuration to both datasets, we encapsulate it into a function and reuse it.

def classify_and_plot(data):
    number_of_classes = 2
    classifier = KMeans(n_clusters=number_of_classes)
    classifier.fit(data)
    prediction = classifier.predict(data)

    colors = ['orange', 'blue']
    predicted_colors = [colors[i] for i in prediction]

    plt.scatter(data[:, 0], data[:, 1], c=predicted_colors)

When applying the same method to both data sets, we can observe that the data points in the center are classified differently. The only difference between the data sets is their data range. The data points are differently scaled along one axis.

classify_and_plot(data1)

../_images/cf8b67538674d91521d77f07004b6fc6594d5a4410f6377e6e35906319ba278b.png

classify_and_plot(data2)

../_images/2034f0d6e320695ba2d091dbea92ad3c9f71e2d06c14a9e2dc0421994753beed.png

Standard Scaling#

Standard scaling is a technique to change the range of data to a fixed range, e.g. [0, 1]. It allows to have identical results in case of data that was in different ranges.

def scale(data):
    scaler = StandardScaler().fit(data)
    return scaler.transform(data)

scaled_data1 = scale(data1)

classify_and_plot(scaled_data1)

../_images/ed826aad3000b07e84f342802452aa85391d31e09967961e5efc8dc2ae2a4869.png

scaled_data2 = scale(data2)

classify_and_plot(scaled_data2)

../_images/ed826aad3000b07e84f342802452aa85391d31e09967961e5efc8dc2ae2a4869.png

previous

Unsupervised machine learning

next

Image visualization in 3D

On this page

Clustering data in different ranges
Standard Scaling

By Robert Haase, Guillaume Witz, Miguel Fernandes, Marcelo Leomil Zoccoler, Shannon Taylor, Mara Lampert, Till Korten, Markus Ankenbrand & add-your-name-here-by-sending-a-pull-request-containing-a-notebook

Last updated on None.

Copyright: Licensed CC-BY 4.0 and BSD3 unless mentioned otherwise. Contribution and feedback welcome.