Plotting Distributions with Seaborn#
With Seaborn, it is also very practical to plot data distributions such as boxplots, bar graphs, histograms and kernel density estimation plots.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
We start by loading a table of measurements into a pandas DataFrame.
df = pd.read_csv("../../data/BBBC007_analysis.csv")
df
area | intensity_mean | major_axis_length | minor_axis_length | aspect_ratio | file_name | |
---|---|---|---|---|---|---|
0 | 139 | 96.546763 | 17.504104 | 10.292770 | 1.700621 | 20P1_POS0010_D_1UL |
1 | 360 | 86.613889 | 35.746808 | 14.983124 | 2.385805 | 20P1_POS0010_D_1UL |
2 | 43 | 91.488372 | 12.967884 | 4.351573 | 2.980045 | 20P1_POS0010_D_1UL |
3 | 140 | 73.742857 | 18.940508 | 10.314404 | 1.836316 | 20P1_POS0010_D_1UL |
4 | 144 | 89.375000 | 13.639308 | 13.458532 | 1.013432 | 20P1_POS0010_D_1UL |
... | ... | ... | ... | ... | ... | ... |
106 | 305 | 88.252459 | 20.226532 | 19.244210 | 1.051045 | 20P1_POS0007_D_1UL |
107 | 593 | 89.905565 | 36.508370 | 21.365394 | 1.708762 | 20P1_POS0007_D_1UL |
108 | 289 | 106.851211 | 20.427809 | 18.221452 | 1.121086 | 20P1_POS0007_D_1UL |
109 | 277 | 100.664260 | 20.307965 | 17.432920 | 1.164920 | 20P1_POS0007_D_1UL |
110 | 46 | 70.869565 | 11.648895 | 5.298003 | 2.198733 | 20P1_POS0007_D_1UL |
111 rows × 6 columns
Boxplots#
The axes function for plotting boxplots is boxplot
.
Seaborn already identified file_name
as a categorical value and ìntensity_mean
as a numerical value. Thus, it plots boxplots for the intensity variable. If we invert x and y, we still get the same graph, but as vertical boxplots.
sns.boxplot(data=df, x="intensity_mean", y="file_name")
<AxesSubplot: xlabel='intensity_mean', ylabel='file_name'>
The figure-level, and more general, version of this kind of plot is catplot
. We just have to provide kind
as box
.
sns.catplot(data=df, x="intensity_mean", y="file_name", kind="box")
<seaborn.axisgrid.FacetGrid at 0x27775d754f0>
There are other kinds available, like a bar
graph.
sns.catplot(data=df, x="file_name", y="intensity_mean", kind="bar")
<seaborn.axisgrid.FacetGrid at 0x2777b1abb80>
Histograms and Distribution Plots#
The axes-level function for plotting histograms is histplot
.
sns.histplot(data = df, x="intensity_mean", hue="file_name")
<AxesSubplot: xlabel='intensity_mean', ylabel='Count'>
We can instead plot the kernel density estimation (kde) with kdeplot
function. Just be careful while interpreting these plots (check some pitfalls here)
sns.kdeplot(data=df, x="intensity_mean", hue="file_name")
<AxesSubplot: xlabel='intensity_mean', ylabel='Density'>
The figure-level function for distributions is distplot
. With it, you can have histograms and kde in the same plot, or other kinds of plots, like the empirical cumulative distribution function (ecdf).
sns.displot(data = df, x="intensity_mean", hue="file_name", kde=True)
<seaborn.axisgrid.FacetGrid at 0x2777b77c910>
Exercise#
Plot two empirical cumulative distribution functions for ‘area’ from different files on a same graph with different colors.
Repeat this for the property ‘intensity_mean’ on a second figure. Infer whether you would expect these properties to be different or not.
*Hint: look for the kind
parameter of displot