{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "85b63768-dc58-4f62-8297-1a90c26f5c36",
   "metadata": {},
   "source": [
    "## Summarizing measurements\n",
    "\n",
    "From an SQL-accessible database containing information about experiments, images acquired during the experiments and corresponding measurements, we would like to extract summary statistics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f9f790d9-22b3-4160-9785-75aa4c661c1d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from pandasql import sqldf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "805125bc-d88c-4a4b-9a92-001e2ff098e6",
   "metadata": {},
   "outputs": [],
   "source": [
    "experiment = pd.read_csv(\"../../data/experiments.csv\")\n",
    "image = pd.read_csv(\"../../data/images.csv\")\n",
    "overview = pd.read_csv(\"../../data/overview.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8580ae00-7a1d-4239-b9cc-946bfc9bf705",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_database = lambda q: sqldf(q, globals())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba23795e-9f8e-4e08-a719-4e708514443b",
   "metadata": {},
   "source": [
    "## Example data\n",
    "\n",
    "Just a reminder, our overview table contains the key information retrieved from our database about the number of cells counted in images which were acquired during experiments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8db92849-9040-4b7b-8b9d-5c19c0dc08b3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>experiment_name</th>\n",
       "      <th>Filename</th>\n",
       "      <th>time_after_exposure_in_s</th>\n",
       "      <th>number_of_cells</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>0</td>\n",
       "      <td>96</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>1</td>\n",
       "      <td>47</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>2</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>3</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>4</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95</th>\n",
       "      <td>95</td>\n",
       "      <td>Control</td>\n",
       "      <td>image9.tif</td>\n",
       "      <td>5</td>\n",
       "      <td>142</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>96</th>\n",
       "      <td>96</td>\n",
       "      <td>Control</td>\n",
       "      <td>image9.tif</td>\n",
       "      <td>6</td>\n",
       "      <td>148</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>97</th>\n",
       "      <td>97</td>\n",
       "      <td>Control</td>\n",
       "      <td>image9.tif</td>\n",
       "      <td>7</td>\n",
       "      <td>144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>98</td>\n",
       "      <td>Control</td>\n",
       "      <td>image9.tif</td>\n",
       "      <td>8</td>\n",
       "      <td>141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99</th>\n",
       "      <td>99</td>\n",
       "      <td>Control</td>\n",
       "      <td>image9.tif</td>\n",
       "      <td>9</td>\n",
       "      <td>140</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>100 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    Unnamed: 0 experiment_name    Filename  time_after_exposure_in_s  \\\n",
       "0            0       High dose  image0.tif                         0   \n",
       "1            1       High dose  image0.tif                         1   \n",
       "2            2       High dose  image0.tif                         2   \n",
       "3            3       High dose  image0.tif                         3   \n",
       "4            4       High dose  image0.tif                         4   \n",
       "..         ...             ...         ...                       ...   \n",
       "95          95         Control  image9.tif                         5   \n",
       "96          96         Control  image9.tif                         6   \n",
       "97          97         Control  image9.tif                         7   \n",
       "98          98         Control  image9.tif                         8   \n",
       "99          99         Control  image9.tif                         9   \n",
       "\n",
       "    number_of_cells  \n",
       "0                96  \n",
       "1                47  \n",
       "2                32  \n",
       "3                24  \n",
       "4                19  \n",
       "..              ...  \n",
       "95              142  \n",
       "96              148  \n",
       "97              144  \n",
       "98              141  \n",
       "99              140  \n",
       "\n",
       "[100 rows x 5 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT *\n",
    "FROM overview\n",
    "\"\"\"\n",
    "\n",
    "query_database(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3c3e5fa2-5aa8-4d59-b137-5317e9054bc5",
   "metadata": {},
   "source": [
    "## Grouping rows\n",
    "The `GROUP BY` statement allows to group long tables to shorter ones and it allows us to do descriptive statistics. We can for example determine the smallest and largest number of cells observed during an experiment using the so called aggregators `MIN()` and `MAX()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "57f5aa4c-31c0-4a5d-8605-b23b0c62679e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_name</th>\n",
       "      <th>MIN(number_of_cells)</th>\n",
       "      <th>MAX(number_of_cells)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Control</td>\n",
       "      <td>113</td>\n",
       "      <td>159</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>High dose</td>\n",
       "      <td>9</td>\n",
       "      <td>151</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Medium dose</td>\n",
       "      <td>7</td>\n",
       "      <td>158</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_name  MIN(number_of_cells)  MAX(number_of_cells)\n",
       "0         Control                   113                   159\n",
       "1       High dose                     9                   151\n",
       "2     Medium dose                     7                   158"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT experiment_name, MIN(number_of_cells), MAX(number_of_cells)\n",
    "FROM overview\n",
    "GROUP BY experiment_name\n",
    "\"\"\"\n",
    "\n",
    "query_database(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0949d398-13a4-4835-b326-5ecd4ba80a6b",
   "metadata": {},
   "source": [
    "Groups can also be formed using multiple columns. Here we also see that it is common to specify columns used for grouping twice, once after `GROUP BY` and once after `SELECT` because we want to see the grouping in the table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "7e856deb-b824-4a8f-b0cd-4daa1a4a9824",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_name</th>\n",
       "      <th>Filename</th>\n",
       "      <th>MIN(number_of_cells)</th>\n",
       "      <th>MAX(number_of_cells)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Control</td>\n",
       "      <td>image6.tif</td>\n",
       "      <td>147</td>\n",
       "      <td>156</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Control</td>\n",
       "      <td>image7.tif</td>\n",
       "      <td>113</td>\n",
       "      <td>122</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Control</td>\n",
       "      <td>image8.tif</td>\n",
       "      <td>145</td>\n",
       "      <td>159</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Control</td>\n",
       "      <td>image9.tif</td>\n",
       "      <td>140</td>\n",
       "      <td>149</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>9</td>\n",
       "      <td>96</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>High dose</td>\n",
       "      <td>image1.tif</td>\n",
       "      <td>15</td>\n",
       "      <td>151</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>High dose</td>\n",
       "      <td>image2.tif</td>\n",
       "      <td>9</td>\n",
       "      <td>97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Medium dose</td>\n",
       "      <td>image3.tif</td>\n",
       "      <td>15</td>\n",
       "      <td>158</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Medium dose</td>\n",
       "      <td>image4.tif</td>\n",
       "      <td>7</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Medium dose</td>\n",
       "      <td>image5.tif</td>\n",
       "      <td>14</td>\n",
       "      <td>147</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_name    Filename  MIN(number_of_cells)  MAX(number_of_cells)\n",
       "0         Control  image6.tif                   147                   156\n",
       "1         Control  image7.tif                   113                   122\n",
       "2         Control  image8.tif                   145                   159\n",
       "3         Control  image9.tif                   140                   149\n",
       "4       High dose  image0.tif                     9                    96\n",
       "5       High dose  image1.tif                    15                   151\n",
       "6       High dose  image2.tif                     9                    97\n",
       "7     Medium dose  image3.tif                    15                   158\n",
       "8     Medium dose  image4.tif                     7                    76\n",
       "9     Medium dose  image5.tif                    14                   147"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT experiment_name, Filename, MIN(number_of_cells), MAX(number_of_cells)\n",
    "FROM overview\n",
    "GROUP BY experiment_name, Filename\n",
    "\"\"\"\n",
    "\n",
    "query_database(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b07b09a7-0f7e-4198-a971-6dd8c43d8d5b",
   "metadata": {},
   "source": [
    "A similar strategy uses the `COUNT` aggregator to count the number of images acquired per experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "c592c42b-e90a-4318-9407-128d6f4d9055",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_name</th>\n",
       "      <th>COUNT(Filename)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Control</td>\n",
       "      <td>40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>High dose</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Medium dose</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_name  COUNT(Filename)\n",
       "0         Control               40\n",
       "1       High dose               30\n",
       "2     Medium dose               30"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT experiment_name, COUNT(Filename)\n",
    "FROM overview\n",
    "GROUP BY experiment_name\n",
    "\"\"\"\n",
    "\n",
    "query_database(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4c24928-ffee-4c62-898c-a913041fe0f2",
   "metadata": {},
   "source": [
    "## Combining statistics\n",
    "\n",
    "Assume the measurements in our database result from a timelapse analysis. A quick look into a single image file and corresponding measurements shows that the number of cells is decreasing over time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "7a017ff6-1cf3-45ef-9b66-09c5a07679da",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>experiment_name</th>\n",
       "      <th>Filename</th>\n",
       "      <th>time_after_exposure_in_s</th>\n",
       "      <th>number_of_cells</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>0</td>\n",
       "      <td>96</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>1</td>\n",
       "      <td>47</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>2</td>\n",
       "      <td>32</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>3</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>4</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>5</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>6</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>6</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>7</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>7</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>8</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>8</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>9</td>\n",
       "      <td>High dose</td>\n",
       "      <td>image0.tif</td>\n",
       "      <td>9</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0 experiment_name    Filename  time_after_exposure_in_s  \\\n",
       "0           0       High dose  image0.tif                         0   \n",
       "1           1       High dose  image0.tif                         1   \n",
       "2           2       High dose  image0.tif                         2   \n",
       "3           3       High dose  image0.tif                         3   \n",
       "4           4       High dose  image0.tif                         4   \n",
       "5           5       High dose  image0.tif                         5   \n",
       "6           6       High dose  image0.tif                         6   \n",
       "7           7       High dose  image0.tif                         7   \n",
       "8           8       High dose  image0.tif                         8   \n",
       "9           9       High dose  image0.tif                         9   \n",
       "\n",
       "   number_of_cells  \n",
       "0               96  \n",
       "1               47  \n",
       "2               32  \n",
       "3               24  \n",
       "4               19  \n",
       "5               16  \n",
       "6               13  \n",
       "7               11  \n",
       "8               10  \n",
       "9                9  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT *\n",
    "FROM overview\n",
    "WHERE Filename = \"image0.tif\"\n",
    "\"\"\"\n",
    "\n",
    "query_database(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "beb4fa47-1bd5-498e-b69d-832a143fb77d",
   "metadata": {},
   "source": [
    "We can now determine the average number of cells at the beginning of the experiment by averaging the first 3 timepoints."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "c8193b5c-485c-4a8b-a4ca-6a572c860237",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Filename</th>\n",
       "      <th>number_of_cells</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>image0.tif</td>\n",
       "      <td>58.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>image1.tif</td>\n",
       "      <td>93.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>image2.tif</td>\n",
       "      <td>58.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>image3.tif</td>\n",
       "      <td>97.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>image4.tif</td>\n",
       "      <td>46.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>image5.tif</td>\n",
       "      <td>89.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>image6.tif</td>\n",
       "      <td>150.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>image7.tif</td>\n",
       "      <td>113.666667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>image8.tif</td>\n",
       "      <td>151.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>image9.tif</td>\n",
       "      <td>147.666667</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Filename  number_of_cells\n",
       "0  image0.tif        58.333333\n",
       "1  image1.tif        93.000000\n",
       "2  image2.tif        58.000000\n",
       "3  image3.tif        97.333333\n",
       "4  image4.tif        46.333333\n",
       "5  image5.tif        89.666667\n",
       "6  image6.tif       150.666667\n",
       "7  image7.tif       113.666667\n",
       "8  image8.tif       151.000000\n",
       "9  image9.tif       147.666667"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT Filename, AVG(number_of_cells) as number_of_cells\n",
    "FROM overview\n",
    "WHERE time_after_exposure_in_s < 3\n",
    "GROUP BY Filename\n",
    "\"\"\"\n",
    "\n",
    "query_database(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "2805d88b-7b2b-48bf-9d79-3f4896199990",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"\"\"\n",
    "SELECT Filename, AVG(number_of_cells) as number_of_cells\n",
    "FROM overview\n",
    "WHERE time_after_exposure_in_s < 3\n",
    "GROUP BY Filename\n",
    "\"\"\"\n",
    "cell_count_early = query_database(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "86b4e49f-a9ec-4fb7-9ce1-0a43cc40feb2",
   "metadata": {},
   "outputs": [],
   "source": [
    "query = \"\"\" \n",
    "SELECT Filename, AVG(number_of_cells) as number_of_cells\n",
    "FROM overview\n",
    "WHERE time_after_exposure_in_s > 6\n",
    "GROUP BY Filename;\n",
    "\"\"\"\n",
    "cell_count_late = query_database(query)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "ae17ddb7-4fa8-4eef-a79a-ff3562e989c2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Filename</th>\n",
       "      <th>cell_count_ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>image0.tif</td>\n",
       "      <td>0.171429</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>image1.tif</td>\n",
       "      <td>0.182796</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>image2.tif</td>\n",
       "      <td>0.172414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>image3.tif</td>\n",
       "      <td>0.171233</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>image4.tif</td>\n",
       "      <td>0.172662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>image5.tif</td>\n",
       "      <td>0.174721</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>image6.tif</td>\n",
       "      <td>0.986726</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>image7.tif</td>\n",
       "      <td>1.055718</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>image8.tif</td>\n",
       "      <td>0.991170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>image9.tif</td>\n",
       "      <td>0.959368</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Filename  cell_count_ratio\n",
       "0  image0.tif          0.171429\n",
       "1  image1.tif          0.182796\n",
       "2  image2.tif          0.172414\n",
       "3  image3.tif          0.171233\n",
       "4  image4.tif          0.172662\n",
       "5  image5.tif          0.174721\n",
       "6  image6.tif          0.986726\n",
       "7  image7.tif          1.055718\n",
       "8  image8.tif          0.991170\n",
       "9  image9.tif          0.959368"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT cell_count_early.Filename, (cell_count_late.number_of_cells / cell_count_early.number_of_cells) AS cell_count_ratio\n",
    "FROM cell_count_early\n",
    "INNER JOIN cell_count_late ON cell_count_early.Filename = cell_count_late.Filename\n",
    "\"\"\"\n",
    "\n",
    "cell_count_ratio_overview = query_database(query)\n",
    "cell_count_ratio_overview"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2701323c-8ef0-459a-b10f-6a15b133eccb",
   "metadata": {},
   "source": [
    "We know now the fraction of cells which survived in which file. We need to use the `JOIN` statement again to know in which experiment these survival ratios were observed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "bf78f4fc-0783-4f39-b0fd-d2376a2f0a8f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Comment</th>\n",
       "      <th>cell_count_ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>High dose</td>\n",
       "      <td>0.171429</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>High dose</td>\n",
       "      <td>0.182796</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>High dose</td>\n",
       "      <td>0.172414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Medium dose</td>\n",
       "      <td>0.171233</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Medium dose</td>\n",
       "      <td>0.172662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Medium dose</td>\n",
       "      <td>0.174721</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Control</td>\n",
       "      <td>0.986726</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Control</td>\n",
       "      <td>1.055718</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Control</td>\n",
       "      <td>0.991170</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Control</td>\n",
       "      <td>0.959368</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Comment  cell_count_ratio\n",
       "0    High dose          0.171429\n",
       "1    High dose          0.182796\n",
       "2    High dose          0.172414\n",
       "3  Medium dose          0.171233\n",
       "4  Medium dose          0.172662\n",
       "5  Medium dose          0.174721\n",
       "6      Control          0.986726\n",
       "7      Control          1.055718\n",
       "8      Control          0.991170\n",
       "9      Control          0.959368"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT experiment.Comment, cell_count_ratio_overview.cell_count_ratio\n",
    "FROM cell_count_ratio_overview\n",
    "INNER JOIN image ON cell_count_ratio_overview.Filename = image.Filename\n",
    "INNER JOIN experiment ON image.experiment_ID = experiment.ID\n",
    "\"\"\"\n",
    "\n",
    "query_database(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa20382d-11f4-4d74-948d-dd3397eaa0c0",
   "metadata": {},
   "source": [
    "Also these can be summarized using the `GROUP BY` statement. Good scientists do not just report the average of these measurements but also the number of images which have been analyzed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "3294fd6a-14f2-46f5-8436-6d5079fed30c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>experiment_name</th>\n",
       "      <th>mean_cell_count_ratio</th>\n",
       "      <th>num_images</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Control</td>\n",
       "      <td>0.998246</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>High dose</td>\n",
       "      <td>0.175546</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Medium dose</td>\n",
       "      <td>0.172872</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  experiment_name  mean_cell_count_ratio  num_images\n",
       "0         Control               0.998246           4\n",
       "1       High dose               0.175546           3\n",
       "2     Medium dose               0.172872           3"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT \n",
    "    experiment.Comment as experiment_name, \n",
    "    AVG(cell_count_ratio_overview.cell_count_ratio) as mean_cell_count_ratio, \n",
    "    COUNT(cell_count_ratio_overview.cell_count_ratio) as num_images\n",
    "FROM cell_count_ratio_overview\n",
    "INNER JOIN image ON cell_count_ratio_overview.Filename = image.Filename\n",
    "INNER JOIN experiment ON image.experiment_ID = experiment.ID\n",
    "GROUP BY experiment.Comment\n",
    "\"\"\"\n",
    "\n",
    "query_database(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94d78d2f-444c-4dc4-b968-e1c960d3a40d",
   "metadata": {},
   "source": [
    "## Exercise\n",
    "\n",
    "Determine the average cell count over the entire timelapse for all images individually."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cdebfca7-6763-472e-a7f7-0e210327d02f",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "8a87efed-deeb-4e50-a983-f2f2f98bbae7",
   "metadata": {},
   "source": [
    "Summarize these measurements for individual images to present the average for each experiment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e04c721-174c-480f-a146-cf6426b717fb",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}