{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "273def9d-da84-49b7-83be-11695c64c3d9",
   "metadata": {},
   "source": [
    "# Appending tables\n",
    "When processing multiple images, potentially using multiple image processing libraries, a common task is to combine tables.\n",
    "\n",
    "We start with two small tables of measurements that could have been obtained from different functions or different libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1477c227",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "156d6b7c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation\n",
       "0      1          0.3         2.3\n",
       "1      2          0.5         3.4\n",
       "2      3          0.7         1.2"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table1 = pd.DataFrame({\n",
    "    \"label\":       [1,   2,   3],\n",
    "    \"circularity\": [0.3, 0.5, 0.7],\n",
    "    \"elongation\":  [2.3, 3.4, 1.2],\n",
    "    })\n",
    "table1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "1877c769",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>18</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  area  skewness\n",
       "0      3    22       0.5\n",
       "1      2    32       0.6\n",
       "2      1    25       0.3\n",
       "3      4    18       0.3"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table2 = pd.DataFrame({\n",
    "    \"label\":    [3,   2,   1,   4],\n",
    "    \"area\":     [22,  32,  25,  18],\n",
    "    \"skewness\": [0.5, 0.6, 0.3, 0.3],\n",
    "    })\n",
    "table2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49df41b0-b63a-44d0-8b6e-ae6ac7cfa263",
   "metadata": {},
   "source": [
    "## Combining columns of tables\n",
    "According to the [pandas documentation](https://pandas.pydata.org/docs/user_guide/merging.html) there are multiple ways for combining tables. We first use a _wrong_ example to highlight pitfalls when combining tables.\n",
    "\n",
    "In the following example, measurements of label 1 and 3 are mixed. Furthermore, one of our tables did not contain measurements for label 4."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "6f255657",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation</th>\n",
       "      <th>label</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "      <td>3</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "      <td>2</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.0</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>18</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation  label  area  skewness\n",
       "0    1.0          0.3         2.3      3    22       0.5\n",
       "1    2.0          0.5         3.4      2    32       0.6\n",
       "2    3.0          0.7         1.2      1    25       0.3\n",
       "3    NaN          NaN         NaN      4    18       0.3"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wrongly_combined_tables = pd.concat([table1, table2], axis=1)\n",
    "wrongly_combined_tables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16a4e200-8964-43d1-b43c-eb5977ed195a",
   "metadata": {},
   "source": [
    "A better way for combining tables is the `merge` command. It allows to explicitly specify `on` which column the tables should be combined. Data scientists speak of the 'index' or 'identifier' of rows in the tables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "29ccfdb1-2b09-46d5-90c2-c4374cb73d02",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation  area  skewness\n",
       "0      1          0.3         2.3    25       0.3\n",
       "1      2          0.5         3.4    32       0.6\n",
       "2      3          0.7         1.2    22       0.5"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "correctly_combined_tables1 = pd.merge(table1, table2, how='inner', on='label')\n",
    "correctly_combined_tables1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eebbf929-6cb4-48e4-bff4-60d444c95d49",
   "metadata": {},
   "source": [
    "You may note that in the above example, label 4 is missing. We can also get it in the out table by performing an `outer join`.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "84d160b6-e577-478a-a14d-f4bba371afee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation  area  skewness\n",
       "0      1          0.3         2.3    25       0.3\n",
       "1      2          0.5         3.4    32       0.6\n",
       "2      3          0.7         1.2    22       0.5\n",
       "3      4          NaN         NaN    18       0.3"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "correctly_combined_tables2 = pd.merge(table1, table2, how='outer', on='label')\n",
    "correctly_combined_tables2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "e885d30a-3be4-4af7-9a91-8a27dfed6276",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18</td>\n",
       "      <td>0.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation  area  skewness\n",
       "0      3          0.7         1.2    22       0.5\n",
       "1      2          0.5         3.4    32       0.6\n",
       "2      1          0.3         2.3    25       0.3\n",
       "3      4          NaN         NaN    18       0.3"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "correctly_combined_tables2 = pd.merge(table1, table2, how='right', on='label')\n",
    "correctly_combined_tables2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce62adcc-f797-469a-865c-f29532561d46",
   "metadata": {},
   "source": [
    "Suppose there is a common measurement name from different tables. For example, table3 below also contains \"elongation\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "567ea501-70cc-47e5-80f2-9b6810fd854e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "      <th>elongation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "      <td>3.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "      <td>1.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>18</td>\n",
       "      <td>0.3</td>\n",
       "      <td>1.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  area  skewness  elongation\n",
       "0      3    22       0.5         2.3\n",
       "1      2    32       0.6         3.4\n",
       "2      1    25       0.3         1.2\n",
       "3      4    18       0.3         1.1"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table3 = pd.DataFrame({\n",
    "    \"label\":    [3,   2,   1,   4],\n",
    "    \"area\":     [22,  32,  25,  18],\n",
    "    \"skewness\": [0.5, 0.6, 0.3, 0.3],\n",
    "    \"elongation\":  [2.3, 3.4, 1.2, 1.1]\n",
    "    })\n",
    "table3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fea092cb-0e12-4056-a819-bf5051c94a25",
   "metadata": {},
   "source": [
    "Applying merge still preserves both measurements in different columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "625b71eb-72ec-4680-83f5-d0a318c25c83",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation_x</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "      <th>elongation_y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "      <td>1.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "      <td>3.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18</td>\n",
       "      <td>0.3</td>\n",
       "      <td>1.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation_x  area  skewness  elongation_y\n",
       "0      1          0.3           2.3    25       0.3           1.2\n",
       "1      2          0.5           3.4    32       0.6           3.4\n",
       "2      3          0.7           1.2    22       0.5           2.3\n",
       "3      4          NaN           NaN    18       0.3           1.1"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "correctly_combined_tables3 = pd.merge(table1, table3, how='outer', on='label')\n",
    "correctly_combined_tables3 "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bccacef-45e3-4a16-afc8-73f1dd8ad631",
   "metadata": {},
   "source": [
    "We can change 'x' and 'y' by passing other suffixes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "43990561-34ca-4099-917b-bdb6895f149d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>label</th>\n",
       "      <th>circularity</th>\n",
       "      <th>elongation_method1</th>\n",
       "      <th>area</th>\n",
       "      <th>skewness</th>\n",
       "      <th>elongation_method2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0.3</td>\n",
       "      <td>2.3</td>\n",
       "      <td>25</td>\n",
       "      <td>0.3</td>\n",
       "      <td>1.2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0.5</td>\n",
       "      <td>3.4</td>\n",
       "      <td>32</td>\n",
       "      <td>0.6</td>\n",
       "      <td>3.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>0.7</td>\n",
       "      <td>1.2</td>\n",
       "      <td>22</td>\n",
       "      <td>0.5</td>\n",
       "      <td>2.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18</td>\n",
       "      <td>0.3</td>\n",
       "      <td>1.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   label  circularity  elongation_method1  area  skewness  elongation_method2\n",
       "0      1          0.3                 2.3    25       0.3                 1.2\n",
       "1      2          0.5                 3.4    32       0.6                 3.4\n",
       "2      3          0.7                 1.2    22       0.5                 2.3\n",
       "3      4          NaN                 NaN    18       0.3                 1.1"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "correctly_combined_tables3 = pd.merge(table1, table3, how='outer', on='label', suffixes=('_method1', '_method2'))\n",
    "correctly_combined_tables3"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9682caa7-ea47-452a-a773-952639b53de8",
   "metadata": {
    "tags": []
   },
   "source": [
    "## Combining measurements from multiple image files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e37e64c-0185-439f-9e52-f1e23e732865",
   "metadata": {},
   "source": [
    "When applying a workflow to many images, you would get tables with the same column names, but with a variable number of rows.\n",
    "To calculate statistics for whole folders or to perform machine learning, we usually need to concatenate those tables, but it is important to keep track of the source files.\n",
    "\n",
    "Let's open two tables generated from applying the same worflow to different files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "23b7b945-b874-49ba-b48f-d494adf7e8c3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>area</th>\n",
       "      <th>intensity_mean</th>\n",
       "      <th>major_axis_length</th>\n",
       "      <th>minor_axis_length</th>\n",
       "      <th>aspect_ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>256</td>\n",
       "      <td>93.250000</td>\n",
       "      <td>19.995017</td>\n",
       "      <td>17.021559</td>\n",
       "      <td>1.174688</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>90</td>\n",
       "      <td>82.488889</td>\n",
       "      <td>15.939969</td>\n",
       "      <td>7.516326</td>\n",
       "      <td>2.120713</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>577</td>\n",
       "      <td>90.637782</td>\n",
       "      <td>35.324458</td>\n",
       "      <td>21.759434</td>\n",
       "      <td>1.623409</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>270</td>\n",
       "      <td>95.640741</td>\n",
       "      <td>20.229431</td>\n",
       "      <td>17.669052</td>\n",
       "      <td>1.144908</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>153</td>\n",
       "      <td>84.908497</td>\n",
       "      <td>15.683703</td>\n",
       "      <td>12.420475</td>\n",
       "      <td>1.262730</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   area  intensity_mean  major_axis_length  minor_axis_length  aspect_ratio\n",
       "0   256       93.250000          19.995017          17.021559      1.174688\n",
       "1    90       82.488889          15.939969           7.516326      2.120713\n",
       "2   577       90.637782          35.324458          21.759434      1.623409\n",
       "3   270       95.640741          20.229431          17.669052      1.144908\n",
       "4   153       84.908497          15.683703          12.420475      1.262730"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = pd.read_csv('../../data/BBBC007_20P1_POS0007_D_1UL.csv')\n",
    "df1.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f0bf9a4b-925d-4227-8aff-4348fa2afca3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>area</th>\n",
       "      <th>intensity_mean</th>\n",
       "      <th>major_axis_length</th>\n",
       "      <th>minor_axis_length</th>\n",
       "      <th>aspect_ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>139</td>\n",
       "      <td>96.546763</td>\n",
       "      <td>17.504104</td>\n",
       "      <td>10.292770</td>\n",
       "      <td>1.700621</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>360</td>\n",
       "      <td>86.613889</td>\n",
       "      <td>35.746808</td>\n",
       "      <td>14.983124</td>\n",
       "      <td>2.385805</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>43</td>\n",
       "      <td>91.488372</td>\n",
       "      <td>12.967884</td>\n",
       "      <td>4.351573</td>\n",
       "      <td>2.980045</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>140</td>\n",
       "      <td>73.742857</td>\n",
       "      <td>18.940508</td>\n",
       "      <td>10.314404</td>\n",
       "      <td>1.836316</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>144</td>\n",
       "      <td>89.375000</td>\n",
       "      <td>13.639308</td>\n",
       "      <td>13.458532</td>\n",
       "      <td>1.013432</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   area  intensity_mean  major_axis_length  minor_axis_length  aspect_ratio\n",
       "0   139       96.546763          17.504104          10.292770      1.700621\n",
       "1   360       86.613889          35.746808          14.983124      2.385805\n",
       "2    43       91.488372          12.967884           4.351573      2.980045\n",
       "3   140       73.742857          18.940508          10.314404      1.836316\n",
       "4   144       89.375000          13.639308          13.458532      1.013432"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.read_csv('../../data/BBBC007_20P1_POS0010_D_1UL.csv')\n",
    "df2.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6121c288-6584-4c30-a584-43cc58871599",
   "metadata": {},
   "source": [
    "In this particular case where we know we have the same columns, we could concatenate them into a single big table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "0cf88538-6bea-4a47-ba1e-09a26fba2602",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>area</th>\n",
       "      <th>intensity_mean</th>\n",
       "      <th>major_axis_length</th>\n",
       "      <th>minor_axis_length</th>\n",
       "      <th>aspect_ratio</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>256</td>\n",
       "      <td>93.250000</td>\n",
       "      <td>19.995017</td>\n",
       "      <td>17.021559</td>\n",
       "      <td>1.174688</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>90</td>\n",
       "      <td>82.488889</td>\n",
       "      <td>15.939969</td>\n",
       "      <td>7.516326</td>\n",
       "      <td>2.120713</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>577</td>\n",
       "      <td>90.637782</td>\n",
       "      <td>35.324458</td>\n",
       "      <td>21.759434</td>\n",
       "      <td>1.623409</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>270</td>\n",
       "      <td>95.640741</td>\n",
       "      <td>20.229431</td>\n",
       "      <td>17.669052</td>\n",
       "      <td>1.144908</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>153</td>\n",
       "      <td>84.908497</td>\n",
       "      <td>15.683703</td>\n",
       "      <td>12.420475</td>\n",
       "      <td>1.262730</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>315</td>\n",
       "      <td>91.133333</td>\n",
       "      <td>20.927095</td>\n",
       "      <td>19.209283</td>\n",
       "      <td>1.089426</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>206</td>\n",
       "      <td>94.262136</td>\n",
       "      <td>23.381879</td>\n",
       "      <td>11.669668</td>\n",
       "      <td>2.003646</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>45</td>\n",
       "      <td>68.377778</td>\n",
       "      <td>9.406371</td>\n",
       "      <td>6.276445</td>\n",
       "      <td>1.498678</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>33</td>\n",
       "      <td>76.727273</td>\n",
       "      <td>10.724275</td>\n",
       "      <td>4.174568</td>\n",
       "      <td>2.568955</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>16</td>\n",
       "      <td>76.750000</td>\n",
       "      <td>7.745967</td>\n",
       "      <td>2.783882</td>\n",
       "      <td>2.782433</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>111 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    area  intensity_mean  major_axis_length  minor_axis_length  aspect_ratio\n",
       "0    256       93.250000          19.995017          17.021559      1.174688\n",
       "1     90       82.488889          15.939969           7.516326      2.120713\n",
       "2    577       90.637782          35.324458          21.759434      1.623409\n",
       "3    270       95.640741          20.229431          17.669052      1.144908\n",
       "4    153       84.908497          15.683703          12.420475      1.262730\n",
       "..   ...             ...                ...                ...           ...\n",
       "42   315       91.133333          20.927095          19.209283      1.089426\n",
       "43   206       94.262136          23.381879          11.669668      2.003646\n",
       "44    45       68.377778           9.406371           6.276445      1.498678\n",
       "45    33       76.727273          10.724275           4.174568      2.568955\n",
       "46    16       76.750000           7.745967           2.783882      2.782433\n",
       "\n",
       "[111 rows x 5 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big_df = pd.concat([df1, df2], axis=0)\n",
    "big_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12290fe6-86aa-408e-b9f7-1c2b2900a64a",
   "metadata": {},
   "source": [
    "The problem is that we lose their source identity. An easy fix for that is to add a new column with the file name before concatenating them. This will facilitate splitting them back and plotting later on.\n",
    "\n",
    "When we give a single value to a new column, it is assigned to all rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "a9a4745e-69a6-4bef-95e0-a449d8ec2dea",
   "metadata": {},
   "outputs": [],
   "source": [
    "df1['file_name'] = 'BBBC007_20P1_POS0007_D_1UL'\n",
    "\n",
    "df2['file_name'] = 'BBBC007_20P1_POS0010_D_1UL'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "4c87f225-eb3c-4107-b5d6-454c2598c084",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>area</th>\n",
       "      <th>intensity_mean</th>\n",
       "      <th>major_axis_length</th>\n",
       "      <th>minor_axis_length</th>\n",
       "      <th>aspect_ratio</th>\n",
       "      <th>file_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>256</td>\n",
       "      <td>93.250000</td>\n",
       "      <td>19.995017</td>\n",
       "      <td>17.021559</td>\n",
       "      <td>1.174688</td>\n",
       "      <td>BBBC007_20P1_POS0007_D_1UL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>90</td>\n",
       "      <td>82.488889</td>\n",
       "      <td>15.939969</td>\n",
       "      <td>7.516326</td>\n",
       "      <td>2.120713</td>\n",
       "      <td>BBBC007_20P1_POS0007_D_1UL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>577</td>\n",
       "      <td>90.637782</td>\n",
       "      <td>35.324458</td>\n",
       "      <td>21.759434</td>\n",
       "      <td>1.623409</td>\n",
       "      <td>BBBC007_20P1_POS0007_D_1UL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>270</td>\n",
       "      <td>95.640741</td>\n",
       "      <td>20.229431</td>\n",
       "      <td>17.669052</td>\n",
       "      <td>1.144908</td>\n",
       "      <td>BBBC007_20P1_POS0007_D_1UL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>153</td>\n",
       "      <td>84.908497</td>\n",
       "      <td>15.683703</td>\n",
       "      <td>12.420475</td>\n",
       "      <td>1.262730</td>\n",
       "      <td>BBBC007_20P1_POS0007_D_1UL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>315</td>\n",
       "      <td>91.133333</td>\n",
       "      <td>20.927095</td>\n",
       "      <td>19.209283</td>\n",
       "      <td>1.089426</td>\n",
       "      <td>BBBC007_20P1_POS0010_D_1UL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>206</td>\n",
       "      <td>94.262136</td>\n",
       "      <td>23.381879</td>\n",
       "      <td>11.669668</td>\n",
       "      <td>2.003646</td>\n",
       "      <td>BBBC007_20P1_POS0010_D_1UL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>45</td>\n",
       "      <td>68.377778</td>\n",
       "      <td>9.406371</td>\n",
       "      <td>6.276445</td>\n",
       "      <td>1.498678</td>\n",
       "      <td>BBBC007_20P1_POS0010_D_1UL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>33</td>\n",
       "      <td>76.727273</td>\n",
       "      <td>10.724275</td>\n",
       "      <td>4.174568</td>\n",
       "      <td>2.568955</td>\n",
       "      <td>BBBC007_20P1_POS0010_D_1UL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>16</td>\n",
       "      <td>76.750000</td>\n",
       "      <td>7.745967</td>\n",
       "      <td>2.783882</td>\n",
       "      <td>2.782433</td>\n",
       "      <td>BBBC007_20P1_POS0010_D_1UL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>111 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    area  intensity_mean  major_axis_length  minor_axis_length  aspect_ratio  \\\n",
       "0    256       93.250000          19.995017          17.021559      1.174688   \n",
       "1     90       82.488889          15.939969           7.516326      2.120713   \n",
       "2    577       90.637782          35.324458          21.759434      1.623409   \n",
       "3    270       95.640741          20.229431          17.669052      1.144908   \n",
       "4    153       84.908497          15.683703          12.420475      1.262730   \n",
       "..   ...             ...                ...                ...           ...   \n",
       "42   315       91.133333          20.927095          19.209283      1.089426   \n",
       "43   206       94.262136          23.381879          11.669668      2.003646   \n",
       "44    45       68.377778           9.406371           6.276445      1.498678   \n",
       "45    33       76.727273          10.724275           4.174568      2.568955   \n",
       "46    16       76.750000           7.745967           2.783882      2.782433   \n",
       "\n",
       "                     file_name  \n",
       "0   BBBC007_20P1_POS0007_D_1UL  \n",
       "1   BBBC007_20P1_POS0007_D_1UL  \n",
       "2   BBBC007_20P1_POS0007_D_1UL  \n",
       "3   BBBC007_20P1_POS0007_D_1UL  \n",
       "4   BBBC007_20P1_POS0007_D_1UL  \n",
       "..                         ...  \n",
       "42  BBBC007_20P1_POS0010_D_1UL  \n",
       "43  BBBC007_20P1_POS0010_D_1UL  \n",
       "44  BBBC007_20P1_POS0010_D_1UL  \n",
       "45  BBBC007_20P1_POS0010_D_1UL  \n",
       "46  BBBC007_20P1_POS0010_D_1UL  \n",
       "\n",
       "[111 rows x 6 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "big_df = pd.concat([df1, df2], axis=0)\n",
    "big_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf553c51-dd29-4d5b-8e72-dcf3a262c418",
   "metadata": {},
   "source": [
    "Now, we can safely distinguish the source of each row."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}