{
"cells": [
{
"cell_type": "markdown",
"id": "273def9d-da84-49b7-83be-11695c64c3d9",
"metadata": {},
"source": [
"# Appending tables\n",
"When processing multiple images, potentially using multiple image processing libraries, a common task is to combine tables.\n",
"\n",
"We start with two small tables of measurements that could have been obtained from different functions or different libraries."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "1477c227",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "156d6b7c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0.3 | \n",
" 2.3 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 0.5 | \n",
" 3.4 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 0.7 | \n",
" 1.2 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation\n",
"0 1 0.3 2.3\n",
"1 2 0.5 3.4\n",
"2 3 0.7 1.2"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table1 = pd.DataFrame({\n",
" \"label\": [1, 2, 3],\n",
" \"circularity\": [0.3, 0.5, 0.7],\n",
" \"elongation\": [2.3, 3.4, 1.2],\n",
" })\n",
"table1"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "1877c769",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" area | \n",
" skewness | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3 | \n",
" 22 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 32 | \n",
" 0.6 | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 25 | \n",
" 0.3 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 18 | \n",
" 0.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label area skewness\n",
"0 3 22 0.5\n",
"1 2 32 0.6\n",
"2 1 25 0.3\n",
"3 4 18 0.3"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table2 = pd.DataFrame({\n",
" \"label\": [3, 2, 1, 4],\n",
" \"area\": [22, 32, 25, 18],\n",
" \"skewness\": [0.5, 0.6, 0.3, 0.3],\n",
" })\n",
"table2"
]
},
{
"cell_type": "markdown",
"id": "49df41b0-b63a-44d0-8b6e-ae6ac7cfa263",
"metadata": {},
"source": [
"## Combining columns of tables\n",
"According to the [pandas documentation](https://pandas.pydata.org/docs/user_guide/merging.html) there are multiple ways for combining tables. We first use a _wrong_ example to highlight pitfalls when combining tables.\n",
"\n",
"In the following example, measurements of label 1 and 3 are mixed. Furthermore, one of our tables did not contain measurements for label 4."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6f255657",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation | \n",
" label | \n",
" area | \n",
" skewness | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1.0 | \n",
" 0.3 | \n",
" 2.3 | \n",
" 3 | \n",
" 22 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 1 | \n",
" 2.0 | \n",
" 0.5 | \n",
" 3.4 | \n",
" 2 | \n",
" 32 | \n",
" 0.6 | \n",
"
\n",
" \n",
" 2 | \n",
" 3.0 | \n",
" 0.7 | \n",
" 1.2 | \n",
" 1 | \n",
" 25 | \n",
" 0.3 | \n",
"
\n",
" \n",
" 3 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 4 | \n",
" 18 | \n",
" 0.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation label area skewness\n",
"0 1.0 0.3 2.3 3 22 0.5\n",
"1 2.0 0.5 3.4 2 32 0.6\n",
"2 3.0 0.7 1.2 1 25 0.3\n",
"3 NaN NaN NaN 4 18 0.3"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"wrongly_combined_tables = pd.concat([table1, table2], axis=1)\n",
"wrongly_combined_tables"
]
},
{
"cell_type": "markdown",
"id": "16a4e200-8964-43d1-b43c-eb5977ed195a",
"metadata": {},
"source": [
"A better way for combining tables is the `merge` command. It allows to explicitly specify `on` which column the tables should be combined. Data scientists speak of the 'index' or 'identifier' of rows in the tables."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "29ccfdb1-2b09-46d5-90c2-c4374cb73d02",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation | \n",
" area | \n",
" skewness | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0.3 | \n",
" 2.3 | \n",
" 25 | \n",
" 0.3 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 0.5 | \n",
" 3.4 | \n",
" 32 | \n",
" 0.6 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 0.7 | \n",
" 1.2 | \n",
" 22 | \n",
" 0.5 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation area skewness\n",
"0 1 0.3 2.3 25 0.3\n",
"1 2 0.5 3.4 32 0.6\n",
"2 3 0.7 1.2 22 0.5"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"correctly_combined_tables1 = pd.merge(table1, table2, how='inner', on='label')\n",
"correctly_combined_tables1"
]
},
{
"cell_type": "markdown",
"id": "eebbf929-6cb4-48e4-bff4-60d444c95d49",
"metadata": {},
"source": [
"You may note that in the above example, label 4 is missing. We can also get it by out table by performing an `outer join`. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "84d160b6-e577-478a-a14d-f4bba371afee",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation | \n",
" area | \n",
" skewness | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0.3 | \n",
" 2.3 | \n",
" 25 | \n",
" 0.3 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 0.5 | \n",
" 3.4 | \n",
" 32 | \n",
" 0.6 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 0.7 | \n",
" 1.2 | \n",
" 22 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" NaN | \n",
" NaN | \n",
" 18 | \n",
" 0.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation area skewness\n",
"0 1 0.3 2.3 25 0.3\n",
"1 2 0.5 3.4 32 0.6\n",
"2 3 0.7 1.2 22 0.5\n",
"3 4 NaN NaN 18 0.3"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"correctly_combined_tables2 = pd.merge(table1, table2, how='outer', on='label')\n",
"correctly_combined_tables2"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e885d30a-3be4-4af7-9a91-8a27dfed6276",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation | \n",
" area | \n",
" skewness | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3 | \n",
" 0.7 | \n",
" 1.2 | \n",
" 22 | \n",
" 0.5 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 0.5 | \n",
" 3.4 | \n",
" 32 | \n",
" 0.6 | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 0.3 | \n",
" 2.3 | \n",
" 25 | \n",
" 0.3 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" NaN | \n",
" NaN | \n",
" 18 | \n",
" 0.3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation area skewness\n",
"0 3 0.7 1.2 22 0.5\n",
"1 2 0.5 3.4 32 0.6\n",
"2 1 0.3 2.3 25 0.3\n",
"3 4 NaN NaN 18 0.3"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"correctly_combined_tables2 = pd.merge(table1, table2, how='right', on='label')\n",
"correctly_combined_tables2"
]
},
{
"cell_type": "markdown",
"id": "ce62adcc-f797-469a-865c-f29532561d46",
"metadata": {},
"source": [
"Suppose there is a common measurement name from different tables. For example, table3 below also contain \"elongation\"."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "567ea501-70cc-47e5-80f2-9b6810fd854e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" area | \n",
" skewness | \n",
" elongation | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 3 | \n",
" 22 | \n",
" 0.5 | \n",
" 2.3 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 32 | \n",
" 0.6 | \n",
" 3.4 | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 25 | \n",
" 0.3 | \n",
" 1.2 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 18 | \n",
" 0.3 | \n",
" 1.1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label area skewness elongation\n",
"0 3 22 0.5 2.3\n",
"1 2 32 0.6 3.4\n",
"2 1 25 0.3 1.2\n",
"3 4 18 0.3 1.1"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table3 = pd.DataFrame({\n",
" \"label\": [3, 2, 1, 4],\n",
" \"area\": [22, 32, 25, 18],\n",
" \"skewness\": [0.5, 0.6, 0.3, 0.3],\n",
" \"elongation\": [2.3, 3.4, 1.2, 1.1]\n",
" })\n",
"table3"
]
},
{
"cell_type": "markdown",
"id": "fea092cb-0e12-4056-a819-bf5051c94a25",
"metadata": {},
"source": [
"Applying merge still preserves both measurements in different columns."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "625b71eb-72ec-4680-83f5-d0a318c25c83",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation_x | \n",
" area | \n",
" skewness | \n",
" elongation_y | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0.3 | \n",
" 2.3 | \n",
" 25 | \n",
" 0.3 | \n",
" 1.2 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 0.5 | \n",
" 3.4 | \n",
" 32 | \n",
" 0.6 | \n",
" 3.4 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 0.7 | \n",
" 1.2 | \n",
" 22 | \n",
" 0.5 | \n",
" 2.3 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" NaN | \n",
" NaN | \n",
" 18 | \n",
" 0.3 | \n",
" 1.1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation_x area skewness elongation_y\n",
"0 1 0.3 2.3 25 0.3 1.2\n",
"1 2 0.5 3.4 32 0.6 3.4\n",
"2 3 0.7 1.2 22 0.5 2.3\n",
"3 4 NaN NaN 18 0.3 1.1"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"correctly_combined_tables3 = pd.merge(table1, table3, how='outer', on='label')\n",
"correctly_combined_tables3 "
]
},
{
"cell_type": "markdown",
"id": "7bccacef-45e3-4a16-afc8-73f1dd8ad631",
"metadata": {},
"source": [
"We can change 'x' and 'y' by passing other suffixes."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "43990561-34ca-4099-917b-bdb6895f149d",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" label | \n",
" circularity | \n",
" elongation_method1 | \n",
" area | \n",
" skewness | \n",
" elongation_method2 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0.3 | \n",
" 2.3 | \n",
" 25 | \n",
" 0.3 | \n",
" 1.2 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 0.5 | \n",
" 3.4 | \n",
" 32 | \n",
" 0.6 | \n",
" 3.4 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 0.7 | \n",
" 1.2 | \n",
" 22 | \n",
" 0.5 | \n",
" 2.3 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" NaN | \n",
" NaN | \n",
" 18 | \n",
" 0.3 | \n",
" 1.1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" label circularity elongation_method1 area skewness elongation_method2\n",
"0 1 0.3 2.3 25 0.3 1.2\n",
"1 2 0.5 3.4 32 0.6 3.4\n",
"2 3 0.7 1.2 22 0.5 2.3\n",
"3 4 NaN NaN 18 0.3 1.1"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"correctly_combined_tables3 = pd.merge(table1, table3, how='outer', on='label', suffixes=('_method1', '_method2'))\n",
"correctly_combined_tables3"
]
},
{
"cell_type": "markdown",
"id": "9682caa7-ea47-452a-a773-952639b53de8",
"metadata": {
"tags": []
},
"source": [
"## Combining measurements from multiple image files"
]
},
{
"cell_type": "markdown",
"id": "3e37e64c-0185-439f-9e52-f1e23e732865",
"metadata": {},
"source": [
"When applying a workflow to many images, you would get tables with the same column names, but with a variable number of rows.\n",
"To calculate statistics for whole folders or to perform machine learning, we usually need to concatenate those tables, but it is important to keep track of the source files.\n",
"\n",
"Let's open two tables generated from applying the same worflow to different files."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "23b7b945-b874-49ba-b48f-d494adf7e8c3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" area | \n",
" intensity_mean | \n",
" major_axis_length | \n",
" minor_axis_length | \n",
" aspect_ratio | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 256 | \n",
" 93.250000 | \n",
" 19.995017 | \n",
" 17.021559 | \n",
" 1.174688 | \n",
"
\n",
" \n",
" 1 | \n",
" 90 | \n",
" 82.488889 | \n",
" 15.939969 | \n",
" 7.516326 | \n",
" 2.120713 | \n",
"
\n",
" \n",
" 2 | \n",
" 577 | \n",
" 90.637782 | \n",
" 35.324458 | \n",
" 21.759434 | \n",
" 1.623409 | \n",
"
\n",
" \n",
" 3 | \n",
" 270 | \n",
" 95.640741 | \n",
" 20.229431 | \n",
" 17.669052 | \n",
" 1.144908 | \n",
"
\n",
" \n",
" 4 | \n",
" 153 | \n",
" 84.908497 | \n",
" 15.683703 | \n",
" 12.420475 | \n",
" 1.262730 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" area intensity_mean major_axis_length minor_axis_length aspect_ratio\n",
"0 256 93.250000 19.995017 17.021559 1.174688\n",
"1 90 82.488889 15.939969 7.516326 2.120713\n",
"2 577 90.637782 35.324458 21.759434 1.623409\n",
"3 270 95.640741 20.229431 17.669052 1.144908\n",
"4 153 84.908497 15.683703 12.420475 1.262730"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df1 = pd.read_csv('../../data/BBBC007_20P1_POS0007_D_1UL.csv')\n",
"df1.head()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "f0bf9a4b-925d-4227-8aff-4348fa2afca3",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" area | \n",
" intensity_mean | \n",
" major_axis_length | \n",
" minor_axis_length | \n",
" aspect_ratio | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 139 | \n",
" 96.546763 | \n",
" 17.504104 | \n",
" 10.292770 | \n",
" 1.700621 | \n",
"
\n",
" \n",
" 1 | \n",
" 360 | \n",
" 86.613889 | \n",
" 35.746808 | \n",
" 14.983124 | \n",
" 2.385805 | \n",
"
\n",
" \n",
" 2 | \n",
" 43 | \n",
" 91.488372 | \n",
" 12.967884 | \n",
" 4.351573 | \n",
" 2.980045 | \n",
"
\n",
" \n",
" 3 | \n",
" 140 | \n",
" 73.742857 | \n",
" 18.940508 | \n",
" 10.314404 | \n",
" 1.836316 | \n",
"
\n",
" \n",
" 4 | \n",
" 144 | \n",
" 89.375000 | \n",
" 13.639308 | \n",
" 13.458532 | \n",
" 1.013432 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" area intensity_mean major_axis_length minor_axis_length aspect_ratio\n",
"0 139 96.546763 17.504104 10.292770 1.700621\n",
"1 360 86.613889 35.746808 14.983124 2.385805\n",
"2 43 91.488372 12.967884 4.351573 2.980045\n",
"3 140 73.742857 18.940508 10.314404 1.836316\n",
"4 144 89.375000 13.639308 13.458532 1.013432"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df2 = pd.read_csv('../../data/BBBC007_20P1_POS0010_D_1UL.csv')\n",
"df2.head()"
]
},
{
"cell_type": "markdown",
"id": "6121c288-6584-4c30-a584-43cc58871599",
"metadata": {},
"source": [
"In this particular case where we know we have the same columns, we could concatenate them into a single big table."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "0cf88538-6bea-4a47-ba1e-09a26fba2602",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" area | \n",
" intensity_mean | \n",
" major_axis_length | \n",
" minor_axis_length | \n",
" aspect_ratio | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 256 | \n",
" 93.250000 | \n",
" 19.995017 | \n",
" 17.021559 | \n",
" 1.174688 | \n",
"
\n",
" \n",
" 1 | \n",
" 90 | \n",
" 82.488889 | \n",
" 15.939969 | \n",
" 7.516326 | \n",
" 2.120713 | \n",
"
\n",
" \n",
" 2 | \n",
" 577 | \n",
" 90.637782 | \n",
" 35.324458 | \n",
" 21.759434 | \n",
" 1.623409 | \n",
"
\n",
" \n",
" 3 | \n",
" 270 | \n",
" 95.640741 | \n",
" 20.229431 | \n",
" 17.669052 | \n",
" 1.144908 | \n",
"
\n",
" \n",
" 4 | \n",
" 153 | \n",
" 84.908497 | \n",
" 15.683703 | \n",
" 12.420475 | \n",
" 1.262730 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 42 | \n",
" 315 | \n",
" 91.133333 | \n",
" 20.927095 | \n",
" 19.209283 | \n",
" 1.089426 | \n",
"
\n",
" \n",
" 43 | \n",
" 206 | \n",
" 94.262136 | \n",
" 23.381879 | \n",
" 11.669668 | \n",
" 2.003646 | \n",
"
\n",
" \n",
" 44 | \n",
" 45 | \n",
" 68.377778 | \n",
" 9.406371 | \n",
" 6.276445 | \n",
" 1.498678 | \n",
"
\n",
" \n",
" 45 | \n",
" 33 | \n",
" 76.727273 | \n",
" 10.724275 | \n",
" 4.174568 | \n",
" 2.568955 | \n",
"
\n",
" \n",
" 46 | \n",
" 16 | \n",
" 76.750000 | \n",
" 7.745967 | \n",
" 2.783882 | \n",
" 2.782433 | \n",
"
\n",
" \n",
"
\n",
"
111 rows × 5 columns
\n",
"
"
],
"text/plain": [
" area intensity_mean major_axis_length minor_axis_length aspect_ratio\n",
"0 256 93.250000 19.995017 17.021559 1.174688\n",
"1 90 82.488889 15.939969 7.516326 2.120713\n",
"2 577 90.637782 35.324458 21.759434 1.623409\n",
"3 270 95.640741 20.229431 17.669052 1.144908\n",
"4 153 84.908497 15.683703 12.420475 1.262730\n",
".. ... ... ... ... ...\n",
"42 315 91.133333 20.927095 19.209283 1.089426\n",
"43 206 94.262136 23.381879 11.669668 2.003646\n",
"44 45 68.377778 9.406371 6.276445 1.498678\n",
"45 33 76.727273 10.724275 4.174568 2.568955\n",
"46 16 76.750000 7.745967 2.783882 2.782433\n",
"\n",
"[111 rows x 5 columns]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"big_df = pd.concat([df1, df2], axis=0)\n",
"big_df"
]
},
{
"cell_type": "markdown",
"id": "12290fe6-86aa-408e-b9f7-1c2b2900a64a",
"metadata": {},
"source": [
"The problem is that we lose their source identity. An easy fix for that is to add a new column with the file name before concatenating them. This will facilitating splitting them back and plotting later on.\n",
"\n",
"When we give a single value to a new column, it is assigned to all rows."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "a9a4745e-69a6-4bef-95e0-a449d8ec2dea",
"metadata": {},
"outputs": [],
"source": [
"df1['file_name'] = 'BBBC007_20P1_POS0007_D_1UL'\n",
"\n",
"df2['file_name'] = 'BBBC007_20P1_POS0010_D_1UL'"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "4c87f225-eb3c-4107-b5d6-454c2598c084",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" area | \n",
" intensity_mean | \n",
" major_axis_length | \n",
" minor_axis_length | \n",
" aspect_ratio | \n",
" file_name | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 256 | \n",
" 93.250000 | \n",
" 19.995017 | \n",
" 17.021559 | \n",
" 1.174688 | \n",
" BBBC007_20P1_POS0007_D_1UL | \n",
"
\n",
" \n",
" 1 | \n",
" 90 | \n",
" 82.488889 | \n",
" 15.939969 | \n",
" 7.516326 | \n",
" 2.120713 | \n",
" BBBC007_20P1_POS0007_D_1UL | \n",
"
\n",
" \n",
" 2 | \n",
" 577 | \n",
" 90.637782 | \n",
" 35.324458 | \n",
" 21.759434 | \n",
" 1.623409 | \n",
" BBBC007_20P1_POS0007_D_1UL | \n",
"
\n",
" \n",
" 3 | \n",
" 270 | \n",
" 95.640741 | \n",
" 20.229431 | \n",
" 17.669052 | \n",
" 1.144908 | \n",
" BBBC007_20P1_POS0007_D_1UL | \n",
"
\n",
" \n",
" 4 | \n",
" 153 | \n",
" 84.908497 | \n",
" 15.683703 | \n",
" 12.420475 | \n",
" 1.262730 | \n",
" BBBC007_20P1_POS0007_D_1UL | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 42 | \n",
" 315 | \n",
" 91.133333 | \n",
" 20.927095 | \n",
" 19.209283 | \n",
" 1.089426 | \n",
" BBBC007_20P1_POS0010_D_1UL | \n",
"
\n",
" \n",
" 43 | \n",
" 206 | \n",
" 94.262136 | \n",
" 23.381879 | \n",
" 11.669668 | \n",
" 2.003646 | \n",
" BBBC007_20P1_POS0010_D_1UL | \n",
"
\n",
" \n",
" 44 | \n",
" 45 | \n",
" 68.377778 | \n",
" 9.406371 | \n",
" 6.276445 | \n",
" 1.498678 | \n",
" BBBC007_20P1_POS0010_D_1UL | \n",
"
\n",
" \n",
" 45 | \n",
" 33 | \n",
" 76.727273 | \n",
" 10.724275 | \n",
" 4.174568 | \n",
" 2.568955 | \n",
" BBBC007_20P1_POS0010_D_1UL | \n",
"
\n",
" \n",
" 46 | \n",
" 16 | \n",
" 76.750000 | \n",
" 7.745967 | \n",
" 2.783882 | \n",
" 2.782433 | \n",
" BBBC007_20P1_POS0010_D_1UL | \n",
"
\n",
" \n",
"
\n",
"
111 rows × 6 columns
\n",
"
"
],
"text/plain": [
" area intensity_mean major_axis_length minor_axis_length aspect_ratio \\\n",
"0 256 93.250000 19.995017 17.021559 1.174688 \n",
"1 90 82.488889 15.939969 7.516326 2.120713 \n",
"2 577 90.637782 35.324458 21.759434 1.623409 \n",
"3 270 95.640741 20.229431 17.669052 1.144908 \n",
"4 153 84.908497 15.683703 12.420475 1.262730 \n",
".. ... ... ... ... ... \n",
"42 315 91.133333 20.927095 19.209283 1.089426 \n",
"43 206 94.262136 23.381879 11.669668 2.003646 \n",
"44 45 68.377778 9.406371 6.276445 1.498678 \n",
"45 33 76.727273 10.724275 4.174568 2.568955 \n",
"46 16 76.750000 7.745967 2.783882 2.782433 \n",
"\n",
" file_name \n",
"0 BBBC007_20P1_POS0007_D_1UL \n",
"1 BBBC007_20P1_POS0007_D_1UL \n",
"2 BBBC007_20P1_POS0007_D_1UL \n",
"3 BBBC007_20P1_POS0007_D_1UL \n",
"4 BBBC007_20P1_POS0007_D_1UL \n",
".. ... \n",
"42 BBBC007_20P1_POS0010_D_1UL \n",
"43 BBBC007_20P1_POS0010_D_1UL \n",
"44 BBBC007_20P1_POS0010_D_1UL \n",
"45 BBBC007_20P1_POS0010_D_1UL \n",
"46 BBBC007_20P1_POS0010_D_1UL \n",
"\n",
"[111 rows x 6 columns]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"big_df = pd.concat([df1, df2], axis=0)\n",
"big_df"
]
},
{
"cell_type": "markdown",
"id": "bf553c51-dd29-4d5b-8e72-dcf3a262c418",
"metadata": {},
"source": [
"Now, when we can safely distinguish the source of each row."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.15"
}
},
"nbformat": 4,
"nbformat_minor": 5
}