{ "cells": [ { "cell_type": "markdown", "id": "273def9d-da84-49b7-83be-11695c64c3d9", "metadata": {}, "source": [ "# Appending tables\n", "When processing multiple images, potentially using multiple image processing libraries, a common task is to combine tables.\n", "\n", "We start with two small tables of measurements that could have been obtained from different functions or different libraries." ] }, { "cell_type": "code", "execution_count": 1, "id": "1477c227", "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "156d6b7c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongation
010.32.3
120.53.4
230.71.2
\n", "
" ], "text/plain": [ " label circularity elongation\n", "0 1 0.3 2.3\n", "1 2 0.5 3.4\n", "2 3 0.7 1.2" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table1 = pd.DataFrame({\n", " \"label\": [1, 2, 3],\n", " \"circularity\": [0.3, 0.5, 0.7],\n", " \"elongation\": [2.3, 3.4, 1.2],\n", " })\n", "table1" ] }, { "cell_type": "code", "execution_count": 3, "id": "1877c769", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelareaskewness
03220.5
12320.6
21250.3
34180.3
\n", "
" ], "text/plain": [ " label area skewness\n", "0 3 22 0.5\n", "1 2 32 0.6\n", "2 1 25 0.3\n", "3 4 18 0.3" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table2 = pd.DataFrame({\n", " \"label\": [3, 2, 1, 4],\n", " \"area\": [22, 32, 25, 18],\n", " \"skewness\": [0.5, 0.6, 0.3, 0.3],\n", " })\n", "table2" ] }, { "cell_type": "markdown", "id": "49df41b0-b63a-44d0-8b6e-ae6ac7cfa263", "metadata": {}, "source": [ "## Combining columns of tables\n", "According to the [pandas documentation](https://pandas.pydata.org/docs/user_guide/merging.html) there are multiple ways for combining tables. We first use a _wrong_ example to highlight pitfalls when combining tables.\n", "\n", "In the following example, measurements of label 1 and 3 are mixed. Furthermore, one of our tables did not contain measurements for label 4." ] }, { "cell_type": "code", "execution_count": 4, "id": "6f255657", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongationlabelareaskewness
01.00.32.33220.5
12.00.53.42320.6
23.00.71.21250.3
3NaNNaNNaN4180.3
\n", "
" ], "text/plain": [ " label circularity elongation label area skewness\n", "0 1.0 0.3 2.3 3 22 0.5\n", "1 2.0 0.5 3.4 2 32 0.6\n", "2 3.0 0.7 1.2 1 25 0.3\n", "3 NaN NaN NaN 4 18 0.3" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wrongly_combined_tables = pd.concat([table1, table2], axis=1)\n", "wrongly_combined_tables" ] }, { "cell_type": "markdown", "id": "16a4e200-8964-43d1-b43c-eb5977ed195a", "metadata": {}, "source": [ "A better way for combining tables is the `merge` command. It allows to explicitly specify `on` which column the tables should be combined. Data scientists speak of the 'index' or 'identifier' of rows in the tables." ] }, { "cell_type": "code", "execution_count": 5, "id": "29ccfdb1-2b09-46d5-90c2-c4374cb73d02", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongationareaskewness
010.32.3250.3
120.53.4320.6
230.71.2220.5
\n", "
" ], "text/plain": [ " label circularity elongation area skewness\n", "0 1 0.3 2.3 25 0.3\n", "1 2 0.5 3.4 32 0.6\n", "2 3 0.7 1.2 22 0.5" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "correctly_combined_tables1 = pd.merge(table1, table2, how='inner', on='label')\n", "correctly_combined_tables1" ] }, { "cell_type": "markdown", "id": "eebbf929-6cb4-48e4-bff4-60d444c95d49", "metadata": {}, "source": [ "You may note that in the above example, label 4 is missing. We can also get it by out table by performing an `outer join`. " ] }, { "cell_type": "code", "execution_count": 6, "id": "84d160b6-e577-478a-a14d-f4bba371afee", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongationareaskewness
010.32.3250.3
120.53.4320.6
230.71.2220.5
34NaNNaN180.3
\n", "
" ], "text/plain": [ " label circularity elongation area skewness\n", "0 1 0.3 2.3 25 0.3\n", "1 2 0.5 3.4 32 0.6\n", "2 3 0.7 1.2 22 0.5\n", "3 4 NaN NaN 18 0.3" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "correctly_combined_tables2 = pd.merge(table1, table2, how='outer', on='label')\n", "correctly_combined_tables2" ] }, { "cell_type": "code", "execution_count": 7, "id": "e885d30a-3be4-4af7-9a91-8a27dfed6276", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongationareaskewness
030.71.2220.5
120.53.4320.6
210.32.3250.3
34NaNNaN180.3
\n", "
" ], "text/plain": [ " label circularity elongation area skewness\n", "0 3 0.7 1.2 22 0.5\n", "1 2 0.5 3.4 32 0.6\n", "2 1 0.3 2.3 25 0.3\n", "3 4 NaN NaN 18 0.3" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "correctly_combined_tables2 = pd.merge(table1, table2, how='right', on='label')\n", "correctly_combined_tables2" ] }, { "cell_type": "markdown", "id": "ce62adcc-f797-469a-865c-f29532561d46", "metadata": {}, "source": [ "Suppose there is a common measurement name from different tables. For example, table3 below also contain \"elongation\"." ] }, { "cell_type": "code", "execution_count": 8, "id": "567ea501-70cc-47e5-80f2-9b6810fd854e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelareaskewnesselongation
03220.52.3
12320.63.4
21250.31.2
34180.31.1
\n", "
" ], "text/plain": [ " label area skewness elongation\n", "0 3 22 0.5 2.3\n", "1 2 32 0.6 3.4\n", "2 1 25 0.3 1.2\n", "3 4 18 0.3 1.1" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table3 = pd.DataFrame({\n", " \"label\": [3, 2, 1, 4],\n", " \"area\": [22, 32, 25, 18],\n", " \"skewness\": [0.5, 0.6, 0.3, 0.3],\n", " \"elongation\": [2.3, 3.4, 1.2, 1.1]\n", " })\n", "table3" ] }, { "cell_type": "markdown", "id": "fea092cb-0e12-4056-a819-bf5051c94a25", "metadata": {}, "source": [ "Applying merge still preserves both measurements in different columns." ] }, { "cell_type": "code", "execution_count": 9, "id": "625b71eb-72ec-4680-83f5-d0a318c25c83", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongation_xareaskewnesselongation_y
010.32.3250.31.2
120.53.4320.63.4
230.71.2220.52.3
34NaNNaN180.31.1
\n", "
" ], "text/plain": [ " label circularity elongation_x area skewness elongation_y\n", "0 1 0.3 2.3 25 0.3 1.2\n", "1 2 0.5 3.4 32 0.6 3.4\n", "2 3 0.7 1.2 22 0.5 2.3\n", "3 4 NaN NaN 18 0.3 1.1" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "correctly_combined_tables3 = pd.merge(table1, table3, how='outer', on='label')\n", "correctly_combined_tables3 " ] }, { "cell_type": "markdown", "id": "7bccacef-45e3-4a16-afc8-73f1dd8ad631", "metadata": {}, "source": [ "We can change 'x' and 'y' by passing other suffixes." ] }, { "cell_type": "code", "execution_count": 10, "id": "43990561-34ca-4099-917b-bdb6895f149d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
labelcircularityelongation_method1areaskewnesselongation_method2
010.32.3250.31.2
120.53.4320.63.4
230.71.2220.52.3
34NaNNaN180.31.1
\n", "
" ], "text/plain": [ " label circularity elongation_method1 area skewness elongation_method2\n", "0 1 0.3 2.3 25 0.3 1.2\n", "1 2 0.5 3.4 32 0.6 3.4\n", "2 3 0.7 1.2 22 0.5 2.3\n", "3 4 NaN NaN 18 0.3 1.1" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "correctly_combined_tables3 = pd.merge(table1, table3, how='outer', on='label', suffixes=('_method1', '_method2'))\n", "correctly_combined_tables3" ] }, { "cell_type": "markdown", "id": "9682caa7-ea47-452a-a773-952639b53de8", "metadata": { "tags": [] }, "source": [ "## Combining measurements from multiple image files" ] }, { "cell_type": "markdown", "id": "3e37e64c-0185-439f-9e52-f1e23e732865", "metadata": {}, "source": [ "When applying a workflow to many images, you would get tables with the same column names, but with a variable number of rows.\n", "To calculate statistics for whole folders or to perform machine learning, we usually need to concatenate those tables, but it is important to keep track of the source files.\n", "\n", "Let's open two tables generated from applying the same worflow to different files." ] }, { "cell_type": "code", "execution_count": 11, "id": "23b7b945-b874-49ba-b48f-d494adf7e8c3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
areaintensity_meanmajor_axis_lengthminor_axis_lengthaspect_ratio
025693.25000019.99501717.0215591.174688
19082.48888915.9399697.5163262.120713
257790.63778235.32445821.7594341.623409
327095.64074120.22943117.6690521.144908
415384.90849715.68370312.4204751.262730
\n", "
" ], "text/plain": [ " area intensity_mean major_axis_length minor_axis_length aspect_ratio\n", "0 256 93.250000 19.995017 17.021559 1.174688\n", "1 90 82.488889 15.939969 7.516326 2.120713\n", "2 577 90.637782 35.324458 21.759434 1.623409\n", "3 270 95.640741 20.229431 17.669052 1.144908\n", "4 153 84.908497 15.683703 12.420475 1.262730" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df1 = pd.read_csv('../../data/BBBC007_20P1_POS0007_D_1UL.csv')\n", "df1.head()" ] }, { "cell_type": "code", "execution_count": 12, "id": "f0bf9a4b-925d-4227-8aff-4348fa2afca3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
areaintensity_meanmajor_axis_lengthminor_axis_lengthaspect_ratio
013996.54676317.50410410.2927701.700621
136086.61388935.74680814.9831242.385805
24391.48837212.9678844.3515732.980045
314073.74285718.94050810.3144041.836316
414489.37500013.63930813.4585321.013432
\n", "
" ], "text/plain": [ " area intensity_mean major_axis_length minor_axis_length aspect_ratio\n", "0 139 96.546763 17.504104 10.292770 1.700621\n", "1 360 86.613889 35.746808 14.983124 2.385805\n", "2 43 91.488372 12.967884 4.351573 2.980045\n", "3 140 73.742857 18.940508 10.314404 1.836316\n", "4 144 89.375000 13.639308 13.458532 1.013432" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.read_csv('../../data/BBBC007_20P1_POS0010_D_1UL.csv')\n", "df2.head()" ] }, { "cell_type": "markdown", "id": "6121c288-6584-4c30-a584-43cc58871599", "metadata": {}, "source": [ "In this particular case where we know we have the same columns, we could concatenate them into a single big table." ] }, { "cell_type": "code", "execution_count": 13, "id": "0cf88538-6bea-4a47-ba1e-09a26fba2602", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
areaintensity_meanmajor_axis_lengthminor_axis_lengthaspect_ratio
025693.25000019.99501717.0215591.174688
19082.48888915.9399697.5163262.120713
257790.63778235.32445821.7594341.623409
327095.64074120.22943117.6690521.144908
415384.90849715.68370312.4204751.262730
..................
4231591.13333320.92709519.2092831.089426
4320694.26213623.38187911.6696682.003646
444568.3777789.4063716.2764451.498678
453376.72727310.7242754.1745682.568955
461676.7500007.7459672.7838822.782433
\n", "

111 rows × 5 columns

\n", "
" ], "text/plain": [ " area intensity_mean major_axis_length minor_axis_length aspect_ratio\n", "0 256 93.250000 19.995017 17.021559 1.174688\n", "1 90 82.488889 15.939969 7.516326 2.120713\n", "2 577 90.637782 35.324458 21.759434 1.623409\n", "3 270 95.640741 20.229431 17.669052 1.144908\n", "4 153 84.908497 15.683703 12.420475 1.262730\n", ".. ... ... ... ... ...\n", "42 315 91.133333 20.927095 19.209283 1.089426\n", "43 206 94.262136 23.381879 11.669668 2.003646\n", "44 45 68.377778 9.406371 6.276445 1.498678\n", "45 33 76.727273 10.724275 4.174568 2.568955\n", "46 16 76.750000 7.745967 2.783882 2.782433\n", "\n", "[111 rows x 5 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "big_df = pd.concat([df1, df2], axis=0)\n", "big_df" ] }, { "cell_type": "markdown", "id": "12290fe6-86aa-408e-b9f7-1c2b2900a64a", "metadata": {}, "source": [ "The problem is that we lose their source identity. An easy fix for that is to add a new column with the file name before concatenating them. This will facilitating splitting them back and plotting later on.\n", "\n", "When we give a single value to a new column, it is assigned to all rows." ] }, { "cell_type": "code", "execution_count": 14, "id": "a9a4745e-69a6-4bef-95e0-a449d8ec2dea", "metadata": {}, "outputs": [], "source": [ "df1['file_name'] = 'BBBC007_20P1_POS0007_D_1UL'\n", "\n", "df2['file_name'] = 'BBBC007_20P1_POS0010_D_1UL'" ] }, { "cell_type": "code", "execution_count": 15, "id": "4c87f225-eb3c-4107-b5d6-454c2598c084", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
areaintensity_meanmajor_axis_lengthminor_axis_lengthaspect_ratiofile_name
025693.25000019.99501717.0215591.174688BBBC007_20P1_POS0007_D_1UL
19082.48888915.9399697.5163262.120713BBBC007_20P1_POS0007_D_1UL
257790.63778235.32445821.7594341.623409BBBC007_20P1_POS0007_D_1UL
327095.64074120.22943117.6690521.144908BBBC007_20P1_POS0007_D_1UL
415384.90849715.68370312.4204751.262730BBBC007_20P1_POS0007_D_1UL
.....................
4231591.13333320.92709519.2092831.089426BBBC007_20P1_POS0010_D_1UL
4320694.26213623.38187911.6696682.003646BBBC007_20P1_POS0010_D_1UL
444568.3777789.4063716.2764451.498678BBBC007_20P1_POS0010_D_1UL
453376.72727310.7242754.1745682.568955BBBC007_20P1_POS0010_D_1UL
461676.7500007.7459672.7838822.782433BBBC007_20P1_POS0010_D_1UL
\n", "

111 rows × 6 columns

\n", "
" ], "text/plain": [ " area intensity_mean major_axis_length minor_axis_length aspect_ratio \\\n", "0 256 93.250000 19.995017 17.021559 1.174688 \n", "1 90 82.488889 15.939969 7.516326 2.120713 \n", "2 577 90.637782 35.324458 21.759434 1.623409 \n", "3 270 95.640741 20.229431 17.669052 1.144908 \n", "4 153 84.908497 15.683703 12.420475 1.262730 \n", ".. ... ... ... ... ... \n", "42 315 91.133333 20.927095 19.209283 1.089426 \n", "43 206 94.262136 23.381879 11.669668 2.003646 \n", "44 45 68.377778 9.406371 6.276445 1.498678 \n", "45 33 76.727273 10.724275 4.174568 2.568955 \n", "46 16 76.750000 7.745967 2.783882 2.782433 \n", "\n", " file_name \n", "0 BBBC007_20P1_POS0007_D_1UL \n", "1 BBBC007_20P1_POS0007_D_1UL \n", "2 BBBC007_20P1_POS0007_D_1UL \n", "3 BBBC007_20P1_POS0007_D_1UL \n", "4 BBBC007_20P1_POS0007_D_1UL \n", ".. ... \n", "42 BBBC007_20P1_POS0010_D_1UL \n", "43 BBBC007_20P1_POS0010_D_1UL \n", "44 BBBC007_20P1_POS0010_D_1UL \n", "45 BBBC007_20P1_POS0010_D_1UL \n", "46 BBBC007_20P1_POS0010_D_1UL \n", "\n", "[111 rows x 6 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "big_df = pd.concat([df1, df2], axis=0)\n", "big_df" ] }, { "cell_type": "markdown", "id": "bf553c51-dd29-4d5b-8e72-dcf3a262c418", "metadata": {}, "source": [ "Now, when we can safely distinguish the source of each row." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.15" } }, "nbformat": 4, "nbformat_minor": 5 }