Exploring tabular data

Exploring tabular data#

When working with data in tables, the ability of quickly getting an overview about the data is key.

import pandas as pd 

Loading CSV files from disk#

To ensure compatility beween different software for processing tabular data the CSV file format is commonly used. We can open those files using pandas.read_csv.

data = pd.read_csv('../../data/Results.csv', index_col=0, delimiter=';')
data

	Area	Mean	StdDev	Min	Max	X	Y	XM	YM	Major	Minor	Angle	%Area	Type

1	18.0	730.389	103.354	592.0	948.0	435.000	4.722	434.962	4.697	5.987	3.828	168.425	100	A
2	126.0	718.333	90.367	556.0	1046.0	388.087	8.683	388.183	8.687	16.559	9.688	175.471	100	A
3	NaN	NaN	NaN	608.0	964.0	NaN	NaN	NaN	7.665	7.359	NaN	101.121	100	A
4	68.0	686.985	61.169	571.0	880.0	126.147	8.809	126.192	8.811	15.136	5.720	168.133	100	A
5	NaN	NaN	69.438	566.0	792.0	348.500	7.500	NaN	7.508	NaN	3.088	NaN	100	A
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
387	152.0	801.599	111.328	582.0	1263.0	348.487	497.632	348.451	497.675	17.773	10.889	11.829	100	A
388	17.0	742.706	69.624	620.0	884.0	420.500	496.382	420.513	NaN	NaN	3.663	49.457	100	A
389	60.0	758.033	77.309	601.0	947.0	259.000	499.300	258.990	499.289	9.476	8.062	90.000	100	A
390	12.0	714.833	67.294	551.0	785.0	240.167	498.167	240.179	498.148	4.606	3.317	168.690	100	A
391	23.0	695.043	67.356	611.0	846.0	49.891	503.022	49.882	502.979	6.454	4.537	73.243	100	A

391 rows × 14 columns

Viewing the data#

Viewing data can be tricky, especially when working with large tables.

data.head(10) # top 10 rows

	Area	Mean	StdDev	Min	Max	X	Y	XM	YM	Major	Minor	Angle	%Area	Type

1	18.0	730.389	103.354	592.0	948.0	435.000	4.722	434.962	4.697	5.987	3.828	168.425	100	A
2	126.0	718.333	90.367	556.0	1046.0	388.087	8.683	388.183	8.687	16.559	9.688	175.471	100	A
3	NaN	NaN	NaN	608.0	964.0	NaN	NaN	NaN	7.665	7.359	NaN	101.121	100	A
4	68.0	686.985	61.169	571.0	880.0	126.147	8.809	126.192	8.811	15.136	5.720	168.133	100	A
5	NaN	NaN	69.438	566.0	792.0	348.500	7.500	NaN	7.508	NaN	3.088	NaN	100	A
6	669.0	697.164	72.863	539.0	957.0	471.696	26.253	471.694	26.197	36.656	23.237	124.340	100	A
7	5.0	658.600	49.161	607.0	710.0	28.300	8.100	28.284	8.103	3.144	2.025	161.565	100	A
8	7.0	677.571	49.899	596.0	768.0	415.357	8.786	415.360	8.804	4.110	2.168	112.500	100	A
9	14.0	691.071	63.873	586.0	808.0	493.286	9.000	493.295	9.016	5.120	3.481	38.802	100	C
10	39.0	763.615	88.786	623.0	1016.0	157.526	12.731	157.592	12.757	8.815	5.633	46.437	100	C

data.tail(10) # bottom 10 rows

	Area	Mean	StdDev	Min	Max	X	Y	XM	YM	Major	Minor	Angle	%Area	Type

382	45.0	734.356	68.637	575.0	867.0	171.500	494.789	171.492	494.739	14.630	3.916	95.698	100	B
383	94.0	746.617	85.198	550.0	1021.0	194.032	498.223	194.014	498.239	17.295	6.920	52.720	100	B
384	35.0	776.257	74.746	611.0	961.0	268.957	493.586	268.977	NaN	NaN	5.990	111.193	100	A
385	35.0	739.286	NaN	593.0	928.0	291.871	493.843	291.871	493.806	NaN	5.352	79.368	100	A
386	14.0	736.143	81.533	646.0	902.0	315.000	493.000	314.989	493.003	NaN	3.676	45.000	100	A
387	152.0	801.599	111.328	582.0	1263.0	348.487	497.632	348.451	497.675	17.773	10.889	11.829	100	A
388	17.0	742.706	69.624	620.0	884.0	420.500	496.382	420.513	NaN	NaN	3.663	49.457	100	A
389	60.0	758.033	77.309	601.0	947.0	259.000	499.300	258.990	499.289	9.476	8.062	90.000	100	A
390	12.0	714.833	67.294	551.0	785.0	240.167	498.167	240.179	498.148	4.606	3.317	168.690	100	A
391	23.0	695.043	67.356	611.0	846.0	49.891	503.022	49.882	502.979	6.454	4.537	73.243	100	A

Overview descriptive statistics#

To get a glimpse of the range of values which exist in the given table, we can ask the DateFrame to describe itself using DataFrame.describe(). It will display count, mean, standard deviation and other descriptive statistics for each column in our table.

data.describe()

	Area	Mean	StdDev	Min	Max	X	Y	XM	YM	Major	Minor	Angle	%Area
count	389.000000	386.000000	388.000000	388.000000	388.000000	389.000000	388.000000	388.000000	386.000000	383.000000	388.000000	390.000000	391.0
mean	107.164524	743.455565	76.575309	610.414948	962.922680	256.419859	254.384088	256.183338	253.353005	12.481016	9.500662	86.598441	100.0
std	241.037082	42.252140	31.844864	57.156709	244.897224	152.261694	155.080074	152.380388	154.426250	11.979176	49.714280	60.593686	0.0
min	1.000000	587.000000	0.000000	516.000000	587.000000	3.978000	4.722000	4.012000	4.697000	1.128000	1.128000	0.000000	100.0
25%	15.000000	717.060750	63.861000	570.750000	847.750000	127.142000	102.875250	126.923250	103.813750	5.098000	3.637250	34.517250	100.0
50%	44.000000	741.077500	74.727000	599.000000	917.500000	243.300000	271.490000	242.288000	271.272000	9.374000	5.886000	89.703500	100.0
75%	116.000000	767.260750	86.826500	633.250000	1014.500000	400.167000	395.058250	400.363500	393.800750	16.283000	9.017250	134.617250	100.0
max	2755.000000	912.938000	377.767000	877.000000	3880.000000	508.214000	503.022000	508.169000	502.979000	144.475000	981.000000	568.000000	100.0

Sorting in tables#

In many cases, we are interested in table rows that contain the maximum value, e.g. in the area column we can find the largest object:

data.sort_values(by = "Area", ascending=False)

	Area	Mean	StdDev	Min	Max	X	Y	XM	YM	Major	Minor	Angle	%Area	Type

190	2755.0	859.928	235.458	539.0	3880.0	108.710	302.158	110.999	300.247	144.475	24.280	39.318	100	C
81	2295.0	765.239	96.545	558.0	1431.0	375.003	134.888	374.982	135.359	65.769	44.429	127.247	100	B
209	1821.0	847.761	122.074	600.0	1510.0	287.795	321.115	288.074	321.824	55.879	41.492	112.124	100	A
252	1528.0	763.777	83.183	572.0	1172.0	191.969	385.944	192.487	385.697	63.150	30.808	34.424	100	B
265	1252.0	793.371	117.139	579.0	1668.0	262.071	394.497	262.268	394.326	60.154	26.500	50.147	100	A
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
113	1.0	587.000	0.000	587.0	587.0	399.500	117.500	399.500	117.500	1.128	1.128	0.000	100	A
310	1.0	866.000	0.000	866.0	866.0	343.500	408.500	343.500	408.500	1.128	1.128	0.000	100	A
219	1.0	763.000	0.000	763.0	763.0	411.500	296.500	411.500	296.500	1.128	1.128	0.000	100	A
3	NaN	NaN	NaN	608.0	964.0	NaN	NaN	NaN	7.665	7.359	NaN	101.121	100	A
5	NaN	NaN	69.438	566.0	792.0	348.500	7.500	NaN	7.508	NaN	3.088	NaN	100	A