Exploring tabular data#

When working with data in tables, the ability of quickly getting an overview about the data is key.

import pandas as pd 

Loading CSV files from disk#

To ensure compatility beween different software for processing tabular data the CSV file format is commonly used. We can open those files using pandas.read_csv.

data = pd.read_csv('../../data/Results.csv', index_col=0, delimiter=';')
data
Area Mean StdDev Min Max X Y XM YM Major Minor Angle %Area Type
1 18.0 730.389 103.354 592.0 948.0 435.000 4.722 434.962 4.697 5.987 3.828 168.425 100 A
2 126.0 718.333 90.367 556.0 1046.0 388.087 8.683 388.183 8.687 16.559 9.688 175.471 100 A
3 NaN NaN NaN 608.0 964.0 NaN NaN NaN 7.665 7.359 NaN 101.121 100 A
4 68.0 686.985 61.169 571.0 880.0 126.147 8.809 126.192 8.811 15.136 5.720 168.133 100 A
5 NaN NaN 69.438 566.0 792.0 348.500 7.500 NaN 7.508 NaN 3.088 NaN 100 A
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
387 152.0 801.599 111.328 582.0 1263.0 348.487 497.632 348.451 497.675 17.773 10.889 11.829 100 A
388 17.0 742.706 69.624 620.0 884.0 420.500 496.382 420.513 NaN NaN 3.663 49.457 100 A
389 60.0 758.033 77.309 601.0 947.0 259.000 499.300 258.990 499.289 9.476 8.062 90.000 100 A
390 12.0 714.833 67.294 551.0 785.0 240.167 498.167 240.179 498.148 4.606 3.317 168.690 100 A
391 23.0 695.043 67.356 611.0 846.0 49.891 503.022 49.882 502.979 6.454 4.537 73.243 100 A

391 rows × 14 columns

Viewing the data#

Viewing data can be tricky, especially when working with large tables.

data.head(10) # top 10 rows
Area Mean StdDev Min Max X Y XM YM Major Minor Angle %Area Type
1 18.0 730.389 103.354 592.0 948.0 435.000 4.722 434.962 4.697 5.987 3.828 168.425 100 A
2 126.0 718.333 90.367 556.0 1046.0 388.087 8.683 388.183 8.687 16.559 9.688 175.471 100 A
3 NaN NaN NaN 608.0 964.0 NaN NaN NaN 7.665 7.359 NaN 101.121 100 A
4 68.0 686.985 61.169 571.0 880.0 126.147 8.809 126.192 8.811 15.136 5.720 168.133 100 A
5 NaN NaN 69.438 566.0 792.0 348.500 7.500 NaN 7.508 NaN 3.088 NaN 100 A
6 669.0 697.164 72.863 539.0 957.0 471.696 26.253 471.694 26.197 36.656 23.237 124.340 100 A
7 5.0 658.600 49.161 607.0 710.0 28.300 8.100 28.284 8.103 3.144 2.025 161.565 100 A
8 7.0 677.571 49.899 596.0 768.0 415.357 8.786 415.360 8.804 4.110 2.168 112.500 100 A
9 14.0 691.071 63.873 586.0 808.0 493.286 9.000 493.295 9.016 5.120 3.481 38.802 100 C
10 39.0 763.615 88.786 623.0 1016.0 157.526 12.731 157.592 12.757 8.815 5.633 46.437 100 C
data.tail(10) # bottom 10 rows
Area Mean StdDev Min Max X Y XM YM Major Minor Angle %Area Type
382 45.0 734.356 68.637 575.0 867.0 171.500 494.789 171.492 494.739 14.630 3.916 95.698 100 B
383 94.0 746.617 85.198 550.0 1021.0 194.032 498.223 194.014 498.239 17.295 6.920 52.720 100 B
384 35.0 776.257 74.746 611.0 961.0 268.957 493.586 268.977 NaN NaN 5.990 111.193 100 A
385 35.0 739.286 NaN 593.0 928.0 291.871 493.843 291.871 493.806 NaN 5.352 79.368 100 A
386 14.0 736.143 81.533 646.0 902.0 315.000 493.000 314.989 493.003 NaN 3.676 45.000 100 A
387 152.0 801.599 111.328 582.0 1263.0 348.487 497.632 348.451 497.675 17.773 10.889 11.829 100 A
388 17.0 742.706 69.624 620.0 884.0 420.500 496.382 420.513 NaN NaN 3.663 49.457 100 A
389 60.0 758.033 77.309 601.0 947.0 259.000 499.300 258.990 499.289 9.476 8.062 90.000 100 A
390 12.0 714.833 67.294 551.0 785.0 240.167 498.167 240.179 498.148 4.606 3.317 168.690 100 A
391 23.0 695.043 67.356 611.0 846.0 49.891 503.022 49.882 502.979 6.454 4.537 73.243 100 A

Overview descriptive statistics#

To get a glimpse of the range of values which exist in the given table, we can ask the DateFrame to describe itself using DataFrame.describe(). It will display count, mean, standard deviation and other descriptive statistics for each column in our table.

data.describe()
Area Mean StdDev Min Max X Y XM YM Major Minor Angle %Area
count 389.000000 386.000000 388.000000 388.000000 388.000000 389.000000 388.000000 388.000000 386.000000 383.000000 388.000000 390.000000 391.0
mean 107.164524 743.455565 76.575309 610.414948 962.922680 256.419859 254.384088 256.183338 253.353005 12.481016 9.500662 86.598441 100.0
std 241.037082 42.252140 31.844864 57.156709 244.897224 152.261694 155.080074 152.380388 154.426250 11.979176 49.714280 60.593686 0.0
min 1.000000 587.000000 0.000000 516.000000 587.000000 3.978000 4.722000 4.012000 4.697000 1.128000 1.128000 0.000000 100.0
25% 15.000000 717.060750 63.861000 570.750000 847.750000 127.142000 102.875250 126.923250 103.813750 5.098000 3.637250 34.517250 100.0
50% 44.000000 741.077500 74.727000 599.000000 917.500000 243.300000 271.490000 242.288000 271.272000 9.374000 5.886000 89.703500 100.0
75% 116.000000 767.260750 86.826500 633.250000 1014.500000 400.167000 395.058250 400.363500 393.800750 16.283000 9.017250 134.617250 100.0
max 2755.000000 912.938000 377.767000 877.000000 3880.000000 508.214000 503.022000 508.169000 502.979000 144.475000 981.000000 568.000000 100.0

Sorting in tables#

In many cases, we are interested in table rows that contain the maximum value, e.g. in the area column we can find the largest object:

data.sort_values(by = "Area", ascending=False)
Area Mean StdDev Min Max X Y XM YM Major Minor Angle %Area Type
190 2755.0 859.928 235.458 539.0 3880.0 108.710 302.158 110.999 300.247 144.475 24.280 39.318 100 C
81 2295.0 765.239 96.545 558.0 1431.0 375.003 134.888 374.982 135.359 65.769 44.429 127.247 100 B
209 1821.0 847.761 122.074 600.0 1510.0 287.795 321.115 288.074 321.824 55.879 41.492 112.124 100 A
252 1528.0 763.777 83.183 572.0 1172.0 191.969 385.944 192.487 385.697 63.150 30.808 34.424 100 B
265 1252.0 793.371 117.139 579.0 1668.0 262.071 394.497 262.268 394.326 60.154 26.500 50.147 100 A
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
113 1.0 587.000 0.000 587.0 587.0 399.500 117.500 399.500 117.500 1.128 1.128 0.000 100 A
310 1.0 866.000 0.000 866.0 866.0 343.500 408.500 343.500 408.500 1.128 1.128 0.000 100 A
219 1.0 763.000 0.000 763.0 763.0 411.500 296.500 411.500 296.500 1.128 1.128 0.000 100 A
3 NaN NaN NaN 608.0 964.0 NaN NaN NaN 7.665 7.359 NaN 101.121 100 A
5 NaN NaN 69.438 566.0 792.0 348.500 7.500 NaN 7.508 NaN 3.088 NaN 100 A

391 rows × 14 columns