Exploring tabular data#
When working with data in tables, the ability of quickly getting an overview about the data is key.
import pandas as pd
Loading CSV files from disk#
To ensure compatility beween different software for processing tabular data the CSV file format is commonly used. We can open those files using pandas.read_csv.
data = pd.read_csv('../../data/Results.csv', index_col=0, delimiter=';')
data
| Area | Mean | StdDev | Min | Max | X | Y | XM | YM | Major | Minor | Angle | %Area | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 18.0 | 730.389 | 103.354 | 592.0 | 948.0 | 435.000 | 4.722 | 434.962 | 4.697 | 5.987 | 3.828 | 168.425 | 100 | A |
| 2 | 126.0 | 718.333 | 90.367 | 556.0 | 1046.0 | 388.087 | 8.683 | 388.183 | 8.687 | 16.559 | 9.688 | 175.471 | 100 | A |
| 3 | NaN | NaN | NaN | 608.0 | 964.0 | NaN | NaN | NaN | 7.665 | 7.359 | NaN | 101.121 | 100 | A |
| 4 | 68.0 | 686.985 | 61.169 | 571.0 | 880.0 | 126.147 | 8.809 | 126.192 | 8.811 | 15.136 | 5.720 | 168.133 | 100 | A |
| 5 | NaN | NaN | 69.438 | 566.0 | 792.0 | 348.500 | 7.500 | NaN | 7.508 | NaN | 3.088 | NaN | 100 | A |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 387 | 152.0 | 801.599 | 111.328 | 582.0 | 1263.0 | 348.487 | 497.632 | 348.451 | 497.675 | 17.773 | 10.889 | 11.829 | 100 | A |
| 388 | 17.0 | 742.706 | 69.624 | 620.0 | 884.0 | 420.500 | 496.382 | 420.513 | NaN | NaN | 3.663 | 49.457 | 100 | A |
| 389 | 60.0 | 758.033 | 77.309 | 601.0 | 947.0 | 259.000 | 499.300 | 258.990 | 499.289 | 9.476 | 8.062 | 90.000 | 100 | A |
| 390 | 12.0 | 714.833 | 67.294 | 551.0 | 785.0 | 240.167 | 498.167 | 240.179 | 498.148 | 4.606 | 3.317 | 168.690 | 100 | A |
| 391 | 23.0 | 695.043 | 67.356 | 611.0 | 846.0 | 49.891 | 503.022 | 49.882 | 502.979 | 6.454 | 4.537 | 73.243 | 100 | A |
391 rows × 14 columns
Viewing the data#
Viewing data can be tricky, especially when working with large tables.
data.head(10) # top 10 rows
| Area | Mean | StdDev | Min | Max | X | Y | XM | YM | Major | Minor | Angle | %Area | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 18.0 | 730.389 | 103.354 | 592.0 | 948.0 | 435.000 | 4.722 | 434.962 | 4.697 | 5.987 | 3.828 | 168.425 | 100 | A |
| 2 | 126.0 | 718.333 | 90.367 | 556.0 | 1046.0 | 388.087 | 8.683 | 388.183 | 8.687 | 16.559 | 9.688 | 175.471 | 100 | A |
| 3 | NaN | NaN | NaN | 608.0 | 964.0 | NaN | NaN | NaN | 7.665 | 7.359 | NaN | 101.121 | 100 | A |
| 4 | 68.0 | 686.985 | 61.169 | 571.0 | 880.0 | 126.147 | 8.809 | 126.192 | 8.811 | 15.136 | 5.720 | 168.133 | 100 | A |
| 5 | NaN | NaN | 69.438 | 566.0 | 792.0 | 348.500 | 7.500 | NaN | 7.508 | NaN | 3.088 | NaN | 100 | A |
| 6 | 669.0 | 697.164 | 72.863 | 539.0 | 957.0 | 471.696 | 26.253 | 471.694 | 26.197 | 36.656 | 23.237 | 124.340 | 100 | A |
| 7 | 5.0 | 658.600 | 49.161 | 607.0 | 710.0 | 28.300 | 8.100 | 28.284 | 8.103 | 3.144 | 2.025 | 161.565 | 100 | A |
| 8 | 7.0 | 677.571 | 49.899 | 596.0 | 768.0 | 415.357 | 8.786 | 415.360 | 8.804 | 4.110 | 2.168 | 112.500 | 100 | A |
| 9 | 14.0 | 691.071 | 63.873 | 586.0 | 808.0 | 493.286 | 9.000 | 493.295 | 9.016 | 5.120 | 3.481 | 38.802 | 100 | C |
| 10 | 39.0 | 763.615 | 88.786 | 623.0 | 1016.0 | 157.526 | 12.731 | 157.592 | 12.757 | 8.815 | 5.633 | 46.437 | 100 | C |
data.tail(10) # bottom 10 rows
| Area | Mean | StdDev | Min | Max | X | Y | XM | YM | Major | Minor | Angle | %Area | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 382 | 45.0 | 734.356 | 68.637 | 575.0 | 867.0 | 171.500 | 494.789 | 171.492 | 494.739 | 14.630 | 3.916 | 95.698 | 100 | B |
| 383 | 94.0 | 746.617 | 85.198 | 550.0 | 1021.0 | 194.032 | 498.223 | 194.014 | 498.239 | 17.295 | 6.920 | 52.720 | 100 | B |
| 384 | 35.0 | 776.257 | 74.746 | 611.0 | 961.0 | 268.957 | 493.586 | 268.977 | NaN | NaN | 5.990 | 111.193 | 100 | A |
| 385 | 35.0 | 739.286 | NaN | 593.0 | 928.0 | 291.871 | 493.843 | 291.871 | 493.806 | NaN | 5.352 | 79.368 | 100 | A |
| 386 | 14.0 | 736.143 | 81.533 | 646.0 | 902.0 | 315.000 | 493.000 | 314.989 | 493.003 | NaN | 3.676 | 45.000 | 100 | A |
| 387 | 152.0 | 801.599 | 111.328 | 582.0 | 1263.0 | 348.487 | 497.632 | 348.451 | 497.675 | 17.773 | 10.889 | 11.829 | 100 | A |
| 388 | 17.0 | 742.706 | 69.624 | 620.0 | 884.0 | 420.500 | 496.382 | 420.513 | NaN | NaN | 3.663 | 49.457 | 100 | A |
| 389 | 60.0 | 758.033 | 77.309 | 601.0 | 947.0 | 259.000 | 499.300 | 258.990 | 499.289 | 9.476 | 8.062 | 90.000 | 100 | A |
| 390 | 12.0 | 714.833 | 67.294 | 551.0 | 785.0 | 240.167 | 498.167 | 240.179 | 498.148 | 4.606 | 3.317 | 168.690 | 100 | A |
| 391 | 23.0 | 695.043 | 67.356 | 611.0 | 846.0 | 49.891 | 503.022 | 49.882 | 502.979 | 6.454 | 4.537 | 73.243 | 100 | A |
Overview descriptive statistics#
To get a glimpse of the range of values which exist in the given table, we can ask the DateFrame to describe itself using DataFrame.describe(). It will display count, mean, standard deviation and other descriptive statistics for each column in our table.
data.describe()
| Area | Mean | StdDev | Min | Max | X | Y | XM | YM | Major | Minor | Angle | %Area | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 389.000000 | 386.000000 | 388.000000 | 388.000000 | 388.000000 | 389.000000 | 388.000000 | 388.000000 | 386.000000 | 383.000000 | 388.000000 | 390.000000 | 391.0 |
| mean | 107.164524 | 743.455565 | 76.575309 | 610.414948 | 962.922680 | 256.419859 | 254.384088 | 256.183338 | 253.353005 | 12.481016 | 9.500662 | 86.598441 | 100.0 |
| std | 241.037082 | 42.252140 | 31.844864 | 57.156709 | 244.897224 | 152.261694 | 155.080074 | 152.380388 | 154.426250 | 11.979176 | 49.714280 | 60.593686 | 0.0 |
| min | 1.000000 | 587.000000 | 0.000000 | 516.000000 | 587.000000 | 3.978000 | 4.722000 | 4.012000 | 4.697000 | 1.128000 | 1.128000 | 0.000000 | 100.0 |
| 25% | 15.000000 | 717.060750 | 63.861000 | 570.750000 | 847.750000 | 127.142000 | 102.875250 | 126.923250 | 103.813750 | 5.098000 | 3.637250 | 34.517250 | 100.0 |
| 50% | 44.000000 | 741.077500 | 74.727000 | 599.000000 | 917.500000 | 243.300000 | 271.490000 | 242.288000 | 271.272000 | 9.374000 | 5.886000 | 89.703500 | 100.0 |
| 75% | 116.000000 | 767.260750 | 86.826500 | 633.250000 | 1014.500000 | 400.167000 | 395.058250 | 400.363500 | 393.800750 | 16.283000 | 9.017250 | 134.617250 | 100.0 |
| max | 2755.000000 | 912.938000 | 377.767000 | 877.000000 | 3880.000000 | 508.214000 | 503.022000 | 508.169000 | 502.979000 | 144.475000 | 981.000000 | 568.000000 | 100.0 |
Sorting in tables#
In many cases, we are interested in table rows that contain the maximum value, e.g. in the area column we can find the largest object:
data.sort_values(by = "Area", ascending=False)
| Area | Mean | StdDev | Min | Max | X | Y | XM | YM | Major | Minor | Angle | %Area | Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 190 | 2755.0 | 859.928 | 235.458 | 539.0 | 3880.0 | 108.710 | 302.158 | 110.999 | 300.247 | 144.475 | 24.280 | 39.318 | 100 | C |
| 81 | 2295.0 | 765.239 | 96.545 | 558.0 | 1431.0 | 375.003 | 134.888 | 374.982 | 135.359 | 65.769 | 44.429 | 127.247 | 100 | B |
| 209 | 1821.0 | 847.761 | 122.074 | 600.0 | 1510.0 | 287.795 | 321.115 | 288.074 | 321.824 | 55.879 | 41.492 | 112.124 | 100 | A |
| 252 | 1528.0 | 763.777 | 83.183 | 572.0 | 1172.0 | 191.969 | 385.944 | 192.487 | 385.697 | 63.150 | 30.808 | 34.424 | 100 | B |
| 265 | 1252.0 | 793.371 | 117.139 | 579.0 | 1668.0 | 262.071 | 394.497 | 262.268 | 394.326 | 60.154 | 26.500 | 50.147 | 100 | A |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 113 | 1.0 | 587.000 | 0.000 | 587.0 | 587.0 | 399.500 | 117.500 | 399.500 | 117.500 | 1.128 | 1.128 | 0.000 | 100 | A |
| 310 | 1.0 | 866.000 | 0.000 | 866.0 | 866.0 | 343.500 | 408.500 | 343.500 | 408.500 | 1.128 | 1.128 | 0.000 | 100 | A |
| 219 | 1.0 | 763.000 | 0.000 | 763.0 | 763.0 | 411.500 | 296.500 | 411.500 | 296.500 | 1.128 | 1.128 | 0.000 | 100 | A |
| 3 | NaN | NaN | NaN | 608.0 | 964.0 | NaN | NaN | NaN | 7.665 | 7.359 | NaN | 101.121 | 100 | A |
| 5 | NaN | NaN | 69.438 | 566.0 | 792.0 | 348.500 | 7.500 | NaN | 7.508 | NaN | 3.088 | NaN | 100 | A |
391 rows × 14 columns