Prompting tasks for pandas#

In this notebook we use bob to explore and plot some tabular data using pandas.

from bia_bob import bob

In this example we use a videogame sales dataset by Gregory Smith which is licensed CC-BY 4.0.

%%bob
write python code that
* opens the file vgsales.csv using pandas
* if the file doesn't exist, open it from https://zenodo.org/records/5898311/files/vgsales.csv?download=1 
  and save it as vgsales.csv

Plan:

  1. Import the pandas library.

  2. Use the try-except block to check if the file vgsales.csv exists.

  3. If the file exists, open it using pandas.

  4. If the file doesn’t exist, download it from the provided URL and save it as vgsales.csv.

  5. Open the file using pandas.

Code:

import pandas as pd

try:
    # Try to open the file
    df = pd.read_csv('vgsales.csv')
except FileNotFoundError:
    # If the file doesn't exist, download it and save as 'vgsales.csv'
    url = 'https://zenodo.org/records/5898311/files/vgsales.csv?download=1'
    df = pd.read_csv(url)
    df.to_csv('vgsales.csv')

# Print the first few rows of the dataframe
print(df.head())
   Rank                      Name Platform    Year         Genre Publisher  \
0     1                Wii Sports      Wii  2006.0        Sports  Nintendo   
1     2         Super Mario Bros.      NES  1985.0      Platform  Nintendo   
2     3            Mario Kart Wii      Wii  2008.0        Racing  Nintendo   
3     4         Wii Sports Resort      Wii  2009.0        Sports  Nintendo   
4     5  Pokemon Red/Pokemon Blue       GB  1996.0  Role-Playing  Nintendo   

   NA_Sales  EU_Sales  JP_Sales  Other_Sales  Global_Sales  
0     41.49     29.02      3.77         8.46         82.74  
1     29.08      3.58      6.81         0.77         40.24  
2     15.85     12.88      3.79         3.31         35.82  
3     15.75     11.01      3.28         2.96         33.00  
4     11.27      8.89     10.22         1.00         31.37  
%%bob 
group and sort the data by Year and 
draw a bar plot of Global_Sales

Plan:

  1. Group the data by the ‘Year’ column.

  2. Calculate the sum of ‘Global_Sales’ for each year.

  3. Sort the data in ascending order by year.

  4. Plot a bar plot of the ‘Global_Sales’ values.

Code:

import pandas as pd
import matplotlib.pyplot as plt

# Group the data by 'Year' and calculate the sum of 'Global_Sales'
grouped_data = df.groupby('Year')['Global_Sales'].sum()

# Sort the data in ascending order by year
sorted_data = grouped_data.sort_values(ascending=True)

# Plot a bar plot of the 'Global_Sales' values
plt.bar(sorted_data.index, sorted_data.values)
plt.xlabel('Year')
plt.ylabel('Global Sales')
plt.title('Global Sales per Year')

# Adjust the x-axis labels for better visibility
plt.xticks(rotation=45)

# Show the plot
plt.show()
../_images/ed880cfb45348ecb76a79cc717989da467767009c1cc031d6bb439ee29575157.png
%%bob
Select the 10 datasets with the highest Global_Sales

Plan:

  1. Sort the dataset based on the ‘Global_Sales’ column in descending order.

  2. Select the first 10 datasets from the sorted dataset.

Code:

# Sort the dataset based on 'Global_Sales' in descending order
sorted_data = df.sort_values(by='Global_Sales', ascending=False)

# Select the first 10 datasets with highest 'Global_Sales'
top_10_datasets = sorted_data.head(10)

# Display the selected datasets
print(top_10_datasets)
   Rank                       Name Platform    Year         Genre Publisher  \
0     1                 Wii Sports      Wii  2006.0        Sports  Nintendo   
1     2          Super Mario Bros.      NES  1985.0      Platform  Nintendo   
2     3             Mario Kart Wii      Wii  2008.0        Racing  Nintendo   
3     4          Wii Sports Resort      Wii  2009.0        Sports  Nintendo   
4     5   Pokemon Red/Pokemon Blue       GB  1996.0  Role-Playing  Nintendo   
5     6                     Tetris       GB  1989.0        Puzzle  Nintendo   
6     7      New Super Mario Bros.       DS  2006.0      Platform  Nintendo   
7     8                   Wii Play      Wii  2006.0          Misc  Nintendo   
8     9  New Super Mario Bros. Wii      Wii  2009.0      Platform  Nintendo   
9    10                  Duck Hunt      NES  1984.0       Shooter  Nintendo   

   NA_Sales  EU_Sales  JP_Sales  Other_Sales  Global_Sales  
0     41.49     29.02      3.77         8.46         82.74  
1     29.08      3.58      6.81         0.77         40.24  
2     15.85     12.88      3.79         3.31         35.82  
3     15.75     11.01      3.28         2.96         33.00  
4     11.27      8.89     10.22         1.00         31.37  
5     23.20      2.26      4.22         0.58         30.26  
6     11.38      9.23      6.50         2.90         30.01  
7     14.03      9.20      2.93         2.85         29.02  
8     14.59      7.06      4.70         2.26         28.62  
9     26.93      0.63      0.28         0.47         28.31  

Exercise#

Use %%bob to determine the most sold game.