Week 10
Friday
Activity: Reading from a file - Videogame Dataset 🎮
You will use Python to uncover some information about a dataset. Download and save the file video_games.tsv to your cs100/ch5 folder. Be sure to use the same name for the file. This is a tab-delimited data file (each field is separated by a tab character). This can be opened with Excel, Numbers, or some other spreadsheet application. You may browse the file and see the type of data that is present.
Copy the following code into your activity-24.py file, saved to the same cs100/ch5 folder. It has a sample function which shows how to get data and calculate the average used price of a game published by Nintendo.
# DATA INDEX VALUES
# Use to access certain data fields in the list
GAME_TITLE = 0
PUBLISHER = 1
REVIEW_SCORE = 2
USED_PRICE = 3
CONSOLE_NAME = 4
ESRB_RATING = 5
RELEASE_YEAR = 6
# return game data as a list of records
# where each record is a list of string fields
def get_data():
data_records = []
game_data = open('video_games.tsv', 'r')
# dispose of TSV header
game_data.readline()
# collect all records
for line in game_data:
allFields = line.split('\t') # separated by tabs
allFields[-1] = allFields[-1].replace("\n","") # get rid of newline
data_records.append(allFields) # each record contains list of all fields
game_data.close()
return data_records
# return the average price of a Nintendo game
def avg_nintendo_game_price():
count = 0 # count of games
running_total = 0 # running total of prices used to calculate average
# Go through each record from the file.
# Each record is a list of fields.
for record in get_data():
# Use the PUBLISHER index to get the name of the publisher
if record[PUBLISHER] == 'Nintendo':
# Use the USED_PRICE index to get the used cost of the game
running_total += float(record[USED_PRICE])
count += 1
# Use the count and the sum of all prices to compute an average
# Then, round to nearest 2 decimal places
avg = running_total / count
return round(avg, 2)
# TODO: Define your functions here (all function definitions MUST
# return data and NOT use the print function)
# TODO: Call the functions you defined here and print the data they return neatly
# For example, this prints the result of the function call to find the average nintendo price
print("Average price of Nintendo Games: ${0}".format(avg_nintendo_game_price()) )
Take some time to explore the data. Come up with three additional functions that present some information about the dataset. Some ideas might be:
- How many game titles were released by a certain publisher? (e.g., num_titles_released_by(publisher))
- How many games are rated over a certain score? (num_games_over(score))
- What game has the highest used price value?
- How many games were released in a specific year?
- What year has the most games released?
- What is the averge score of all games made by a certain publisher?
- What games has the longest title?
None of your function definitions should print. They should all use the return statement to provide data. When you call your function, you will print the output to the screen neatly with a clear explanation of what the value represents (this will use the print function). Use the existing code as a guide.
If you finish early
Write more than three additional functions! What other information would you like to know about this dataset? Explore a bit!
How to submit
Submit your working python file AND the video_games.tsv file to Moodle.