Activity 24 - Videogame Dataset 🎮

You will use Python to uncover some information about a dataset. Download and save the file video_games.tsv to your cs100/ch5 folder. Be sure to use the same name for the file. This is a tab-delimited data file (each field is separated by a tab character). This can be opened with Excel, Numbers, or some other spreadsheet application. You may browse the file and see the type of data that is present.

Copy the following code into your activity-24.py file, saved to the same cs100/ch5 folder. It has a sample function which shows how to get data and calculate the average used price of a game published by Nintendo.

# DATA INDEX VALUES
# Use to access certain data fields in the list
GAME_TITLE = 0
PUBLISHER = 1
REVIEW_SCORE = 2
USED_PRICE = 3
CONSOLE_NAME = 4
ESRB_RATING = 5
RELEASE_YEAR = 6


# return game data as a list of records
# where each record is a list of string fields
def get_data():
    data_records = []
    game_data = open('video_games.tsv', 'r')
    
    # dispose of TSV header
    game_data.readline()
        
    # collect all records
    for line in game_data:
        allFields = line.split('\t')   # separated by tabs
        allFields[-1] = allFields[-1].replace("\n","") # get rid of newline
        data_records.append(allFields) # each record contains list of all fields
    
    game_data.close()
    return data_records


# return the average price of a Nintendo game
def avg_nintendo_game_price():
    count = 0          # count of games
    running_total = 0  # running total of prices used to calculate average

    # Go through each record from the file.
    # Each record is a list of fields.
    for record in get_data():

        # Use the PUBLISHER index to get the name of the publisher
        if record[PUBLISHER] == 'Nintendo':
            # Use the USED_PRICE index to get the used cost of the game
            running_total += float(record[USED_PRICE])
            count += 1
    
    # Use the count and the sum of all prices to compute an average
    # Then, round to nearest 2 decimal places
    avg = running_total / count
    return round(avg, 2)

# TODO: Define your functions here (all function definitions MUST
#    return data and NOT use the print function)


# TODO: Call the functions you defined here and print the data they return neatly
# For example, this prints the result of the function call to find the average nintendo price
print("Average price of Nintendo Games: ${0}".format(avg_nintendo_game_price()) )

Take some time to explore the data. Come up with three additional functions that present some information about the dataset. Some ideas might be:

None of your function definitions should print. They should all use the return statement to provide data. When you call your function, you will print the output to the screen neatly with a clear explanation of what the value represents (this will use the print function). Use the existing code as a guide.

If you finish early

Write more than three additional functions! What other information would you like to know about this dataset? Explore a bit!

How to submit

Submit your working python file AND the video_games.tsv file to Moodle.