Exercise 05: Histogram
You are going to create a simple program to display the distribution of letters in a string.
This exercise is designed to exercise:
- for loops
- lists
- math operations
- string functions
Setup
- Download the zip file containing the starter code.
- Extract the
ex05
folder an place it into yourcs102
folder - Open the file
histogram.py
with Thonny
Assignment
The provided code contains an exceptionally long string only contains lower-case letters a through z and nothing else. You will need to count the occurence of each letter that occurs in the string as store the letter counts in a list. The ultimate goal will be to display a normalized text-based histogram of the letter distributions.
The code contains a few constant values to help you out:
- ASCII_OFFSET = ord(“a”)
- This holds the numeric value used to represent ‘a’
- HISTOGRAM_SYMBOL = “>”
- This is the symbol that will be used for each “tick” of the histogram bar
- MAX_HISTOGRAM_LENGTH = 70
- This is the maximum length of any bar in the histogram
- BIG_STRING = …
- This will hold the massive string you will need to process
Let’s walk through a complete example.
Assume we have the following (much shorter) string:
abcccaddd
We need to know three things in order to produce our histogram:
- How many times does each letter appear?
- What is the largest occurrence of a letter (a tie doesn’t matter)?
- What is the ratio that a letter occurs with respect to the letter that occurs most often?
For our example, the counts are:
a: 2
b: 1
c: 3
d: 3
The letters that appear the most are c and d, both with a maximum occurrence of 3.
Since we will have a very large string in our project, we want to normalize the number of HISTOGRAM_SYMBOL
(‘>’) characters we print to represent the bar of our histogram. In our case, we are normalizing to MAX_HISTOGRAM_LENGTH
(70). For each letter, we calculate the ratio of it’s appearance with respect to the largest value. So if we were to calculate the ratio for ‘a’ it would be:
ratio = 2 / 3
We then take that ratio and multiply it by the maximum length our histogram can be MAX_HISTOGRAM_LENGTH
so we can display the approriate number of HISTOGRAM_SYMBOL
characters.
display_symbol_count = ratio * MAX_HISTOGRAM_LENGTH
An example of the expected output is:
a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2
b >>>>>>>>>>>>>>>>>>>>>>> 1
c >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3
d >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3
e 0
f 0
g 0
...
z 0
Each line of the histogram output displays the letter, a space, the histogram bar, another space, and finally the count for each letter. The ellipis (…) is used only in my example to shorten the example. Your program will always output all the results for the letters a through z regardless of their appearance count.
Notice how the counts for c and d have exactly 70 HISTOGRAM_SYMBOL
s, b is roughly one-third of the length, and a is roughly two-thirds of the length. This is due to the normalization described above.
Hints
- We will need to maintain a running total of all letter counts. How could we use a list with enough space to hold the count for each letter (think about how many letters there are…)?
- You will not be able to display a fraction of a
HISTOGRAM_SYMBOL
. Only whole numbers will be possible.
HINT: There are some useful functions that will help you with your tasks.
Submission
Right click your ex05
assignment folder and choose compress
on MacOS or Compress to ZIP file
on Windows. Upload the zip file to the matching Moodle assignment to submit your work.
Grading
You will earn up to 10 points for this exercise, broken down as follows:
- 1 points - the program does not crash
- 3 point - the program only uses one list to hold the letter counts
- 3 point - each histogram line matches the described format
- 3 points - the program outputs the correct result