Exercise 04: Histogram

You are going to create a simple program to display the distribution of letters in a string.

This exercise is designed to exercise:

Setup

  1. Download the zip file containing the starter code.
  2. Extract the ex05 folder an place it into your cs102 folder
  3. Open the file histogram.py with Thonny

Assignment

The provided code contains an exceptionally long string only contains lower-case letters a through z and nothing else. You will need to count the occurence of each letter that occurs in the string as store the letter counts in a list. The ultimate goal will be to display a normalized text-based histogram of the letter distributions.

The code contains a few constant values to help you out:

Let’s walk through a complete example.

Assume we have the following (much shorter) string:

abcccaddd

We need to know three things in order to produce our histogram:

  1. How many times does each letter appear?
  2. What is the largest occurrence of a letter (a tie doesn’t matter)?
  3. What is the ratio that a letter occurs with respect to the letter that occurs most often?

For our example, the counts are:

a: 2
b: 1
c: 3
d: 3

The letters that appear the most are c and d, both with a maximum occurrence of 3.

Since we will have a very large string in our project, we want to normalize the number of HISTOGRAM_SYMBOL (‘>’) characters we print to represent the bar of our histogram. In our case, we are normalizing to MAX_HISTOGRAM_LENGTH (70). For each letter, we calculate the ratio of it’s appearance with respect to the largest value. So if we were to calculate the ratio for ‘a’ it would be:

ratio = 2 / 3

We then take that ratio and multiply it by the maximum length our histogram can be MAX_HISTOGRAM_LENGTH so we can display the approriate number of HISTOGRAM_SYMBOL characters.

display_symbol_count = ratio * MAX_HISTOGRAM_LENGTH

An example of the expected output is:

a >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2
b >>>>>>>>>>>>>>>>>>>>>>> 1
c >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3
d >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3
e  0
f  0
g  0
...
z  0

Each line of the histogram output displays the letter, a space, the histogram bar, another space, and finally the count for each letter. The ellipis (…) is used only in my example to shorten the example. Your program will always output all the results for the letters a through z regardless of their appearance count.

Notice how the counts for c and d have exactly 70 HISTOGRAM_SYMBOLs, b is roughly one-third of the length, and a is roughly two-thirds of the length. This is due to the normalization described above.

Hints

HINT: There are some useful functions that will help you with your tasks.

Submission

Right click your ex05 assignment folder and choose compress on MacOS or Compress to ZIP file on Windows. Upload the zip file to the matching Moodle assignment to submit your work.

Grading

You will earn up to 5 points for this exercise, broken down as follows: