Around the World

Take some time to read Colin Morris’s exercise in language compression, which considers whether pop lyrics are getting more repetitive.

  1. (2 points.) In no more than two sentences, what is it about the lyrics for “Around the World” that make them so compressible?

  2. (2 points.) In no more than two sentences, why isn’t it worthwhile to compress short substrings, even if they appear repeatedly in a song?


Consider a childhood song like “Row Row Row Your Boat”, whose lyrics are the below:

Row, row, row your boat
Gently down the stream
Merrily merrily, merrily, merrily
Life is but a dream
Row, row, row your boat
Gently down the stream
Merrily merrily, merrily, merrily
Life is but a dream
Row, row, row your boat
Gently down the stream
Merrily merrily, merrily, merrily
Life is but a dream
Row, row, row your boat
Gently down the stream
Merrily merrily, merrily, merrily
Life is but a dream

Were you to store those lyrics in a text file, the file would be 404 bytes, as the lyrics comprise 404 characters, including line endings (\n).

  1. (3 points.) Propose how you could store those same lyrics in a text file using significantly fewer than 404 bytes, in such a way that someone else, familiar with your format, could recover the original lyrics.
  2. (2 points.) Compress those lyrics per your proposal, storing the result in a text file called lyrics.txt containing only ASCII characters.
  3. (1 point.) How many bytes is your lyrics.txt? You might find Linux’s wc command helpful for such:
    wc -m lyrics.txt
    

Consider the flag of Italy, below:

flag of Italy

Suppose that an image of that flag, in RGB format, is \(300\) pixels wide by \(200\) pixels tall. If each pixel is represented with three bytes (one for red, one for green, one for blue), the image itself would be \(300 \times 200 \times 3 = 180,000\) bytes.

  1. (3 points.) Propose how you could store that same image in a binary file using significantly fewer than \(180,000\) bytes, in such a way that someone else, familiar with your format, could recover the original image.
  2. (3 points.) Among the flags of the world, which country’s flag would compress even more than Italy’s, if you applied your same proposal to it? In no more than two sentences, why?
  3. (3 points.) Among the flags of the world, which country’s flag would compress significantly less than Italy’s, if you applied your same proposal to it? In no more than two sentences, why?