Around the World
Take some time to read Colin Morris’s exercise in language compression, which considers whether pop lyrics are getting more repetitive.
-
(2 points.) In no more than two sentences, what is it about the lyrics for “Around the World” that make them so compressible?
-
(2 points.) In no more than two sentences, why isn’t it worthwhile to compress short substrings, even if they appear repeatedly in a song?
Consider a childhood song like “Row Row Row Your Boat”, whose lyrics are the below:
Row, row, row your boat
Gently down the stream
Merrily merrily, merrily, merrily
Life is but a dream
Row, row, row your boat
Gently down the stream
Merrily merrily, merrily, merrily
Life is but a dream
Row, row, row your boat
Gently down the stream
Merrily merrily, merrily, merrily
Life is but a dream
Row, row, row your boat
Gently down the stream
Merrily merrily, merrily, merrily
Life is but a dream
Were you to store those lyrics in a text file, the file would be 404 bytes, as the lyrics comprise 404 characters, including line endings (\n
).
- (3 points.) Propose how you could store those same lyrics in a text file using significantly fewer than 404 bytes, in such a way that someone else, familiar with your format, could recover the original lyrics.
- (2 points.) Compress those lyrics per your proposal, storing the result in a text file called
lyrics.txt
containing only ASCII characters. - (1 point.) How many bytes is your
lyrics.txt
? You might find Linux’swc
command helpful for such:wc -m lyrics.txt
Consider the flag of Italy, below:
Suppose that an image of that flag, in RGB format, is
- (3 points.) Propose how you could store that same image in a binary file using significantly fewer than
bytes, in such a way that someone else, familiar with your format, could recover the original image. - (3 points.) Among the flags of the world, which country’s flag would compress even more than Italy’s, if you applied your same proposal to it? In no more than two sentences, why?
- (3 points.) Among the flags of the world, which country’s flag would compress significantly less than Italy’s, if you applied your same proposal to it? In no more than two sentences, why?