Less is More
Problem 1
T3A4C2G1A3
Problem 2
Sequences with few runs of repeated characters are likely to increase the file size rather than decrease it. For example, ATGC
would be represented as A1T1G1C1
, which requires 8 characters instead of the original 4.
Problem 3
Germany’s flag would be compressed more. Since each row of pixels in Germany’s flag only contains a single color (as compared to Romania’s three), the runs of pixels in Germany’s flag are longer and therefore can be compressed more.
Problem 4
11101011110
Problem 5
11101011110
Problem 6
The problem with the encoding is that it’s ambiguous: it’s possible that two distinct DNA sequences have the same encoding. Another encoding would be to use two bits for each nucleotide: 00
for A
, 01
for C
, 10
for G
, and 11
for T
.