Older
labs / tiddlers / content / labs / lab01 / _Labs_01_Decoding and Encoding Data Sequences.md

Recall in lectures that we presented a Huffman tree that was constructed by examining the frequency of characters on Wikipedia:

{{/Labs/01/Images/fig3.svg}}

The corresponding Huffman coding for this tree is:

{{/Labs/01/Images/HuffmanBySymbol.svg}}

Using this coding, we can decode the sequence

`1100010100010100110100001100111001111011100111`

as:

11000 = C (remaining bits: 10100010100110100001100111001111011100111)

1010 = O (remaining bits: 0010100110100001100111001111011100111)

001010 = M (remaining bits: 0110100001100111001111011100111)

011010 = P (remaining bits: 0001100111001111011100111)

000 = [space] (remaining bits: 1100111001111011100111)

1100111 = 1 (remaining bits: 001111011100111)

00111101 = 0 (remaining bits: 1100111)

1100111 = 1 (no remaining bits, end of sequence)

Resulting in the string 'COMP 101'.

### Exercise

Use this Huffman coding to decode the following bits:

`11010011001001001100110101001`

Encoding information using a Huffman coding follows the opposite direction (look up the symbol and emit the corresponding code). This is straightforward when the table describing the Huffman coding is sorted by symbol:

{{/Labs/01/Images/HuffmanByCharacter.svg}}

Now, we can quickly confirm that the sequence ‘QWERTY’ would be encoded as:

1000100001(Q) 100011(W) 111(E) 1011(R​) 0101(T) 0010000(Y)

1000100001100011111101101010010000 (34 bits).