Also mentioned in lectures was the concept of a Huffman code, which is a method of lossless data compression (reducing the amount of data needed to store or transmit information). Recall that a Huffman code can be extracted from a Huffman tree – one basic algorithm for building a Huffman tree is:
1. Start with an empty set T
2. Add all the symbols to set T
3. While there are multiple "trees" in the set:
a. Remove the lowest probability tree from set T (break ties in terms of smaller tree size), and call this tree A
b. Remove the lowest probability tree from set T (break ties in terms of smaller tree size), and call this tree B
c. Make a new tree C by joining A and B, and set the probability of C to p(A) + p(B)
d. Add the new tree C to set T
4. Return the only tree in set T as the Huffman tree
(note: we will cover algorithm concepts is more detail in later classes, for now you only need to know the algorithm itself)
As with entropy, a worked example of building a Huffman tree is discussed in lectures. For example, one possible Huffman tree corresponding to the system shown earlier (with entropy of 2.171 bits) would be:
{{/Labs/01/Images/fig1.svg}}
The Huffman coding can then be extract from this tree by walking the path along each symbol:
Symbol | P(Symbol) | Code | |Code| |

A | 0.2 | 11 | 2 |

B | 0.1 | 100 | 3 |

C | 0.3 | 00 | 2 |

D | 0.3 | 01 | 2 |

E | 0.1 | 101 | 3 |

On average, the code length in this system is 0.2 * 2 + 0.1 * 3 + 0.3 * 2 + 0.3 * 2 + 0.1 * 3 = 2.2 bits. Recall that the entropy of this system is 2.171 bits, so the efficiency of this Huffman coding is 2.171/2.200 = 98.7%. Contrast this with a naïve encoding of our symbols, with three bits per symbol:
Symbol | P(Symbol) | Code | |Code| |

A | 0.2 | 000 | 2 |

B | 0.1 | 001 | 3 |

C | 0.3 | 010 | 2 |

D | 0.3 | 011 | 2 |

E | 0.1 | 100 | 3 |

The code length here is 3 bits, so the efficiency of this system is 2.171/3.000 = 72.4%. In other words, the Huffman coding is over 25% more efficient than a naïve encoding.
**Exercise**
Compute the Huffman tree of the second example from the previous section (the one with entropy of 1.875 bits). Use this tree to extract the Huffman coding, and then compute its efficiency. In doing so, what do you note about the lengths of the codes, both overall and relative to the probabilities of the symbols that they represent?
Draw your tree now to fill in the table below.
Symbol | P(Symbol) | Code | |Code| |

A | 0.625 | | |

B | 0.25 | | |

C | 0.5 | | |

D | 0.0625 | | |

E | 0.125 | | |