Information relates to uncertainty. The Shannon information content of an outcome \(x\) is \(h(x)=-log_{2}P(x)\). The rare event has larger information than a common event. The unit of information is a bit (binary digit). Coding is a mapping from an outcome of an ensemble to binary digits \(\{0,1\}^+\). A symbol code is a code for a single ensemble. A block code is a code for a sequence ensemble. A set of sequences of the ensemble has a typical subset. The cardinality of a typical set is \(2^{H_{2}X}\). We can reduce a code length by mapping codes to only a typical set (the source coding theorem). The prefix code is an optimal symbol code. The Kraft inequality is the condition of prefix code \(\Sigma_{i}2^{-l_{i}} \le 1\).
The noisy-channel coding theorem describes the possible rate and block code length \(N\). If the block code length \(N\) is long enough, the channel looks like the noisy typewriter and arbitrary block error rate can be achieved with rate. The maximum rate is the capacity \(C\) of the channel. If the rate is small enough, the typical set of the output of the channel can be mapped for the typical set of input without overlap.