# Shannon Information & Entropy

We live in the so-called ‘information age’ but try asking someone ‘what is information?‘, ‘how much is there?‘ , or ‘how much can I expect to receive?‘.

This can be fruitless – unless you happen to be standing next to a student of Claude Shannon…

Claude Shannon’s theory of information and entropy provides a robust way of quantifying how much information is in a message and how much information we can expect to receive in the future from an information source.

How much information is there within a message?

Shannon’s measure of information relies on one clever yet simple intuition:

• Someone telling you what you already know is less informative than someone telling you something unexpected.

Mathematically this can be stated as follows:

• I = log ( 1 / p ) = -log ( p )

where I is the quantity of information and p is the probability of receiving a message containing that information.

To illustrate suppose that you buy a lottery ticket with a one in a million chance of winning then on being told that you have won you would receive -log ( 1 / 1,000,000 ) ~ 20 bits of information. On the other hand if you were unsurprisingly told that you had lost you would receive -log( 999,999 / 1,000,000 ) ~ 0 bits of information.

It also turns out that information has nice mathematical properties e.g. information is additive.

If two independent messages with probability p1 and probability p2 are received then the joint probability of receiving both messages is p1 * p2.

• I1 + I2 = -log ( p1 ) + -log( p2 ) = -log( p1 * p2 )

How much information can we expect to receive in the future?

Entropy is a measure of the information one can expect to receive in the future from an information source. It is calculated by taking into account all possible events and calculating the average event.

Take for example the probability of you receiving an email in the next minute. There is a probability p that you will receive an email and a probability 1-p that you won’t. To calculate the the entropy you need to work out the information of each outcome and then weight. This can be represented mathematically as follows:

• H = p*I1 + (1-p)*I2
• H = -p*log( p ) + -(1-p)*log( 1-p )

where H is entropy p is the probability of receiving an email in next minute and I1 and I2 is the quantity of information received by receiving / not receiving an email in the next minute.

To illustrate suppose that there is a 1/10 chance of receiving an email in the next minute then I can calculate the average expected information that I will receive in the next minute (from the presence / absence of receiving emails):
H = -1/10 * log ( 1/10 ) + 9/10 * log ( 9 / 10 ) = 0.5 bits of information.

Do Shannon’s information and entropy measures tell us about the usefulness of information?

Shannon’s information and entropy measures do not say much as to whether you’ll find a piece of information useful – they are simply a measure of the likelihood of a message and the predictability of an information source. It turns out Shannon’s measures are very important for data encoding schemes and data compression where data is treated as a set of symbols to be transmitted however they say nothing about a message’s semantic content. They are also interesting because they get you to think differently as to what counts as a message e.g. the presence / absence of an email gives you information about the state of the world.

In another post I’ll describe how-to measure the economic value of the semantic content of information thereby providing a measure the usefulness of a piece of information.