Physics 380, 2011: Lecture 9
From Ilya Nemenman
Back to the main Teaching page.
Back to Physics 380, 2011: Information Processing in Biology.
In these lectures, we cover some background on information theory. A good physics style introduction to this problem can be found in the upcoming book by Bialek (Bialek 2010). A very nice, and probably still the best, introduction to information theory as a theory of communication is (Shannow and Weaver, 1949). A standard and very good textbook on information theory is (Cover and Thomas, 2006).
Warmup questions
- Does noise in signal transduction pathways affect information transmission?
- We would like to characterize how much information is transmitted by a cellular signaling pathway, say the NF-κB pathway depicted on the right (Cheong et al. 2011) , or in E. coli transcription (Guet et al., 2002; Ziv et al., 2007), as shown on the left. What characteristics of the system should we measure in order to be able to quantify this? Specifically, do we need:
- < r > , < r | s > only?
- < r > , < r | s > , and
,
only?
- P(r | s) for all s only?
- P(r | s) for all s and P(s), that is, the entire P(r,s)?
Main lecture
- Setting up the problem: How do we measure information transmitted by a biological signaling system?
- Shannon's axioms and the derivation of entropy: if a variable x is observed from a distribution P(x) then the amount of the information we gain from this observation must obey the following properties.
- If the cardinality of the distribution grows and the distribution is uniform, then the measure of information grows as well.
- The measure of information must be a continuous function of the distribution P(x)
- The measure of information is additive. That is, for a fine graining of x into ξ, we should have
.
Up to a multiplicative constant, the measure of information is then
, which is also called the Boltzman-Shannon entropy. And we fix the constant by defining the entropy of a uniform binary distribution to be 1. Then
. The entropy is then measured in bits.
- Meaning of entropy: Entropy of 1 bit means that we have gained enough information to answer one yes or no (binary) question about the variable x.
- Properties of entropy (positive, limited, convex):
, where k is the cardinality of the distribution. Moreover, the first inequality becomes an equality iff the variable is deterministic (that is, one event has a probability of 1), and the second inequality is an equality iff the distribution is uniform.
- Entropy is a convex function of the distribution
- Entropies of independent variables add.
- Entropy is an extensive quantity: for a joint distribution
, we can define an entropy rate
.
- Differential entropy: a continuous variable x can be discretized with a step Δx, and then the entropy is
. This formally diverges at fine discretization: we need infinitely many bits to fully specify a continuous variable. The integral in the above expression is called the differential entropy, and whenever we write S[X] for continuous variables, we mean the differential entropy.
- Entropy of a normal distribution with variance σ2 is S = 1 / 2log2σ2 + const.
- Multivariate entropy is defined with summation/integration of log-probability over multiple variables, cf. entropy rate above.
- Conditional entropy is defined as averaged log-probability of a conditional distribution
- Mutual information: what if we want to know about a variable x, but instead are measuring a variable y. How much are we learning about x then? This is given by the difference of entropies of x before and after the measurement:
.
- Meaning of mutual information: mutual information of 1 bit between two variables means that by querying one of them as much as possible, we can get one bit of information about the other.
- Properties of mutual information
- Limits:
. Note that the first inequality becomes an equality iff the two variables are completely statistically independent.
- Mutual information is well-defined for continuous variables.
- Reparameterization invariance: for any
, the following is true I[X;Y] = I[Ξ;Η].
- Data processing inequality: For P(x,y,z) = P(x)P(y | x)P(z | y),
. That is, information cannot get created in a transformation of a variable, whether deterministic or probabilistic.
- Information rate: Information is also an extensive quantity, so that it makes sense to define an information rate
.
- Limits:
- Mutual information of a bivariate normal with a correlation coefficient ρ is I = 1 / 2log2(1 − ρ2).
- For Gaussian variables y = g(x + η), where x is the signal, y is the response, and η is the noise related to the input,
(see the homework problem).