Entropy Estimation
From Ilya Nemenman
This is one of the Projects of Ilya Nemenman's lab. Email Ilya for details.
Background
Information theoretic (IT) methods are now routinely used for natural sciences data analysis. For example, in neuroscience one uses IT tools to understand mechanisms by which neurons encode sensory information, while in molecular systems biology the same techniques help to uncover statistical associations among activities of various chemical species in the cell and, ultimately, to reconstruct the network of its biochemical interactions.
The strength of IT methods comes from the very limited assumptions that go into them. It is always very appealing, for example, to quantify all statistical dependencies among variables, but not just linear ones; or, similarly, to measure the structure of the neural code without assuming spike-spike independence. However, the widespread use of the IT methods has been hindered by a conceptual difficulty: it is very difficult to estimate entropy (and hence other IT quantities, which are linear combinations of various entropies) from experimental datasets of realistic sizes. It turns out that the bias, rather that the variance, of the estimates is usually the problem. For example, for estimation of entropies of discrete data, most common methods result in a bias of
, where S is the (unknown) entropy and N is the data set size (see this review for details). In the recent years, many methods (some described here) have been developed to solve the problem, but the majority only works when
(for discrete variables), or requires strong assumptions about the smoothness of the underlying probability density (for continuous variables).
You may want to visit my collection of references on entropy estimation.
NSB entropy estimator for discrete data
As I argued in this review, many discrete probability distributions should allow for (nearly) unbiased entropy estimation for datasets that are square-root smaller than those required by common methods, namely when
. To achieve this tremendous imporvement, we developed the NSB entropy estimator, which reliably estimates entropies in the regime
for distributions with not too long rank ordered tails. It is possible to diagnose when the method fails and, even in these cases, it performs not worse than most traditional estimators when
. As described in our original publication, Nemenman et al., 2002, the method builds on many techniques, such as Bayesian model selection and coincidence counting. Alternatively, one can view the method as imposing a uniform (least knowledge) prior on the entropy itself. I believe this to be a generally correct strategy: when estimating a certain quantity, the least bias will be achieved when assuming an a priori uniform distribution over this quantity, even if it results in very non-uniform assumptions about other variables, such as the distribution itself.
If you are interested in the NSB method, the following link will be pf use:
- SourceForge-hosted project that implements the NSB method in C++ and Octave/MatLab, http://nsb-entropy.sf.net .
- Original paper that introduced the method, Nemenman et al., 2002.
- Nemenman-IEEEIT elucidates the coincidence-counting nature of the estimator and analyzes a lot of its technical properties.
- The following papers have applies the estimator for different natural datasets
Differences of IT quantities
In many cases, in particular when IT quantities are used as a measure of statistical interactions, the precise values of these quantities are less important than knowing their rankings (i.e., the sign of their differences). In the case of mutual information, we showed in Margolin et al., 2006a that the latter task is much easier than the former, and it can be completed with almost any estimator with very little prior tuning.
The following summarizes the relevant work:
- The original work noting the fact, Margolin et al., 2006a.
- Papers that used the technique:
