Archive for category machine learning
The concept of Concentration of Measure (CoM) quantifies the tendency of an -dimensional probability distribution to “lump” or concentrate around an -dimensional submanifold. The phenomenon is that this tendency is especially large in high dimensions. Viewed another way, this is about the interaction between probability and distance in high dimensions.
This is (hopefully) the start of a series of posts that will explain this concept. I would like to review a few concepts and definitions first though.
We denote by a metric measure space of metric and probability measure . A probability measure is in effect a normalised measure, such that the measure of the whole space is 1. In such a space, the -extension of a set is denoted by and is defined as . Of course, the -extension of includes all of . What it does more than that is that it fattens by width . Among other things, this means that is a shell enveloping , and that for small values of , this volume (measure if you want to be pedantic) is approximately the surface area (surface measure) of the set multiplied by .
Now we can define the concentration rate of a space. The basic idea is: of all the sets that span half of the total volume of the space, find the one whose -extension spans the smallest volume. We then declare the concentration rate of the space to be the volume outside that extension. In math terms: .
A few points can be made here. First, one would expect the set whose extension spans the smallest volume to be the one which has the smallest surface area, by virture of the relation between them. And in fact, one would be right, and this is the subject of the Isoperimetric Inequalities. Second, the concentration rate is the maximum volume outside an extended half-volume space possible, so using this property amounts to utilizing certain inequalities to limit, say, the volume of error. This is useful, for example, in the analysis of randomized algorithms. Third, the current definition yields one sided inequalities, but it is very easy to see that two sided algorithms can be derived which limit the volume outside a shell of width around the surface of any set of half volume.
The more the measure is concentrated around a half-space, the smaller the concentration rate is. What we really want, and what CoM is about, is a very quick decay of the concentration rate as increases. This will be useful for us to decide in a learning algorithm which value of to cut off learning at with small probable error. But that is for another day.
And yes, it would be good if Posterous supported Latex..
Everybody firmly believes in it because the mathematicians imagine it is a fact of observation, and observers that it is a theory of mathematics.
So, which is it? To apportion the blame, I suggest that the fact of observation be that “errors” are independent and identically distributed with finite variances. Observers may as well vote for Lyapunov’s set of conditions. I am sure that mathematicians are more than happy to shoulder the responsibility for the resulting Central Limit Theorem.