In my last post I discovered a surprising result about the distribution of |x|: x \in [0,1]^d, namely that as the dimension of the cube grew, the distribution actually seemed to get thinner, converging to almost a spike at \sqrt{d/3}. I am going to attempt to generalise that result a little here, and draw some (mainly conceptual and heuristic) conclusions to inform future intuition about such problems.

So we now look at a set of random variables X_1,X_2,...,X_d which all have the same distribution (say, that of X, which we assume to be nonconstant), and consider the distribution of S = X_1+X_2+....+X_d.

By following a similar method to our investigation in the previous post, straightforward calculation and applications of the linearity of expectation and the mutual independence of the variables give:

  • \mu = \mathbb{E}(S) = d\mathbb{E}(X);
  • \sigma^2 = \text{Var}(S) = d\mathbb{E}(X^2) - d\mathbb{E}(X)^2 = d\text{Var(X)}.

Therefore, as d increases, the standard deviation increases at a much slower rate than the mean. In particular, we can apply Chebyshev’s inequality and read off two quick results.

  1. For any k, \epsilon>0, \mathbb{P}(|S-\mu| \geq kd^{1/2+\epsilon}) \leq \frac{Var(X)}{k^2d^{2\epsilon}} \rightarrow 0.
  2. For k=\lambda\text{Var}^{1/2}(X), \lambda >1,  \mathbb{P}(|S-\mu| \geq kd^{1/2}) \leq \frac{1}{\lambda^2}.

These can be used to give quantitative estimates of the very quick drop in density of the distribution away from the mean. We shall conclude by briefly revisiting the result of the past post, and then drawing some general intuitive pointers.

In the previous problem, we set X to have the square of a uniformly distributed variable on {[0,1]}, and deduce that the result we require (with \alpha = 1/2) is essentially (ignoring some lower order terms) equivalent to finding c>0 such that for all sufficiently large d

\mathbb{P}(|S-\mu| \leq 3^{-1/2}d^{1/2}) > c.

But this is immediate from the second result above and the fact that 3^{-1/2}>\text{Var}^{1/2}(U([0,1]^d)^2) = \sqrt{4/45}.

Reflection on this general phenomenon

The intuition I can use to make sense of this result derives from the heuristic “if you take more samples, you are more likely to get an average”. Although in our case we are just taking a sum, this is essentially equivalent to taking an average, and it is perhaps unsurprising that the distribution in the limit is therefore actually very dense around the expected value and almost zero everywhere else. It might actually be sensible to turn this intuition into mathematics.

So set T = S/d, and multiplying past results by suitable constants give us that

  • \mathbb{E}(T) = \mathbb{E}(X)
  • \text{Var}(T) = d^{-1}\text{Var}(X)

In other words, the variance is inversely proportional to the number of samples we take before averaging, which is roughly what we would intuitively expect. So this result is actually, however surprising it seemed at first, really just the primary school idea that increasing sample size increases the accuracy of a result, but multiplying everything by d.