This has been the (to me very surprising) result of my enquiry into the following beautiful problem, posed to the author by Prof. Ben Green. I am recording my findings primarily for my own reference, and would urge any reader not familiar with the problem to at least try to attack it themselves before reading on.

**Question: **If points (if you like, vectors) are selected randomly and independently with uniform probability from , show there exists such that for all .

My initial attempts to solve this problem were not particularly successful, since trying to relate and as they *both* varied over their distributions seemed tricky. My initial heuristic was along the lines of: *“The random variable * *ranges over the interval * *so as * *grows the distribution must spread itself across a greater distance, hence be less at any given point under a suitable continuous transform to keep the distribution ‘appropriate’ (with mean at the correct, also growing, place). Thus while the distribution does peak near the middle, it does not seem likely that a given length neighbourhood can have ** probability as **.”*

With this heuristic in mind, the only method of approach which seemed likely to work was to take some kind of sum or integral over various (and a number growing with )* * disjoint intervals on the distribution to get a spread-out enough estimate to eliminate the lack of an probability on any given interval. However, actually going ahead and doing something like this proved to be very tricky, especially for one such as myself with little formal probabilistic training.

So I decided to leave the above heuristic aside and just see how far from the probability looked like around a natural-looking modal point. I achieved the following surprising result.

**Theorem 1: **For a fixed constant ,

.

In particular, the above question is a trivial consequence of the case .

Owing to the nature of the Euclidean metric, it seemed natural to set up variables as follows:

Let be selected randomly and uniformly, and set . A simple application of the linearity of expectation showed that , and as seemed a much easier object to work with than I decided to take a look at for some constant to be played with later.

This is equal to

which is then clearly .

Which looks like the kind of quantity one can bound above by Chebyshev’s inequality and get a lower bound around . The initial heuristic relied on the ‘near uniformity’ assumption that this variance would be of magnitude just a little less than , but here I would be very pleasantly surprised.

But this implies the much lower than expected . In other words, the heuristic based on assuming is roughly smoothly distributed was fallacious. Instead, the distribution of has a ‘spike’ at its expected value and is almost zero away from it.

So .

Writing we can apply Chebyshev’s inequality to gain the lower bound , which is precisely theorem 1.

So my heuristic has been torn to pieces, and, instead, as gets large, the ‘amount of space’ in the -dimensional unit cube whose distance from the origin is within a certain (small) distance of Â actually must be rather a lot. For example, take . Then , so as tends to infinity the probability approaches a value of at least , so more than two thirds of points in a unit cube of high dimension are (with maximum error ) roughly distance from the origin (or, by symmetry, any other vertex).

I found this result initially very surprising (and sent Prof. Green a semi-apologetic email in which I acknowledged that I could just be having an evening of insanity and the next morning discover my estimates were based on false assumptions). The basic reason the smooth expected distribution ended up being a spike was that in the variance calculation the quadratic terms cancelled out, leaving only smaller order terms. In another post in the next few days I shall try to investigate whether this phenomenon (where the quadratic parts of are equal) can be found in any interesting families of general situations, as this now seems likely and should be investigated to inform my intuition in future.

## 1 comment

Comments feed for this article

February 15, 2011 at 6:15 am

SeanI have some ideas about the Walsh Hadamard transform and the Normal distribution that I discovered here:

The Hadamard transform uses multiplication by 1 or -1, but that ends up the same as doing only addition and subtraction, hence the central limit theorem applies to any random data so transformed.

http://code.google.com/p/lemontree/downloads/list