You are currently browsing the monthly archive for September 2009.

While revising IB Metric and Topological Spaces, I came across the following cool alternative to the topologists sine curve as an example of a space that is connected but not pathwise connected.

The idea is to build a really really long line (in fact we shall build a ray, but constructing a line can be achieved by an obvious gluing job). It will be connected for the same reasons as the real line, but not pathwise connected just because any path will be too short. To realise our goal, we pick the smallest uncountable ordinal \omega_1 and just take X=\omega_1 \times [0,1) \cup \{\infty\} with the obvious (order) topology generated by interpreting this as placing uncountably many intervals end-to-end (and \infty > (\alpha,x) for all (\alpha,x) \in \omega_1 \times [0,1)).

Consequently we get a really really long ray, which is far longer than any continuous image of a closed interval of real numbers, so it isn’t pathwise connected (any path from (0,0) to \infty must cross uncountably many intervals, but there is a rational number in the inverse image of each interval.

One of the recent questions from an example sheet in Metric and Topological Spaces gave rise to the following more interesting general question:

What is the greatest number N_d(\theta) of unit vectors in \mathbb{R}^d such that the angle between any two is greater than or equal to angle \theta?

I don’t want to ruin this question by posting a solution, or even a sketch solution on this blog, but there is some very interesting behaviour revealed once you look into it. Anyone who wishes to investigate fully independently should now stop reading. I am going to post a few interesting remarks that break down the kind of behaviour the problem exhibits (without giving any proofs).

Firstly, it is almost obvious that if \theta \geq \frac{\pi}{2} then N_d(\theta) \leq 2d. Unfortunately, the ‘obvious’ proof of this is a proof by ‘worst case scenario’ (an orthogonal set and its negation is clearly the best example), which unfortunately doesn’t really hold water. However, this does turn out to be true, and my process of finding proof gave me an interesting perspective on what ‘really spread out’ vectors look like.

Now we get a bizarre phenomenon: acute angles seem to behave frighteningly differently to right and obtuse angles. Any sort of inductive bound one tries to find seems to fail, yet coming up with reasonable explicit constructions is tricky. In fact it turns out that a sudden jump is made from there being linearly many unit vectors to their being exponentially many unit vectors. Getting the upper bound for this is quite simple. A lower bound is harder.

It’s a great equation for pulling out of the bag when giving talks at IMO training camps on functional equations, partly because it is so simple and pops up reasonably often, but also because it teaches a valuable lesson to school children about functions: they aren’t always nice. However, we always wave our hands and say that there exist horrible solutions, and circumstantial evidence for this is that it seems like we can’t get any more information out of the equation than its value on the rationals (or a translation thereof). However, we are always a little wary of actually doing the construction, partly because it would be useless for the IMO, and partly because it in some sense isn’t really a construction at all – it requires Zorn’s lemma, and in some sense always will. That said, I think IMO trainees have a right to know what these solutions look like if they’re curious, so I thought I’d try putting an elementary explanation up here.

Firstly, recall that Cauchy’s equation is a property of functions from the reals to the reals, telling us that

f(x+y) = f(x) + f(y) for all x,y \in \mathbb{R}.

An obvious guess is f(x)=kx, which clearly works. However, we cannot deduce this must be the solution for all reals, but we can deduce, by a fairly routine induction argument, that f(xy)=xf(y) for all rational x.

These two properties taken together tell us that for every x,y \in \mathbb{R}, a,b \in \mathbb{Q} we have f(ax+by) = af(x) + bf(y). If we consider the real numbers to be a set of vectors over the rationals, then this tells us that f is a linear map. However, it is also clear that the converse holds, so in fact any linear map will satisfy both of these conditions and so satisfy the Cauchy equation.

Just to elaborate on what we mean by considering the real numbers to be vectors over the rationals, the standard vectors from A-level maths are usually vectors over the reals. We could represent a 3-dimensional vector by v=xi + yj+zk where x,y,z are real numbers, and i,j,k are fixed vectors which should be somehow considered as abstract objects with the properties that:

  • The sum of any two vectors is a vector (and we can subtract vectors, there is a zero vector, and so on).
  • If we multiply a vector by a scalar (in this case a real number) we get another vector.

These two operations interact with one another in the obvious way. However, there’s no reason vectors have to represent arrows in space. Anything that obeys the above rules, regardless of its interpretation, can be considered as a ‘vector’. To see that the reals can be considered as vectors over the rationals, consider the set of numbers of the form a+b\sqrt 2 + c\sqrt 3 : a,b,c \in \mathbb{Q}. It is not tricky to prove that this behaves, for our purposes, just like the 3D real vector spaces from A-level maths. Sure, we can multiply real numbers together in a way we can’t multiply geometrical vectors, but this is a property we will ignore for now (except the multiplication in which one number is a rational number).

Bearing in mind the above example, it therefore isn’t a giant leap of intuition to consider the entire set of real numbers as a set of vectors over the rational numbers. There is one important potential area of confusion still unresolved though. In all the above examples, we were able to expand any vector out and write it uniquely as a (finite) linear combination of other vectors (e.g. v = x_1 i + x_2 j + x_3 k). A collection with this property, like \{ i,j,k\} is called a basis for the vector space. If you spend a while trying to build a basis for the reals over the rationals, it soon becomes clear that it’s going to have to be pretty big. In fact, you don’t have to think too hard before you find an infinitely large collection of real numbers where none can be written in terms of the others (so the basis is going to have to be infinite). It isn’t at all obvious that a basis even exists.

However, fortunately, one can prove using Zorn’s Lemma that a basis always exists (in fact, even better, we can extend any set of vectors that aren’t expressible in terms of each other to form a basis). A statement and sketch proof of Zorn can be found elsewhere on my blog, as can the proof that every vector space has a basis. A basis for the reals over the rationals is often called a Hamel basis.

Now we have a basis, it is easy to construct linear maps. In general, a linear map can be thought of as any ‘square matrix’ (which in the case of the reals over the rationals will have infinite size). A more useful definition is that given above, that it satisfies the equation f(ax+by) = af(x)+bf(y). Given our basis \mathcal{B} = \{ b_i \}_{i\in I} (where I is some indexing set and could be larger than \mathbb{N}) it is clear that any permutation of the basis is a linear map (and will hence satisfy Cauchy’s equation).

To make this concrete, let’s consider a simple example. Suppose, for simplicity, we only care for now about numbers of the form x+y\sqrt{2} (where x,y are rationals). We define f(x+y\sqrt{2}) = x\sqrt{2}+y (a linear map obtained by a permutation of the basis).

One the one hand it is simple to check that if z_i=x_i+y_i\sqrt{2}: i=1,2 then f(z_1+z_2) = (x_1+x_2)\sqrt{2} + (y_1+y_2) = f(z_1)+f(z_2). I.e. this function does satisfy the Cauchy equation. But on the other hand consider z=14142135-10000000\sqrt{2} (which is a very small number). f(z) = 14142135 \sqrt{2} - 10000000 which is very large (and it is clear that by taking better and better rational approximations to \sqrt{2} one can get arbitrarily large values of f arbitrarily close to zero). So f is a decidedely bizarre function, but it does satisfy the Cauchy equation (and it is easy to see that, given a Hamel basis, it can be extended to the entire set of real numbers by just fixing everything else in the basis).

In projective geometry we know that ‘well-behaved’ curves such as conics can be viewed in two ways: either as a locus of points or an envelope of lines (the curve being determined by being tangent to every line in the envelope). One of the little gems of the new superficially dry course on ‘Variational Principles’ is a method to give us a way of switching between these two views algebraically (provided the situation is suitably nice).

Firstly, we suppose that y=f(x) describes a locus of points. How might we parameterise the tangents in a way that doesn’t involve points at all? Tangents are lines of the form y=mx-c, so in some sense there are two co-ordinates (m,c) that describe the tangents uniquely. If we are to obtain an equation in a similar form c=f^*(m) we will clearly require that f is strictly convex (else g will fail to be well-defined), so assume this is the case.

We then seek the c such that y=mx-c is the unique tangent to the curve y=f(x) with gradient m. Equating the two values for y we get that c = mx_0 - f(x_0) where x_0 is the unique real number such that f'(x_0) = m. Nothing mind-blowing so far.

But this becomes rather more interesting once we remember the supporting-line inequality for convex functions (a convex function is always bigger than any tangent with equality at the point of tangency). Writing this down at our point we get that f(x) \geq mx - mx_0 + f(x_0). Rearranging, we get

mx- f(x) \leq mx_0 - f(x_0) for all x.

So writing this in a slightly different way, we get the cute formula that f^*(m) = \text{max}_x(mx-f(x)).

And for an encore, it’s now obvious (by projective duality) that if we do this transformation twice we get back to where we started (the curve stays fixed, and our function goes from describing the locus to describing the envelope to describing the locus again), so we get the nice relation that f^{**} = f for all strictly convex f.

When first asked if V=  \mathbb{R}^{\mathbb{N}} has a countable basis as a vector space over \mathbb{R} almost everyone will think for a second: “of course! (1,0,0,…),(0,1,0,…),… will do it”. Then most people realise this is total rubbish because (1,1,….) isn’t a finite linear combination of these. This then usually raises the suspicion that any basis must be uncountable.

A quick way to check this is just to consider the set of sequences of form a_n = e^{\alpha \log n} where \alpha is any positive real. This is clearly an uncountable linearly independent set.

However, actually constructing a basis seems to be impossible (though the Zorn argument I posted on this blog about 6 months ago shows it has to exist). If anyone knows how to do it, I’d love to hear it.

EDIT: It has been pointed out to me that it’s not entirely obvious that finding an uncountable linearly independent set constitutes a proof that there is no countable basis. I have assumed the dimension theorem, that with every vector space V there is associated a fixed cardinal \text{dim}V which gives the cardinality of any basis. Equivalently, any two bases for V have the same cardinality, and any set of greater cardinality therefore has linear dependencies (since it cannot be extended to a basis). To prove the dimension theorem, we take two bases of different sizes, write out each element of the smaller one in terms of the larger one, and because we only use a finite subset of the elements from the large basis to write out each element of the smaller one, the union of the elements used in this process can’t possibly be equal to the entire large basis, giving the necessary contradiction.

Imagine you are in one of those computer games where there is a wall of spikes coming from the left of the screen, so you are forced to proceed indefinitely until the level ends, always proceeding in the same direction. Suppose the level never ends: when can you escape? Zorn’s lemma asks, and answers, this question. Perhaps a little surprisingly it reveals that perversely however massively infinite and complicated your set is that the only possible means of escape you have is to dart down a hole that looks like the natural numbers, such that wherever the wall of spikes is there is always some amount of space to the right. If there are no such holes, then however stupidly huge the space in which you have to run, you eventually have no chance.

Sometimes Zorn is viewed as being a rather scary complicated theorem because it is equivalent to the axiom of choice. This is totally false: the axiom of choice (large cartesian products of nonempty sets are nonempty) is a really nonscary fact that is equivalent to the axiom of choice. What seems to make Zorn scary is that it gives really specific information about the structures of monstrously large sets, and there’s no way to prove it without first working out what it means to have a pursuit lasting many many infinities and then ending.

Let’s set up a little notation. Suppose we have a set X on which there is some partial order relation < (so for any pair of distinct elements a,b \in X it may be true that one of a<b or b<a holds, or neither could hold). Since we have to keep running to the right, we will be interested in subsets of X that are well ordered: of the form x_1<x_2<.... (where the subscripts could be members of any set, possibly much larger than the integers). We (perhaps slightly abusing standard usage) call such a set a chain.

We say that a chain is an escape tunnel if there is no m \in X such that for every element x in the chain, m \geq x. If there is an escape tunnel, we can trivially escape down it forever. The only interesting case is therefore when X doesn’t have any escape tunnels. It could perhaps be viewed as being a ‘closed’ set in some sense, but bear in mind that it could be absolutely huge, far bigger than anything we could ever visualise. Can we exploit this sheer hugeness to escape, without needing to cheat and run down an escape tunnel?

Well, let’s give it a shot. Suppose we find ourselves on some point x. If we are going to keep running, all we need is some x' such that x'>x. We then just keep going forever, and because our set is really big we can probably keep going. We try going for a while, and locally it looks like we are darting down an escape tunnel, but because there aren’t any escape tunnels, there is some m ahead of us and our planned route which at some point (possibly infinitely far away in the future) the wall of spikes will get to, so we must get there too. So we keep going ‘forever and a day’ in this way.

Ah, but wait a sec. Can a set X of any size support this kind of running forever? After all, a set does have a fixed cardinality. If we run for ages and ages and ages (starting at some point), we might be able to build an arbitrarily large well-ordered subset of X. But the size of such a subset is bounded: it can never be bigger than X.

And yet if we consider the set of well-orderings of subsets of X, and consider the set S of order-isomorphism classes represented, this is itself a well-ordered set, but cannot possibly be order-isomorphic to any of its elements. In other words, S is a well-ordering bigger than any possible on X.

Yet the set of well-ordering types up to and including S is itself well-ordered (1<2<3 is in some obvious sense smaller than 1<2<3<4<5), and if we keep running forever in our space X, our route is at some point bound to be long enough to be a copy of S. But this is impossible, because S is just too big. So at some point we get stuck at some point from which we cannot escape and die, even if X is really massive.

More formally, X has at least one maximal element (‘a point from which we cannot escape’).

There are bits of rigour missing from the argument above, and somehow without developing the dull bits of the theory of ordinals it’s impossible to fill in the gaps, but the intention of this post was to highlight that it’s somehow ultimately a result about really long pursuits in a large space, and that the proof is really not much more idea-wise than you need in the finite case (once you’ve had the idea of blocking off the escape tunnels).


First I’ll begin by mentioning that this is an idea completely stolen from a talk by Prof. Imre Leader on games of pursuit, most recently delivered at the ‘Bath’ IMO Squad Nursery camp in Oxford last week.

Suppose we have a ‘lawn’ on which there is somewhere a student. Being in Cambridge, the student is not supposed to be on the lawn, and a group of porters is charged with apprehending the student. The head porter may deploy some number of porters (whose maximum speed is the same as that of the student) to points on the boundary of the lawn and issue them with a strategy to bring the student in (the student must escape in finite time by reaching a point on the boundary at which there is not a waiting porter). Of course, the porters may never step on the grass. The question is, for various lawns, how many porters do you need? The two lawns we will consider are the square shaped lawn in ‘Great Court’, and the circular-shaped law ‘New Court’. As always, I’d advocate the reader sitting down and drawing pictures of the lawns and thinking about (even proving?) how many porters they think they need before reading on.

For the square lawn, it is clear that 4 porters can do the job by just tracking the student’s ordinate parallel to the side they are guarding. A boring check shows that 3 cannot do it.

For the circular lawn, it seems difficult to prove that any number of porters can do it even though it seems that a trillion porters should probably be able to do it. However, the head porter has a cunning plan. There are infinitely many points on the boundary of the lawn, so if he dispatches infinitely many porters, he can just guard every point, so he sends out porters P_1,P_2,...... However, the student realizes that to avoid all the porters, he can avoid each in turn, so it will suffice if the student can get himself to some position really close to the boundary where he will be able to escape P_1 whatever he does next provided he does so quickly, and then in time at most \frac{1}{1000} get himself to a position where he will always escape P_2 provided he escapes by the same deadline he had for P_1, and so on, taking at most 1000^{-(n-1)} time to get to a position from which he will always escape P_n provided he leaves within some sufficient time at least 1000^{-1}+1000^{-2}+... from getting to his position clear of P_1. It’s then clear that, provided his path tends to the boundary, when he eventually escapes none of the porters will be waiting for him. Having had this idea, the problem becomes very simple owing to the arc length of the circle along any distance being greater than the length of the corresponding chord (so if the student is sufficiently near the edge, he can always get well away from any given porter by heading along some straight line). Hence any infinite set of porters labelled with natural numbers fails to catch the student. What has gone wrong in the head porter’s plan? Clearly if we just permenently put a porter on every point of the boundary they are bound to catch the student. The only problem that could possibly have arisen is that he labelled his infinite set of porters with the natural numbers, and the above escape strategy shows that such a set of porters wasn’t actually big enough to implement the obvious strategy and guard every point of the circle. In other words, the head porter has, to his misfortune, discovered that the set of points on the boundary of the circle is uncountable.