The Cantor-Bernstein Theorem is the intuitively ‘obvious’ result that if given two sets A,B we can inject A into B and B into A then there exists a bijection between A and B. In the notation of cardinalities this states that |A|\leq|B|\leq|A| \Rightarrow |A|=|B|.

Given how intuitive this result is, the author was surprised to find a really quick and intuitive proof impossible to come by. The temptation is to pursue some sort of ‘tweaking’ method to make f into a bijection by altering it a little, but the only obvious ways to do this involved using the axiom of choice, which seemed to be far more powerful than this result should demand.

The author has come across the following two choice-free proofs, the first from Wikipedia, and the second from a Part IA lecture course. Both start with the idea of constructing a bijection by partitioning the domain A into two sets A_f, A_g such that the function h: A \rightarrow B given by

h(a) = \begin{cases} f(a) & \text{if } a \in A_f \\ g^{-1}(a) & \text{if } a \in A_g\\ \end{cases}

is a bijection.

So this idea reduces the problem to finding a partition A_f,A_g such that:

  1. A_g \subseteq \text{Im}(g) (to ensure h is well defined)
  2. f(A_f) \cup g^{-1}(A_g) is a partition of B (to ensure h is a bijection).

Solution 1 (Wikipedia, modified): Let C_0 = A\backslash g(B) and define recursively for i \geq 1, C_i = g(f(C_{i-1})). We now set A_f = \bigcup_{i=0}^\infty C_i, a countable union.

This ensures the first condition above since A_f \supseteq C_0 = A\backslash\text{Im}(g), and it ensures the second condition because for any b \in B we can either get to it by applying f \circ g to some element of f(C_0) a finite number of times (in which case manifestly b \in f(A_f)) or we cannot (in which case g(b) \in A_g so b \in g^{-1}(A_g)).

Solution 2 (P.T. Johnstone lecturing Numbers and Sets in Part IA of the Cambridge Maths Tripos):

Consider the function \Phi : X \subseteq A \mapsto A\backslash g(B \backslash f(X)), which we have basically designed to be such that if \Phi (X) = X we can take A_f = X and it will obviously satisfy the required two properties. We are now simply required to find such an X.

The nontrivial idea is to then let S = \{ A' \subseteq A | A' \subseteq \Phi(A')\} and consider X = \bigcup_{A' \in S} A'.

Vacuously X \subseteq \Phi(X) by the definition of S. But since \Phi is inclusion preserving, this implies \Phi(X) \subseteq \Phi(\Phi(X)) which by definition of S implies \Phi(X) \in S, which implies \Phi(X) \subseteq X. So \Phi(X) = X and we can take A_f = X to give a valid bijection.

Reflection on both proofs

The idea used in the Wikipedia proof is that g^{-1}:g(B) \rightarrow B is a bijection, so the only thing to fix is A \backslash g(B) and any inconveniences arising from this ‘taking up space’ once we map it into B. So the set A_f we eventually reach is the set of things not in g(B) or the image under g of things which we must throw out of B as a result of taking f as our bijection on A_f. Obviously, the more things we throw out of B, the greater the size of A_f and so the more elements of B that are disturbed, and so on. The proof then works on the basis that this process of ‘throwing out’ is a countable one and so taking a countable union of a sequence of sets will suffice. Though this apparently works, I am not completely comfortable with this as a really intuitive proof of why the theorem is true. While the picture of the two injections interacting whereby the domain of f erodes the codomain of g until at the end of all time the functions have agreed which elements each is to biject is quite memorable and intuitive, it is not necessarily obvious why the functions must come to an agreement at the end of all time.

I found the second apparently quite different proof very intriguing. The preliminary (natural, but initially odd-looking) idea is setting up this function \Phi which takes a subset of X and follows it along f to a subset of B, whose complement it then follows back along g to a subset of Y of A (which it complements again). It is then clear that if X \cup Y is a partition of A then it is immediately a required partition X=A_f, Y=A_g. So a solution to \Phi(X) = X will solve the problem.

In other words, we need to find X such that both \Phi(X) \subseteq X and X \subseteq \Phi(X). We now try almost exactly the same line of thought as in the Wikipedia proof. Taking X=\emptyset we can trivially achieve the second relation, and since \Phi is clearly inclusion preserving, \Phi^k(X) \subseteq \Phi^{k+1}(X) for any k\in \mathbb{N}. The Wikipedia proof then finished by taking A_f = \bigcup_{k=0}^\infty \Phi^k(\emptyset), and here of course \Phi(A_f) = A_f (in fact, this argument presented using the \Phi notation is, to me, much more convincing).

Why then does Johnstone opt for the more unusual, albeit similar, approach of taking the larger set S = \{A' \subseteq A | A' \subseteq \Phi(A')\}, a set which ostensibly contains the sets defined above, and then define A_f = \bigcup_{A' \in S} A' instead (this works for the same reasons, and will certainly contain the A_f defined in the above paragraph)? For one, this method of proof is more primitive, making no use of the notion of natural numbers. However, this author at present cannot see any other benefits to this approach. It certainly seems on the surface more obscure, difficult to motivate, and difficult to visualise or draw insight from. If the reader can see a good way to motivate this choice of A_f or perhaps an interesting way in which this proof generalises while that relying on taking a countable union fails, I should be very glad to read it.

So in summary, the most intuitive proof of this fundamental theorem I could find was probably the argument from proof 1 in the language of proof 2, but I still don’t feel completely satisfied I know everything there is to know about why this theorem is true.

Advertisements