The Cantor-Bernstein Theorem is the intuitively ‘obvious’ result that if given two sets we can inject into and into then there exists a bijection between and . In the notation of cardinalities this states that .

Given how intuitive this result is, the author was surprised to find a really quick and intuitive proof impossible to come by. The temptation is to pursue some sort of ‘tweaking’ method to make into a bijection by altering it a little, but the only obvious ways to do this involved using the axiom of choice, which seemed to be far more powerful than this result should demand.

The author has come across the following two choice-free proofs, the first from Wikipedia, and the second from a Part IA lecture course. Both start with the idea of constructing a bijection by partitioning the domain into two sets such that the function given by

is a bijection.

So this idea reduces the problem to finding a partition such that:

- (to ensure is well defined)
- is a partition of (to ensure is a bijection).

**Solution 1 (Wikipedia, modified):** Let and define recursively for . We now set , a *countable* union.

This ensures the first condition above since , and it ensures the second condition because for any we can either get to it by applying to some element of a finite number of times (in which case manifestly ) or we cannot (in which case so ).

**Solution 2 (P.T. Johnstone lecturing Numbers and Sets in Part IA of the Cambridge Maths Tripos):**

Consider the function , which we have basically designed to be such that if we can take and it will obviously satisfy the required two properties. We are now simply required to find such an .

The nontrivial idea is to then let and consider .

Vacuously by the definition of . But since is inclusion preserving, this implies which by definition of implies , which implies . So and we can take to give a valid bijection.

**Reflection on both proofs**

The idea used in the Wikipedia proof is that is a bijection, so the only thing to fix is and any inconveniences arising from this ‘taking up space’ once we map it into . So the set we eventually reach is the set of things not in or the image under of things which we must throw out of as a result of taking as our bijection on . Obviously, the more things we throw out of , the greater the size of and so the more elements of that are disturbed, and so on. The proof then works on the basis that this process of ‘throwing out’ is a countable one and so taking a countable union of a sequence of sets will suffice. Though this apparently works, I am not completely comfortable with this as a really *intuitive* proof of why the theorem is true. While the picture of the two injections interacting whereby the domain of *erodes* the codomain of until at the end of all time the functions have agreed which elements each is to biject is quite memorable and intuitive, it is not necessarily obvious why the functions must come to an agreement at the end of all time.

I found the second apparently quite different proof very intriguing. The preliminary (natural, but initially odd-looking) idea is setting up this function which takes a subset of and follows it along to a subset of , whose complement it then follows back along to a subset of of (which it complements again). It is then clear that if is a partition of then it is immediately a required partition . So a solution to will solve the problem.

In other words, we need to find such that both and . We now try almost exactly the same line of thought as in the Wikipedia proof. Taking we can trivially achieve the second relation, and since is clearly inclusion preserving, for any . The Wikipedia proof then finished by taking , and here of course (in fact, this argument presented using the notation is, to me, much more convincing).

Why then does Johnstone opt for the more unusual, albeit similar, approach of taking the larger set , a set which ostensibly contains the sets defined above, and then define instead (this works for the same reasons, and will certainly contain the defined in the above paragraph)? For one, this method of proof is more primitive, making no use of the notion of natural numbers. However, this author at present cannot see any other benefits to this approach. It certainly seems on the surface more obscure, difficult to motivate, and difficult to visualise or draw insight from. If the reader can see a good way to motivate this choice of or perhaps an interesting way in which this proof generalises while that relying on taking a countable union fails, I should be very glad to read it.

So in summary, the most intuitive proof of this fundamental theorem I could find was probably the argument from proof 1 in the language of proof 2, but I still don’t feel completely satisfied I know everything there is to know about why this theorem is true.

## 7 comments

Comments feed for this article

January 1, 2009 at 1:59 pm

zerosharpFirst of all happy new year. There are two other proofs I know of this result. One that you’ll find in Jech “Set Theory” and the other in Kunen’s “Set-theory-An introduction to independence proofs”. I’ll mention the one from Kunen’s . Sorry I don’t know how to use LaTex.

First of all assume f:A–>B and g:B–>A are 1-1. Let A_0=A and B_0=B and A_n+1=g(B_n) and B_n+1=f(A_n). Then you get infinite descending sequences of the kind: …included in A_N included in A_n-1 included in A-n-2 included…included in A_1 included in A (same thing for subsets of B). Let A_inf= the intersection of all A_n and B_inf= the intersection of all B_n.

Finally let h(x) = f(x) is x belongs to A_inf union the union(A_2n without A_2n+1). Otherwise let h(x) = g^-1(x). then h is 1-1 and onto.

The way I understand this proof is that the fcts f and g act simultaneously until their respective domains get the smallest possible. At each step they give each other those newly created smaller and smaller subsets so they can inject them into smaller and smaller codomains. But you might understand it better than I do. Nice post by the way. Could you post more set theory in the future? (things like cardinal arithmetic/ordinal arithmetic/models of set theory). Thx.

January 2, 2009 at 5:57 pm

Martin OrrOnce you have chosen , Solution 2 reduces to the (Knaster-)Tarski fixed point theorem:

Theorem: If is a complete lattice and order-preserving, then has a fixed point.

(A complete lattice is a partially ordered set in which every subset has a least upper bound. The powerset of is an example, with least upper bound given by union.)

Prof Johnstone’s proof can be easily converted to a proof of this theorem.

If you try to apply the argument about to the general case, you will find that you need a stronger condition: must be continuous i.e. it must take the l.u.b. of to the l.u.b. of for each .

Checking that the used in the proof of Schroder-Bernstein is not quite immediate – this is passed over with “of course” in the original article but I think there is something to do there.

This “iterate on the least element of ” method does lead to a proof of the Knaster-Tarski theorem even without continuity, but you have to do it a transfinite number of times – define for all ordinals and use Hartog’s lemma to say the sequence stops growing somewhere.

So Prof Johnstone’s proof is good because it generalises to Knaster-Tarski without using ordinals, and it is occasionally useful to have an explicit formula () for a fixed point. It also always gives you the least fixed point (and so proves that the least fixed point exists).

I agree that the other proof is more intuitive though (even for the general case, as long as you accept transfinite constructions as intuitive). Knowing that this proof terminates in at most steps for continuous is a different sort of “formula” for a fixed point which can be important for applications to computation. When you have to go into transfinite ordinals, I don’t think that this proof will always give the least fixed point.

As for sticking with proof 1 in its own language, I think it feels more concrete (and you can draw a picture showing, at least in some sense, what f, g, h do) and probably many IA students (and others) prefer it for that reason.

January 2, 2009 at 11:09 pm

Tom LoveringHappy new year to you guys also, and thanks for the responses.

Ah. Thanks very much Martin. I had suspected there must be some more general result lurking somewhere which PTJ was trying to ‘set us up for’ given his reputation and the lack of another explanation for his choosing such an abstract proof (he only lectured proof 2 from above).

That’s also a cool theorem worth knowing :).

September 17, 2009 at 4:07 am

chuck loyola1) nice post!

2) this is nitpicking over definitions but shouldn’t “is a partition of B” read “=B” ?

September 17, 2009 at 4:13 pm

tlovering1) Thanks.

2) No, though =B is true, I do mean a partition, which is a bit of a stronger statement (not only does it equal B, but the sets whose union is B are also disjoint).

September 18, 2009 at 3:11 am

chuck loyolaoh oops of course; I just meant the notational issue of corresponding to the dry definition of “partition” (my suggestion was wrong in being insufficient of course). I should’ve suggested:

“{f(A_f), g^{-1}(A_g)} is a partition of B”. (simply because the union, technically isn’t a “partition”) –But again this cosmetics, trifling, nitpicking! (Sorry…)

September 18, 2009 at 10:44 am

tloveringNo, thanks a lot for posting. Other people might be confused about the same issue, and now they (hopefully) aren’t. I personally quite like writing partitions as unions, because it puts the right vision of what is going on in the mind of the reader, but I do understand why it is tempting to object (technically I’m saying that a single set is a partition, but I am relying on people reading between the lines a bit).

Tom.