Archive for the ‘maths’ Category

Tutte meets Homfly

Graphs are pretty important objects in mathematics, and in the world — what with every network of any kind being represented by one, from social connections, to road and rail systems, to chemical molecules, to abstract symmetries. They are a fundamental concept in understanding a lot about the world, from particle physics to sociology.

Knots are also pretty important. How can a loop of string get knotted in space? That’s a fairly basic question and you would think we might know the answer to it. But it turns out to be quite hard, and there is a whole field of mathematics devoted to understanding it. Lord Kelvin thought that atoms were knots in the ether. As it turns out, they are not, and there is no ether. Nonetheless the idea was an interesting one and it inspired mathematicians to investigate knots. Knot theory is now a major field of mathematics, part of the subject of topology. It turns out to be deeply connected to many other areas of science, because many things can be knotted:  umbilical cords, polymers, quantum observables, DNA… and earphone cords. (Oh yes, earphone cords.) Indeed, knots arise crucially in some of the deepest questions about the nature of space and time, whether as Wilson loops in topological field theories, or as crucial examples in the theory of 3-dimensional spaces, or 3-manifolds, and as basic objects of quantum topology.

Being able to tell apart graphs and knots is, therefore, a pretty basic question. If you’re given two graphs, can you tell if they’re the same? Or if you’re given two knots, can you tell if they’re the same? This question is harder than it looks.  For instance, the two graphs below look quite different, but they are the “same”  (or, to use a technical term, isomorphic): each vertex in the first graph corresponds to a vertex in the second graph, in such a way that the connectedness of vertices is preserved.

(Source: Chris-martin on wikipedia)

Telling whether two graphs are the same, or isomorphic, is known as the graph isomorphism problem. It’s a hard problem; when you have very large graphs, it may take a long time to tell whether they are the same or not. As to precisely how hard it is, we don’t yet quite know.

Similarly, two knots, given as diagrams drawn on a page, it is difficult to tell if they are the “same” or not. Two knots are the “same”, or equivalent (or ambient isotopic), if there is a way to move one around in space to arrive at the other. For instance, all three knots shown below are equivalent: they are all, like the knot on the left, unknotted. This is pretty clear for the middle knot; for the knot on the right, have fun trying to see why!

(Source: Stannered, C_S, Prboks13 on wikipedia)

Telling whether two knots are equivalent is also very hard. Indeed, it’s hard enough to tell if a given knot is knotted or knot — which is known as the unknot recognition or unknotting problem. We also don’t quite know precisely how hard it is.

The fact that we don’t know the answers to some of the most basic questions about graphs and knots is part of the reason why graph theory and knot theory are very active areas of current research!

However, there are some extremely clever methods that can be used to tell graphs and knots apart. Many such methods exist. Some are easier to understand than others; some are easier to implement than others; some tell more knots apart than others. I’m going to tell you about two particular methods, one for graphs, and one for knots. Both methods involve polynomials. Both methods are able to tell a lot of graphs/knots apart, but not all of them.

The idea is that, given a graph, you can apply a certain procedure to write down a polynomial. Even if the same graph is presented to you in a different way, you will still obtain the same polynomial. So if you have two graphs, and they give you different polynomials, then they must be different graphs!

Similarly, given a knot, you can apply another procedure to write down a polynomial. Even if the knot is drawn in a very different way (like the very different unknots above), you still obtain the same polynomial. So if you have two knots, and they give you different polynomials, then they must be different knots!

Bill Tutte and his polynomial

Bill Tutte was an interesting character: a second world war British cryptanalyst and mathematician, who helped crack the Lorenz cipher used by the Nazi high command, he also made major contributions to graph theory, and developed the field of matroid theory.

He also introduced a way to obtain a polynomial from a graph, which now bears his name: the Tutte polynomial.

Each graph G has a Tutte polynomial T_G . It’s a polynomial in two variables, which we will call x and y. So we will write the Tutte polynomial of G as T_G (x,y).

For instance, the graph G below, which forms a triangle, has Tutte polynomial given by T_G (x,y) = x^2 + x + y.

So how do you calculate the Tutte polynomial? There are a few different ways. Probably the easiest is to use a technique where we simplify the graph step by step in the process. We successively collapse or remove edges in various ways, and as we do so, we make some algebra.

There are two operations we perform to simplify the graph. Each of these two operations “removes” an edge, but does so in a different way. They are called deletion and contraction. You can choose any edge of a graph, and delete it, or contract it, and you’ll end up with a simpler graph.

First, deletion. To delete an edge, you simply rub it out. Everything else stays as it was. The vertices are unchanged: there are still just as many vertices. There is just one fewer edge. So, for instance, the result of deleting an edge of the triangle graph above is shown below.

The graph obtained by deleting edge e from graph G is denoted G-e.

Note that in the triangle case above, the triangle graph is connected, and after deleting an edge, the result is still connected. This isn’t always the case: it is possible that you have a connected graph, but after moving an edge e, it becomes disconnected. In this case the edge e is called a bridge. (You can then think of it as a “bridge” between two separate “islands” of the graph; removing the edge, there is no bridge between the islands, so they become disconnected.)

Second, contraction. To contract an edge, you imagine shrinking it down, so that it’s shorter and shorter, until it has no length at all. The edge has vertices at its endpoints, and so these two vertices come together, and combine into a single vertex. So if edge e has vertices v_1, v_2 at its endpoints, then after contracting e, the vertices v_1, v_2 are combined into a single vertex v. Thus, if we contract an edge of the triangle graph, we obtain a result something like the graph shown below.

The graph obtained by contracting edge e from graph G is denoted G.e.

Contracting an edge always produces a graph with 1 fewer edges. Each edge which previous ended at v_1 or v_2 now ends at v. And contracting an edge usually produces a graph with 1 fewer vertices: the vertices v_1, v_2 are collapsed into v.

However, this is not always the case. If the edge e had both its endpoints at the same vertex, then the number of vertices does  not decrease at all! The endpoints v_1 and v_2 of e are then the same point, i.e. v_1 = v_2, and so they are already collapsed into the same vertex! In this case, the edge e is called a loop. Contracting a loop is the same as just deleting a loop.

So, that’s deletion and contraction. We can use deletion and contraction to calculate the Tutte polynomial using the following rules:

  1. Let’s start with some really simple graphs.
    • If a graph G has no edges, then it’s just a collection of disconnected vertices. In this case the Tutte polynomial is given by T_G (x,y) = 1.
    • If a graph has precisely one edge, then that it consists of a bunch of vertices, with precisely one edge e joining them. If e connects two distinct vertices, then it is a bridge, and T_G (x,y) = x.
    • On the other hand, if G has precisely one edge e which connects a vertex to itself, then it is a loop, and T_G (x,y) =  y.
  2. When you take a graph G and consider deleting an edge e to obtain G - e, or contracting it to obtain G.e, these three graphs G, G-e, G.e have Tutte polynomials which are closely related:

    \displaystyle T_G (x,y) = T_{G-e} (x,y) + T_{G.e} (x,y).

    So in fact the Tutte polynomial you are looking for is just the sum of the Tutte polynomials of the two simpler graphs G-e and G.e.
    However! This rule only works if e is not a bridge or a loop. If e is a bridge or a loop, then we have two more rules to cover that situation.

  3. When edge e is a bridge, then T_G (x,y) = x T_{G.e} (x,y).
  4. When edge e is a loop, then T_G (x,y) = y T_{G-e} (x,y).

These may just look like a random set of rules. And indeed they are: I haven’t tried to explain them, where they come from, or given any motivation for them. And I’m afraid I’m not going to. Tutte was very clever to come up with these rules!

Nonetheless, using the above rules, we can calculate the Tutte polynomial of a graph.

Let’s do some examples. We’ll work out a few, starting with simpler graph and we’ll work our way up to calculating the Tutte polynomial of the triangle, which we’ll denote G.

First, consider the graph A consisting of a single edge between two vertices.

This graph contains precisely one edge, which is a bridge, so T_A (x,y) = x.

Second, consider the graph B consisting of a single loop on a single vertex. This graph also contains precisely one edge, but it’s a loop, so T_B (x,y) = y.

Third, consider the graph C which is just like the graph A, consisting of a single edge  between two vertices, but with another disconnected vertex.

This graph also contains precisely one edge, which is also a bridge, so it actually has the same Tutte polynomial as A! So we have T_C (x,y) = x.

Fourth, consider the graph D which consists of a loop and another edge as shown. A graph like this is sometimes called a lollipop.

Now let e be the loop. As it’s a loop, rule 4 applies. If we remove e, then we just obtain the single edge graph A from before. That is, D-e=A. Applying rule 4, then, we obtain T_D (x,y) = y T_{D-e} (x,y) = y T_A (x,y) = xy.

Fifth, consider the graph E which consists of two edges joining three vertices. We saw this before when we deleted an edge from the triangle.

Pick one of the edges and call it e. (It doesn’t matter which one — can you see why?) If we remove e, the graph becomes disconnected, so e is a bridge. Consequently rule 3, for bridges, applies. Now contracting the edge e we obtain the lollipop graph E. That is, E-e=C. So, applying rule 3, we obtain T_E (x,y) = x T_{E-e} (x,y) = x T_C (x,y) = x^2 .

Sixth, let’s consider the graph F consisting of two “parallel” edges between two vertices. We saw this graph before when we contracted an edge of the triangle.

Pick one of the edges and call it e. (Again, it doesn’t matter which one.) This edge is neither a bridge nor a loop, so rule 2 applies. Removing e just gives the graph A with one vertex, which has Tutte polynomial x. Contracting e gives a graph with a single vertex and a loop. Applying rule 4, this graph has Tutte polynomial y. So, by rule 2, the Tutte polynomial of this graph F is given by \displaystyle T_F (x,y) = x + y .

Finally, consider the triangle graph G. Take an edge e; it’s neither a bridge nor a loop, so rule 2 applies. Removing e results in the graph E from above, which has Tutte polynomial x^2. Contracting e results in the graph F from above with two parallel edges; and we’ve seen it has Tutte polynomial x+y. So, putting it all together, we obtain the Tutte polynomial of the triangle as

\displaystyle T_G (x,y) = T_{G-e} (x,y) + T_{G.e} (x,y) = T_E (x,y) + T_F (x,y) = x^2 + x + y.

Having seen these examples, hopefully the process starts to make some sense.

However, as we mentioned before, we’ve given no motivation for why this works. And, it’s not even clear that it works at all! If you take a graph,  you can delete and contract different edges in different orders and get all sorts of different polynomials along the way. It’s not at all clear that you’ll obtain the same result regardless of how you remove the edges.

Nonetheless, it is true, and was proved by Tutte, that no matter how you simplify the graph at each stage, you’ll obtain the same result. In other word, the Tutte polynomial of a graph is actually well defined.


H, O, M, F, L, Y, P and T

Tutte invented his polynomial in the 1940s — it was part of his PhD thesis. So the Tutte polynomial has been around for a long time. The knot polynomial that we’re going to consider, however, is considerably younger.

In the 1980s, there was a revolution in knot theory. The excellent mathematician Vaughan Jones in 1984 discovered a polynomial which can be associated to a knot. It has become known as the Jones polynomial. It was not the first polynomial that anyone had defined from a knot, but it sparked a great deal of interest in knots, and led to the resolution of many previously unknown questions in knot theory.

Once certain ideas are in the air, other ideas follow. Several mathematicians started trying to find improved versions of the Jones polynomial, and at least 8 mathematicians came up with similar ways to improve the Jones polynomial. In 1985, Jim Hoste, Adrian Ocneanu, Kenneth Millett, Peter J. Freyd, W. B. R. Lickorish, and David N. Yetter published a paper defining a new polynomial invariant. Making an acronym of their initials, it’s often called the HOMFLY polynomial. Two more mathematicians, Józef H. Przytycki and Pawe? Traczyk, did independent work on the subject, and so it’s often called the HOMFLY-PT polynomial.

Like the Tutte polynomial, the HOMFLY polynomial is a polynomial in two variables. (The Jones polynomial, however, is just in one variable.) It can also be written as a homogeneous polynomial in three variables. We’ll take the 3-variable homogeneous version.

Strictly speaking, to get a HOMFLY polynomial, your knot must be oriented: it must have a direction. This is usually represented by an arrow along the knot.

HOMFLY polynomials also exist for links — a link is just a knot with many loops invovled. So even if there are several loops knotted up, they still have a HOMFLY polynomial. (Each loop needs to be oriented though.)

So, if you’re given an oriented knot or link K, it has a HOMFLY polynomial. We’ll denote it by P_K (x,y,z). So how do you compute it? By following some rules which successively simplify the knot.

  1. If the knot K is the unknot, then P_K (x,y,z) = 1.
  2. If you take one of the crossings in the diagram and alter it in the various ways shown below — but leave the rest of the knot unchanged — then you obtain three links L^+, L^-, L^0. Their HOMFLY polynomials are related by

    \displaystyle x P_{L^+} (x,y,z) + y P_{L^-} (x,y,z) + z P_{L^0} (x,y,z) = 0.

    Source: C_S, wikimedia

    A relationship like this, between three knots or links which differ only at a single crossing, is called a skein relation.

  3. If you can move the link L around in 3-dimensional space to another link L', then this doesn’t change the HOMFLY polynomial: $latex P_L (x,y,z) = P_{L’} (x,y,z).
  4. If the oriented link L is split, i.e. separates into two disjoint (untangled) sub-links L_1, L_2, then you can take the HOMFLY polynomials of the two pieces separately, and multiply them, with an extra factor:

    \displaystyle P_L (x,y,z) = \frac{-(x+y)}{z} P_{L_1} (x,y,z) P_{L_2} (x,y,z).

  5. If L is a connect sum of two links L_1, L_2, then you can take the HOMFLY polynomials of the two pieces, and multiply them:

    \displaystyle P_L (x,y,z) = P_{L_1} (x,y,z) P_{L_2} (x,y,z).

    What is a connect sum? It’s when two knots or links are joined together by a pair of strands, as shown below.

    Source: Maksim, wikimedia

And there you go.

Again, it’s not at all clear where these rules come from, or that they will always give the same result. There might be many ways to change the crossings and simplify the knot, but H and O and M and F and L and Y and P and T showed that in fact you do always obtain the same result for the polynomial.

Let’s see how to do this in a couple of examples.

First of all, for the unknot U, by rule 1, its HOMFLY polynomial is P_U (x,y,z) = 1.

Second, let’s consider two linked unknots as shown below. This is known as the Hopf link. Let’s call it H.

Source: Jim.belk, wikimedia

Let’s orient both the loops so that they are anticlockwise. Pick one of the crossings and consider the three possibilities obtained by replacing it according to the skein relation described above, H^+, H^-, H^0. You should find that H^+ corresponds to the crossing as it is shown, so H^+ = H. Changing the crossing results in two unlinked rings, that is, H^- = two split unknots. By rule 4 above then, P_{H^-} (x,y,z) = \frac{-(x+y)}{z} P_U (x,y,z) P_U (x,y,z); and as each unknot has HOMFLY polynomial 1, we obtain P_{H^-} (x,y,z) = \frac{-(x+y)}{z}. On the other hand, smoothing the crossing into H^0 gives an unknot, so P_{H^0} (x,y,z) = P_{U} (x,y,z) = 1.

Putting this together with the skein relation (rule 2), we obtain the equation

\displaystyle x P_{H^+} (x,y,z) + y P_{H^-} (x,y,z) + z P_{H^0} (x,y,z) = 0,

which gives

\displaystyle x P_H (x,y,z) + y \frac{-(x+y)}{z} + z = 0

and hence the HOMFLY of the Hopf link is found to be

\displaystyle P_H (x,y,z) = \frac{ y(x+y)}{xz} - \frac{z^2}{xz} = \frac{xy + y^2 - z^2}{xz}.


When the Tutte is the HOMFLY

In 1988, Francois Jaeger showed that the Tutte and HOMFLY polynomials are closely related.

Given a graph G drawn in the plane, it has a Tutte polynomial T_G (x,y), as we’ve seen.

But from such a G, Jaeger considered a way to build an oriented link D(G). And moreover, he showed that the HOMFLY polynomial of D(G) is closely related to the Tutte polynomial of G. In other words, T_G (x,y) and P_{D(G)} (x,y,z) are closely related.

But first, let’s see how to build an link from a graph. It’s called the median construction. Here’s what you do. Starting from your graph G, which is drawn in the plane, you do the following.

  • Thicken G up. You can then think of it as a disc around each vertex, together with a band along each edge.
  • Along each edge of G, there is now a band. Take each band, and put a full right-handed twist in it. You’ve now got a surface which is twisted up in 3-dimensional space.
  • Take the boundary of this surface. It’s a link. And this link is precisely D(G). (As it turns out, there’s also a natural way to put a direction on D(G), i.e. make it an oriented link.)

It’s easier to understand with a picture. Below we have a graph G, and the link D(G) obtained from it.

A graph (source: wikimedia) and the link (source: Jaeger) obtained via the median construction.

Jaeger was able to show that in general, the Tutte polynomial T_G (x,y) and the HOMFLY polynomial P_{D(G)} (x,y,z) are related by the equation

\displaystyle P_{D(G)} (x,y,z) = \left( \frac{y}{z} \right)^{V(G)-1} \left( - \frac{z}{x} \right)^{E(G)} T_G \left( -\frac{x}{y}, \frac{-(xy+y^2)}{z^2} \right),

where V(G) denotes the number of vertices of G, and E(G) denotes the number of edges of G. Essentially, Jaeger showed that the process you can use to simplify the link D(G) to calculate the HOMFLY polynomial, corresponds in a precise way to the process you can use to simplify the graph G to calculate the Tutte polynomial.

In addition to this excellent correspondence — Tutte meeting HOMFLY — Jaeger was able to deduce some further consequences.

He showed that the four colour theorem is equivalent to a fact about HOMFLY polynomials: for every loopless connected plane graph G, P_{D(G)} (3,1,2) \neq 0.

Moreover, since colouring problems for plane graphs are known to be very hard, in terms of computational complexity — NP-hard — it follows that the computation of the HOMFLY polynomial is also NP hard.

Said another way: if you could find a way to compute the HOMFLY polynomial of a link in polynomial time, you would prove that P = NP and claim yourself a Millennium prize!


Written by dan

September 8th, 2017 at 4:18 pm

Holy-principle, Batman!

(With apologies and tribute to the late Adam West)



Here’s a situation known to any beginning skier.

You are at the top of a mountain slope. You want to go down to the bottom of the slope. But you are on skis, and you are not very good at skiing.

Despite your lack of skill, you have been deposited by a ski lift at the top of the slope. Slightly terrified, you know that the only honourable way out of your predicament is down — down the slope, on your skis.

Pointing your skis down the slope, with a rush of exhilaration you find yourself accelerating downwards. Unfortunately, you know from bitter experience that the hardest thing about learning to ski is learning to control your speed.

If you are bold or stupid, your incompetent attempt to conquer the mountain likely ends in a spectacular crash. If you are cowardly and have mastered the snowplough, your plodding descent ends with a whimper and worn out thighs from holding the skis in an awkward triangle all the way down.

But you know that the more your skis point down the mountain, the faster you go. And the more your skis point across the slope, the slower you go.

When your skis point down the slope, they are pointing steeply downwards; they are pointing in quite a steep direction. But when your skis point across the slope, they are not pointing very downwards at all; they are pointing in quite a flat direction.

As an incompetent skier, the best way to get down the slope without injury and without embarrassment is to go take a path which criss-crosses the slope as much as possible. You want your skis to point in a flat direction as much as possible, and in a steep direction as little as possible.

The problem is that each time you change direction, you temporarily point downwards, and risk runaway acceleration.

Is it possible to get down the mountain while always pointing in a flat direction?

Of course the answer is no. But there is an area of mathematics which says that the answer is yes, almost, more or less.

This, very roughly, is the beginning of the idea of Gromov’s homotopy principle — often abbreviated to h-principle.


* * *


The ideas of the h-principle were developed by Mikhail Gromov in his work in the 1970s, including work with my PhD advisor Yakov Eliahsberg. The term “h-principle” first appeared and was systematically elaborated by Gromov in his (notoriously difficult) book Partial differential relations. Gromov, who grew up in Soviet Russia, was not a friend of the authorities and in 1970, despite being invited to speak at the International Congress of Mathematicians in France, was prohibited from leaving the USSR. Later he finally made his “gromomorphism” to the US, and now he works at IHES just near Paris.

The ideas of the h-principle are about a “soft” kind of geometry. If you are prepared to treat your geometry like playdough, and morph various things around, within certain constraints, then the $h$-principle tells you how to morph your playdough-geometry to get the nice kind of geometry you want. Technically, morphing playdough has a fancy name: it’s called homotopy.

An example of the h-principle (or, more precisely, of “holonomic approximation”) in the skiing context would be as follows.

Your ideal skiing path down the slope would be to go straight down, but always have your skis pointing flat. That is, your skis should always point horizontally, but you should go straight down the mountain. That is the ideal.

That, of course, is a ridiculous ideal path. But it’s an ideal path nonetheless. You want to go straight down the mountain, and the ideal path does this; and you want to have your skis pointing safely horizontally, and the ideal path does this too. This is the ideal of the incompetent skier: go down the mountain as directly as possible, with your skis always being completely flat.

Now the ideal path cannot be achieved in practice. In practice, if your skis are pointing horizontally, you go horizontally. (We ignore skidding for the purposes of our mathematical idealisation.) In practice, you go in the direction your skis are pointing.

The “ideal path” is a path down the mountain, which also tells you which way your skis should point at each stage (i.e. horizontally), but which doesn’t satisfy the practical principle that you should go in the direction you’re pointing. As such, it’s a generalisation of the real sort of path you could actually take down the mountain. (The technical name for this type of path is a “section of a jet bundle”.)

A realistic path down the mountain — one where you go in the direction your skis are pointing — is also known as a holonomic path.

One of the first results towards the h-principle, says that if you are prepared to make a few tiny tiny adjustments to your path, then you can take an actual, holonomic path down the mountain, where you are very very close to the ideal path — both in terms of where you go, and in the direction your skis point. You stay very very close — in fact, arbitrarily close — to the path straight down the mountain. And your skis stay very very close — again, arbitrarily close — to horizontal at every instant on the way down.

How is this possible? Well, you have to make some adjustments.

First, you make some adjustments to the path. You might have to make a wiggly path, rather than going in a straight line. Actually, it will have to be very wiggly — arbitrarily wiggly.

And, second, you’ll have to make some adjustments to the mountain too. You’ll have to adjust the shape of the mountain slope — but only by a very very small, arbitrarily small amount.

Well, perhaps these types of alternations are rather drastic. But without moving the mountain, you won’t be able to go down the mountain and stay very close to horizontal. You must alter the ski slope, and you must alter your path. But these movements are very very small, and you can make them as small as you like.

How do you alter the mountain? Roughly, you can make tiny ripples in the slope — and roughly, you turn it into a terraced slope. Just like rice farming in Vietnam, or for growing crops in the Andes.


Terraced farmland in Peru. By Alexson Scheppa Peisino(AlexSP).

As you go along a terrace, you remain horizontal! We don’t want our terraces to be completely horizontal though — we want them to have a very gentle downwards slope, so that we can stay very close to horizontal, and yet eventually get to the bottom of the mountain.

And also, we’ll need to be able to go smoothly from one terrace down to the next, so each terraces should turn into the next. So perhaps it’s more like Lombard Street, the famously windy street in San Francisco. (Which is not, however, the most crooked street in the US — that’s Wall Street, of course. Got you there.)


Lombard Street. By Gaurav1146

Perhaps a more accurate depiction of what we want is the figure below, from Eliashberg and Mishachev’s book Introduction to the h-principle. We want very fine, very gently sloping terraces, and we want them extremely small, so that the mountain is altered by a tiny tiny amount. And to go down the slope we need to take a very windy path — with many many wiggles in it. To go down the slope is almost like a maze — although it’s a very simple, repetitive maze.

A modified, lightly terraced, very windy, ski slope.

Thus, with a astonomical number of microscopic terraces of the mountain, each nano-scopically sloped downwards, and an astronomical number of microscopic wiggles in your path down the terraces, you can go down the mountain, staying very close to your idealised path. You go very very close to straight down the mountain, and your skis stay very very close to horizontal all the way down.

And then, you’ve done it.

This is the flavour of the h-principle. More precisely this result is called holonomic approximation. Holonomic approximation says that even an incompetent skier can get down a mountain with an arbitrarily small amount of trouble — provided that they can do an arbitrarily large amount of work in advance to terrace the mountain and prepare themselves an arbitrarily flat arbitrarily wiggly path.


* * *


The h-principle has applications beyond idealised incompetent skiing down microscopically terraced mountains. Two of the most spectacular applications are sphere eversion, and isometric embedding. In fact they both preceded the h-principle — and Gromov’s attempt to understand and formalise them directly inspired the development of the h-principle.

Sphere eversion  is a statement about spheres in three-dimensional space. Take a standard unit sphere in, but again regard it as made of playdough, and we will consider morphing (erm, homotoping) it in space. We allow the sphere to pass through itself, but never to crease, bend or rip. All the sphere can intersect itself, each point of the sphere must remain smooth enough to have a tangent plane. (The technical name for this is an immersion of the sphere into 3-dimensional space.)

Smale’s sphere eversion says that it’s possible to turn the sphere inside out by these rules — that is, by a homotopy through immersions. This amazing theorem is all the more amazing because Smale’s original 1957 proof was an existence proof: he proved that there existed a way to turn the sphere inside out, but did not say how to do it! Many explicit descriptions have now been given for sphere eversions, and there are many excellent videos about it, including Outside In made in the 1990s. My colleague Burkard Polster, aka the Mathologer, also has an excellent video about it.

Smale has an interesting and admirable biography. He was actively involved in the anti-Vietnam War movement, even to the extent of being subpoenaed by the House Un-American Activities Committee. His understanding of the relationship between creativity, leisure and mathematical research was epitomised in his statement that his best work was done “on the beaches of Rio”. (He discovered his horseshoe map on the beach at Leme.)

Isometric embedding is more abstract; see my article in the conversation for another description, but it is even more amazing. It is a theorem about abstract spaces. For instance, you could take a surface — but then, put on it an abstractly-defined metric, unrelated to how it sits in space. Isometric embedding attempts to map the surface back into 3-dimensional space in a way that preserves distances, so that the abstract metric on a surface corresponds to the familiar notion of distance we know in three dimensions.

Isometric embedding is largely associated with John Nash, who passed away a couple of years ago and is more well known for his work on game theory, and from the book and movie A Beautiful Mind. The proof is incredible. Gromov describes how he came to terms with this proof in some recollections. He originally found Nash’s proof “as convincing as lifting oneself by the hair”, but after eventually finding understanding, he found Nash’s proof “miraculously, did lift you in the air by the hair”!

The Nash-Kuiper theorem says that if you can map your abstract surface into 3-dimensional space in such a way that it decreases distances, then you can morph it — homotope it — to make it preserve distances. (Actually, it need not be a surface but a space of any dimension; and it need not be 3-dimensional space, but space of any dimension.) And, just like on the ski slope, this alteration of the surface in 3-dimensional space can be made very very small — arbitrarily small.

The h-principle is another mathematical superpower, and it comes up in many places where geometry is “soft”, and you can slightly “morph” or “adjust” your geometrical situation to find the situation we want.


Written by dan

June 12th, 2017 at 3:10 pm

The Impact of Impact

An interesting scholarly article appeared in the journal Studies in Higher Education in February of this year, by Jennifer Chubb and Richard Watermeyer. It investigates some aspects of the research funding system in the UK and Australia.

Give any research academic in Australia today (or the UK, or well, anywhere) a few minutes to vent about their job and you will most likely hear a tirade about grants — whether the writing of research grant applications, the application process, the chances of success, who and what tends to succeed, the pressures universities exert on researchers to obtain them, or any aspect of the related culture.

Well, come to think of it, there might be tirades against many possible things. The university universe is not short on tirades or things to tirade against.

While anyone in the academic world will be very familiar with the standard grievances — and it would take far too long to attempt to make a list — they are grievances usually only aired in private.

What is good about this article is that it uses the medium of a research article to air the views of academics, suitably anonymised, in public. The focus is on a particularly problematic aspect of the process of research funding in Australia and the UK: impact statements.

To quote the article,

In both UK and Australian funding contexts… the perceived merit of a research funding application is now linked to the capacity of the applicant to prescribe convincing (pathways to) research impacts, or more specifically, credible statements of how they will ensure economic and/or societal returns from their research… ‘Impact Statements’ … demand that academics demonstrate an awareness of their external communities and how they will benefit from the proposed research… [and] require that academics demonstrate methodological competency in engaging with their research users, showing how research will be translated and appropriated in ways that most effectively service users’ needs.

On its face, it looks like a good idea: any research asking for public money must make some attempt to justify its effect on society. And that doesn’t just look like a good idea, it is a good idea.

However, most research — including especially most important and worthy research — has zero-to-infinitesimal direct impact on society — or at least very little that can be explained in the few sentences of the word limit to create an “impact”. There are certainly areas that do have direct impact: most medical research; some (but not all) climate research; some renewable energy research; some biotechnology and nanotechnology research, and so on. But of course, that research with the most immediate direct economic or commercial impact is already funded by private capital and does not need public funding. Most research is much slower, uncertain, slowly and methodically working towards a long-term scientific or scholarly goal — with occasional surprises and breakthroughs.

But what impact statements, and the associated culture, demand are not accurate stories with all the complexity of scientific understanding, research programmes, educated guesswork and careful methodology that sensible research requires. That would take too long. Boring! We want impact. In a few words. Major impact. High velocity. Boom. That’s what we’re looking for. And that’s just not how research works.

Alas, simply saying that your research makes the world a better place by improving its store of important scientific and scholarly knowledge, and making society better because by supporting this research the society becomes the kind of society that supports this kind of research, is much too subtle for the politics of the situation to allow. Rather, the politics of the situation make the impact statement into a crude sales pitch.

Thus, we have a situation where, in the principal public statements made to support scientific and scholarly research, the predominant, sufficient and principal good reason for the public to support scientific and scholarly research is out of the question — it is inexpressible. It is also, in effectively preventing full justifications from being aired (at least where it counts), a scientific version of the censorship by concision so familiar in mainstream media.

How would, say, Euler have written an impact statement for his research into, say, analysis? The impact of theorems which gradually improve mathematical understanding, over decades and centuries, to the point where they enable breakthroughs in other sciences, engineering, or technology, is impossible to quantify. Even for those parts of Euler’s research which have had major, definite and decisive impact, like Euler’s theorem in number theory central to RSA encryption, the idea that Euler could have had any inkling of this application, over 200 years later, is laughable. Even in  1940 Hardy’s A Mathematician’s Apology sung the praises of number theory precisely because of its uselessness.

So, justification based on “impact” would have been an impossible task for Euler. And Euler is the most prolific mathematician of all time, one of the greatest mathematicians of all time. God help any lesser mortal.

To be fair, pure mathematics is in some sense too easy a case. The very inapplicability of pure mathematics is so clear that any statement about “impact” in this context can only seriously be understood as a source of amusement. A three-year project to think hard and prove some theorems about some interesting and important field of mathematics — but which may have some practical applications, one day, but this is impossible to predict, and in all likelihood not — is so far from the average person’s concept of “impact” that we can only feel that the poor mathematician has been dragged by a faceless bureaucracy into a system designed for someone else, in some other time and place.

Or, perhaps slightly more accurately, and disturbingly, a pure mathematician made to justify their research based on “impact” is a lamb about to be fed to the lions. But thankfully, mathematicians will not be fed to the lions — or at least, not all of them — because the emperor has taken their side. A society without mathematicians produces none of the STEM-literate graduates that the emperor, capital, demands. The survival of the planet, as it turns out, also demands STEM-literate graduates, but as the perilous state of the planet so clearly attests, it is capital, not the planet, which is a much stronger determinant of social outcomes, at least under present social arrangements.

Mathematics aside, the point remains. Requiring 30-second written advertisements called “impact statements” leads to exaggeration, over-speculation, and, at best, twisting of the truth.

But don’t take if from me — it’s much more interesting to requote the senior academics at Australian/UK universities quoted in the article:

It’s virtually impossible to write one of these grants and be fully frank and honest in what it is you’re writing about. (Australia, Professor)

‘illusions’ (UK, Professor); ‘virtually meaningless’, or ‘made up stories’ (Australia, Professor) ‘…taking away from the absolute truth about what should be done’ (UK, Professor). Words such as lying, lies, stories, disguise, hoodwink, game – playing, distorting, fear, distrust, over- engineering, flower-up, bull-dust, disconnected, narrowing and the recurrence of the word ‘problem’

Would I believe it? No, would it help me get the money – yes. (UK, Professor)

I will write my proposals which will have in the middle of them all this work, yeah but on the fringes will tell some untruths about what it might do because that’s the only way it’s going to get funded and you know I’ve got a job to do, and that’s the way I’ve got to do it. It’s a shame isn’t it? (UK, Professor)

If you can find me a single academic who hasn’t had to bullshit or bluff or lie or embellish in order to get grants, then I will find you an academic who is in trouble with his [sic] Head of Department. If you don’t play the game, you don’t do well by your university. So anyone that’s so ethical that they won’t bend the rules in order to play the game is going to be in trouble, which is deplorable. (Australia, Professor)It’s about survival. It’s not sincere all the way through…that’s when it gets disheartening. It puts people on the back foot and fuels a climate of distrust. (UK, Professor)

It is impossible to predict the outcome of a scientific piece of work, and no matter what framework it is that you want to apply it will be artificial and come out with the wrong answer because if you try to predict things you are on a hiding to nothing. (UK, Professor)

The idea therefore that impact could be factored in in advance was viewed as a dumb question put in there by someone who doesn’t know what research is. I don’t know what you’re supposed to say, something like ‘I’m Columbus, I’m going to discover the West Indies?!’ (Australia, Professor)

It’s disingenuous, no scientist really begins the true process of scientific discovery with the belief it is going to follow this very smooth path to impact because he or she knows full well that that just doesn’t occur and so there’s a real problem with the impact agenda- and that is it’s not true it’s wrong – it flies in the face of scientific practice. (UK, Professor)

It’s really virtually impossible to write an (Australian Research Council) ARC grant now without lying and this is the kind of issue that they should be looking at. (Australia, Professor)

It becomes increasingly difficult – one would be very hard pressed to write a successful grant application that’s fully truthful…you’re going to get phony answers, they’re setting themselves up for lies…[they go on]…it’s absurd to expect every grant proposal to have an impact story. (Australia, Professor)

Trying to force people to tell a causal story is really tight, it’s going to restrict impact to narrow immediate stuff, rather than the big stuff, and force people to be dishonest. (UK, Professor)

They’re just playing games – I mean, I think it’s a whole load of nonsense, you’re looking for short term impact and reward so you’re playing a game…it’s over inflated stuff. (Professor, Australia)


Written by dan

April 29th, 2016 at 12:57 pm

The Lost Art of Integration Impossibility

Integration is hard.

When we learn calculus, we learn to differentiate before we can integrate. This is despite the fact that, arguably, integration is an “easier” concept. To my mind at least, when I am given a curve in the plane, the notion of an area bounded by this curve is a very straightforward, intuitive thing; while the notion of its “gradient” or “slope” at a point is a much more subtle, or at least less intuitive idea.

But whether these ideas are natural or not, one is certainly mathematically and technically more difficult than the other. Integration is much more subtle and difficult.

These difficulties highlight the extent to which integration is less a science and more an art form. And in my experience, those difficulties are seen very rarely in high school or undergraduate mathematics, even as students take course after course about calculus and integration. So it high time we shed some light on this lost art.

Really existing differentiation

In order to see just how hard integration is, let’s first consider how we learn, and apply, the ideas of differentiation.

When we learn differentiation, we first learn a definition that involves limits and difference quotients — the old chestnut \lim_{h \rightarrow 0} \frac{ f(x+h) - f(x) }{ h } . We pass through a discussion of chords and tangents — perhaps even supplemented with some physical intuition about average and instantaneous velocity. From this we have a “first principles” approach to calculus, using the formula f'(x) = \lim_{h \rightarrow 0} \frac{ f(x+h) - f(x) }{ h } .

This formula, and the whole “first principles” approach, is then promptly forgotten. After we learn the “first principles” of calculus, we then learn a series of rules, techniques and tricks, such as the product rule, quotient rule and chain rule. Using these, combined with a few other “basic” derivatives, most students will never need the “first principles” again.

More specifically, once we know how to differentiate basic functions like polynomials, trig functions and exponentials,

\displaystyle   \text{e.g.} \quad \frac{d}{dx} (x^n) = nx^{n-1}, \quad \frac{d}{dx} e^x = e^x, \quad \frac{d}{dx} \sin x = \cos x

and we know the rules for how to differentiate their products, quotients, and compositions

\displaystyle   \text{e.g.} \quad   \frac{d}{dx} f(x)g(x) = f'(x) g(x) + f(x) g'(x), \quad  \frac{d}{dx} \frac{f(x)}{g(x)} = \frac{ f'(x) g(x) - f(x) g'(x) }{ g(x)^2 }, \quad  \frac{d}{dx} f(g(x)) = f'(g(x)) \; g'(x)

we can forget all about “first principles” and mechanically apply these formulae. With some basics down, and armed with the trident of product, quotient, and chain rules, then, we can differentiate most functions we’re likely to come up against.

It turns out, then, that in a certain sense, differentiation is “easy”. You don’t need to know the theory so much as a few basic rules and techniques. And although these rules can be a bit technically demanding, you can use them in a fairly straightforward way. In fact, their use is algorithmic. If you’ve got the technique sufficiently down, then you can mechanically differentiate most functions we’re likely to come across.

Let’s make this a little more precise. What do we mean by “most functions we’re likely to come across”? What are these functions? We mean the elementary functions. We can define these as follows. We start from some “basic” functions: polynomials, rational functions, trigonometric functions and their inverses, exponential functions and logarithms.

\displaystyle   \text{E.g.} \quad 31 x^4 - 159 x^2 + 65, \quad \frac{ 2x^5 - 3x + 1 }{ x^3 + 8x^2 - 1 }, \quad \sin x, \quad \cos x, \quad \tan x, \quad \arcsin x, \quad \arccos x, \quad \arctan x, \quad a^x, \quad \log_a x.

We then think of all the functions that you can get by repeatedly adding, subtracting, multiplying, dividing, taking n‘th roots (i.e. square roots, cube roots, etc) and composing these functions. These functions are the elementary ones. They include functions like the following:

\displaystyle   \log_2 \left( \frac{ \sqrt[4]{3x^4 - 1} + 2\sin e^x }{ \arcsin (x^{\tan \log_3 x} + \sqrt[7]{ \pi^x - \cos (x^2) } ) - x^x } \right).

(Aside: There’s actually a technicality here. Instead of saying that we can take n‘th roots of a function, we should actually say that we can take any function which is a solution of a polynomial expression of existing functions. The n‘th root of a function f(x) , i.e. \sqrt[n]{f(x)} , is the solution of the polynomial equation in f(x) given by f(x)^n - 1 = 0 . That is, you can take an algebraic extension of the function field. Having done this, you can find the derivative of the new function using implicit differentiation. But we will not worry too much about these technicalities.)

Actually, the above definition is not really a very efficient one. If you start from just the constant real functions and the function x, then you can build a lot just from them! By repeatedly adding and multiplying xs and constants, you can build any polynomial; and then by dividing polynomials you can build any rational function. If you throw in e^x and \ln x = \log_e x , then you also have all the other exponential and logarithmic functions, because for any (positive real) constant a,

\displaystyle   a^x = e^{x \log_e a}   \quad \text{and} \quad  \log_a x = \frac{ \log_e x }{ \log_e a},

and \log_e a is a constant! If you allow yourself to also use complex number constant functions, then you can build the trig functions out of exponentials,

\displaystyle   \sin x = \frac{ e^{ix} - e^{-ix} }{ 2i }, \quad  \cos x = \frac{ e^{ix} + e^{-ix} }{ 2 },

and then you have \tan x = \frac{\sin x }{ \cos x } . You can also build hyperbolic trigonometric functions if you wish, since \sinh x = \frac{ e^x - e^{-x} }{2} , \cosh x = \frac{ e^x + e^{-x} }{2} , and \tanh x = \frac{ \sinh x }{ \cosh x } .

The formulas above for \sin x and cos x are relatively well known if you’ve studied complex numbers; a little less well-known are the formulas that allow us to express inverse trigonometric functions in terms of complex numbers, together with logarithms and square roots:

\displaystyle   \arcsin x = - i \; \ln \left( ix + \sqrt{1-x^2} \right), \quad  \arccos x = i \; \ln \left( x - i \sqrt{1-x^2} \right), \quad  \arctan x = \frac{i}{2} \; \left( \ln (1-ix) - \ln (1+ix) \right).

(If you haven’t seen these before, try to prove them! There are also logarithmic functions for inverse hyperbolic trigonometric functions, which are probably slightly more well known as they don’t have complex numbers in them.)

Thus, we can define an elementary function as a function which can be built from the functions \{ \text{complex constants}, x, e^x, \ln x \} using a finite number of additions, subtractions, multiplications, divisions, compositions, and n‘th roots (or really, solving polynomial equations in existing functions but don’t worry about this bit in parentheses).

The point is, that if you are good enough at the product, chain, and quotient rules, you can differentiate any elementary function. You don’t need any more tricks, though you might need to apply the rules very carefully and many times over! A further point is that when you find the answer, you find that the derivative of an elementary function is another elementary function.

Not so elementary, my dear Watson

When we come to integration, though, everything becomes much more difficult. I’m only going to discuss indefinite integration, i.e. antidifferentiation. Definite integration with terminals just ends up giving you a number, but indefinite integration is essentially the inverse problem to differentiation. If we’re asked to find the indefinite integral \int f(x) \; dx , we’re asked to find a function g(x) whose derivative is f(x), i.e. such that g'(x) = f(x). There are many such functions: if you have one such function g(x), then you can add any constant c to it, and the resulting function g(x)+c also has derivative f(x); that is why we tend to write +c at the end of the answer to any indefinite integration question. But it will suffice for us, here, to be able to find one — for the sake of simplicity, I will not write +c in the answers to indefinite integrals. In doing so I lose 1 mark for every integral I solve, but I don’t care!

We start with some basic functions like polynomials and trigonometric functions, exponentials and logarithms, some integrals are standard.

\displaystyle   \int x^n \; dx = \frac{1}{n+1} x^{n+1}, \quad  \int \sin x \; dx = - \cos x, \quad  \int \cos x \; dx = \sin x, \quad  \int e^x \; dx = e^x.

Some are slightly less standard:

\displaystyle   \int \tan x \; dx = - \ln \cos x, \quad  \int \ln x \; dx = x \ln x - x, \quad.

(You might complain that the integral of \tan x should actually be \ln | \cos x | . You’d be right, and I am totally sweeping that technicality under the carpet!)

Some inverse trigonometric integrals, perhaps, are less standard again:

\displaystyle   \int \arcsin x \; dx = x \arcsin x + \sqrt{1-x^2}, \quad  \int \arccos x \; dx = x \arccos x - \sqrt{1-x^2}, \quad  \int \arctan x \; dx = x \arctan x - \frac{1}{2} \ln (1+x^2).

So far, so good — although perhaps not always obvious! But now, in general, what if we start to combine these functions? The problem is that if you know how to integrate f(x) and you know how to integrate g(x), it does not follow that you know how to integrate their product f(x) g(x) . This is in contrast to differentiation: if you know how to differentiate f(x) and g(x), then you can use the product rule to differentiate f(x)g(x). There is no product rule for integration!

The product rule for differentiation, rather, translates into the integration by parts formula for integration:

\displaystyle   \int f(x) g'(x) \; dx = f(x) g(x) - \int f'(x) g(x) \; dx.

This is not a formula for \int f(x) g(x) \; dx ! A product rule for integration would say to you “if you can integrate both of my factors, you can integrate me!” But this integration by parts formula says something more along the lines of “if you can integrate one of my factors and differentiate the other, then you can express me in terms of the integral obtained by integrating and differentiating those two factors”. That is a much more subtle statement. A product rule would be a hammer you could use to crack integrals; but the integration formula is a much more subtle card up your sleeve.

Essentially, integration by parts supplies you with a trick which, if you are clever enough, and the integral is conducive to it, you can use to rewrite the integral in terms of a different integral which is hopefully easier. Hopefully. While the product rule for differentiation is an all-purpose tool of the trade — a machine used to calculate derivatives — integration by parts is a subtle trick which, when wielded with enough sophistication and skill, can simplify (rather than calculate) integrals.

Similarly, there is no chain rule for integration. The chain rule for differentiation translates into the integration by substitution formula for integration:

\displaystyle   \int f'(g(x)) \; g'(x) \; dx = f(g(x)).

A chain rule for integration would say to you “if I am a composition of two functions, and you can integrate both of them, then you can integrate me”. But integration by substitution says, instead, “if I am a composition of two functions, multiplied by the derivative of the inner function, then you can integrate me”. In a certain sense it’s easier than integration by parts, because it calculates the integral and gives you an answer, rather than merely reducing to a different (hopefully simpler) integral. But still, it remains an art form: it requires the skill to see how to regard the integrand as an expression of the form f'(g(x)) \; g'(x) . Finally, there is no quotient rule for integration either.

So, while differentiation is a skill which can be learned and applied, integration is an art form for which we learn tricks and strategies, and develop our skills and intuition in applying them. Now, actually there are tables of standard integrals, far far beyond the small examples above. There are theorems about how functions of certain types can be integrated. There are algorithms which can be used to integrate certain, often very complicated, families of functions.

But the question remains: how far can we go? If we see an integral which we can’t immediately solve, do we just need to think a little harder, and apply something from our bag of tricks in a clever new way? Do we just need more skill, or is the integral impossible? How would we tell the difference between a “hard” and an “impossible” integral — and what does that even mean?

In a certain sense, no integrals are “impossible”. An integral of a continuous function always exists, in a certain sense. If you’ve got a continuous function f  : \mathbb{R} \rightarrow \mathbb{R}, then its integral is certainly defined as a function, using the definition with Riemann sums — this is a theorem. Even if f is not continuous, it’s possible that the Riemann sum approach can give a well-defined function as the integral. For more exotic functions f, there is the more advanced method of Lebesgue integration.

But this is not what we have in mind when we say an “integral is impossible”. What we really mean is that we can’t write a nice formula for the integral. This would happen if the result were not an elementary function.

As we discussed above, if you take an elementary function and differentiate it, you can always calculate the derivative with a sufficiently careful application of product/chain/quotient rules, and the result is another elementary function.

So, we might ask: given an elementary function, even though there might not be any straightforward way to calculate its integral, is the result always another elementary function?

Indomitable impossible integrals

It turns out, the answer is no. There are elementary functions such that, when you take their integral, it is not an elementary function. When you try to integrate such a function, although the integral exists, you can’t write a nice formula for it. And it’s not because you’re not skillful enough. It’s not because you’re not smart enough. The reason you can’t write a nice formula for the integral is because no such formula exists: the integral is not an elementary function.

What is an example of such a function? The simplest example is one that high school students come up against all the time: the Gaussian function

\displaystyle   e^{-x^2}.

It’s clearly an elementary function, constructed by composition of a polynomial -x^2 and the exponential function. But its integral is not elementary.

You might recall that the graph of y = e^{-x^2} is a bell curve. Suitably dilated (normalised), it is the probability density function for a normal distribution. When you calculate probabilities involving normally distributed random variables, you often integrate this function.

You may recall painful time spent in high school looking up a table to find out probabilities for the normal distribution. That table is essentially a table of (definite) integrals for the function e^{-x^2} (or a closely related function). And the reason that it’s a table you have to look up, rather than a formula, is because there is no formula for the integral \int e^{-x^2} \; dx . You need a table because the integral of the elementary function e^{-x^2} is not elementary.

There’s no formula for normal distribution probabilities because integration is an art form, rather than algorithmic. And so we are sometimes reduced to the quite non-artistic process of looking up a table to find the integral.

Now, when I say that \int e^{-x^2} \; dx is not elementary, I mean that it’s known as a theorem. That is, it has been proved mathematically that \int e^{-x^2} \; dx is not elementary, and so doesn’t have a nice formula. But what could this mean? How could you prove that an integral doesn’t have a nice formula, isn’t an elementary function, can’t be written in a nice way? The proof is a bit complicated, too complicated to recall in complete detail here. But there are some nice ideas involved, and it’s worth recounting some of them here.

Proving the impossible

The fact that \int e^{-x^2} \; dx is not elementary was proved by the French mathematician Joseph Liouville in the mid-19th century. In fact, he proved quite a deal more. Suppose you have an elementary function f(x) , and you are trying to find its integral g(x) = \int f(x) \; dx . Now as the integrand f(x) is continuous, the integral g(x) certainly exists as a continuous function; the question is whether g(x) is elementary or not, i.e. whether there is a formula for g(x) involving only complex numbers, powers of x, rational functions, \exp and \log, and n‘th roots (and their generalisations).

Liouville’s theorem, amazingly, tells you that if the function g(x) you’re looking for is elementary, then it must have a very specific form. Very roughly, Liouville says, g(x) can have more logarithms than f(x), but no more exponentials. You can see the germ of this idea in some of the integrals above:

\displaystyle   \int \tan x \; dx = - \ln \cos x, \quad  \int \arctan x \; dx = x \arctan x - \frac{1}{2} \ln (1+x^2).

In these integrals, a new logarithm appears, that did not appear in the integrand. Never does a new exponential appear. If an exponential appears in the integral, then it appeared in the integrand, as in examples like

\displaystyle   \int e^x \; dx = e^x.

To state Liouville’s theorem more precisely, we need the idea of a field of functions. For our purposes, we can think of a field of functions as a collection of functions f(x) which is closed under addition, subtraction, multiplication, and division. The polynomials in x do not form a field of functions, because when you divide two polynomials you do not always get a polynomial! However, the rational functions in x do form a field of functions. A rational function in x is the quotient of two polynomials (with complex coefficients) in x, i.e. a function like

\displaystyle   \frac{ 3x^2 - 7 }{ x^{10} - x^9 + x^3 + 1}   \quad \text{ or } \quad  \frac{ 4x+1}{2x-3}   \quad \text{or} \quad  x^2 - 3x + \pi  \quad \text{or} \quad  3.

The first example is the quotient of a quadratic by a 10’th degree polynomial; the second example is the quotient of two linear polynomials. The third example illustrates the notion that any polynomial is also a rational function, because you can think of it as itself divided by 1, and 1 is a polynomial: x^2 - 3x + \pi = \frac{ x^2 - 3x + \pi }{ 1 } . The final example illustrates the notion that any constant is also a rational function.

The field of rational functions (with complex coefficients) in x is denoted \mathbb{C}(x) . You can make bigger fields of rational functions by including new elements! For instance, you could throw in the exponential function e^x , and then you can obtain the larger field of functions \mathbb{C}(x, e^x) . The functions in this field are those made up of adding, subtracting, multiplying, and dividing powers of x and the function e^x. So this includes functions like

\displaystyle   \frac{ x^2 + x e^x - e^x }{ x + 1 }  \quad \text{ or } \quad  (e^x) \cdot (e^x) = e^{2x}  \quad \text{ or } \quad  x^7 e^{4x} - 3 x^2 e^x + \pi.

Note however that a function like e^{e^x} does not lie in \mathbb{C}  (x, e^x) . This function field is made up out of adding, subtracting, multiplying and dividing x and e^x , but not by composing these functions.

We can see, then, that this second function field is bigger than the first one: \mathbb{C}(x) \subset \mathbb{C}(x, e^x) . In technical language, we say that \mathbb{C}(x) \subset \mathbb{C}(x, e^x) is a field extension. Moreover, both these fields have the nice property that they are closed under differentiation. That is, it you take a rational function and differentiate it, you get another rational function. And if you take a function in \mathbb{C}(x, e^x) , involving x‘s and e^x‘s, and differentiate it, you get another function in \mathbb{C}(x, e^x) . In technical language, we say that \mathbb{C}(x) and \mathbb{C}(x, e^x) are differential fields of functions.

A differential field obtained in this way, by starting from rational functions and then throwing in an exponential, is an example of an field of elementary functions. In general, a field of elementary functions is obtained from the field of rational functions \mathbb{C}(x) by successively throwing in extra functions, some finite number of times. Each time you add a function is must be either

  • an exponential of a function already in the field, or
  • a logarithm of a function already in the field, or
  • an n‘th root of a function already in the field (or more generally the root of a polynomial equation with coefficients in the field but as I keep saying don’t worry too much about this!).

Note that, by definition, any function in a field of elementary functions is made up by adding, subtracting, multiplying, and dividing x‘s, and exponentials, and logarithms, and n‘th roots (or generalisations thereof). That is, a function in an field of elementary functions is an elementary function! So our definitions of “elementary function” and “field of elementary functions” agree — it would be bad if we used the word “elementary” to mean two different things!

We can now state Liouville’s theorem precisely.

Liouville’s theorem: Let K be an field of elementary functions, and let f(x) be a function in K. (Hence f(x) is an elementary function.) If the integral \int f(x) \; dx is elementary, then

\displaystyle   \int f(x) \; dx = h(x) + \sum_{j=1}^n c_j \log g_j (x),

where n is a non-negative integer, each of c_1, c_2, \ldots, c_n is a constant, and the functions h(x), g_1(x), g_2(x), \ldots, g_n(x) all lie in K.

That is, Liouville’s theorem says that the integral of an elementary function f(x) must be a sum of a function h(x) that lies in the same field as f, and a constant linear combination of some logarithms of functions g_j(x) in the same field as f. The fact that h(x) and each g_j(x) lies in the same field K as f(x) means that they cannot be much more complicated than f(x): they must be made up by adding, subtracting, multiplying and dividing the same bunch of functions that you can use to define f(x)

So Lioville’s theorem says, in a precise way, that when you integrate an elementary function f(x), if the result is elementary, then it can’t be much more complicated than f(x), and the only way in which it can be more complicated is that it can have some logarithms in it. This is what we meant when we gave the very rough description “Liouville says g(x) can have more logarithms than f(x), but no more exponentials“.

Let’s now return to our specific example of the Gaussian function f(x) = e^{-x^2}. What does Liouville’s theorem mean for this function? Well, this function lies in the field of elementary functions K where we start from rational functions and then throw in, not e^x, but e^{-x^2}. That is, we can take K = \mathbb{C}(x, e^{-x^2}).

The theorem says that if the integral

\displaystyle   \int e^{-x^2} \; dx

is elementary, then it is given by

\displaystyle   \int e^{-x^2} \; dx = h(x) + \sum_{j=1}^n c_j \log g_j (x),

where n is a non-negative integer, each of c_1, c_2, \ldots, c_n is a constant, and the functions h(x), g_1(x), g_2(x), \ldots, g_n(x) all lie in \mathbb{C}(x, e^{-x^2}). That is, h(x), g_1(x), \ldots, g_n(x) are “no more complicated” than e^{-x^2}; they are all made by adding, subtracting, multiplying and dividing x‘s and e^{-x^2}‘s.

If we differentiate the above equation, we obtain

\displaystyle   e^{-x^2} = h'(x) + \sum_{j=1}^n c_j \frac{ g'_j (x) }{ g_j (x) },

On the left hand side is the function we started with, e^{-x^2}. On the right hand side is an expression involving several functions. However, all the functions g_j(x) and h(x) lie in K; they are “no more complicated” than e^{-x^2}. Now as K is a differential field, their derivatives g'_j(x) and h'(x) also lie in K; they are also “no more complicated”. So in fact the right hand side is an expression involving functions no more complicated than e^{-x^2}. They are all just rational functions, with e^{-x^2}‘s thrown in. And if you think about it, thinking about what you will get for each g'_j(x) / g_j (x), you might find it hard to avoid having a big denominator. You likely won’t be able to cancel the fraction. So you might find, then, that none of the g_j(x) can make this equality work, and all g_j(x) have to be zero; or in other words, e^{-x^2} = h'(x). But now that h(x) is, like everything else here, made up by adding, subtracting, multiplying and dividing x‘s and e^{-x^2}‘s. You might find, when you differentiate such a function, that it’s very hard to get a lone e^{-x^2}. Every time you differentiate an e^{-x^2} you get a -2xe^{-x^2}, which has a pesky extra factor of -2x. And even if it appears together with other terms, as something like x^3 e^{-x^2}, when you differentiate it you get something like -2x^4 e^{-x^2} + 3x^2 e^{-x^2}, which still has no isolated e^{-x^2} term. And so, in conclusion, you might find it very difficult to find any functions that make the right hand side equal to e^{-x^2}.

Of course, this is not a proof at all; it’s a mere plausibility argument. To prove the integral is not elementary does take a bit more work. But it has been done, and can be found in standard references.

Hopefully, though, this should at least give you some idea why it might be true, and how you might prove, that an integral is “impossible”, and can’t be written with any nice formula.

Mathematics is an amazing place.


Brian Conrad, Impossibility theorems for elementary integration, [[]].

Keith O. Geddes, Stephen R. Czapor, George Labahn, Algorithms for Computer Algebra, Kluwer (1992).

Andy R. Magid, Lectures on Differential Galois Theory, AMS (1994).

(Update 2/3/15: Typo fixed.)


Written by dan

March 1st, 2015 at 6:18 am