Archive for the ‘maths’ Category

Holy-principle, Batman!

(With apologies and tribute to the late Adam West)



Here’s a situation known to any beginning skier.

You are at the top of a mountain slope. You want to go down to the bottom of the slope. But you are on skis, and you are not very good at skiing.

Despite your lack of skill, you have been deposited by a ski lift at the top of the slope. Slightly terrified, you know that the only honourable way out of your predicament is down — down the slope, on your skis.

Pointing your skis down the slope, with a rush of exhilaration you find yourself accelerating downwards. Unfortunately, you know from bitter experience that the hardest thing about learning to ski is learning to control your speed.

If you are bold or stupid, your incompetent attempt to conquer the mountain likely ends in a spectacular crash. If you are cowardly and have mastered the snowplough, your plodding descent ends with a whimper and worn out thighs from holding the skis in an awkward triangle all the way down.

But you know that the more your skis point down the mountain, the faster you go. And the more your skis point across the slope, the slower you go.

When your skis point down the slope, they are pointing steeply downwards; they are pointing in quite a steep direction. But when your skis point across the slope, they are not pointing very downwards at all; they are pointing in quite a flat direction.

As an incompetent skier, the best way to get down the slope without injury and without embarrassment is to go take a path which criss-crosses the slope as much as possible. You want your skis to point in a flat direction as much as possible, and in a steep direction as little as possible.

The problem is that each time you change direction, you temporarily point downwards, and risk runaway acceleration.

Is it possible to get down the mountain while always pointing in a flat direction?

Of course the answer is no. But there is an area of mathematics which says that the answer is yes, almost, more or less.

This, very roughly, is the beginning of the idea of Gromov’s homotopy principle — often abbreviated to h-principle.


* * *


The ideas of the h-principle were developed by Mikhail Gromov in his work in the 1970s, including work with my PhD advisor Yakov Eliahsberg. The term “h-principle” first appeared and was systematically elaborated by Gromov in his (notoriously difficult) book Partial differential relations. Gromov, who grew up in Soviet Russia, was not a friend of the authorities and in 1970, despite being invited to speak at the International Congress of Mathematicians in France, was prohibited from leaving the USSR. Later he finally made his “gromomorphism” to the US, and now he works at IHES just near Paris.

The ideas of the h-principle are about a “soft” kind of geometry. If you are prepared to treat your geometry like playdough, and morph various things around, within certain constraints, then the $h$-principle tells you how to morph your playdough-geometry to get the nice kind of geometry you want. Technically, morphing playdough has a fancy name: it’s called homotopy.

An example of the h-principle (or, more precisely, of “holonomic approximation”) in the skiing context would be as follows.

Your ideal skiing path down the slope would be to go straight down, but always have your skis pointing flat. That is, your skis should always point horizontally, but you should go straight down the mountain. That is the ideal.

That, of course, is a ridiculous ideal path. But it’s an ideal path nonetheless. You want to go straight down the mountain, and the ideal path does this; and you want to have your skis pointing safely horizontally, and the ideal path does this too. This is the ideal of the incompetent skier: go down the mountain as directly as possible, with your skis always being completely flat.

Now the ideal path cannot be achieved in practice. In practice, if your skis are pointing horizontally, you go horizontally. (We ignore skidding for the purposes of our mathematical idealisation.) In practice, you go in the direction your skis are pointing.

The “ideal path” is a path down the mountain, which also tells you which way your skis should point at each stage (i.e. horizontally), but which doesn’t satisfy the practical principle that you should go in the direction you’re pointing. As such, it’s a generalisation of the real sort of path you could actually take down the mountain. (The technical name for this type of path is a “section of a jet bundle”.)

A realistic path down the mountain — one where you go in the direction your skis are pointing — is also known as a holonomic path.

One of the first results towards the h-principle, says that if you are prepared to make a few tiny tiny adjustments to your path, then you can take an actual, holonomic path down the mountain, where you are very very close to the ideal path — both in terms of where you go, and in the direction your skis point. You stay very very close — in fact, arbitrarily close — to the path straight down the mountain. And your skis stay very very close — again, arbitrarily close — to horizontal at every instant on the way down.

How is this possible? Well, you have to make some adjustments.

First, you make some adjustments to the path. You might have to make a wiggly path, rather than going in a straight line. Actually, it will have to be very wiggly — arbitrarily wiggly.

And, second, you’ll have to make some adjustments to the mountain too. You’ll have to adjust the shape of the mountain slope — but only by a very very small, arbitrarily small amount.

Well, perhaps these types of alternations are rather drastic. But without moving the mountain, you won’t be able to go down the mountain and stay very close to horizontal. You must alter the ski slope, and you must alter your path. But these movements are very very small, and you can make them as small as you like.

How do you alter the mountain? Roughly, you can make tiny ripples in the slope — and roughly, you turn it into a terraced slope. Just like rice farming in Vietnam, or for growing crops in the Andes.


Terraced farmland in Peru. By Alexson Scheppa Peisino(AlexSP).

As you go along a terrace, you remain horizontal! We don’t want our terraces to be completely horizontal though — we want them to have a very gentle downwards slope, so that we can stay very close to horizontal, and yet eventually get to the bottom of the mountain.

And also, we’ll need to be able to go smoothly from one terrace down to the next, so each terraces should turn into the next. So perhaps it’s more like Lombard Street, the famously windy street in San Francisco. (Which is not, however, the most crooked street in the US — that’s Wall Street, of course. Got you there.)


Lombard Street. By Gaurav1146

Perhaps a more accurate depiction of what we want is the figure below, from Eliashberg and Mishachev’s book Introduction to the h-principle. We want very fine, very gently sloping terraces, and we want them extremely small, so that the mountain is altered by a tiny tiny amount. And to go down the slope we need to take a very windy path — with many many wiggles in it. To go down the slope is almost like a maze — although it’s a very simple, repetitive maze.

A modified, lightly terraced, very windy, ski slope.

Thus, with a astonomical number of microscopic terraces of the mountain, each nano-scopically sloped downwards, and an astronomical number of microscopic wiggles in your path down the terraces, you can go down the mountain, staying very close to your idealised path. You go very very close to straight down the mountain, and your skis stay very very close to horizontal all the way down.

And then, you’ve done it.

This is the flavour of the h-principle. More precisely this result is called holonomic approximation. Holonomic approximation says that even an incompetent skier can get down a mountain with an arbitrarily small amount of trouble — provided that they can do an arbitrarily large amount of work in advance to terrace the mountain and prepare themselves an arbitrarily flat arbitrarily wiggly path.


* * *


The h-principle has applications beyond idealised incompetent skiing down microscopically terraced mountains. Two of the most spectacular applications are sphere eversion, and isometric embedding. In fact they both preceded the h-principle — and Gromov’s attempt to understand and formalise them directly inspired the development of the h-principle.

Sphere eversion  is a statement about spheres in three-dimensional space. Take a standard unit sphere in, but again regard it as made of playdough, and we will consider morphing (erm, homotoping) it in space. We allow the sphere to pass through itself, but never to crease, bend or rip. All the sphere can intersect itself, each point of the sphere must remain smooth enough to have a tangent plane. (The technical name for this is an immersion of the sphere into 3-dimensional space.)

Smale’s sphere eversion says that it’s possible to turn the sphere inside out by these rules — that is, by a homotopy through immersions. This amazing theorem is all the more amazing because Smale’s original 1957 proof was an existence proof: he proved that there existed a way to turn the sphere inside out, but did not say how to do it! Many explicit descriptions have now been given for sphere eversions, and there are many excellent videos about it, including Outside In made in the 1990s. My colleague Burkard Polster, aka the Mathologer, also has an excellent video about it.

Smale has an interesting and admirable biography. He was actively involved in the anti-Vietnam War movement, even to the extent of being subpoenaed by the House Un-American Activities Committee. His understanding of the relationship between creativity, leisure and mathematical research was epitomised in his statement that his best work was done “on the beaches of Rio”. (He discovered his horseshoe map on the beach at Leme.)

Isometric embedding is more abstract; see my article in the conversation for another description, but it is even more amazing. It is a theorem about abstract spaces. For instance, you could take a surface — but then, put on it an abstractly-defined metric, unrelated to how it sits in space. Isometric embedding attempts to map the surface back into 3-dimensional space in a way that preserves distances, so that the abstract metric on a surface corresponds to the familiar notion of distance we know in three dimensions.

Isometric embedding is largely associated with John Nash, who passed away a couple of years ago and is more well known for his work on game theory, and from the book and movie A Beautiful Mind. The proof is incredible. Gromov describes how he came to terms with this proof in some recollections. He originally found Nash’s proof “as convincing as lifting oneself by the hair”, but after eventually finding understanding, he found Nash’s proof “miraculously, did lift you in the air by the hair”!

The Nash-Kuiper theorem says that if you can map your abstract surface into 3-dimensional space in such a way that it decreases distances, then you can morph it — homotope it — to make it preserve distances. (Actually, it need not be a surface but a space of any dimension; and it need not be 3-dimensional space, but space of any dimension.) And, just like on the ski slope, this alteration of the surface in 3-dimensional space can be made very very small — arbitrarily small.

The h-principle is another mathematical superpower, and it comes up in many places where geometry is “soft”, and you can slightly “morph” or “adjust” your geometrical situation to find the situation we want.


Written by dan

June 12th, 2017 at 3:10 pm

The Impact of Impact

An interesting scholarly article appeared in the journal Studies in Higher Education in February of this year, by Jennifer Chubb and Richard Watermeyer. It investigates some aspects of the research funding system in the UK and Australia.

Give any research academic in Australia today (or the UK, or well, anywhere) a few minutes to vent about their job and you will most likely hear a tirade about grants — whether the writing of research grant applications, the application process, the chances of success, who and what tends to succeed, the pressures universities exert on researchers to obtain them, or any aspect of the related culture.

Well, come to think of it, there might be tirades against many possible things. The university universe is not short on tirades or things to tirade against.

While anyone in the academic world will be very familiar with the standard grievances — and it would take far too long to attempt to make a list — they are grievances usually only aired in private.

What is good about this article is that it uses the medium of a research article to air the views of academics, suitably anonymised, in public. The focus is on a particularly problematic aspect of the process of research funding in Australia and the UK: impact statements.

To quote the article,

In both UK and Australian funding contexts… the perceived merit of a research funding application is now linked to the capacity of the applicant to prescribe convincing (pathways to) research impacts, or more specifically, credible statements of how they will ensure economic and/or societal returns from their research… ‘Impact Statements’ … demand that academics demonstrate an awareness of their external communities and how they will benefit from the proposed research… [and] require that academics demonstrate methodological competency in engaging with their research users, showing how research will be translated and appropriated in ways that most effectively service users’ needs.

On its face, it looks like a good idea: any research asking for public money must make some attempt to justify its effect on society. And that doesn’t just look like a good idea, it is a good idea.

However, most research — including especially most important and worthy research — has zero-to-infinitesimal direct impact on society — or at least very little that can be explained in the few sentences of the word limit to create an “impact”. There are certainly areas that do have direct impact: most medical research; some (but not all) climate research; some renewable energy research; some biotechnology and nanotechnology research, and so on. But of course, that research with the most immediate direct economic or commercial impact is already funded by private capital and does not need public funding. Most research is much slower, uncertain, slowly and methodically working towards a long-term scientific or scholarly goal — with occasional surprises and breakthroughs.

But what impact statements, and the associated culture, demand are not accurate stories with all the complexity of scientific understanding, research programmes, educated guesswork and careful methodology that sensible research requires. That would take too long. Boring! We want impact. In a few words. Major impact. High velocity. Boom. That’s what we’re looking for. And that’s just not how research works.

Alas, simply saying that your research makes the world a better place by improving its store of important scientific and scholarly knowledge, and making society better because by supporting this research the society becomes the kind of society that supports this kind of research, is much too subtle for the politics of the situation to allow. Rather, the politics of the situation make the impact statement into a crude sales pitch.

Thus, we have a situation where, in the principal public statements made to support scientific and scholarly research, the predominant, sufficient and principal good reason for the public to support scientific and scholarly research is out of the question — it is inexpressible. It is also, in effectively preventing full justifications from being aired (at least where it counts), a scientific version of the censorship by concision so familiar in mainstream media.

How would, say, Euler have written an impact statement for his research into, say, analysis? The impact of theorems which gradually improve mathematical understanding, over decades and centuries, to the point where they enable breakthroughs in other sciences, engineering, or technology, is impossible to quantify. Even for those parts of Euler’s research which have had major, definite and decisive impact, like Euler’s theorem in number theory central to RSA encryption, the idea that Euler could have had any inkling of this application, over 200 years later, is laughable. Even in  1940 Hardy’s A Mathematician’s Apology sung the praises of number theory precisely because of its uselessness.

So, justification based on “impact” would have been an impossible task for Euler. And Euler is the most prolific mathematician of all time, one of the greatest mathematicians of all time. God help any lesser mortal.

To be fair, pure mathematics is in some sense too easy a case. The very inapplicability of pure mathematics is so clear that any statement about “impact” in this context can only seriously be understood as a source of amusement. A three-year project to think hard and prove some theorems about some interesting and important field of mathematics — but which may have some practical applications, one day, but this is impossible to predict, and in all likelihood not — is so far from the average person’s concept of “impact” that we can only feel that the poor mathematician has been dragged by a faceless bureaucracy into a system designed for someone else, in some other time and place.

Or, perhaps slightly more accurately, and disturbingly, a pure mathematician made to justify their research based on “impact” is a lamb about to be fed to the lions. But thankfully, mathematicians will not be fed to the lions — or at least, not all of them — because the emperor has taken their side. A society without mathematicians produces none of the STEM-literate graduates that the emperor, capital, demands. The survival of the planet, as it turns out, also demands STEM-literate graduates, but as the perilous state of the planet so clearly attests, it is capital, not the planet, which is a much stronger determinant of social outcomes, at least under present social arrangements.

Mathematics aside, the point remains. Requiring 30-second written advertisements called “impact statements” leads to exaggeration, over-speculation, and, at best, twisting of the truth.

But don’t take if from me — it’s much more interesting to requote the senior academics at Australian/UK universities quoted in the article:

It’s virtually impossible to write one of these grants and be fully frank and honest in what it is you’re writing about. (Australia, Professor)

‘illusions’ (UK, Professor); ‘virtually meaningless’, or ‘made up stories’ (Australia, Professor) ‘…taking away from the absolute truth about what should be done’ (UK, Professor). Words such as lying, lies, stories, disguise, hoodwink, game – playing, distorting, fear, distrust, over- engineering, flower-up, bull-dust, disconnected, narrowing and the recurrence of the word ‘problem’

Would I believe it? No, would it help me get the money – yes. (UK, Professor)

I will write my proposals which will have in the middle of them all this work, yeah but on the fringes will tell some untruths about what it might do because that’s the only way it’s going to get funded and you know I’ve got a job to do, and that’s the way I’ve got to do it. It’s a shame isn’t it? (UK, Professor)

If you can find me a single academic who hasn’t had to bullshit or bluff or lie or embellish in order to get grants, then I will find you an academic who is in trouble with his [sic] Head of Department. If you don’t play the game, you don’t do well by your university. So anyone that’s so ethical that they won’t bend the rules in order to play the game is going to be in trouble, which is deplorable. (Australia, Professor)It’s about survival. It’s not sincere all the way through…that’s when it gets disheartening. It puts people on the back foot and fuels a climate of distrust. (UK, Professor)

It is impossible to predict the outcome of a scientific piece of work, and no matter what framework it is that you want to apply it will be artificial and come out with the wrong answer because if you try to predict things you are on a hiding to nothing. (UK, Professor)

The idea therefore that impact could be factored in in advance was viewed as a dumb question put in there by someone who doesn’t know what research is. I don’t know what you’re supposed to say, something like ‘I’m Columbus, I’m going to discover the West Indies?!’ (Australia, Professor)

It’s disingenuous, no scientist really begins the true process of scientific discovery with the belief it is going to follow this very smooth path to impact because he or she knows full well that that just doesn’t occur and so there’s a real problem with the impact agenda- and that is it’s not true it’s wrong – it flies in the face of scientific practice. (UK, Professor)

It’s really virtually impossible to write an (Australian Research Council) ARC grant now without lying and this is the kind of issue that they should be looking at. (Australia, Professor)

It becomes increasingly difficult – one would be very hard pressed to write a successful grant application that’s fully truthful…you’re going to get phony answers, they’re setting themselves up for lies…[they go on]…it’s absurd to expect every grant proposal to have an impact story. (Australia, Professor)

Trying to force people to tell a causal story is really tight, it’s going to restrict impact to narrow immediate stuff, rather than the big stuff, and force people to be dishonest. (UK, Professor)

They’re just playing games – I mean, I think it’s a whole load of nonsense, you’re looking for short term impact and reward so you’re playing a game…it’s over inflated stuff. (Professor, Australia)

Written by dan

April 29th, 2016 at 12:57 pm

The Lost Art of Integration Impossibility

Integration is hard.

When we learn calculus, we learn to differentiate before we can integrate. This is despite the fact that, arguably, integration is an “easier” concept. To my mind at least, when I am given a curve in the plane, the notion of an area bounded by this curve is a very straightforward, intuitive thing; while the notion of its “gradient” or “slope” at a point is a much more subtle, or at least less intuitive idea.

But whether these ideas are natural or not, one is certainly mathematically and technically more difficult than the other. Integration is much more subtle and difficult.

These difficulties highlight the extent to which integration is less a science and more an art form. And in my experience, those difficulties are seen very rarely in high school or undergraduate mathematics, even as students take course after course about calculus and integration. So it high time we shed some light on this lost art.

Really existing differentiation

In order to see just how hard integration is, let’s first consider how we learn, and apply, the ideas of differentiation.

When we learn differentiation, we first learn a definition that involves limits and difference quotients — the old chestnut \lim_{h \rightarrow 0} \frac{ f(x+h) - f(x) }{ h } . We pass through a discussion of chords and tangents — perhaps even supplemented with some physical intuition about average and instantaneous velocity. From this we have a “first principles” approach to calculus, using the formula f'(x) = \lim_{h \rightarrow 0} \frac{ f(x+h) - f(x) }{ h } .

This formula, and the whole “first principles” approach, is then promptly forgotten. After we learn the “first principles” of calculus, we then learn a series of rules, techniques and tricks, such as the product rule, quotient rule and chain rule. Using these, combined with a few other “basic” derivatives, most students will never need the “first principles” again.

More specifically, once we know how to differentiate basic functions like polynomials, trig functions and exponentials,

\displaystyle   \text{e.g.} \quad \frac{d}{dx} (x^n) = nx^{n-1}, \quad \frac{d}{dx} e^x = e^x, \quad \frac{d}{dx} \sin x = \cos x

and we know the rules for how to differentiate their products, quotients, and compositions

\displaystyle   \text{e.g.} \quad   \frac{d}{dx} f(x)g(x) = f'(x) g(x) + f(x) g'(x), \quad  \frac{d}{dx} \frac{f(x)}{g(x)} = \frac{ f'(x) g(x) - f(x) g'(x) }{ g(x)^2 }, \quad  \frac{d}{dx} f(g(x)) = f'(g(x)) \; g'(x)

we can forget all about “first principles” and mechanically apply these formulae. With some basics down, and armed with the trident of product, quotient, and chain rules, then, we can differentiate most functions we’re likely to come up against.

It turns out, then, that in a certain sense, differentiation is “easy”. You don’t need to know the theory so much as a few basic rules and techniques. And although these rules can be a bit technically demanding, you can use them in a fairly straightforward way. In fact, their use is algorithmic. If you’ve got the technique sufficiently down, then you can mechanically differentiate most functions we’re likely to come across.

Let’s make this a little more precise. What do we mean by “most functions we’re likely to come across”? What are these functions? We mean the elementary functions. We can define these as follows. We start from some “basic” functions: polynomials, rational functions, trigonometric functions and their inverses, exponential functions and logarithms.

\displaystyle   \text{E.g.} \quad 31 x^4 - 159 x^2 + 65, \quad \frac{ 2x^5 - 3x + 1 }{ x^3 + 8x^2 - 1 }, \quad \sin x, \quad \cos x, \quad \tan x, \quad \arcsin x, \quad \arccos x, \quad \arctan x, \quad a^x, \quad \log_a x.

We then think of all the functions that you can get by repeatedly adding, subtracting, multiplying, dividing, taking n‘th roots (i.e. square roots, cube roots, etc) and composing these functions. These functions are the elementary ones. They include functions like the following:

\displaystyle   \log_2 \left( \frac{ \sqrt[4]{3x^4 - 1} + 2\sin e^x }{ \arcsin (x^{\tan \log_3 x} + \sqrt[7]{ \pi^x - \cos (x^2) } ) - x^x } \right).

(Aside: There’s actually a technicality here. Instead of saying that we can take n‘th roots of a function, we should actually say that we can take any function which is a solution of a polynomial expression of existing functions. The n‘th root of a function f(x) , i.e. \sqrt[n]{f(x)} , is the solution of the polynomial equation in f(x) given by f(x)^n - 1 = 0 . That is, you can take an algebraic extension of the function field. Having done this, you can find the derivative of the new function using implicit differentiation. But we will not worry too much about these technicalities.)

Actually, the above definition is not really a very efficient one. If you start from just the constant real functions and the function x, then you can build a lot just from them! By repeatedly adding and multiplying xs and constants, you can build any polynomial; and then by dividing polynomials you can build any rational function. If you throw in e^x and \ln x = \log_e x , then you also have all the other exponential and logarithmic functions, because for any (positive real) constant a,

\displaystyle   a^x = e^{x \log_e a}   \quad \text{and} \quad  \log_a x = \frac{ \log_e x }{ \log_e a},

and \log_e a is a constant! If you allow yourself to also use complex number constant functions, then you can build the trig functions out of exponentials,

\displaystyle   \sin x = \frac{ e^{ix} - e^{-ix} }{ 2i }, \quad  \cos x = \frac{ e^{ix} + e^{-ix} }{ 2 },

and then you have \tan x = \frac{\sin x }{ \cos x } . You can also build hyperbolic trigonometric functions if you wish, since \sinh x = \frac{ e^x - e^{-x} }{2} , \cosh x = \frac{ e^x + e^{-x} }{2} , and \tanh x = \frac{ \sinh x }{ \cosh x } .

The formulas above for \sin x and cos x are relatively well known if you’ve studied complex numbers; a little less well-known are the formulas that allow us to express inverse trigonometric functions in terms of complex numbers, together with logarithms and square roots:

\displaystyle   \arcsin x = - i \; \ln \left( ix + \sqrt{1-x^2} \right), \quad  \arccos x = i \; \ln \left( x - i \sqrt{1-x^2} \right), \quad  \arctan x = \frac{i}{2} \; \left( \ln (1-ix) - \ln (1+ix) \right).

(If you haven’t seen these before, try to prove them! There are also logarithmic functions for inverse hyperbolic trigonometric functions, which are probably slightly more well known as they don’t have complex numbers in them.)

Thus, we can define an elementary function as a function which can be built from the functions \{ \text{complex constants}, x, e^x, \ln x \} using a finite number of additions, subtractions, multiplications, divisions, compositions, and n‘th roots (or really, solving polynomial equations in existing functions but don’t worry about this bit in parentheses).

The point is, that if you are good enough at the product, chain, and quotient rules, you can differentiate any elementary function. You don’t need any more tricks, though you might need to apply the rules very carefully and many times over! A further point is that when you find the answer, you find that the derivative of an elementary function is another elementary function.

Not so elementary, my dear Watson

When we come to integration, though, everything becomes much more difficult. I’m only going to discuss indefinite integration, i.e. antidifferentiation. Definite integration with terminals just ends up giving you a number, but indefinite integration is essentially the inverse problem to differentiation. If we’re asked to find the indefinite integral \int f(x) \; dx , we’re asked to find a function g(x) whose derivative is f(x), i.e. such that g'(x) = f(x). There are many such functions: if you have one such function g(x), then you can add any constant c to it, and the resulting function g(x)+c also has derivative f(x); that is why we tend to write +c at the end of the answer to any indefinite integration question. But it will suffice for us, here, to be able to find one — for the sake of simplicity, I will not write +c in the answers to indefinite integrals. In doing so I lose 1 mark for every integral I solve, but I don’t care!

We start with some basic functions like polynomials and trigonometric functions, exponentials and logarithms, some integrals are standard.

\displaystyle   \int x^n \; dx = \frac{1}{n+1} x^{n+1}, \quad  \int \sin x \; dx = - \cos x, \quad  \int \cos x \; dx = \sin x, \quad  \int e^x \; dx = e^x.

Some are slightly less standard:

\displaystyle   \int \tan x \; dx = - \ln \cos x, \quad  \int \ln x \; dx = x \ln x - x, \quad.

(You might complain that the integral of \tan x should actually be \ln | \cos x | . You’d be right, and I am totally sweeping that technicality under the carpet!)

Some inverse trigonometric integrals, perhaps, are less standard again:

\displaystyle   \int \arcsin x \; dx = x \arcsin x + \sqrt{1-x^2}, \quad  \int \arccos x \; dx = x \arccos x - \sqrt{1-x^2}, \quad  \int \arctan x \; dx = x \arctan x - \frac{1}{2} \ln (1+x^2).

So far, so good — although perhaps not always obvious! But now, in general, what if we start to combine these functions? The problem is that if you know how to integrate f(x) and you know how to integrate g(x), it does not follow that you know how to integrate their product f(x) g(x) . This is in contrast to differentiation: if you know how to differentiate f(x) and g(x), then you can use the product rule to differentiate f(x)g(x). There is no product rule for integration!

The product rule for differentiation, rather, translates into the integration by parts formula for integration:

\displaystyle   \int f(x) g'(x) \; dx = f(x) g(x) - \int f'(x) g(x) \; dx.

This is not a formula for \int f(x) g(x) \; dx ! A product rule for integration would say to you “if you can integrate both of my factors, you can integrate me!” But this integration by parts formula says something more along the lines of “if you can integrate one of my factors and differentiate the other, then you can express me in terms of the integral obtained by integrating and differentiating those two factors”. That is a much more subtle statement. A product rule would be a hammer you could use to crack integrals; but the integration formula is a much more subtle card up your sleeve.

Essentially, integration by parts supplies you with a trick which, if you are clever enough, and the integral is conducive to it, you can use to rewrite the integral in terms of a different integral which is hopefully easier. Hopefully. While the product rule for differentiation is an all-purpose tool of the trade — a machine used to calculate derivatives — integration by parts is a subtle trick which, when wielded with enough sophistication and skill, can simplify (rather than calculate) integrals.

Similarly, there is no chain rule for integration. The chain rule for differentiation translates into the integration by substitution formula for integration:

\displaystyle   \int f'(g(x)) \; g'(x) \; dx = f(g(x)).

A chain rule for integration would say to you “if I am a composition of two functions, and you can integrate both of them, then you can integrate me”. But integration by substitution says, instead, “if I am a composition of two functions, multiplied by the derivative of the inner function, then you can integrate me”. In a certain sense it’s easier than integration by parts, because it calculates the integral and gives you an answer, rather than merely reducing to a different (hopefully simpler) integral. But still, it remains an art form: it requires the skill to see how to regard the integrand as an expression of the form f'(g(x)) \; g'(x) . Finally, there is no quotient rule for integration either.

So, while differentiation is a skill which can be learned and applied, integration is an art form for which we learn tricks and strategies, and develop our skills and intuition in applying them. Now, actually there are tables of standard integrals, far far beyond the small examples above. There are theorems about how functions of certain types can be integrated. There are algorithms which can be used to integrate certain, often very complicated, families of functions.

But the question remains: how far can we go? If we see an integral which we can’t immediately solve, do we just need to think a little harder, and apply something from our bag of tricks in a clever new way? Do we just need more skill, or is the integral impossible? How would we tell the difference between a “hard” and an “impossible” integral — and what does that even mean?

In a certain sense, no integrals are “impossible”. An integral of a continuous function always exists, in a certain sense. If you’ve got a continuous function f  : \mathbb{R} \rightarrow \mathbb{R}, then its integral is certainly defined as a function, using the definition with Riemann sums — this is a theorem. Even if f is not continuous, it’s possible that the Riemann sum approach can give a well-defined function as the integral. For more exotic functions f, there is the more advanced method of Lebesgue integration.

But this is not what we have in mind when we say an “integral is impossible”. What we really mean is that we can’t write a nice formula for the integral. This would happen if the result were not an elementary function.

As we discussed above, if you take an elementary function and differentiate it, you can always calculate the derivative with a sufficiently careful application of product/chain/quotient rules, and the result is another elementary function.

So, we might ask: given an elementary function, even though there might not be any straightforward way to calculate its integral, is the result always another elementary function?

Indomitable impossible integrals

It turns out, the answer is no. There are elementary functions such that, when you take their integral, it is not an elementary function. When you try to integrate such a function, although the integral exists, you can’t write a nice formula for it. And it’s not because you’re not skillful enough. It’s not because you’re not smart enough. The reason you can’t write a nice formula for the integral is because no such formula exists: the integral is not an elementary function.

What is an example of such a function? The simplest example is one that high school students come up against all the time: the Gaussian function

\displaystyle   e^{-x^2}.

It’s clearly an elementary function, constructed by composition of a polynomial -x^2 and the exponential function. But its integral is not elementary.

You might recall that the graph of y = e^{-x^2} is a bell curve. Suitably dilated (normalised), it is the probability density function for a normal distribution. When you calculate probabilities involving normally distributed random variables, you often integrate this function.

You may recall painful time spent in high school looking up a table to find out probabilities for the normal distribution. That table is essentially a table of (definite) integrals for the function e^{-x^2} (or a closely related function). And the reason that it’s a table you have to look up, rather than a formula, is because there is no formula for the integral \int e^{-x^2} \; dx . You need a table because the integral of the elementary function e^{-x^2} is not elementary.

There’s no formula for normal distribution probabilities because integration is an art form, rather than algorithmic. And so we are sometimes reduced to the quite non-artistic process of looking up a table to find the integral.

Now, when I say that \int e^{-x^2} \; dx is not elementary, I mean that it’s known as a theorem. That is, it has been proved mathematically that \int e^{-x^2} \; dx is not elementary, and so doesn’t have a nice formula. But what could this mean? How could you prove that an integral doesn’t have a nice formula, isn’t an elementary function, can’t be written in a nice way? The proof is a bit complicated, too complicated to recall in complete detail here. But there are some nice ideas involved, and it’s worth recounting some of them here.

Proving the impossible

The fact that \int e^{-x^2} \; dx is not elementary was proved by the French mathematician Joseph Liouville in the mid-19th century. In fact, he proved quite a deal more. Suppose you have an elementary function f(x) , and you are trying to find its integral g(x) = \int f(x) \; dx . Now as the integrand f(x) is continuous, the integral g(x) certainly exists as a continuous function; the question is whether g(x) is elementary or not, i.e. whether there is a formula for g(x) involving only complex numbers, powers of x, rational functions, \exp and \log, and n‘th roots (and their generalisations).

Liouville’s theorem, amazingly, tells you that if the function g(x) you’re looking for is elementary, then it must have a very specific form. Very roughly, Liouville says, g(x) can have more logarithms than f(x), but no more exponentials. You can see the germ of this idea in some of the integrals above:

\displaystyle   \int \tan x \; dx = - \ln \cos x, \quad  \int \arctan x \; dx = x \arctan x - \frac{1}{2} \ln (1+x^2).

In these integrals, a new logarithm appears, that did not appear in the integrand. Never does a new exponential appear. If an exponential appears in the integral, then it appeared in the integrand, as in examples like

\displaystyle   \int e^x \; dx = e^x.

To state Liouville’s theorem more precisely, we need the idea of a field of functions. For our purposes, we can think of a field of functions as a collection of functions f(x) which is closed under addition, subtraction, multiplication, and division. The polynomials in x do not form a field of functions, because when you divide two polynomials you do not always get a polynomial! However, the rational functions in x do form a field of functions. A rational function in x is the quotient of two polynomials (with complex coefficients) in x, i.e. a function like

\displaystyle   \frac{ 3x^2 - 7 }{ x^{10} - x^9 + x^3 + 1}   \quad \text{ or } \quad  \frac{ 4x+1}{2x-3}   \quad \text{or} \quad  x^2 - 3x + \pi  \quad \text{or} \quad  3.

The first example is the quotient of a quadratic by a 10’th degree polynomial; the second example is the quotient of two linear polynomials. The third example illustrates the notion that any polynomial is also a rational function, because you can think of it as itself divided by 1, and 1 is a polynomial: x^2 - 3x + \pi = \frac{ x^2 - 3x + \pi }{ 1 } . The final example illustrates the notion that any constant is also a rational function.

The field of rational functions (with complex coefficients) in x is denoted \mathbb{C}(x) . You can make bigger fields of rational functions by including new elements! For instance, you could throw in the exponential function e^x , and then you can obtain the larger field of functions \mathbb{C}(x, e^x) . The functions in this field are those made up of adding, subtracting, multiplying, and dividing powers of x and the function e^x. So this includes functions like

\displaystyle   \frac{ x^2 + x e^x - e^x }{ x + 1 }  \quad \text{ or } \quad  (e^x) \cdot (e^x) = e^{2x}  \quad \text{ or } \quad  x^7 e^{4x} - 3 x^2 e^x + \pi.

Note however that a function like e^{e^x} does not lie in \mathbb{C}  (x, e^x) . This function field is made up out of adding, subtracting, multiplying and dividing x and e^x , but not by composing these functions.

We can see, then, that this second function field is bigger than the first one: \mathbb{C}(x) \subset \mathbb{C}(x, e^x) . In technical language, we say that \mathbb{C}(x) \subset \mathbb{C}(x, e^x) is a field extension. Moreover, both these fields have the nice property that they are closed under differentiation. That is, it you take a rational function and differentiate it, you get another rational function. And if you take a function in \mathbb{C}(x, e^x) , involving x‘s and e^x‘s, and differentiate it, you get another function in \mathbb{C}(x, e^x) . In technical language, we say that \mathbb{C}(x) and \mathbb{C}(x, e^x) are differential fields of functions.

A differential field obtained in this way, by starting from rational functions and then throwing in an exponential, is an example of an field of elementary functions. In general, a field of elementary functions is obtained from the field of rational functions \mathbb{C}(x) by successively throwing in extra functions, some finite number of times. Each time you add a function is must be either

  • an exponential of a function already in the field, or
  • a logarithm of a function already in the field, or
  • an n‘th root of a function already in the field (or more generally the root of a polynomial equation with coefficients in the field but as I keep saying don’t worry too much about this!).

Note that, by definition, any function in a field of elementary functions is made up by adding, subtracting, multiplying, and dividing x‘s, and exponentials, and logarithms, and n‘th roots (or generalisations thereof). That is, a function in an field of elementary functions is an elementary function! So our definitions of “elementary function” and “field of elementary functions” agree — it would be bad if we used the word “elementary” to mean two different things!

We can now state Liouville’s theorem precisely.

Liouville’s theorem: Let K be an field of elementary functions, and let f(x) be a function in K. (Hence f(x) is an elementary function.) If the integral \int f(x) \; dx is elementary, then

\displaystyle   \int f(x) \; dx = h(x) + \sum_{j=1}^n c_j \log g_j (x),

where n is a non-negative integer, each of c_1, c_2, \ldots, c_n is a constant, and the functions h(x), g_1(x), g_2(x), \ldots, g_n(x) all lie in K.

That is, Liouville’s theorem says that the integral of an elementary function f(x) must be a sum of a function h(x) that lies in the same field as f, and a constant linear combination of some logarithms of functions g_j(x) in the same field as f. The fact that h(x) and each g_j(x) lies in the same field K as f(x) means that they cannot be much more complicated than f(x): they must be made up by adding, subtracting, multiplying and dividing the same bunch of functions that you can use to define f(x)

So Lioville’s theorem says, in a precise way, that when you integrate an elementary function f(x), if the result is elementary, then it can’t be much more complicated than f(x), and the only way in which it can be more complicated is that it can have some logarithms in it. This is what we meant when we gave the very rough description “Liouville says g(x) can have more logarithms than f(x), but no more exponentials“.

Let’s now return to our specific example of the Gaussian function f(x) = e^{-x^2}. What does Liouville’s theorem mean for this function? Well, this function lies in the field of elementary functions K where we start from rational functions and then throw in, not e^x, but e^{-x^2}. That is, we can take K = \mathbb{C}(x, e^{-x^2}).

The theorem says that if the integral

\displaystyle   \int e^{-x^2} \; dx

is elementary, then it is given by

\displaystyle   \int e^{-x^2} \; dx = h(x) + \sum_{j=1}^n c_j \log g_j (x),

where n is a non-negative integer, each of c_1, c_2, \ldots, c_n is a constant, and the functions h(x), g_1(x), g_2(x), \ldots, g_n(x) all lie in \mathbb{C}(x, e^{-x^2}). That is, h(x), g_1(x), \ldots, g_n(x) are “no more complicated” than e^{-x^2}; they are all made by adding, subtracting, multiplying and dividing x‘s and e^{-x^2}‘s.

If we differentiate the above equation, we obtain

\displaystyle   e^{-x^2} = h'(x) + \sum_{j=1}^n c_j \frac{ g'_j (x) }{ g_j (x) },

On the left hand side is the function we started with, e^{-x^2}. On the right hand side is an expression involving several functions. However, all the functions g_j(x) and h(x) lie in K; they are “no more complicated” than e^{-x^2}. Now as K is a differential field, their derivatives g'_j(x) and h'(x) also lie in K; they are also “no more complicated”. So in fact the right hand side is an expression involving functions no more complicated than e^{-x^2}. They are all just rational functions, with e^{-x^2}‘s thrown in. And if you think about it, thinking about what you will get for each g'_j(x) / g_j (x), you might find it hard to avoid having a big denominator. You likely won’t be able to cancel the fraction. So you might find, then, that none of the g_j(x) can make this equality work, and all g_j(x) have to be zero; or in other words, e^{-x^2} = h'(x). But now that h(x) is, like everything else here, made up by adding, subtracting, multiplying and dividing x‘s and e^{-x^2}‘s. You might find, when you differentiate such a function, that it’s very hard to get a lone e^{-x^2}. Every time you differentiate an e^{-x^2} you get a -2xe^{-x^2}, which has a pesky extra factor of -2x. And even if it appears together with other terms, as something like x^3 e^{-x^2}, when you differentiate it you get something like -2x^4 e^{-x^2} + 3x^2 e^{-x^2}, which still has no isolated e^{-x^2} term. And so, in conclusion, you might find it very difficult to find any functions that make the right hand side equal to e^{-x^2}.

Of course, this is not a proof at all; it’s a mere plausibility argument. To prove the integral is not elementary does take a bit more work. But it has been done, and can be found in standard references.

Hopefully, though, this should at least give you some idea why it might be true, and how you might prove, that an integral is “impossible”, and can’t be written with any nice formula.

Mathematics is an amazing place.


Brian Conrad, Impossibility theorems for elementary integration, [[]].

Keith O. Geddes, Stephen R. Czapor, George Labahn, Algorithms for Computer Algebra, Kluwer (1992).

Andy R. Magid, Lectures on Differential Galois Theory, AMS (1994).

(Update 2/3/15: Typo fixed.)

Written by dan

March 1st, 2015 at 6:18 am