In this article, we're proving the triangle inequality, which can be understood as follows:
The shortest distance between two points is a straight line.
I like this theorem because it feels intuitive ... like it needs to be true.
So let's state this theorem in mathematical language. For any two vectors u,v∈Rn, the length of their sum vector (the direct path) is always shorter than the sum of their indivdual path lengths (the detour):
∣∣u+v∣∣≤∣∣u∣∣+∣∣v∣∣
Here, ∣∣⋅∣∣:Rn→R denotes a vector's norm, defined by ∣∣x∣∣=x⋅x,
where we use the x⋅y notation for the dot product:Rn×Rn→R of two vectors.
First we'll be proving that the absolute value of a dot product is less than the product of the individual vector's norms, which is known as the Cauchy-Schwarz inequality:
∣u⋅v∣≤∣∣u∣∣⋅∣∣v∣∣(∀u,v∈Rn)
Given any two vectors u,v∈Rn, let us define w=u−tv, for some arbitrary scalar t∈R.
By distributivity and scalar multiplication of the dot product, we have
∣∣w∣∣2=∣w⋅w∣=∣(u−tv)⋅(u−tv)∣=∣u⋅u−2tu⋅v+t2v⋅v∣
We're about to use the fact that ∣a+b∣≤∣a∣+∣b∣ for any a,b∈R.
Why is this true? Note how ∣a∣≥a and ∣b∣≥b,
which implies that ∣a∣⋅∣b∣≥a⋅b.
It follows that a2+2∣a∣∣b∣+b2≥a2+2ab+b2, implying that (∣a∣+∣b∣)2≥(a+b)2,
from which it follows that ∣a∣+∣b∣≥∣a+b∣
The key observation here, is that this is a quadratic inequality in the variable t.
Geometrically speaking, this inequality states that a parabola, parameterized by t, should cross the x-axis at most once.
Since the well known quadratic formula tells us that a function f(x)=ax2+bx+c intersects with the x-axis at coordinates
x1,x2=2a1(−b±b2−4ac)
it follows that, for the above inequality to hold, we must have b2−4ac≤0, i.e.:
(2∣u⋅v∣)2−4∣∣u∣∣2∣∣v∣∣2≤0
By rearranging some terms, we find that (u⋅v)2≤(∣∣u∣∣⋅∣∣v∣∣)2,
from which the Cauchy-Schwarz inequality follows.