Calculus

I would like to start this section with the proof of a very useful formula in calculus, which is the formula for the cosine of an angle between two vectors. Once the cosine of an angle is known, the angle itself can be computed using the Math.acos() function of JavaScript. The Math.acos() function can be executed from the “Web Console” of the Firefox web browser which can be invoked by pressing “F12” or “CTRL+shift+k”.

_images/twoVectors.JPG
_images/spacer.png

In the above figure, the cosines and sines of the angles \(\theta_a\) , \(\theta_b\) and the angle between the vectors can be expressed as follows:

\[\cos{\theta_a}=\frac{a_1}{\Vert \mathbf{a} \Vert},\quad \cos{\theta_b}=\frac{b_1}{\Vert \mathbf{b} \Vert},\quad \sin{\theta_a}=\frac{a_2}{\Vert \mathbf{a} \Vert},\quad \sin{\theta_b}=\frac{b_2}{\Vert \mathbf{b} \Vert}\]
\[\cos(\theta_a-\theta_b)=\cos(\theta_a)\cos(\theta_b)+\sin(\theta_a)\sin(\theta_b)=\frac{a_1}{\Vert \mathbf{a} \Vert}\frac{b_1}{\Vert \mathbf{b} \Vert}+\frac{a_2}{\Vert \mathbf{a} \Vert}\frac{b_2}{\Vert \mathbf{b} \Vert}=\frac{\langle \mathbf{a} { , } \mathbf{b} \rangle}{\Vert\mathbf{a}\Vert\Vert\mathbf{b}\Vert}\]

where \(\Vert\cdot \Vert\) denotes the Euclidean norm or the magnitude of a vector and \(\langle { \cdot { , } \cdot } \rangle\) denotes the scalar product or inner product of two vectors.

Vector norm and inner product

All vectors are denoted with bold letters. The inner product of two vectors in the Euclidean n-space \(\mathbb{R}^n\) is defined by \(\langle { \mathbf{x} { , } \mathbf{y} } \rangle=\sum_{i=1}^{n}x_iy_i\). Some of the properties of the inner product are as follows [6]:

\[\lvert\langle { \mathbf{x} { , } \mathbf{y} } \rangle\rvert\leq \Vert\mathbf{x}\Vert\cdot \Vert\mathbf{y}\Vert\]

This can be proven using the concept of linear independence.

Linear independence

Let’s say we have a set of k vectors \(\lbrace \mathbf{v}_1, ... ,\mathbf{v}_k \rbrace\) in the Euclidean n-space \(\mathbb{R}^n\). These vectors are either linearly dependent or independent. If there exists a set of k coefficients \(\lbrace\alpha_1, ... , \alpha_k \rbrace\) such that not all of these coefficients are zero and \(\alpha_1\mathbf{v}_1 + ... +\alpha_k\mathbf{v}_k=0\), then the vectors are linearly dependent because we could express one of these vectors as a linear combination of the rest of the vectors in the set. As an example suppose that \(\alpha_1\neq 0\). Then we could write \(\mathbf{v}_1=-(\alpha_2/\alpha_1)\mathbf{v}_2-(\alpha_3/\alpha_1)\mathbf{v}_3- ... -(\alpha_k/\alpha_1)\mathbf{v}_k\). On the other hand if the only way to express the zero vector as a linear combination of these vectors is with \(\alpha_i=0\quad\forall i\in\lbrace 1,...,k\rbrace\), then the vectors are linearly independent. If the vectors \(\mathbf{x}\) and \(\mathbf{y}\) are linearly dependent, then one of them can be expressed in terms of the other such that \(\mathbf{x}=\alpha \mathbf{y}\) for some \(\alpha \in\mathbb{R}\). Then we obtain:

\[|\langle \mathbf{x},\mathbf{y} \rangle |=|\langle \alpha \mathbf{y},\mathbf{y} \rangle|=|\alpha\langle \mathbf{y},\mathbf{y}\rangle |=|\alpha|\Vert y\Vert^2=\Vert\alpha \mathbf{y}\Vert\Vert\mathbf{y}\Vert=\Vert\mathbf{x}\Vert\Vert\mathbf{y}\Vert\]

On the other hand, if \(\mathbf{x}\) and \(\mathbf{y}\) are linearly independent, then \(\Vert\mathbf{x}-\alpha\mathbf{y}\Vert\neq 0\) for any \(\alpha \in\mathbb{R}\) and we obtain:

\[0<\Vert\mathbf{x}-\alpha\mathbf{y}\Vert^2=\sum_{i=1}^{n}(x_i-\alpha y_i)^2=\sum_{i=1}^{n}{x_i}^2+{\alpha}^2{y_i}^2-2\alpha x_iy_i\]

which is a quadratic equation in form of \(a{\alpha}^2+b\alpha + c\). Since this equation is always greater than zero, there are no real values of \(\alpha\) which would make it equal to zero. As a result the discriminant of the equation (\(b^2-4ac\)) must be less than zero. Because if it were greater than or equal to zero, then \({(-b \pm\sqrt{b^2-4ac})}/{2a}\) would give us some real values that make the quadratic equation equal to zero. Therefore:

\[\Big(-2\sum_{i=1}^{n}x_iy_i\Big)^2-4\Big(\sum_{i=1}^n{y_i}^2\Big)\Big(\sum_{i=1}^{n}{x_i}^2\Big) <0\]
\[|\langle \mathbf{x},\mathbf{y} \rangle|^2<\Vert\mathbf{y}\Vert^2\Vert\mathbf{x}\Vert^2\]

This property leads to another one which is called the triangle inequality:

\[\Vert \mathbf{x}+\mathbf{y}\Vert\leq\Vert\mathbf{x}\Vert + \Vert \mathbf{y}\Vert\]

To prove this we can proceed as follows:

\[\begin{split}\Vert\mathbf{x}+\mathbf{y}\Vert^2&=\sum_{i=1}^n(x_i+y_i)^2=\sum_{i=1}^n{x_i}^2+{y_i}^2+2x_iy_i=\Vert\mathbf{x}\Vert^2+\Vert\mathbf{y}\Vert^2+2\langle\mathbf{x},\mathbf{y}\rangle \\ &\leq \Vert\mathbf{x}\Vert^2+\Vert\mathbf{y}\Vert^2+2\Vert\mathbf{x}\Vert\Vert\mathbf{y}\Vert=(\Vert\mathbf{x}\Vert+\Vert\mathbf{y}\Vert)^2\end{split}\]

Now let’s turn back to the vectors \(\mathbf{a}\), \(\mathbf{b}\) in \(\mathbb{R}^2\) and the angle between them. The formula for the cosine of the difference between two angles(or an angle between two vectors) can be derived as follows[1]:

Let \(f(\theta)=\cos(\theta-\beta)+\alpha_1\cos(\theta)+\alpha_2\sin(\theta)\) where \(f:\mathbb{R}\to\mathbb{R}\) and \(\beta,\alpha_1, \alpha_2 \in \mathbb{R}\) are arbitrary. The first and second derivatives of \(f(\theta)\) look like:

\[\begin{split}&f^{'}(\theta)=-\sin(\theta-\beta)-\alpha_1\sin(\theta)+\alpha_2\cos(\theta)\\ &f^{''}(\theta)=-\cos(\theta-\beta)-\alpha_1\cos(\theta)-\alpha_2\sin(\theta)\end{split}\]

from which

\[f(\theta)+f^{''}(\theta)=0\]

follows. If we choose \(\alpha_1\) and \(\alpha_2\) as

\[\alpha_1=-\cos(\beta), \alpha_2=-\sin(\beta)\]

we obtain

\[f(\theta)=\cos(\theta-\beta)-\cos(\theta)\cos(\beta)-\sin(\theta)\sin(\beta)\]
\[f(0)=f^{'}(0)=0\]

Let’s define \(g:\mathbb{R}\to\mathbb{R}\) as \(g(\theta)=(f(\theta))^2+(f^{'}(\theta))^2\). Then

\[g^{'}(\theta)=2f(\theta)f^{'}(\theta)+2f^{'}(\theta)f^{''}(\theta)=2f^{'}(\theta)\Big(f(\theta)+f^{''}(\theta)\Big)=0\]

Since \(g^{'}(\theta)=0\) for all \(\theta\in\mathbb{R}\), \(g(\theta)\) is a constant function and equal to \(g(0)=(f(0))^2+(f^{'}(0))^2=0\) for all \(\theta\in\mathbb{R}\). Assume that \(f(\theta_0)\neq 0\) for some \(\theta_0 \in\mathbb{R}\). Then \(g(\theta_0)=(f(\theta_0))^2+(f^{'}(\theta_0))^2>0\). This contradiction proves that \(f(\theta)=0\) everywhere on \(\mathbb{R}\) and therefore \(\boxed{\cos(\theta-\beta)=\cos(\theta)\cos(\beta)+\sin(\theta)\sin(\beta)}\).

In the above proof we used the fact that if the derivative of a function is zero everywhere, then this function has a constant value. This can be proven using the mean value theorem as follows:

Mean Value Theorem and Rolle’s Theorem

Let \([a,b]\subset\mathbb{R}\) with \(a<b\). Then \(g(\theta)\) is differentiable on \([a,b]\). According to the mean value theorem, there exists \(\xi \in (a,b)\) such that

\[g^{'}(\xi)=\frac{g(b)-g(a)}{b-a}=0 \Rightarrow g(b)=g(a), \forall a,b \in \mathbb{R}, \quad\therefore \boxed{g(\theta)=const}\]

In order to prove the mean value theorem, it is possible to define another function \(G:\mathbb{R}\to\mathbb{R}\) as \(G(\theta)=g(\theta)+\alpha\theta\) for some \(\alpha\in\mathbb{R}\). Then for any interval \([a,b]\subset\mathbb{R}\), \(G(\theta)\) is differentiable on \([a,b]\). Also, \(\alpha\) can be chosen in such a way that \(G(a)=G(b)\). Since \(G(a)=g(a)+\alpha a\) and \(G(b)=g(b)+\alpha b\), Choosing \(\alpha=(g(b)-g(a))/(a-b)\) would imply that \(G(a)=G(b)\). Since \(G(\theta)\) is differentiable on \([a,b]\), according to Rolle’s theorem, there exists \(\xi \in (a,b)\) such that

\[G^{'}(\xi)=0=g^{'}(\xi)+\frac{g(b)-g(a)}{a-b}\Rightarrow \boxed{g^{'}(\xi)=\displaystyle\frac{g(b)-g(a)}{b-a}}\]

Once it is known that \(G(a)=G(b)\), there are only three possibilities for the behaviour of \(G(\theta)\) on some point \(\theta_0 \in (a,b)\). The first possibility is that \(G(a)=G(\theta_0)=G(b)\). If this is true for any \(\theta_0 \in (a,b)\) then \(G(\theta)\) is constant on \([a,b]\) and its derivative is zero at any \(\xi\in(a,b)\) because of the definition of derivative as follows:

\[G^{'}(\xi)=\underset{\theta \to \xi}{\lim}\frac{G(\theta)-G(\xi)}{\theta -\xi}=\underset{\theta \to \xi}{\lim}\frac{0}{\theta -\xi}=0\]

The second possibility is that for some \(\theta_0 \in (a,b)\), \(G(\theta_0)>G(a)=G(b)\). In this case the Weierstrass’ maximum-minimum theorem guarantees the existence of some \(\theta_{max}\in (a,b)\) such that \(G(\theta_{max})\geq G(\theta_0)>G(a)=G(b)\) and for any \(\theta \in (a,b)\), \(G(\theta)\leq G(\theta_{max})\). We also know that \(G^{'}(\theta_{max})\) exits and is equal to the right-hand and left-hand derivatives of \(G\) at \(\theta_{max}\).

\[0\leq\underset{\theta \to {\theta _{max}} ^{-}}{\lim}\frac{G(\theta)-G(\theta _{max})}{\theta -\theta _{max}}=G^{'}(\theta _{max})=\underset{\theta \to {\theta _{max}}^{+}}{\lim}\frac{G(\theta)-G(\theta _{max})}{\theta -\theta _{max}}\leq 0\]

From the above inequalities it is clear that \(\boxed{G^{'}(\theta _{max})=0}\). This completes the proof of Rolle`s theorem since the only remaining possibility is that for some \(\theta_0 \in (a,b)\), \(G(\theta_0)<G(a)=G(b)\) and the proof of this case is identical to the previous case.

Taylor’s theorem

A generalization of the mean value theorem to n times differentiable functions is Taylor’s theorem. According to Taylor’s theorem, if \(f^{(n-1)}(x)\) exists on [a,b] and \(f^n(x)\) exists on (a,b), then there exists \(\xi \in (a,b)\) such that

\[f(b)=\sum_{k=0}^{n-1}\frac{f^{(k)}(a)}{k!}(b-a)^k + \frac{f^{n}(\xi)}{n!}(b-a)^n\]

In order to prove this, we define the following function \(\phi(x)\) [2] :

\[\phi(x)=\sum_{k=0}^{n-1}\frac{f^{(k)}(x)}{k!}(b-x)^k + M(b-x)^n\]

Clearly \(\phi\) is continuous on [a,b] and differentiable on (a,b). Therefore if we choose a value for M such that \(\phi(a)=\phi(b)=f(b)\), then from Rolle’s theorem [mvt] it would follow that there exists \(\xi\in (a,b)\) such that \(\phi'(\xi)=0\).

\[\begin{split}\phi'(x)&=f'(x)+\sum_{k=1}^{n-1}\frac{f^{(k+1)}(x)}{k!}(b-x)^k - \frac{f^{(k)}(x)}{k!}k(b-x)^{(k-1)} - Mn(b-x)^{(n-1)} \\ &=f'(x)+\sum_{k=2}^{n}\frac{f^{(k)}(x)}{(k-1)!}(b-x)^{k-1}-\sum_{k=1}^{n-1}\frac{f^{(k)}(x)}{(k-1)!}(b-x)^{k-1}-Mn(b-x)^{n-1}\\ &=f'(x)-f'(x)+\frac{f^{(n)}(x)}{(n-1)!}(b-x)^{n-1}-Mn(b-x)^{n-1}\\\end{split}\]
\[\phi'(\xi)=0\Rightarrow \frac{f^{(n)}(\xi)}{(n-1)!}(b-\xi)^{n-1}=Mn(b-\xi)^{n-1}\Rightarrow M=\frac{f^{(n)}(\xi)}{n!}\]

Inserting the above found M into the expression \(\phi(a)=\phi(b)\) completes the proof of Taylor’s theorem.

Taylor’s theorem can also be expressed in integral form using the fundamental theorem of calculus which says that if a function \(f(x)\) is differentiable on \([a,b]\) and \(\int_a^b f'(x)dx\) exists, then \(f(b)-f(a)=\int_a^b f'(x)dx\). This expression can be reformulated as

\[f(b)=\frac{1}{0!}f(a)(b-a)^0+\frac{1}{0!}\int_a^bf'(x)dx=p_0+r_0\]

Using integration by parts, the \(r_0\) part of the above equation can be expanded as follows:

\[\begin{split}r_0&=-\frac{1}{1!}\int_a^bf'(x)d(b-x)\\ &u=f'(x), du=f''(x)dx,\quad dv=d(b-x), v=b-x \\ &=-\frac{1}{1!}\Big[f'(x)(b-x)\Big|_a^b-\int_a^bf''(x)(b-x)dx\Big]\\ &=-\frac{1}{1!}\Big[-f'(a)(b-a)-\int_a^bf''(x)(b-x)dx\Big]\\ &=\frac{1}{1!}f'(a)(b-a)^1+\frac{1}{1!}\int_a^bf''(x)(b-x)dx\end{split}\]

which gives us

\[p_1=\frac{1}{0!}f^{(0)}(a)(b-a)^0+\frac{1}{1!}f^{(1)}(a)(b-a)^1,\quad r_1=\frac{1}{1!}\int_a^bf^{(2)}(x)(b-x)^1dx\]

Continuing this way, if \(f^{(n+1)}(x)\) is continuous on \([a,b]\), then we would obtain

\[p_n=\sum_{k=0}^{n}\frac{f^{(k)}(a)}{k!}(b-a)^k,\quad r_n=\frac{1}{n!}\int_a^bf^{(n+1)}(x)(b-x)^ndx\]

In order to show this inductively, we can expand \(r_n\) as follows

\[\begin{split}r_n&=-\frac{1}{(n+1)!}\int_a^bf^{(n+1)}(x)d(b-x)^{(n+1)}\\ &=-\frac{1}{(n+1)!}\Big[f^{(n+1)}(x)(b-x)^{(n+1)}\Big|_a^b-\int_a^bf^{(n+2)}(x)(b-x)^{(n+1)}dx\Big]\\ &=\frac{1}{(n+1)!}f^{(n+1)}(a)(b-a)^{(n+1)}+\frac{1}{(n+1)!}\int_a^bf^{(n+2)}(x)(b-x)^{(n+1)}dx\end{split}\]

which gives us

\[p_{n+1}=\sum_{k=0}^{n+1}\frac{f^{(k)}(a)}{k!}(b-a)^k,\quad r_{n+1}=\frac{1}{(n+1)!}\int_a^bf^{(n+2)}(x)(b-x)^{(n+1)}dx\]

Therefore, if \(f^{(n)}(x)\) is continuous on \([a,b]\), then the integral form of Taylor’s theorem is

\[f(b)=\sum_{k=0}^{n-1}\frac{f^{(k)}(a)}{k!}(b-a)^k+\frac{1}{(n-1)!}\int_a^bf^{(n)}(x)(b-x)^{(n-1)}dx\]

Integration by Parts

We used this integration rule while deriving the integral form of Taylor’s theorem. The rule is based on the fundamental theorem of calculus which says that if \(f,g\) are differentiable functions and \(f',g'\) are integrable on \([a,b]\) then \(\int_a^b(f(x)g(x))'dx=f(x)g(x)|_a^b\).

Using the product rule for differentiation we obtain:

\[\begin{split}&\int_a^b(f(x)g(x))'dx=\int_a^b\Big[f'(x)g(x)+f(x)g'(x)\Big]dx=f(x)g(x)|_a^b\\ &\Rightarrow \int_a^bf(x)g'(x)dx=f(x)g(x)|_a^b-\int_a^bg(x)f'(x)dx\end{split}\]

If we let \(f(x)=u\), \(g(x)=v\), this rule can also be expressed as \(\int_a^b udv=uv|_a^b-\int_a^bvdu\).

Power Series

Series in the form of the Taylor expansion of a function \(f:[a,b]\to\mathbb{R}\) at \(b\) about \(a\) are called power series. Furthermore, for every power series \(\sum_{k=n}^\infty c_k(x-a)^k\) there is a certain set of values such that if \(|x|\) is in that set then the series absolutely converges and if it is not then the series diverges. This set is defined by the concept of radius of convergence. Before proving that every power series has a radius of convergence, first let’s clarify the concept of absolute convergence and show that absolutely convergent series are a subset of convergent series.

A series in the form \(\sum_{k=n_0}^\infty a_k\) is absolutely convergent if the series \(\sum_{k=n_0}^\infty |a_k|\) is convergent. To show this we use a property of the absolute value operator which states that if \(x,c\in\mathbb{R}\) and \(c\geq 0\) then \(|x|\leq c\) if and only if \(-c\leq x\leq c\). Using this we obtain \(-|a_k|\leq a_k \leq |a_k|\) and \(0\leq a_k+|a_k|\leq 2|a_k|\). According to the direct comparison test for the convergence of series, if \(\sum_{k=n_0}^\infty |a_k|\) converges then \(\sum_{k=n_0}^\infty 2|a_k|\) converges and \(\sum_{k=n_0}^\infty a_k+|a_k|\) converges. We know that \(\sum_{k=n_0}^\infty a_k=\sum_{k=n_0}^\infty a_k+|a_k|-|a_k|=\sum_{k=n_0}^\infty a_k+|a_k|-\sum_{k=n_0}^\infty |a_k|\). This means that \(\sum_{k=n_0}^\infty a_k\) is the sum of two convergent series and therefore is itself convergent.

Direct Comparison Test

This test is used in order to determine the convergence behaviour of a series \(\sum_{k=n_0}^\infty |a_k|\) based on the behaviour of another series \(\sum_{k=n_0}^\infty |b_k|\). If there exists \(N\in \mathbb{N}\) such that \(\forall k\geq N\), \(0\leq a_k\leq b_k\), then \(\sum_{k=n_0}^\infty |b_k|\) is convergent \(\Rightarrow\) \(\sum_{k=n_0}^\infty |a_k|\) is convergent and if \(\sum_{k=n_0}^\infty |a_k|\) is divergent \(\Rightarrow\)\(\sum_{k=n_0}^\infty |b_k|\) is divergent. Let \(M_a=\sum_{k=n_0}^N a_k\), \(M_b=\sum_{k=n_0}^N b_k\). Then \(\sum_{k=n_0}^\infty a_k=M_a+\sum_{k=N+1}^\infty a_k\) and \(\sum_{k=n_0}^\infty b_k=M_b+\sum_{k=N+1}^\infty b_k\). Let \(\forall n>N\), \(S_n=\sum_{k=N+1}^n a_k\) and \(T_n=\sum_{k=N+1}^n b_k\). If \(\sum_{k=n_0}^\infty b_k\) is a convergent series, then \(\lbrace T_n \rbrace\) must be a convergent and therefore bounded sequence. As a result \(\lbrace S_n \rbrace\) is bounded. Since \(a_k\) is non-negative for \(k\geq N\), \(\lbrace S_n \rbrace\) is also a monotonely increasing sequence. Therefore \(\lbrace S_n \rbrace\) is convergent and \(\sum_{k=n_0}^\infty a_k=M_a+\lim_{n\to\infty}S_n\).

Assume that \(\sum_{k=n_0}^\infty a_k\) is divergent but \(\sum_{k=n_0}^\infty b_k\) is convergent. Then \(\lbrace T_n\rbrace\) must be convergent and bounded which implies the boundedness and convergence of \(\lbrace S_n \rbrace\) and \(\sum_{k=n_0}^\infty a_k\). This contradiction proves the divergence of \(\sum_{k=n_0}^\infty b_k\).

Every convergent sequence is bounded

While proving why direct comparison test works we used the fact that convergent sequences must be bounded. Let \(a_n\to L\). \(\exists N\in\mathbb{N}:n\geq N \Rightarrow |a_n-L|<1\Rightarrow|a_n|<1+|L|\) where we are using another property of the absolute value operator which is as follows: \(\Big||x|-|y|\Big|\leq |x-y|,\forall x,y\in\mathbb{R}\). Let \(M=\max\lbrace|a_1|, |a_2|, ... , |a_{N-1}|,1+|L|\rbrace\). Then \(\forall n,|a_n|\leq M\) and \(\lbrace a_n\rbrace\) is bounded.

Limit Comparison Test

[4]Let \(\lim_{n\to\infty}\frac{a_n}{b_n}=L\) and \(0< a_n, b_n\) for \(n\) greater than or equal to some \(N\in\mathbb{N}\). If \(0<L<\infty\), then either both \(\sum_{n=0}^\infty a_n\) and \(\sum_{n=0}^\infty b_n\) converge or both diverge.

For large enough \(n\), \(a_n,b_n>0\) and \(\displaystyle\frac{L}{2}<\frac{a_n}{b_n}<\frac{3L}{2}\Rightarrow \frac{L}{2}b_n<a_n<\frac{3L}{2}b_n\). Therefore by direct comparison test either both \(\sum_{n=0}^\infty a_n\) and \(\sum_{n=0}^\infty b_n\) converge or both diverge.

If \(L=0\) then for large enough \(n\), \(a_n,b_n>0\) and \(\frac{a_n}{b_n}<1\Rightarrow 0<a_n<b_n\). It follows that if \(\sum_{n=0}^\infty b_n\) is convergent then \(\sum_{n=0}^\infty a_n\) is convergent.

If \(L=\infty\) then for large enough \(n\), \(a_n,b_n>0\) and \(1<\frac{a_n}{b_n}\Rightarrow 0<b_n<a_n\). It follows that if \(\sum_{n=0}^\infty b_n\) is divergent then \(\sum_{n=0}^\infty a_n\) is divergent.

Ratio Test

[4]Let \(a_n>0\) for all \(n\) and \(\lim_{n\to\infty}\frac{a_{n+1}}{a_n}=\rho\). If \(\rho<1\) then the series \(\sum_{n=n_0}^\infty a_n\) converges, if \(\rho>1\) then the series diverges and if \(\rho=1\) then the test is inconclusive. First, let us investigate the case when \(\rho<1\). Let \(\rho<r<1\). Then there exists \(N\in\mathbb{N}\) such that if \(n\geq N\) then \(\frac{a_{n+1}}{a_n}-\rho <r-\rho\Rightarrow \frac{a_{n+1}}{a_n} <r\). It follows that

\[a _{N+1}<ra_N\]
\[a _{N+2}<ra _{N+1}<r^2 a_N\]
\[\Rightarrow a _{N+k}<r^k a_N\]

Since \(|r|<1\), \(a_N\sum_{k=1}^\infty r^k\) is a convergent series and by the direct comparison test \(\sum_{k=N+1}^\infty a_k\) is a convergent series. Considering that \(\sum_{k=0}^N a_k\) is a finite value, we can conclude that \(\sum_{k=0}^\infty a_k\) is convergent when \(\rho<1\).

If \(\rho>1\) then for all large \(n\), \(0<a_n<a_{n+1}\) and \(\lbrace a_n \rbrace\) does not converge to zero which implies that in this case \(\sum_{n=0}^\infty a_n\) is divergent.

In both cases \(a_n=1/n\) and \(a_n=1/n^2\), \(\displaystyle\frac{a_{n+1}}{a_n}\to 1\) but the first of these series is divergent and the second one is convergent. Therefore, the test is inconclusive if \(\rho=1\).

Root Test

[4]Let \(\sqrt[n]{a_n}\to \rho\) and \(a_n\geq 0, \forall n\geq N\). If \(\rho<1\) then \(\sum_{n=0}^\infty a_n\) is convergent and if \(\rho>1\) then the series is divergent. In case of \(\rho=1\) the test is inconclusive. Suppose \(\rho<r<1\). For large enough \(n\), \(\sqrt[n]{a_n}-\rho<r-\rho\Rightarrow a_n<r^n\). Suppose \(a_n<r^n\) for \(n\geq K>N\). Let \(M_a=\sum_{n=0}^{K-1}a_n\) and compare the series \(\sum_{n=N}^\infty a_n\) and \(\sum_{n=N}^\infty r^n\). Since \(|r|<1\), the geometric series \(\sum_{n=N}^\infty r^n\) converges to \(1/(1-r)\) and by the direct comparison test \(\sum_{n=N}^\infty a_n\) and therefore \(\sum_{n=0}^\infty a_n\) are convergent. On the other hand if \(\rho>1\) then for all large \(n\), \((a_n)^{(1/n)}>1\Rightarrow a_n>1\) which means that \(a_n\) does not converge to 0 and therefore the series diverges. In order to prove the inconclusiveness of the test when \(\rho=1\), consider the series \(\sum_{n=1}^{\infty}1/n\) and \(\sum_{n=1}^{\infty}1/n^2\). In both cases \(a_n\to 1\) but the first series is divergent whereas the second is convergent.

Dirichlet Test

[2] Let \(a_k\to 0\) and \(S_n=\sum_{k=0}^n b_k\) is a bounded sequence such that for every \(n\), \(|S_n|\leq B\). Furthermore the sequence \(\lbrace a_k\rbrace\) is of bounded variation which means that \(\displaystyle\sum_{k=1}^\infty |a_{k+1}-a_k|\) is convergent. Then \(\displaystyle\sum_{k=1}^\infty a_kb_k\) is convergent.

Let \(\varepsilon>0\). There exists \(N\) such that whenever \(n,m\geq N\), \(\sum_n^m|a_{k+1}-a_k|<\frac{\varepsilon}{3B}\) by the Cauchy convergence criterion. Also whenever \(k\geq N\), \(|a_k|<\frac{\varepsilon}{3B}\).

Let \(n,m \geq N\). Using Abel’s lemma :

\[\begin{split}\Big|\sum_{k=n}^m a_kb_k \Big|&=\Big|\sum_{k=n}^ma_k(S_k-S_{k-1})\Big|\\ &=\Big| a_{m+1}S_m-a_nS_{n-1}-\sum_{k=n}^m (a_{k+1}-a_k)S_k\Big|\\ &\leq\Big| a_{m+1}S_m\Big|+\Big|a_nS_{n-1}\Big|+\Big|\sum_{k=n}^m (a_{k+1}-a_k)S_k\Big|\\ &\leq\Big| a_{m+1}\Big|\Big|S_m\Big|+\Big|a_n\Big|\Big|S_{n-1}\Big|+\Big|S_k\Big|\Big|\sum_{k=n}^m (a_{k+1}-a_k)\Big|\\ &<\frac{\varepsilon}{3B}B+\frac{\varepsilon}{3B}B+\frac{\varepsilon}{3B}B=\varepsilon\end{split}\]

Therefore \(\sum_{k=0}^\infty a_kb_k\) is convergent according to the Cauchy convergence criterion.

Cauchy convergence criterion

[1] This criterion says that a sequence is convergent if and only if it is a Cauchy sequence. A sequence is called Cauchy sequence, if for every \(\varepsilon\), there exists \(N\in\mathbb{N}\) such that whenever \(n,m\geq N\), \(|a_n-a_m|<\varepsilon\).

If \(a_n\to L\) and \(\varepsilon>0\). There exists \(N\in\mathbb{N}\) such that \(n,m\geq N\Rightarrow |a_n-L|<\varepsilon/2\) and \(|a_m-L|<\varepsilon/2\). Therefore \(|a_n-a_m|=|a_n-L+L-a_m|\leq |a_n-L|+|a_m-L|<\varepsilon/2+\varepsilon/2=\varepsilon\).

Conversely, if \(\lbrace a_n\rbrace\) is a Cauchy sequence, then first of all it is a bounded sequence. We know that there exists \(N\in\mathbb{N}\) such that \(n,m\geq N\Rightarrow|a_n-a_m|<1\Rightarrow |a_n|<1+|a_N|\). Let \(M=\max\lbrace|a_1|,|a_2|, ... ,|a_{N-1}|, 1+|a_N|\rbrace\). Then \(\lbrace a_n \rbrace\) is bounded by \(M\). Since \(\lbrace a_n \rbrace\) is bounded, it has a convergent subsequence \(a_{n_k}\to c\). Let \(\varepsilon >0\). For some \(N\), \(|a_n-a_m|\) is always less than \(\varepsilon/2\) if \(n,m\geq N\). Also there exists \(K>N\) such that if \(k\geq K\), then \(|a_{n_k}-c|<\varepsilon/2\). Let \(n\geq N\) and \(k\geq K\). Considering that \(n_k\geq k\), we obtain \(|a_n-c|=|a_n-a_{n_k}+a_{n_k}-c|\leq|a_n-a_{n_k}|+|a_{n_k}-c|<\varepsilon/2+\varepsilon/2=\varepsilon\). This proves that every Cauchy sequence is a convergent sequence.

Abel’s Lemma

[2]This lemma states that \(\sum_n^ma_k(b_{k+1}-b_k)=a_{m+1}b_{m+1}-a_nb_n-\sum_n^m(a_{k+1}-a_k)b_{k+1}\). This can be proven as follows:

\[\begin{split}\sum_n^m a_k (b _{k+1}-b_k)&=\sum_n^ma_k b _{k+1}-\sum_n^m a_k b_k \\ &=\sum_n^ma_k b _{k+1}-\sum _{n+1}^{m+1} a_k b_k +a _{m+1}b _{m+1}-a_nb_n \\ &=\sum_n^m \Big[a _k b _{k+1}-a _{k+1}b _{k+1}\Big]+a _{m+1}b _{m+1}-a_nb_n \\ &=\sum_n^m b _{k+1}\Big[ a_k-a _{k+1} \Big]+ a _{m+1}b _{m+1}-a_nb_n \\ &=a _{m+1}b _{m+1}-a_nb_n-\sum_n^m(a _{k+1}-a_k)b _{k+1}\end{split}\]

Radius of convergence

[2]For every power series there exists a value \(R\) called the radius of convergence such that \(0\leq R\leq \infty\). If \(|x|<R\) then the series \(\sum_{k=n_0}^\infty c_k(x-a)^k\) absolutely converges and if \(|x|>R\) then the series diverges.

Consider a convergent series \(\sum_{k=n_0}^\infty c_k(x_0-a)^k\) and let \(|x-a|<|x_0-a|\). For the sake of convenience let \(y=x-a\) and \(y_0=x_0-a\). Since \(\sum_{k=n_0}^\infty c_k{y_0}^k\) is convergent, there exists a real number \(M\) such that \(|c_k{y_0}^k|\leq M\) for all \(k\). Then \(|c_ky^k|=|c_ky_0^k|\displaystyle\frac{|c_ky^k|}{|c_k{y_0}^k|}\leq M \displaystyle\frac{|y^k|}{|{y_0}^k|}\). Since \(\displaystyle\frac{y}{y_0}<1\), the right hand side of the inequality is a convergent geometric series and using the direct comparison test we obtain that \(\sum_{k=n_0}^\infty c_ky^k\) absolutely converges.

Let \(S=\lbrace r\geq 0:\sum_{k=n_0}^\infty c_kr^k \text{ is convergent}\rbrace\). If \(S\) is unbounded, then for every \(y\in \mathbb{R}\) there exists \(r\in S\) such that \(|y|<|r|\) and \(\sum_{k=n_0}^\infty c_ky^k\) is absolutely convergent. This means that the series is absolutely convergent for \(|y|<\infty\) or \(|x|<\infty\). If \(S\) is bounded then using the completeness axiom of the set of real numbers we know that it has a supremum. Let \(R=\sup S\) and \(|y|<R\). Then, there exists \(r\in S\) such that \(|y|<r\leq |r|\) otherwise \(|y|\) would be the supremum. It follows that \(\sum_{k=n_0}^\infty c_ky^k\) is absolutely convergent when \(|y|<R\). This means that \(\sum_{k=n_0}^\infty c_k(x-a)^k\) is absolutely convergent when \(x\in(-R+a,R+a)\). As another possibility, suppose that \(R<|y|\). Then there exists some \(r\) such that \(R<r<|y|\). Assume that \(\sum_{k=n_0}^\infty c_ky^k\) is convergent. Then \(\sum_{k=n_0}^\infty c_k r^k\) must be absolutely convergent and therefore convergent which means that \(r\) is in \(S\) and at the same time greater than the supremum of \(S\). This is a contradiction, therefore if \(|y|>R\) then \(\sum_{k=n_0}^\infty c_ky^k\) is divergent.

As an example we can analyze the series \(\displaystyle\sum_{k=2}^\infty\frac{x^k}{\log k}\). Using the ratio test:

\[\lim_{k\to\infty}\Big|\frac{a_{k+1}}{a_k}\Big|=\lim_{k\to\infty}\Big|\frac{x^{k+1}\log(k+1)}{x^k\log(k)}\Big|=|x|\lim_{k\to\infty}\frac{\log(k+1)}{\log(k)}=|x|\lim_{k\to\infty}\frac{1/(k+1)}{1/k}=|x|\]

Therefore the series absolutely converges when \(|x|<1\) and the radius of convergence is \(1\). When computing the limit in the above example which includes the logarithm function we resorted to L’Hospital’s rule.

L’Hospital’s Rule

[7]Let \(f:(a,b)\to \mathbb{R}\), \(g:(a,b)\to \mathbb{R}\) and both functions are differentiable on \((a,b)\).Let \(\displaystyle\lim_{x\to a^+}\frac{f'(x)}{g'(x)}=A\in\mathbb{R}\). Choose \(p,q,\varepsilon\) such that \(A\in(p+\varepsilon,q-\varepsilon)\). Since \(f\) and \(g\) are differentiable on \((a,b)\), according to the Cauchy mean value theorem for any \(x,y\in (a,b)\) there exists \(\xi\in(x,y)\) such that \(\displaystyle \frac{f'(\xi)}{g`(\xi)}=\frac{f(x)-f(y)}{g(x)-g(y)}\).

Suppose that \(\lim_{x\to a^+}f(x)=\lim_{x\to a^+}g(x)=0\). Since \(f'/g'\) converges to \(A\) as x converges to \(a\), there exists a neighbourhood of \(a\) such that the intersection of that neighbourhood with \((a,b)\) is non-empty and for every \(x_0\) in this intersection \(f'(x_0)/g'(x_0)\in (p+\varepsilon,q-\varepsilon)\). Let’s call this intersection \((a,c)\) for some \(c\in(a,b)\). Let \(x,y\in(a,c)\). Then \(\displaystyle\frac{f(x)-f(y)}{g(x)-g(y)}\in(p+\varepsilon,q-\varepsilon)\). Furthermore, \(\displaystyle\lim_{x\to a^+}\frac{f(x)-f(y)}{g(x)-g(y)}=\frac{f(y)}{g(y)}\in[p+\varepsilon,q-\varepsilon]\) which means that for any neighbourhood \((p,q)\) of \(A\), there exists a neighbourhood of \(a\) such that the intersection of that neighbourhood is a non-empty set \((a,c)\) and for every \(y\in(a,c)\), \(f(y)/g(y)\in (p,q)\). Therefore, \(\displaystyle\lim_{x\to a^+}\frac{f(x)}{g(x)}=A\).

Another case where L’Hospital’s rule can be applied is when \(g(x)\to\infty\) as \(x\to a^+\). Fix \(y\in(a,c)\). Since \(g(x)\to\infty\) as \(x\to a^+\), there exists \(c_1\in(a,c)\) such that for every \(x\in(a,c_1)\), \(g(x)>0\) and \(g(x)>g(y)\). Let \(x\in(a,c_1)\). Using

\[p+\varepsilon <\frac{f(x)-f(y)}{g(x)-g(y)}<q-\varepsilon\]
\[\Rightarrow (p+\varepsilon)\Big(1-\frac{g(y)}{g(x)}\Big)<\frac{f(x)}{g(x)}-\frac{f(y)}{g(x)}<(q-\varepsilon)\Big(1-\frac{g(y)}{g(x)}\Big)\]
\[\Rightarrow p+\varepsilon+\frac{1}{g(x)}(f(y)-(p+\varepsilon)g(y))<\frac{f(x)}{g(x)}<q-\varepsilon+\frac{1}{g(x)}(f(y)-(q-\varepsilon)g(y))\]

Since \(g(x)\to\infty\) as \(x\to a^+\), it is possible to choose \(x\) close enough to \(a\) and therefore \(g(x)\) large enough such that \(\Big|\frac{1}{g(x)}(f(y)-(p+\varepsilon)g(y))\Big|<\varepsilon\), \(\Big|\frac{1}{g(x)}(f(y)-(p+\varepsilon)g(y))\Big|<f(x)/g(x)-(p+\varepsilon)\), \(\Big|\frac{1}{g(x)}(f(y)-(q-\varepsilon)g(y))\Big|<\varepsilon\) and \(\Big|\frac{1}{g(x)}(f(y)-(q-\varepsilon)g(y))\Big|<q-\varepsilon-f(x)/g(x)\). Let \(c_2\in(a,c_1)\) such that \(x\in(a,c_2)\) satisfies these conditions. It follows that \(x\in(a,c_2)\Rightarrow f(x)/g(x)\in (p,q)\) and \(\displaystyle\lim_{x\to a^+ f(x)/g(x)=A}\).

Cauchy Mean Value Theorem

Let \(f\) and \(g\) be continuous on \([a,b]\) and differentiable on \((a,b)\). Then there exists \(\xi\in(a,b)\) such that \(\displaystyle\frac{f'(\xi)}{g'(\xi)}=\frac{f(b)-f(a)}{g(b)-g(a)}\). In order to prove this, we can define a function \(\phi\) as follows:

\[\phi(x)=(f(x)-f(a))(g(b)-g(a))-(g(x)-g(a))(f(b)-f(a))\]

Clearly, \(\phi(a)=\phi(b)=0\) and from Rolle’s theorem there exists \(\xi\in(a,b)\) such that \(\phi'(x)=f'(\xi)(g(b)-g(a))-g'(\xi)(f(b)-f(a))=0\Rightarrow \displaystyle\frac{f'(\xi)}{g'(\xi)}=\frac{f(b)-f(a)}{g(b)-g(a)}\).

Logarithm

The logarithm function is defined as

\[\log(x)=\int_1^x\frac{1}{t}dt\]

Using the fundamental theorem of calculus we can derive the following equality:

\[\log(xy)=\log(x)+\log(y),\quad x,y>0\]

Let \(xy=u\) for \(x,y>0\). Then \(\log(xy)=\log(u)=\int_1^u\frac{1}{t}dt\Rightarrow \frac{d}{dx}\log(xy)=\frac{d}{du}\log(u)\frac{du}{dx}\) by the chain rule. Since \(1/t\) is continuous at \(t=u\) we obtain \(\displaystyle\frac{d}{dx}\log(xy)=\frac{1}{xy}y=\frac{1}{x}\). The derivative of \(\log(x)\) with respect to \(x\) is also equal to \(\displaystyle\frac{1}{x}\). Therefore \(\log(xy)=\log(x)+C\) where \(C\) is a constant. Using \(\log(1)=0\) we obtain \(\log(1\cdot y)=0+C\Rightarrow \log(xy)=\log(x)+\log(y)\).

Using the above equality we obtain \(0=\log(1)=\log(x\cdot x^{-1})=\log(x)+\log(x^{-1})\Rightarrow \log(x^{-1})=-\log(x)\).

Clearly \(\log(x^1)=1\cdot \log(x)\). Let \(n\in\mathbb{N}\). If \(\log(x^{n})=n\cdot \log(x)\), then \(\log(x^{n+1})=\log(x^n)+\log(x)=(n+1)\log(x)\therefore\forall n\in\mathbb{N},\forall x>0, \log(x^n)=n\log(x)\) by induction.

\(\log(x^0)=0\cdot\log(x)\) and \(\log(x^{-n})=\log((x^{-1})^n)\). Since \(x^{-1}>0\), \(\log((x^{-1})^n)=n\cdot\log(x^{-1})=-n\log(x)\). Therefore for every integer \(m\in\mathbb{Z}\), \(log(x^m)=m\log(x)\).

Let \(b^n=x\Rightarrow b=x^{1/n}>0\Rightarrow\log(x)=\log(b^n)=n\log(b)=n\log(x^{1/n})\)

\(\Rightarrow\log(x^{1/n})=\frac{1}{n}\log(x)\).

Let \(q\in\mathbb{Q}\) be any rational number. Then there exist an integer \(m\) and a positive integer \(n\) such that \(\log(x^q)=\log((x^{1/n})^m)=m\log(x^{1/n})=\frac{m}{n}\log(x)=q\log(x)\). Therefore for every rational number \(q\) and for every positive real number \(x\), \(\log(x^q)=q\log(x)\).

While showing that the series \(\displaystyle\sum_{k=2}^\infty\frac{x^k}{\log k}\) has the radius of convergence \(1\), we made use of \(\displaystyle\lim_{k\to\infty}\log k=\infty\). The limits of the \(\log\) function at \(\pm\infty\) can be obtained as follows: Let \((n\log(x),+\infty)\) be any neighbourhood of \(+\infty\) where \(n\in\mathbb{N}, 0<x\in\mathbb{R}\). There exists another neighbourhood \((x^n,+\infty)\) of \(+\infty\) such that \(\forall x_0\in(x^n,+\infty),\quad \log(x^n)=n\log(x)<\log(x_0)\) since \(\log(x)\) is a strictly increasing function. Therefore \(\log(x_0)\in (n\log(x),+\infty) \text{ and }\displaystyle\lim_{x\to+\infty} \log(x)=+\infty\).

Similarly, let \((-\infty,-n\log(x))\) be any neighbourhood of \(-\infty\) where \(n\in\mathbb{N}, 0<x\in\mathbb{R}\). There exists a neighbourhood \((0,x^{-n})\) of \(0\) such that \(\forall x_0\in(0,x^{-n}),\quad \log(x_0)<\log(x^{-n})=-n\log(x)\) since \(\log(x)\) is a strictly increasing function. Therefore \(\log(x_0)\in (-\infty,-n\log(x)) \text{ and }\displaystyle\lim_{x\to 0} \log(x)=-\infty\).

In order to prove that logarithm is a strictly increasing function we use the fact that its derivative has always a positive value. Let \((u,v)\subset(0,\infty)\). Since \(\log(x)\) is differentiable on \((u,v)\) and continuous on \([u,v]\), according to the mean value theorem (mvt) there exists \(c\in(u,v)\) such that \(\displaystyle\frac{d}{dx}\log(x)\Big|_{x=c}=\frac{1}{c}=\frac{\log(v)-\log(u)}{v-u}>0\Rightarrow \log(u)<\log(v)\quad\forall (u,v)\subset (0,\infty)\).

It can also be proven that the range of the \(\log\) function is all of \(\mathbb{R}\). Since \(\log(x)\to\pm\infty,\forall r\in\mathbb{R},\exists p,q\in(0,+\infty):\log(p)<r<\log(q)\). Therefore according to Bolzano intermediate value theorem \(\exists x\in(p,q):\log(x)=r\).

Absolute value

[1]Some of the most significant properties of the absolute value can be proven as follows:

\(x,y\in\mathbb{R}\). \(-|x|\leq x\leq |x|\), \(-|y|\leq y\leq |y| \Rightarrow -(|x|+|y|)\leq x+y\leq(|x|+|y|)\).Also using \(|-y|=|y|\) we obtain \(|x\pm y|\leq |x|+|y|\).

\(|x|=|x+y-y|\leq|x+y|+|y|\Rightarrow |x|-|y|\leq|x+y|\). \(|y|=|y+x-x|\leq|x+y|+|x|\Rightarrow |y|-|x|\leq |x+y|\Rightarrow \Big||x|-|y|\Big|\leq|x\pm y|\leq|x|+|y|\).

Another property of the absolute value operator that we used in the section about the radius of convergence is that for any \(x,y\in \mathbb{R}\), \(|x|^y=|x^y|\). Using the representation of real numbers as complex numbers without imaginary part we obtain \(x=re^{i\theta}=|x|e^{i\theta}\) and \(|x^y|=||x|^ye^{iy\theta}|=||x|^y||\cos(y\theta)+i\sin(y\theta)|=||x|^y|\cdot 1=|x|^y\).

The Fundamental Theorem of Calculus

Let \(\int_a^bf(x)dx\) exist and let \(F:[a,b]\to\mathbb{R}\) be the antiderivative of \(f(x)\) which means that \(F'(x)=f(x), \forall x\in[a,b]\). Then the fundamental theorem of calculus states that \(F(b)-F(a)=\int_a^bf(x)dx\). In order to prove this, let \(P\) be any partition of \([a,b]\) so that \(P=\lbrace x_0=a,x_1,x_2,...,x_{n-1},x_n=b\rbrace\). Then \(F(b)-F(a)=\sum_{i=1}^nF(x_i)-F(x_{i-1})\). Since \(F(x)\) is differentiable on every subinterval \([x_{i-1},x_i]\), according to the mean value theorem, for every \(i\in\lbrace 1,...,n\rbrace,\exists c_i\in(x_{i-1},x_i)\) such that

\[F'(c_i)=f(c_i)=\frac{F(x_i)-F(x_{i-1})}{x_i-x_{i-1}}\]

Therefore \(F(b)-F(a)=\sum_{i=1}^nf(c_i)(x_i-x_{i-1})\) which is a Riemann sum of \(f\) with respect to \(P\). The lower sum \(L(P,f)\) and upper sum \(U(P,f)\) of \(f\) with respect to \(P\) are defined as

\[\begin{split}L(P,f)=\sum_{i=1}^nf(p_i)(x_i-x_{i-1}),f(p_i)=\inf\lbrace f(x):x\in[x_{i-1},x_i]\rbrace\\ U(P,f)=\sum_{i=1}^nf(q_i)(x_i-x_{i-1}),f(q_i)=\sup\lbrace f(x):x\in[x_{i-1},x_i]\rbrace\end{split}\]

Therefore \(L(P,f)\leq F(b)-F(a)\leq U(P,f)\). Since \(P\) was chosen arbitrarily, \(F(b)-F(a)\) is an upper bound for the set of all lower sums of \(f\) and a lower bound for the set of all upper sums of \(f\) on the interval \([a,b]\). Since \(\int_a^b f(x)dx\) exists, by definition the upper and lower integrals of \(f\) on \([a,b]\) must be both equal to \(\int_a^b f(x)dx\). The upper integral \(U(f)\) is the greatest lower bound of the set of all upper sums of \(f\) and the lower integral \(L(f)\) is the least upper bound of the set of all lower sums of \(f\).

\[\begin{split}L(f)=\sup\lbrace L(P,f):P\text{ partitions }[a,b]\rbrace\\ U(f)=\inf\lbrace U(P,f):P\text{ partitions }[a,b]\rbrace\end{split}\]

From the above definitions it follows that

\[L(f)\leq F(b)-F(a)\leq U(f)\Rightarrow \boxed{F(b)-F(a)=\int_a^b f(x)dx}\]

According to the fundamental theorem of calculus if \(g:[a,b]\to\mathbb{R}\) is integrable on \([a,b]\), and \(G(x)=\displaystyle\int_a^xg(t)dt\) for any \(x\in[a,b]\), then \(G(x)\) is continuous on \([a,b]\). Also, if \(g\) is continuous at some \(c\in[a,b]\) then \(G'(c)=g(c)\). First of all, since \(g\) is integrable, it is also bounded by some \(M\in\mathbb{R}\). Let \(x,y\in[a,b]\) and \(x\neq y\). Consider \(|G(x)-G(y)|=|\int_x^yg(t)dt|\leq M|x-y|\Rightarrow \displaystyle\frac{|G(x)-G(y)|}{|x-y|}\leq M\) which proves that \(G\) is Lipschitz and therefore continuous on \([a,b]\).

Suppose that \(g\) is continuous at some \(c\in [a,b]\). Then for every \(\varepsilon >0\) there exists \(\delta >0\) such that if \(|x-c|<\delta\) then \(|g(x)-g(c)|<\varepsilon\). Choose \(\varepsilon, x\) such that \(|x-c|<\delta\). Consider \(\displaystyle\Big|\frac{G(x)-G(c)}{x-c}-g(c)\Big|=\Big|\frac{1}{x-c}\int_c^x g(t)dt-g(c)\Big|=\Big|\frac{1}{x-c}\int_c^x[g(t)-g(c)]dt\Big|\). Since \(|t-c|<\delta\), \(|g(t)-g(c)|<\varepsilon\).

\[\Rightarrow \Big| \frac{G(x)-G(c)}{x-c}-g(c) \Big|<\frac{1}{|x-c|}\varepsilon |x-c|=\varepsilon\]
\[\therefore \lim_{x\to c}\frac{G(x)-G(c)}{x-c}=G'(c)=g(c)\]

It can also be proven that a function which is Lipschitz on an interval, is also uniformly continuous and therefore continuous on this interval. Assume that \(G\) is Lipschitz but not uniformly continuous on \([a,b]\). Then, there exists \(\varepsilon >0\) such that for all \(n\in\mathbb{N}\) there exist \(x_n,y_n\in[a,b]\) with \(|x_n-y_n|<1/n\) and \(|G(x_n)-G(y_n)|\geq \varepsilon\). Since \(G\) is Lipschitz, there exists \(M\in\mathbb{R}\) such that \(\displaystyle|\frac{G(x_n)-G(y_n)}{x_n-y_n}|\leq M\) for all \(n\). It follows that for large enough \(n\):

\[|G(x_n)-G(y_n)|\leq M|x_n-y_n|<\frac{M}{n}<\varepsilon\]

But our assumption was that \(|G(x_n)-G(y_n)|\geq \varepsilon\) for all \(n\). This contradiction proves that on some interval \([a,b]\) if a function is Lipschitz then it is uniformly continuous.

Differentiation Rules

While proving Taylor’s theorem we made use of the product rule and the chain rule of differentiation.

The Product Rule

The product rule was utilized while taking the derivative of \(\displaystyle\frac{f^{(k)}(x)}{k!}(b-x)^k\) with respect to x. Let \(G(x)=f(x)g(x)\) where f’ and g’ both exist at some x=a. Then the derivative of \(G(x)\) at x=a can be expressed as follows [1]:

\[\begin{split}G'(a)&=\lim_{x\to a} \frac{G(x)-G(a)}{x-a}\\ &=\lim_{x\to a}\frac{f(x)g(x)-f(a)g(x)+f(a)g(x)-f(a)g(a)}{x-a}\\ &=\lim_{x\to a}\frac{f(x)-f(a)}{x-a}g(x)+\frac{g(x)-g(a)}{x-a}f(a)\\ &=f'(a)g(a) +g'(a)f(a)\end{split}\]

This gives us the product rule of differentiation. The existence of f’(a) and g’(a) imply the continuity of f and g at x=a which is used in the last step of the above proof in order to obtain \(\displaystyle\lim_{x\to a}g(x)=g(a)\) and \(\displaystyle\lim_{x\to a}f(x)=f(a)\). This can be shown using the definition of the derivative as follows:

\[f(x)-f(a)=\frac{f(x)-f(a)}{x-a}(x-a)\Rightarrow f(x)=f(a)+\frac{f(x)-f(a)}{x-a}(x-a)\]
\[\begin{split}\Rightarrow \lim_{x\to a}f(x)&=\lim_{x\to a} f(a)+\lim_{x\to a}\frac{f(x)-f(a)}{x-a}(x-a)\\ &=f(a)+\lim_{x\to a}\frac{f(x)-f(a)}{x-a}\lim_{x\to a}(x-a)\\ &=f(a)+f'(a)\cdot 0=f(a)\end{split}\]

While proving the continuity of a function at a point where it is differentiable, we used the product rule of the limit operator which says that if f and g are two functions such that \(\displaystyle\lim_{x\to x_0}f(x)=F\) and \(\displaystyle\lim_{x\to x_0}g(x)=G\) then \(\displaystyle\lim_{x\to x_0}f(x)g(x)=FG\). The proof of that statement is as follows [3]: Since the limits exist, we know that for any \(\varepsilon>0\), there exist \(\delta_f\), \(\delta_g\) such that whenever \(|x-x_0|<\delta_f\), \(|f(x)-F|<\displaystyle\frac{\varepsilon}{2(1+|G|)}\) and whenever \(|x-x_0|<\delta_g\), \(|g(x)-G|<\displaystyle\frac{\varepsilon}{2(1+|F|)}\). Also for \(\varepsilon=1\) we know that there exists \(\delta_1\) such that whenever \(|x-x_0|<\delta_1\), \(|g(x)-G|<1\). Suppose that \(\varepsilon >0\) and \(\delta=\min \lbrace\delta_f,\delta_g,\delta_1\rbrace\). If \(|x-x_0|<\delta\), then we obtain:

\[\begin{split}|f(x)g(x)-FG|&=|f(x)g(x)-Fg(x)+Fg(x)-FG|= |g(x)(f(x)-F)+F(g(x)-G)|\\ &\leq |g(x)(f(x)-F)|+|F(g(x)-G)|=|g(x)|\cdot |f(x)-F|+|F|\cdot |g(x)-G|\\ &<|g(x)|\frac{\varepsilon}{2(1+|G|)}+(1+|F|)\frac{\varepsilon}{2(1+|F|)}\end{split}\]

At this point we need to show that \(|g(x)|<(1+|G|)\):

\[|g(x)|=|g(x)-G+G|\leq |g(x)-G|+|G| < 1+|G|\]

Therefore

\[|f(x)g(x)-FG|<(1+|G|)\frac{\varepsilon}{2(1+|G|)}+(1+|F|)\frac{\varepsilon}{2(1+|F|)}=\frac{\varepsilon}{2}+\frac{\varepsilon}{2}=\varepsilon\]

The Chain Rule

The chain rule of differentiation is applied in order to take the derivative of compound functions in form of \(f(g(x))\) or \(f\circ g(x)\) with respect to \(x\). If we equate \(g(x)\) to a variable \(u\), then \(f'(g(x))\) is computed as \(f'(u)g'(x)\). In order to prove this formula we can use the definition of derivative as follows [4]: Let \(y=f(u)\), \(y_0=f(u_0)\), \(u_0=g(x_0)\), then

\[\frac{dy}{dx}\Big \rvert_{x=x_0}=\lim_{x\to x_0}\frac{y-y_0}{x-x_0}=\lim_{x\to x_0}\frac{y-y_0}{u-u_0}\frac{u-u_0}{x-x_0}\]

Using Taylor’s theorem, at any value of \(x\) and \(u\), \(f(u)\) and \(g(x)\) can be expressed as follows:

\[\begin{split}&f(u)=f(u_0)+f'(u_0)(u-u_0)+ ... +\frac{f^{(n)}(\xi)}{n!}(u-u_0)^n,\qquad \xi\in(u_0,u)\\ &f(u)-f(u_0)=f'(u_0)(u-u_0)+\varepsilon_1(u-u_0)\\ &\Rightarrow\frac{f(u)-f(u_0)}{u-u_0}(u-u_0)=(f'(u_0)+\varepsilon_1)(u-u_0)\end{split}\]
\[\begin{split}&g(x)=g(x_0)+g'(x_0)(x-x_0)+ ... + \frac{g^{(n)}(c)}{n!}(x-x_0)^n,\qquad c\in(x_0,x)\\ &g(x)-g(x_0)=g'(x_0)(x-x_0)+\varepsilon_2(x-x_0)\\ &\Rightarrow\frac{g(x)-g(x_0)}{x-x_0}(x-x_0)=(g'(x_0)+\varepsilon_2)(x-x_0)\end{split}\]

In the above expressions, after the first derivative of f and g, the remaining parts of the Taylor expansions are summarized as \(\varepsilon_1(u-u_0)\) and \(\varepsilon_2(x-x_0)\) respectively. Using the Taylor expansions it can be shown that \(\varepsilon_1\) and \(\varepsilon_2\) both converge to zero as \(x\) converges to \(x_0\):

\[\lim_{x\to x_0}\frac{g(x)-g(x_0)}{x-x_0}-g'(x_0)=\lim_{x\to x_0}\varepsilon_2=0\]
\[\lim_{x\to x_0}u-u_0=\lim_{x\to x_0}g(x)-g(x_0)=\lim_{x\to x_0}(g'(x_0)+\varepsilon_2)(x-x_0)=0\]
\[\lim_{x\to x_0}\frac{f(u)-f(u_0)}{u-u_0}-f'(u_0)=\lim_{u\to u_0}\frac{f(u)-f(u_0)}{u-u_0}-f'(u_0)=\lim_{u\to u_0}\varepsilon_1=0\]

Using this result the derivative of f(g(x)) with respect to x is computed as follows:

\[\begin{split}y-y_0&=(f'(u_0)+\varepsilon_1)(u-u_0)\\ &=(f'(u_0)+\varepsilon_1)(g'(x_0)+\varepsilon_2)(x-x_0)\end{split}\]
\[\begin{split}\lim_{x\to x_0}\frac{y-y_0}{x-x_0}&=\lim_{x\to x_0}\Big[f'(u_0)\cdot g'(x_0)+\varepsilon_1\cdot g'(x_0)+\varepsilon_2\cdot f'(u_0)+\varepsilon_1 \cdot \varepsilon_2\Big]\\ &=f'(u_0)\cdot g'(x_0)=f'(g(x_0))\cdot g'(x_0)\end{split}\]

Another differentiation rule that we used while proving Taylor’s theorem is the rule to calculate the derivative of a power. According to this rule, if a function has the form \(f(x)=x^n\), then its derivative with respect to \(x\) is \(nx^{n-1}\). There are to ways to prove this formula. The first one uses the binomial theorem . The derivative of \(f\) at some \(x=x_0\) is computed as \(\displaystyle\lim_{h\to 0}\displaystyle\frac{f(x_0+h)-f(x_0)}{h}\). Using the binomial expansion of \(f(x_0+h)\) we obtain

\[\begin{split}f'(x_0)&=\lim_{h\to 0}\frac{(x_0+h)^n-{x_0}^n}{h}\\ &=\lim_{h\to 0}\frac{\binom{n}{0}{x_0}^n+\binom{n}{1}{x_0}^{n-1}h+...+\binom{n}{n-1}x_0h^{n-1}+\binom{n}{n}h^n-{x_0}^n}{h}\\ &=n{x_0}^{n-1}+\lim_{h\to 0}h\Bigg(\binom{n}{2}{x_0}^{n-2}+\binom{n}{3}{x_0}^{n-3}h+\quad ...\quad+\binom{n}{n-1}x_0h^{n-2}+\binom{n}{n}h^{n-1}\Bigg)\\ &=\boxed{n{x_0}^{n-1}}\end{split}\]

The second way to prove the formula for the derivative of a power uses the following expansion

\[x^n-{x_0}^n=(x-x_0)(x^{n-1}+x_0x^{n-2}+{x_0}^2x^{n-3}+{x_0}^3x^{n-4}+\quad ...\quad +{x_0}^{n-2}x+{x_0}^{n-1})\]

The derivative of \(f\) at some \(x=x_0\) can also be computed as \(\displaystyle\lim_{x\to x_0}\displaystyle\frac{f(x)-f(x_0)}{x-x_0}\). Using the above expansion we obtain:

\[\begin{split}f'(x_0)&=\lim_{x\to x_0}\frac{f(x)-f(x_0)}{x-x_0}=\lim_{x\to x_0}\frac{x^n-{x_0}^n}{x-x_0}\\ &=\lim_{x\to x_0}(x^{n-1}+x_0x^{n-2}+{x_0}^2x^{n-3}+{x_0}^3x^{n-4}+ \quad ... \quad +{x_0}^{n-2}x+{x_0}^{n-1})\\ &=({x_0}^{n-1}+x_0{x_0}^{n-2}+{x_0}^2{x_0}^{n-3}+...+{x_0}^{n-2}x_0+{x_0}^{n-1})\\ &=\boxed{n{x_0}^{n-1}}\end{split}\]

The expansion used in the above proof can be obtained using the finite geometric series summation formula. This formula states that:

\[\sum_{k=0}^{n-1}r^k=\frac{1-r^n}{1-r},\quad r\neq 1\]

In the above formula let \(r=x_0/x\). If \(x=x_0\) then \(x^n-{x_0}^n=0\) and there is no need for an expansion formula. Suppose \(x\neq x_0\). Then \(\displaystyle\sum_{k=0}^{n-1}\Big(\frac{x_0}{x}\Big)^k=\displaystyle\frac{1-\Big(\displaystyle\frac{x_0}{x}\Big)^n}{1-\displaystyle\frac{x_0}{x}}\). Using this result we can write \(x^n-{x_0}^n\) in the following form:

\[\begin{split}x^n-{x_0}^n&=x^n\Big(1-(\frac{x_0}{x})^n\Big)=x^n\Big(1-\frac{x_0}{x}\Big)\sum_{k=0}^{n-1}(\frac{x_0}{x})^k\\ &=x\Big(1-\frac{x_0}{x}\Big)x^{n-1}\Big(1+\frac{x_0}{x}+(\frac{x_0}{x})^2+...+(\frac{x_0}{x})^{n-2}+(\frac{x_0}{x})^{n-1}\Big)\\ &=(x-x_0)(x^{n-1}+x_0x^{n-2}+{x_0}^2x^{n-3}+...+{x_0}^{n-2}x+{x_0}^{n-1})\end{split}\]

Binomial theorem

Binomial theorem states that for any \(a,b\in\mathbb{R}\) and \(n\in\mathbb{N}\), [1]

\[\begin{split}(a+b)^n&=\binom{n}{0}a^n+\binom{n}{1}a^{n-1}b+\binom{n}{2}a^{n-2}b^2+\quad ...\quad \\ &+\binom{n}{n-2}a^2b^{n-2}+\binom{n}{n-1}ab^{n-1}+\binom{n}{n}b^n\end{split}\]

This can be inductively proven with the help of Pascal’s triangle theorem which states that

\[\binom{n}{k-1}+\binom{n}{k}=\binom{n+1}{k}\]

Pascal’s triangle theorem can be proven by inserting the definition of the binomial coefficient \(\binom{n}{k}\) in the above equation:

\[\begin{split}&\frac{n!}{(k-1)!(n-k+1)!}+\frac{n!}{k!(n-k)!}=\frac{n!}{(n-k)!(k-1)!}\Big[\frac{1}{n-k+1}+\frac{1}{k}\Big]\\ &=\frac{n!}{(n-k)!(k-1)!}\Big[\frac{n+1}{k(n-k+1)}\Big]=\frac{(n+1)!}{k!(n+1-k)!}=\binom{n+1}{k}\end{split}\]

For \(n=1\), \(\displaystyle\sum_{k=0}^1\displaystyle\binom{1}{k}a^{1-k}b^k=\displaystyle\binom{1}{0}a+\displaystyle\binom{1}{1}b=(a+b)^1\) and the binomial formula for \((a+b)^n\) is true. Suppose the formula is also true for some \(n\in\mathbb{N}\). Then

\[\begin{split}(a+b)^{n+1}&=(a+b)(a+b)^n=(a+b)\sum_{k=0}^n\binom{n}{k}a^{n-k}b^k\\ &=\Big[\binom{n+1}{0}a^{n+1}+\binom{n}{1}a^nb+\binom{n}{2}a^{n-1}b^2+\binom{n}{3}a^{n-2}b^3+... \\ &+\binom{n}{n-3}a^4b^{n-3}+\binom{n}{n-2}a^3b^{n-2}+\binom{n}{n-1}a^2b^{n-1}+\binom{n}{n}ab^n\Big]\\ &+\Big[\binom{n}{0}a^nb+\binom{n}{1}a^{n-1}b^2+\binom{n}{2}a^{n-2}b^3+\binom{n}{3}a^{n-3}b^4 + ... \\ &+ \binom{n}{n-3}a^3b^{n-2}+\binom{n}{n-2}a^2b^{n-1}+\binom{n}{n-1}ab^n+\binom{n+1}{n+1}b^{n+1}\Big]\\ &=\binom{n+1}{0}a^{n+1}+a^nb\Big[\binom{n}{1}+\binom{n}{0}\Big]+a^{n-1}b^2\Big[\binom{n}{1}+\binom{n}{2}\Big]\\ &+a^{n-2}b^3\Big[\binom{n}{3}+\binom{n}{2}\Big]+...+a^3b^{n-2}\Big[\binom{n}{n-2}+\binom{n}{n-3}\Big]\\ &+a^2b^{n-1}\Big[\binom{n}{n-1}+\binom{n}{n-2}\Big]+ab^n\Big[\binom{n}{n}+\binom{n}{n-1}\Big]+\binom{n+1}{n+1}b^{n+1}\\ &=\binom{n+1}{0}a^{n+1}+\binom{n+1}{1}a^nb+\binom{n+1}{2}a^{n-1}b^2+\binom{n+1}{3}a^{n-2}b^3 +... \\ &+\binom{n+1}{n-2}a^3b^{n-2}+\binom{n+1}{n-1}a^2b^{n-1}+\binom{n+1}{n}ab^n+\binom{n+1}{n+1}b^{n+1}\\ &=\sum_{k=0}^{n+1}\binom{n+1}{k}a^{n+1-k}b^k\end{split}\]

which shows that the formula is also true for \(n+1\) if it is true for \(n\). This completes the proof of the binomial theorem.

The binomial theorem is also one of the reasons why \(0^0\) was defined as equal to \(1\) by mathematicians. Consider the following expansion [5]:

\[\begin{split}(0+x)^n&=\binom{n}{0}0^nx^0+\binom{n}{1}0^{n-1}x+...+\binom{n}{n-1}0^1x^{n-1}+\binom{n}{n}0^0x^n\\ &=x^n\end{split}\]

If \(0^0\) were undefined or defined as zero, then the binominal theorem would yield \(x^n=0^0x^n=0\) or \(x^n=\) undefined.

Weierstrass maximum minimum theorem

While proving Rolle’s theorem we made use of Weierstrass’ maximum-minimum theorem which states that if a function is continuous on a closed interval \([a,b]\), then this function has a maximum and a minimum value on \([a,b]\). We can start the proof of Weierstrass’ maximum-minimum theorem by showing that the continuity of \(f:[a,b]\to\mathbb{R}\) on \([a,b]\) implies its boundedness on \([a,b]\). This can be proven by contradiction. Assume that \(f:[a,b]\to\mathbb{R}\) is continuous but not bounded. Then for any \(n\in\mathbb{N}\) there must be \(x_n\in [a,b]\) such that \(\lvert f(x_n)\lvert > n\). Obviously, \(\lbrace x_n \rbrace\) is a sequence bounded by a and b. From the boundedness of \(\lbrace x_n \rbrace\) it follows that \(\lbrace x_n \rbrace\) has a convergent subsequence \(\lbrace x_{n_k} \rbrace\) such that \(x_{n_k}\to c\in [a,b]\). Since \(f\) is a continuous function, \(f(x_{n_k})\to f(c)\). This means that for any real number \(\varepsilon > 0\), there exists \(k_0\in\mathbb{N}\) such that if \(\lvert x_{n_k}-c \rvert < 1/n_{k_0}\) then \(\lvert f(x_{n_k})-f(c)\rvert < \varepsilon\) and \(\lvert f(x_{n_k})\rvert < \varepsilon + \lvert f(c) \rvert\). Since \(\lbrace x_{n_k} \rbrace\) converges to \(c\), it is possible to choose k large enough so that \(\lvert x_{n_k}-c \rvert <1/n_{k_0}\) and \(\varepsilon+\lvert f(c) \rvert < n_k\). But in this case we obtain \(\lvert f(x_{n_k} \rvert < n_k\) which is in contradiction with our initial assumption that \(\lvert f(x_n)\rvert >n\) for any \(n\in\mathbb{N}\). This proves the boundedness of \(f:[a,b]\to\mathbb{R}\). As a result, \(f\) has a supremum \(S\) on \([a,b]\). Using the definition of supremum, we know that for every \(n\in\mathbb{N}\) there exists \(x_n \in [a,b]\) such that \(S-1/n < f(x_n) \leq S\) from which \(f(x_n)\to S\) follows. This gives us another bounded sequence \(\lbrace x_n \rbrace\) with a convergent subsequence \(x_{n_k}\to c\) in \([a,b]\) and \(f(x _{n_k})\to f(c)\). Since \(f(x _{n_k})\) is a subsequence of \(f(x_n)\), these two sequences have to converge to the same limit such that \(f(c)=S\). Since \(c\in[a,b]\) and \(\forall x\in[a,b]\), \(f(x)\leq f(c)\), this completes the proof of the maximum part of the Weierstrass’ maximum-minimum theorem. The minimum part can be proven in the same way.

Combining Weierstrass maximum minimum theorem with the integrability of a continuous function we obtain the mean value theorem for integrals

Mean Value Theorem for Integrals

Let \(f\) be continuous on \([a,b]\). Then \(\exists p,q\in[a,b]:f(p)=\inf{f(x):x\in[a,b]},f(q)=\sup{f(x):x\in[a,b]}\).It follows that

\[f(p)(b-a)\leq\int_a^bf\leq f(q)(b-a)\]
\[\Rightarrow f(p)\leq \frac{1}{b-a}\int_a^bf\leq f(q)\]

Therefore according to Bolzano intermediate value theorem \(\exists c\in[a,b]:f(c)=\frac{1}{b-a}\int_a^bf\Rightarrow \int_a^bf=f(c)(b-a)\).

Bolzano intermediate value theorem

This theorem states that if \(f\) is continuous on \([a,b]\) and \(f(a)<0<f(b)\) then \(\exists c\in(a,b):f(c)=0\).

Consider \(\displaystyle f\Bigg(\frac{a+b}{2}\Bigg)\). If \(\displaystyle f\Bigg(\frac{a+b}{2}\Bigg)=0\) then we found \(c\). Otherwise, if \(\displaystyle f\Bigg(\frac{a+b}{2}\Bigg)<0\) let \(a_1=(a+b)/2, b_1=b\) and consider \(\displaystyle f\Bigg(\frac{a_1+b_1}{2}\Bigg)\). If \(\displaystyle f\Bigg(\frac{a_1+b_1}{2}\Bigg)=0\) then we found \(c\). Otherwise, if \(\displaystyle f\Bigg(\frac{a_1+b_1}{2}\Bigg)>0\) let \(b_2=(a_1+b_1)/2, a_2=a_1\) and consider \(\displaystyle f\Bigg(\frac{a_2+b_2}{2}\Bigg)\). Continuing this way we either find \(c\) after a finite number of updates or we get a monotone increasing sequence \(\lbrace a_n\rbrace\) and a monotone decreasing sequence \(\lbrace b_n\rbrace\). Since these sequences are also bounded, they are both convergent. Also, because \(b_n=a_n+\displaystyle\frac{b-a}{2^n}\), they converge to the same limit \(c\in[a,b]\). Due to the continuity of \(f\) on \([a,b]\) it follows that \(f(a_n)\to f(c),f(b_n)\to f(c)\). Furthermore \(\forall n,f(a_n)<0, f(b_n)>0\) which implies that \(0\leq f(c)\leq 0\Rightarrow \boxed{f(c)=0}\).

In the proof of the Weierstrass’ maximum-minimum theorem we made use of several facts without showing why they are true. The first one of these facts is that any bounded sequence has a convergent subsequence (Bolzano-Weierstrass theorem).

Every bounded sequence has a convergent subsequence (Bolzano-Weierstrass)

Let \(\lbrace x_n \rbrace\) be any real valued sequence. We can call \(x_p\) a peak value of \(\lbrace x_n\rbrace\) if for all \(k\in\mathbb{N}\), \(x_{p+k}\leq x_p\). Then \(\lbrace x_n \rbrace\) has either an infinite number of peak values or only a finite number of them. In case of infinitely many peak values, for any \(k\in\mathbb{N}\), There exists a peak value \(x_{n_k}\) and these peak values build a decreasing monotone subsequence \(\lbrace x_{n_k} \rbrace\). In case of a finite number of peak values, let \(x_N\) be the last of them and let \(n_1 > N\). Then, \(x_{n_1}\) is not a peak value and therefore there exists \(x_{n_2}\) such that \(x_{n_1} \leq x_{n_2}\). Also, for any \(k\in\mathbb{N}\), there exist \(x_{n_k}\) and \(x_{n_{k+1}}\) such that \(n_k >N\) and \(x_{n_k} \leq x_{n_{k+1}}\). Therefore, a monotone increasing subsequence \(\lbrace x_{n_k} \rbrace\) of \(\lbrace x_n \rbrace\) can be built using these non-peak values with indices greater than \(N\). It follows that any real valued sequence has a monotone subsequence. It can also be shown that if a monotone sequence is bounded, then it is convergent. Now suppose that \(\lbrace x_n \rbrace\) is a real-valued and bounded sequence and \(\lbrace x_{n_k} \rbrace\) is its monotone increasing subsequence. Then \(\lbrace x_{n_k} \rbrace\) is also bounded. Let \(S\) be the supremum of \(\lbrace x_{n_k} \rbrace\). Then, for every \(\varepsilon >0\), there exists \(K\in\mathbb{N}\) such that \(S-\varepsilon < x_{n_K} \leq S\). Since \(\lbrace x_{n_k} \rbrace\) is an increasing sequence, \(\forall k>K\), \(S-\varepsilon < x_{n_K}\leq x_{n_k} \leq S\) from which we can obtain by subtracting \(S\) from both sides of the inequality the following relationship: \(\lvert x_{n_k}-S \rvert <\varepsilon\). This completes the proof that the monotone subsequence of a bounded sequence is convergent and therefore every bounded sequence has a convergent subsequence.

The next fact that we used in the proof of Weierstrass’ maximum-minimum theorem is that if a convergent sequence \(a_n \to L\) is in \([A,B]\) then its limit \(L\) is also in \([A,B]\). We can start the proof of this fact by first proving that the limit of a non-negative convergent sequence \(a_n \to L\) is also non-negative. Clearly, for any \(\varepsilon > 0\), there exists \(N_{\varepsilon}\in\mathbb{N}\) such that \(n>N_{\varepsilon}\) implies \(\lvert a_n - L \rvert <\varepsilon\). If we assume a negative limit then we obtain \(a_n-L <\varepsilon \Rightarrow a_n <\varepsilon + L\). However we could choose \(\varepsilon\) small enough such that \(\varepsilon <\lvert L \rvert\). Then we would obtain \(a_n <\varepsilon +L <0\) which is a contradiction. Therefore the limit of a non-negative convergent sequence must be non-negative. The next step in the proof is to observe the behaviours of the non-negative sequences \(\lbrace a_n-A \rbrace\) and \(\lbrace B-a_n \rbrace\). Clearly, \(a_n-A \to L-A\geq 0\Rightarrow A \leq L\) and \(B-a_n\to B-L \geq 0 \Rightarrow L\leq B\). It follows that \(L\in [A,B]\).

In the proof of Weierstrass’ maximum-minimum theorem we also used the fact that a sequence is convergent with a limit if and only if each of its subsequences is convergent with the same limit. In order to prove this let \(x_n\to L\). Then for any \(\varepsilon >0\) there exists \(N_{\varepsilon}\) such that \(n>N_{\varepsilon}\) implies \(\lvert x_n-L\rvert<\varepsilon\). Then let \(\lbrace x_{n_k}\rbrace\) be any subsequence of \(\lbrace x_n \rbrace\). For every \(k>N_{\varepsilon}\) we know that \(n_k\geq k> N_{\varepsilon}\) and \(\lvert x_{n_k}-L\rvert<\varepsilon\) and therefore \(x_{n_k}\to L\). Conversely, if any subsequence of \(\lbrace x_n \rbrace\) converges to \(L\), then since \(\lbrace x_n\rbrace\) is a subsequence of itself \(x_n\to L\).

A continuous function is integrable

Another place where Weierstress’ maximum-minimum theorem can be used is in the proof of the integrability of a continuous function. While proving the Weierstrass’ maximum-minimum theorem, we made use of the boundedness of a continuous function. A further implication of the continuity is that a function \(f\) which is continuous on an interval \([a,b]\subset\mathbb{R}\) is integrable on \([a,b]\). In order to prove this, we use the fact that \(f\) is also uniformly continuous on \([a,b]\). Suppose \(\varepsilon >0\), then \(\exists \delta >0\) such that for any \(x,y\) with \(|x-y|<\delta\), \(|f(x)-f(y)|<\varepsilon / (b-a)\). We can choose a partition \(P=\lbrace x_0,x_1, ... , x_n\rbrace\) of \([a,b]\) such that for any \(i\in\lbrace 1,...,n\rbrace\), \(|x_i-x_{i-1}|<\delta\). Since \(f\) is continuous on every interval \([x_{i-1},x_i]\), according to Weierstrass’ maximum-minimum theorem on each one of these intervals there exist \(p_i,q_i\in[x_{i-1},x_i]\) such that \(f(p_i)=\inf\lbrace f(x):x\in[x_{i-1},x_i]\rbrace\) and \(f(q_i)=\sup\lbrace f(x):x\in[x_{i-1},x_i]\rbrace\). Furthermore since \(|q_i-p_i|\) is always less than \(\delta\), \(|f(q_i)-f(p_i)|\) is always less than \(\varepsilon/(b-a)\). Now, \(U(P,f)-L(P,f)\) can be computed as follows:

\[\begin{split}U(P,f)-L(P,f)&=\sum_{i=1}^{n}(f(q_i)-f(p_i))(x_i-x_{i-1})\\ &<\frac{\varepsilon}{b-a}\sum_{i=1}^{n}(x_i-x_{i-1})\\ &=\frac{\varepsilon}{b-a}(b-a)=\varepsilon\end{split}\]

Therefore, according to the Cauchy criterion for integrability, \(\int_a^bf(x)dx\) exists. The definitions of \(U(P,f),L(P,f)\) can be found in the section about the fundamental theorem of calculus

In order to prove that if \(f\) is continuous on \([a,b]\) then it is uniformly continuous on \([a,b]\) we can assume that \(\exists \varepsilon>0 : \forall n \exists x_n,y_n\in[a,b]: |x_n-y_n|<1/n \text{ and }|f(x_n)-f(y_n)|\geq\varepsilon\). Then \(\lbrace x_n \rbrace,\lbrace y_n \rbrace\) have convergent subsequences \(\lbrace x_{n_k} \rbrace,\lbrace y_{n_k} \rbrace\) with \(|x_{n_k}-y_{n_k}|<1/n_k\forall k\). It follows that \(x_{n_k}\to c\in[a,b],y_{n_k}\to c\in[a,b]\Rightarrow f(x_{n_k})\to f(c),f(y_{n_k})\to f(c)\). Therefore for large enough \(k\), \(|f(x_{n_k})-f(y_{n_k})|<\varepsilon\). This contradiction completes the proof.

Cauchy criterion for integrability

According to this criterion a function \(f\) is integrable on an interval \([a,b]\) if and only if for every \(\varepsilon >0\) there exists a partition \(P\) of \([a,b]\) such that \(U(P,f)-L(P,f)<\varepsilon\).

If \(\int_a^b f=\alpha\) then there exists a sequence of partitions \(\lbrace P_n \rbrace\) such that \(U(P_n,f)\to\alpha\) and \(L(P_n,f)\to alpha\). Then \(U(P_n,f)-L(P_n,f)\to 0\) and for every \(\varepsilon>0\) for large enough \(n\), \(U(P_n,f)-L(P_n,f)<\varepsilon\).

Conversely, if for every \(\varepsilon>0\) there exists \(P_{\varepsilon}\) such that \(U(P_{\varepsilon},f)-L(P_{\varepsilon},f)<\varepsilon\) then \(0\leq U(f)-L(f)<\varepsilon\) for every positive \(\varepsilon\) which implies that \(U(f)=L(f)=\int_a^b f\).

In the proof of the Cauchy integrability criterion we used the fact that if \(f\) is integrable on \([a,b]\) then there exists a sequence of partitions \(\lbrace P_n \rbrace\) such that \(U(P_n,f)\to\alpha\) and \(L(P_n,f)\to \alpha\).

If \(f\) is integrable on \([a,b]\) then \(U(f)=L(f)=\alpha\) from which it follows that for every \(n\in\mathbb{N}\), there exist partitions of \([a,b]\), \(Q_n,R_n\) and their union \(P_n=Q_n\cup R_n\) such that

\[\alpha-\frac{1}{n}<L(Q_n,f)\leq L(P_n,f)\leq U(P_n,f)\leq U(R_n,f)<\alpha+\frac{1}{n}\]
\[\Rightarrow |L(P_n,f)-\alpha|<\frac{1}{n},\quad |U(P_n,f)-\alpha|<\frac{1}{n} \Rightarrow L(P_n,f)\to\alpha, U(P_n,f)\to\alpha\]

Conversely, if there exists a sequence of partitions \(\lbrace P_n \rbrace\) such that \(U(P_n,f)\to\alpha\) and \(L(P_n,f)\to \alpha\), then

\[\alpha\leq L(f)\leq U(f)\leq\alpha\Rightarrow L(f)=U(f)=\alpha=\int_a^bf`\]

Assume that \(L(P_n,f)\to\alpha\) and \(L(f)<\alpha\). Then for large enough \(n\), \(|L(P_n,f)-\alpha|<\alpha-L(f)\). It follows that \(L(f)-\alpha <L(P_n,f)-\alpha\) and \(L(P_n,f)\) is greater than the least upper bound of lower sums of \(f\) which is a contradiction. \(U(f)\leq \alpha\) can be proven similarly.

In order to prove that \(L(f)\leq U(f)\), let \(Q,R\) be any partitions and let \(P=Q\cup R\). Then \(L(Q,f)\leq L(P,f)\leq U(P,f)\leq U(R,f)\). Therefore any lower sum is less than or equal to any upper sum. In other words any lower sum is a lower bound for the set of all upper sums. Since \(U(f)\) is the greatest lower bound for the set of all upper sums, we have \(L(P,f)\leq U(f)\). Since \(P\) could be any partition, it follows that \(U(f)\) is an upper bound for the set of all lower sums. Since \(L(f)\) is the least upper bound for the set of all lower sums, \(L(f)\leq U(f)\) follows.

In the above proofs we frequently used the fact that the refinement of a partition increases lower sums and decreases upper sums. The increase of lower sums and decrease of upper sums can be proven in the same way. In order to prove the increase of lower sums we can insert an additional point \(p\) to the partition \(P\) and call the refined partition \(P'\) such that

\[P'=\lbrace x_0,x_1, ... , x _{k-1}, p, x_k, x _{k+1}, ..., x_n\rbrace\]

Let \(m'=\inf\lbrace x:x\in[x _{k-1}, p] \rbrace\), \(m''=\inf\lbrace x:x\in[p, x_k] \rbrace\), \(m_i=\inf\lbrace x:x\in[x_{i-1}, x_i] \rbrace\). It follows that \(m'\geq m_k\) and \(m''\geq m_k\). Therefore

\[\begin{split}L(P',f)&=\sum _{i=1}^{k-1}m_i(x_i-x _{i-1})+m'(p-x _{k-1})+m''(x_k-p)+\sum _{i=k+1}^n m_i(x_i-x _{i-1})\\ &\geq \sum _{i=1}^{k-1}m_i(x_i-x _{i-1})+m_k(p-x _{k-1})+m_k(x_k-p)+\sum _{i=k+1}^n m_i(x_i-x _{i-1})\\ &=\sum _{i=1}^{k-1}m_i(x_i-x _{i-1})+m_k(x_k-x _{k-1})+\sum _{i=k+1}^n m_i(x_i-x _{i-1})\\ &=L(P,f)\end{split}\]

References

[1] Muldowney, James S. ; “Mathematics 117 Lecture Notes”, University of Alberta

[2] Bowman, John C. ; “Math 117/118 Honours Calculus Lecture Notes”, University of Alberta

[3] http://planetmath.org/proofoflimitruleofproduct

[4] Thomas’ Calculus, 12th edition.

[5] http://www.askamathematician.com/2010/12/q-what-does-00-zero-raised-to-the-zeroth-power-equal-why-do-mathematicians-and-high-school-teachers-disagree/

[6] Spivak M. (1965);”Calculus on Manifolds”, ISBN 0-8053-9021-9

[7] Rudin W. (1976);”Principles of Mathematical Analysis”, ISBN 0-07-054235-X