Calculus and Linear Algebra

CALCULUS AND LINEAR ALGEBRA

1 - NATURAL NUMBERS

Natural numbers are used for counting and ordering

ℕ := {0,1,2,3,...}

Some definitions, including the standard ISO 80000-2, begin the natural numbers with 0, corresponding to the non-negative integers 0, 1, 2, 3, ..., sometimes collectively denoted by the symbol N₀, to emphasize that zero is included, whereas others start with 1, corresponding to the positive integers 1, 2, 3, ..., sometimes collectively denoted by the symbol N₁, N⁺, or N^* for emphasizing that zero is excluded

Peano axioms: 1) zero (0) is a natural number; 2) to every natural number n is associated a natural σ(n) different from 0, called the next of n, and different natural numbers have different successors; 3) if a set A of natural numbers contains 0 and contains the next of each of its elements, then A = ℕ

Starting from the notion of succession we can define addition, and on the base of addition the multiplication of natural numbers

Adding 1 to a natural number means passing from n to the next of n

n+0 := n; n+1 = n+σ(0) := σ(n+0) = σ(n); n+2 = n+σ(1) := σ(n+1) = σ(σ(n))

Commutative property: a binary operation is commutative if changing the order of the operands does not change the result; addition is commutative, a+b = b+a; multiplication is commutative, a⋅b = b⋅a; subtraction and division are not commutative, for example 3−5 ≠ 5−3 and 7/5 ≠ 5/7

Associative property: the associative property is a property of some binary operations, which means that rearranging the parentheses in an expression will not change the result; (a+b)+c = a+(b+c); (a⋅b)⋅c = a(b⋅c)

The neutral element of the addition is the number 0

The neutral element of multiplication is the number 1

Distributive property: a(b+c) = ab+ac

The mathematical induction or induction principle is a mathematical proof technique; it is essentially used to prove that a statement P(n) holds for every natural number n = 0, 1, 2, 3, ...; a proof by induction consists of two cases, the first, the base case or base, proves the statement for n = 0 without assuming any knowledge of other cases, the second case, the induction step, proves that if the statement holds for any given case n = k, then it must also hold for the next case n = k + 1; these two steps establish that the statement holds for every natural number n; the base case does not necessarily begin with n = 0, but often with n = 1, and possibly with any fixed natural number n = N

The induction principle asserts that if P(0) is true, and if P(n) ⇒ P(n+1) ∀ n, then P(n) is true

A ⊆ ℕ; 0 ∈ A; n ∈ A ⇒ n+1 ∈ A

The sum of n natural numbers is n(n+1)/2; 0+1+2+...+n = n(n+1)/2; P(0): 0 = 0, true; 0+1+...+n+(n+1) = (n(n+1)/2)+(n+1) = (n(n+1)+2(n+1))/2 = (n+1)(n+2)/2, P(n) ⇒ P(n+1); P(n) is true

Calculate how many parts a plane is divided by n straight lines; r(n+1) = r(n)+(n+1); r(n) = r(n-1)+n; r(n) = 1+n(n+1)/2

a⁰ := 1; aⁿ⁺¹ := a⋅aⁿ; 0! := 1; (n+1)! := (n+1)n!

2 - COMBINATORIAL CALCULUS

a² := a⋅a; aⁿ := a⋅a⋅...⋅a for n factors; aⁿ⁺¹ = a⋅aⁿ; a⋅a = a² = a⋅a¹ ⇒ a¹ = a⋅a/a = a; a = a¹ = a⋅a⁰ ⇒ a⁰ = a/a = 1; a^-n = 1/aⁿ

n! := 1⋅...⋅n; (n+1)! = (n+1)⋅n!; 1 = 1! = 1⋅0! ⇒ 0! := 1

The Cartesian product of two sets A and B, denoted A × B, is the set of all ordered pairs (a, b) where a is in A and b is in B

A₁ = {a,b,c}, A₂ = {1,2}; A₁ x A₂ = (a,1), (a,2), (b,1), (b,2), (c,1), (c,2); the number of pairs formed is n₁⋅n₂

Ordered lists = dispositions

Dispositions with repetitions of n elements in groups of k elements: ^(r)D_n,k = n^k

A = {a,b,c}, n = 3, k = 2; (a,a), (a,b), (a,c), (b,a), (b,b), (b,c), (c,a), (c,b), (c,c); ^rD_3,2 = 3² = 9 dispositions with repetitions of 3 elements in groups of 2 elements

A byte is composed by 8 bits, a bit is 0 or 1, so a byte can assume 256 different values; {0,1}, n = 2, k = 8; ^(r)D_2,8 = 2⁸ = 256

Simple dispositions, dispositions with no repetitions of n elements in groups of k elements: D_n,k = n(n-1)...(n-k+1); k ≤ n

A = {a,b,c}, n = 3, k = 2; (a,b), (a,c), (b,a), (b,c), (c,a), (c,b); D_3,2 = 3⋅2 = 6 dispositions with no repetitions of 3 elements in groups of 2 elements

Permutations, simple dispositions of n elements in groups of n elements: P_n = D_n,n = n(n-1)...2⋅1 = n!; permutations are simple dispositions with k = n

4 friends go to the theater and there are 4 armchairs; how many ways can they sit? In 24 different ways; permutation that is solved with n!

Unordered lists = combinations; combinations are a subset of the dispositions

Simple combinations of n elements in groups of k elements: C_n,k = D_n,k/P_k = D_n,k/k! = (n(n-1)...(n-k+1))/k! = (n(n-1)...(n-k+1)(n-k)!)/k!(n-k)! = n!/(k!(n-k)!)

Symmetry of binomial coefficients: C_n,k = C_n,n-k

C_5,2 = C_5,3; C_5,2 = 5!/(2!3!) = 10; C_5,3 = 5!/(3!2!) = 10

C_n,n = 1; n!/(n!0!) = 1

Binomial coefficient with upper index n and lower index k and 0 ≤ k ≤ n: C(n,k) = n!/(k!(n-k)!); C(n,k) ∈ ℕ

(a+b)⁰ = 1

(a+b)¹ = a+b

(a+b)² = a²+2ab+b²

(a+b)³ = a³+3a²b+3ab²+b³

1 1

1 2 1

1 3 3 1

1 4 6 4 1

1 5 10 10 5 1

1 6 15 20 15 6 1

1 7 21 35 35 21 7 1

1 8 28 56 70 56 28 8 1

1 9 36 84 126 126 84 36 9 1

1 10 45 120 210 252 210 120 45 10 1

Binomial formula: (a+b)ⁿ = C(n,0)aⁿ+C(n,1)a^n-1b+C(n,2)a^n-2b²+...+C(n,n-1)ab^n-1+C(n,n)bⁿ = ⁿΣ_k=0(C(n,k)a^n-kb^k)

C(n,0) = C(n,n) = 1

C(n,k) = C(n-1,k)+C(n-1,k-1), 1 ≤ k ≤ n-1; each row of the arithmetic triangle is calculated starting from the previous row

n!/(k!(n-k)!) = (n!/((k-1)!(n-(k-1))!))((n-(k-1))/k) = C(n,k-1)((n-k+1)/k)

3 - INTEGERS AND RATIONALS

Commutative property: a+b = b+a; ab = ba

Associative property: (a+b)+c = a+(b+c); (ab)c = a(bc)

Neutral element: a+0 = a; a⋅1 = a

Distributive property: a(b+c) = ab+ac

The equation a+x = b can be solved in ℕ only if a ≤ b

The equation a+x = 0 can be solved in ℕ only if a = 0

The set of integers consists of zero (0), the positive natural numbers (1, 2, 3, ...), and their additive inverses, the negative integers, −1, −2, −3, ...; the set of integers is denoted by the symbol ℤ

Commutative property for integers: 2+(-5) = -3, -5+2 = -3

The solution of the equation a+x = b in ℤ is x = b-a that is x = b+(-a), the sum of b with the opposite of a; subtraction is equivalent to addition with the opposite

The opposite of x is -x; if x is negative, -x is positive

|x| := {x, x ∈ ℕ; -x, x ∉ ℕ}

3⋅4 = 4+4+4 = 12; 4⋅3 = 3+3+3+3 = 12; 3⋅(-4) = -4-4-4 = -12; -4⋅3 = -12; 3⋅0 = 0⋅3 = 0; -4⋅0 = 0⋅(-4) = 0; -3⋅(-4) = 12

0 = -3⋅0 = -3(4-4) = -3⋅4+(-3)(-4) = -12+12; for the distributive property, the product of two negative numbers is positive.

The sum of a number with its opposite is zero.

The product of a number by its reciprocal is 1.

The product of two integers a and b is equal to the product of their absolute values if a and b have the same sign, it is the opposite of the product of their absolute values if they have opposite signs; the product is zero if at least one of the factors is zero

a+x = b is always solvable in ℤ

a+x = 0 is always solvable in ℤ; every number has its opposite

a⋅x = 1 in ℕ if and only if a = 1; in ℕ the only number to have a reciprocal is 1

a⋅x = 1 in ℤ if and only if a = 1 or a = -1; in ℤ the only numbers to have a reciprocal are 1 and -1

3⋅x = 1; there are no solutions in ℕ and ℤ

A rational number is a number that can be expressed as the quotient or fraction p/q of two integers, a numerator p and a non-zero denominator q; rationals are denoted with the symbol ℚ

The fraction n/m, where n is an integer and m is an integer different from 0, has n as its numerator and m as its denominator; if n is a multiple of m, n = q⋅m, then the fraction is apparent: n/m = q

n/m = p/q ⇔ n⋅q = m⋅p

3/-2 = -3/2

3/4 is a fraction reduced to the lowest terms, numerator and denominator are prime among themselves, they have no other divisor in common than 1

An irreducible fraction, or fraction in lowest terms or simplest form or reduced fraction, is a fraction in which the numerator and denominator are integers that have no other common divisors than 1, or -1 when negative numbers are considered; a fraction a⁄b is irreducible if and only if a and b are coprime that is if a and b have a greatest common divisor of 1.

4 - RATIONALS

(1/3)+(2/3) = 3/3 = 1

(2/3)+(4/3) = 6/3 = 2

(2/3)+(3/4) = (2⋅4+3⋅3)/(3⋅4) = 17/12

The least common multiple, lowest common multiple, or smallest common multiple of two integers a and b, usually denoted by lcm(a,b), is the smallest positive integer that is divisible by both a and b; when adding, subtracting, or comparing simple fractions, the least common multiple of the denominators, often called the lowest common denominator, is used because each of the fractions can be expressed as a fraction with this denominator; (2/21)+(1/6) = (2/(3⋅7))+(1/(2⋅3)) = (2⋅2/(2⋅3⋅7))+(1⋅7/(2⋅3⋅7)) = (4/42)+(7/42) = 11/42

3⋅(3/4) = 9/4

(2/3)⋅(4/7) = 8/21

Operations in ℚ: (n/m)+(p/q) = (nq+pm)/mq; (n/m)⋅(p/q) = np/mq

Comparing n/m and p/q; nq/mq < pm/mq, nq < pm; 2/3 < 3/4, 2⋅4 < 3⋅3

ℚ is an ordered field: n/m < p/q ⇔ nq < pm

ℚ is a field because every element, except 0, has an opposite and a reciprocal or multiplicative inverse; ℕ and ℤ are not fields because they do not have multiplicative inverses

Between two rational numbers there are always infinite rational numbers; ℚ is a dense numerical set

The rational numbers are a dense subset of the real numbers

ℕ and ℤ are not dense, ℚ and ℝ are dense

It is possible to find a rational number between 2/3 and 3/4; (2/3)+(3/4) = (8+9)/12 = 17/12; (17/12)/2 = 17/24; 2/3 < 17/24 < 3/4; 2⋅24 < 3⋅17, 48 < 51; 17⋅4 < 3⋅24, 68 < 72

A majorant or upper bound of a set A is a number M greater than or equal to any element of A; ∀ a ∈ A, a ≤ M

A not empty set A is bounded above if it admits majorants

If p < q then p/q < 1; 1 is a majorant

A minorant or lower bound of a set A is a number m less than or equal to any element of A; ∀ a ∈ A, a ≥ m

A not empty set A is bounded below if it admits minorants

The smallest of the majorants is called supremum; sup(A)

The largest of the minorants is called infimum; inf(A)

If the supremum belongs to the set, it is the maximum of the set; max(A) = M

If the infimum belongs to the set, it is the minimum of the set; min(A) = m

If a set is bounded, it has minorants and majorants

If a set is unbounded below, it has no minorants but it has majorants

If a set is unbounded above, it has minorants but it has no majorants

If a set is unbounded below and above, it has no minorants and no majorants

A = {p/q, 0 < p < q}; A is bounded above and 1 is the supremum, but it has no maximum; A is bounded below and 0 is the infimum, but it has no minimum

p/q < (p+1)/(q+1); p(q+1) < q(p+1), pq+p < pq+q, p < q; 3/4 < 4/5

In ℤ a set bounded above has maximum

In ℤ a set bounded below has minimum

In ℚ a set bounded above may not have maximum

In ℚ a set bounded below may not have minimum

The Pythagoreans discovered that the diagonal of a square is immeasurable respect to the side; 1²+1² = (n/m)², 2 = n²/m² ⇔ 2m² = n², but there are no integers that can satisfy this equation

There is no rational whose square is equal to 2; the double of a perfect square cannot be a perfect square

A = {p/q; p²/q² < 2}, there is no maximum; B = {n/m; n²/m² > 2}, there is no minimum

ℚ is dense but not complete; in ℚ it is impossible to resolve x² = 2; there are sets that are bounded above but without supremum

In ℚ there are sets bounded above but without supremum and maximum, and sets bounded below but without infimum and minimum, because ℚ is dense but not complete; this can be understood by studying the hyperbola (2x+2)/(x+2)

5 - DECIMALS

√(2) is not present in the rational number line

So there is a need for a numerical set without holes, which is dense but also continuous; therefore it is necessary to introduce irrational numbers; rational numbers and irrational numbers form the set of real numbers, denoted by the symbol ℝ

A decimal fraction is a fraction that has a power of 10 as its denominator; a decimal or equivalent fraction generates a finite decimal number

123 = 1⋅100+2⋅10+3⋅1; 1⋅10²+2⋅10¹+3⋅10⁰

1.25 = 1⋅10⁰+2⋅10^-1+5⋅10^-2

3/2 = 15/10 = (10+5)/10 = 10/10 + 5/10 = 1,5

9/4 = 225/100 = (200+20+5)/100 = 200/100 + 20/100 + 5/100 = 2,25

1/7 = 1.142857; this is a simple repeating decimal representation, simple because the block of repeating digits begins immediately after the decimal point or comma

The periodicity of a number is given by the remains that are repeated during the division process

Rational numbers have periodic decimal representation, including period 0, excluding period 9

11/15 = 0.73; a decimal in which at least one of the digits after the decimal point is non-repeated and some digits are repeated is called a mixed repeating decimal

x = 1.235 = 1.2353535...; 100x = 123.5353535...; 100x-x = 99x = 122.3; x = 122.3/99 = 12230/990

A repeating decimal or recurring decimal is decimal representation of a number whose digits are periodic, repeating its values at regular intervals, and the infinitely repeated portion is not zero; it can be shown that a number is rational if and only if its decimal representation is repeating or terminating; the infinitely repeated digit sequence is called the repetend or reptend; if the repetend is a zero, this decimal representation is called a terminating decimal rather than a repeating decimal, since the zeros can be omitted and the decimal terminates before these zeros; every terminating decimal representation can be written as a decimal fraction, a fraction whose denominator is a power of 10 like 1.585 = 1585/1000; 1.585 may also be written as a ratio of the form k/2ⁿ5^m, 1.585 = 317/(2³5²); every number with a terminating decimal representation also has an alternative representation as a repeating decimal whose repetend is the digit 9, this is obtained by decreasing the final (rightmost) non-zero digit by one and appending a repetend of 9 like 1.000... = 0.999...

Any number that cannot be expressed as a ratio of two integers is said to be irrational; their decimal representation neither terminates nor infinitely repeats but extends forever without regular repetition; examples of such irrational numbers are the square root of 2 and π

Irrational numbers are represented by unlimited, non-periodic decimal alignments

6 - REAL NUMBERS

The set ℝ of real numbers is the union of the set of rational numbers with the set of irrational numbers

A rational number can be represented as a fraction, and can be either a periodic decimal number or a finite decimal number derived from a decimal fraction or equivalent

Irrational numbers have infinite non-periodic decimal digits

The set of rational numbers and the set of irrational numbers form the set of real numbers

ℕ ⊂ ℤ ⊂ ℚ ⊂ ℝ

ℚ and ℝ are ordered fields, but ℝ is complete

In ℝ, every set that is bounded above has a supremum

z = 1.23751..., y = 1.23738...; 1.2375 < x < 1.2376; 1.2373 < y < 1.2374; y < x

The absolute value or modulus is equal to the number when it is positive and the opposite when it is negative; the absolute value or modulus exists in ℤ, ℚ, and ℝ

x ∈ ℝ, |x| := {x, x ≥ 0; -x, x < 0}

3.7 > 2.3; -3.7 < -2.3

The set ℝ of real numbers is complete because every set that is bounded above has supremum and every set that is bounded below has infimum.

∀ a ∈ A and ∀ b ∈ B, a < b; a ≤ sup(A) ≤ inf(B) ≤ b; the sets A and B are separated by the interval [sup(A),inf(B)]

Two sets A and B are separate if ∀ a ∈ A and ∀ b ∈ B, a < b, and this implies that sup(A) ≤ inf(B)

Two separate sets A and B are contiguous if sup(A) = inf(B)

0 ≤ inf(B)-sup(A) ≤ b-a

The notion of contiguity between sets can be used to define π, the ratio between the length of a circumference and the length of its diameter or, in equivalent form, the length of the semicircle of radius 1

Archimedes c. 240 BC, 3+10/71 < π < 3+1/7; J.H. Lambert 1767, π is irrational; F. Lindemann 1882, π is transcendental

An irrational number cannot be represented by a fraction and its decimal representation is unlimited and not periodic

A transcendental number is an irrational number that cannot be obtained as a solution of an algebraic equation

x = x₀.c₁c₂...c_n..., x₀ ∈ ℕ; x_n = x₀.c₁c₂...c_n; x^'_n = x₀.c₁c₂...c_n+1/10ⁿ; x_n ≤ x ≤ x^'_n; x has an uncertainty of 1/10ⁿ

7 - INEQUALITY

|x+y| ≤ |x|+|y|; it is the triangle inequality

|x⋅y| = |x|⋅|y|

The geometric mean of two non-negative numbers does not exceed the arithmetic mean: √(xy) ≤ (x+y)/2

If x is a real number of any sign, the square root of its square is the absolute value of x: √(x²) = |x|

Considering a right triangle inscribed in a semicircle of radius r, where x and y are the projections of the cathets on the hypotenuse; r = (x+y)/2; an Euclid's theorem states that in a right triangle the height is the proportional mean between the projections of the cathets on the hypotenuse that is x/h = h/y, xy = h², h = √(xy); h ≤ r, √(xy) ≤ (x+y)/2

√(xy) ≤ (x+y)/2; xy ≤ ((x+y)/2)², xy ≤ (x+y)²/4, 4xy ≤ x²+2xy+y², 0 ≤ x²-2xy+y², 0 ≤ (x-y)²; the square of a real number is always ≥ 0

The geometric mean of n ≥ 2 non-negative numbers does not exceed the arithmetic mean, ⁿ√(x₁x₂...x_n) ≤ (x₁+x₂+...+x_n)/n

If n ≥ 2 non-negative real numbers have n as their sum, their product does not exceed 1, (x₁+x₂+...+x_n = n) ⇒ x₁x₂...x_n ≤ 1; x₁+x₂ = 2, 1-α+1+α = 2, x₁x₂ = (1-α)(1+α) = 1-α² < 1

Geometric mean ≤ Arithmetic mean, G ≤ A; A = (x₁+x₂+...+x_n)/n, n = (x₁+x₂+...+x_n)/A = x₁/A+x₂/A+...+x_n/A; if a sum of n numbers equals n, then their product is ≤ 1, x₁x₂...x_n/Aⁿ ≤ 1, x₁x₂...x_n ≤ Aⁿ; G = ⁿ√(x₁x₂...x_n) ≤ ⁿ√(Aⁿ) = A

Considering a = a⋅1⋅1⋅...⋅1 where the number 1 is repeated n-1 times and a > 1; 1 < ⁿ√(a) ≤ (a+n-1)/n = 1+(a-1)/n, with this formula we can overestimate the nth root of a number; 1 < ³√(1.2) < 1+0.2/3 = 1+2/30

Bernoulli's inequality: (1+x)ⁿ ≥ 1+nx, n ∈ ℕ, x ∈ ℝ, x > -1

First demonstration of the Bernoulli's inequality: x₁ = 1+nx, x₂ = 1, x_n = 1, ⁿ√(1+nx) ≤ (1+nx+n-1)/n, ⁿ√(1+nx) ≤ 1+x, 1+nx ≤ (1+x)ⁿ

Second demonstration of the Bernoulli's inequality: base of induction, P(n=0) (1+x)⁰ ≥ 1+0x, 1 ≥ 1 true; inductive step, (1+x)ⁿ⁺¹ ≥ (1+x)(1+nx) ≥ 1+x+nx+nx² > 1+(n+1)x

8 - REAL FUNCTIONS AND SEQUENCES

A function is a binary relation between two sets that associates each element of the first set to exactly one element of the second set; typical examples are functions from integers to integers, or from the real numbers to real numbers; a function is a process or a relation that associates each element x of a set X, the domain of the function or definition set, to a single element y of the set Y, the codomain of the function or image of the function; if the function is called f, this relation is denoted by y = f(x), where the element x is the argument or input of the function, and y is the value of the function, the output, or the image of x by f; the symbol that is used for representing the input is the variable of the function, for example f is a function of the variable x; a function is uniquely represented by the set of all pairs (x, f(x)), called the graph of the function

f: A → ℝ, A ⊆ ℝ, x → f → f(x); each number x of the set A is associated with a real number

Domain of the function = definition set = A = dom f

Codomain of the function = set that contains the output values of the function, usually it is ℝ

Image of the function = set of the output values of the function = f(A) = im(f)

f(x) = x²+2x+3, A = ℝ, A = (-∞,+∞)

f(x) = √(x), A = {x ∈ ℝ, x ≥ 0}, A = [0,+∞)

f(x) = √(x(1-x)), y = x-x², A = {x ∈ ℝ, 0 ≤ x ≤ 1}, A = [0,1]

f(x) = 1/(x²-1), A = ℝ \ {-1,1}, A = (-∞,-1) ∪ (-1,1) ∪ (1,+∞)

A real interval is a set of real numbers that contains all real numbers lying between any two numbers of the set

[a,b] := {x ∈ ℝ, a ≤ x ≤ b}, bounded and closed interval; a is the left bound or infimum and b is the right bound or supremum, inf(A) = a, sup(A) = b; a is the minimum and b is the maximum, min(A) = a, max(A) = b

(a,b] := {x ∈ ℝ, a < x ≤ b}, bounded and left-open and right-closed interval; a is the left bound or infimum and b is the right bound or supremum, inf(A) = a, sup(A) = b; there is no minimum and b is the maximum, max(A) = b

[a,b) := {x ∈ ℝ, a ≤ x < b}, bounded and left-closed and right-open interval; a is the left bound or infimum and b is the right bound or supremum, inf(A) = a, sup(A) = b; a is the minimum and there is no maximum, min(A) = a

(a,b) := {x ∈ ℝ, a < x < b}, bounded and open interval; a is the left bound or infimum and b is the right bound or supremum, inf(A) = a, sup(A) = b; there are no minimum and maximum

[a,+∞) := {x ∈ ℝ, a ≤ x}, unbounded above and closed interval; a is the left bound or infimum and the right bound or supremum is +∞ and there are no majorants, inf(A) = a, sup(A) = +∞; a is the minimum and there is no maximum, min(A) = a

(a,+∞) := {x ∈ ℝ, a < x}, unbounded above and open interval; a is the left bound or infimum and the right bound or supremum is +∞ and there are no majorants, inf(A) = a, sup(A) = +∞; there are no minimum and maximum

(-∞,b] := {x ∈ ℝ, x ≤ b}, unbounded below and closed interval; the left bound or infimum is -∞ and the right bound or supremum is b and there are no minorants, inf(A) = -∞, sup(A) = b; there is no minimum and b is the maximum, max(A) = b

(-∞,b) := {x ∈ ℝ, x < b}, unbounded below and open interval; the left bound or infimum is -∞ and the right bound or supremum is b and there are no minorants, inf(A) = -∞, sup(A) = b; there are no minimum and maximum

In each interval, the connection property is valid: if x₁ and x₂ ∈ A, and x₁ < x₂, and x₁ < x < x₂, then x ∈ A

A sequence is a function defined in the set ℕ of natural numbers or in the set ℕ^* of natural numbers greater than 0; n ∈ ℕ → f → f(n) ∈ ℝ

n ∈ ℕ^* → 1+(1/n) = (n+1)/n

a₀,a₁,a₂,...,a_n; (a_n)_n∈ℕ

An arithmetic progression or arithmetic sequence is a sequence of numbers such that the difference between the consecutive terms is constant; an arithmetic progression is a sequence in which, given an initial term, each term is obtained from the previous one by adding a constant

Arithmetic progression: d = a_n+1-a_n, d is a difference and d ≠ 0, a_n+1 := a_n + d; a₀, a₁ = a₀ + d, a₂ = a₁ + d = a₀ + 2d, a_n = a₀ + nd, a_n = a₁ + (n-1)d, a_n = a_m + (n-m)d

A finite portion of an arithmetic progression is called a finite arithmetic progression and sometimes just called an arithmetic progression; the sum of a finite arithmetic progression is called an arithmetic series

A geometric progression or geometric sequence is a sequence of non-zero numbers where each term after the first is found by multiplying the previous one by a fixed, non-zero number called the common ratio; in a geometric progression, each term is obtained from the previous one by multiplying by a constant

Geometric progression: r = a_n+1/a_n, r is a ratio and r ≠ 1, a_n+1 := a_n⋅r; a₀, a₁ = a₀⋅r, a₂ = a₁⋅r = a₀⋅r², a_n = a₀⋅rⁿ, a_n = a₁⋅r^n-1

A geometric series is the sum of the numbers in a geometric progression

The distinction between a progression and a series is that a progression is a sequence, whereas a series is a sum

Formula to calculate the sum of n terms in geometric progression with initial term = 1 and ratio = r; 1+r+...+r^n-1 = ((1-r)/(1-r))(1+r+...+r^n-1) = (1+r+...+r^n-1-r-r²-r^n-1-rⁿ)/(1-r) = (1-rⁿ)/(1-r); if 0 < r < 1, 1+r+...+r^n-1 = (1-rⁿ)/(1-r) < 1/(1-r)

Sum of n terms in geometric progression with initial term = 1 and ratio = 1/2; 1+1/2+...+(1/2)^n-1 < 2

It is interesting to understand if there is a trend value for a succession that is a number to which the terms of the sequence are close for large indices

In a constant sequence the trend value is obvious

(-1)ⁿ = 1,-1,1,-1,1,-1,...; when n is even it is 1, when n is odd it is -1; there is no trend value

1/n, n ∈ ℕ^*; n = 1, 1/n = 1; n = 2, 1/n = 0.5; n = 3, 1/n = 0.3; n = 4, 1/n = 0.25; n = 5, 1/n = 0.2; n = 10, 1/n = 0.1; n=11, 1/11 = 0.09; n = 100, 1/100 = 0.01; the trend value is 0; the k-th decimal digit reaches the value 0 when n > 10^k

ε > 0, n_ε ∀ n > n_ε, |(1/n)-0| = 1/n < ε, n > 1/ε, n > n_ε ≥ 1/ε; the set of natural numbers is unbounded above, so for every ε there is an n > n_ε

n ∈ ℕ^*, n ↦ ⁿ√(a), a > 1; ¹√(3) = 3, ²√(3) = 1.73205080757, ³√(3) = 1.44224957031, ⁴√(3) = 1.31607401295, ⁵√(3) = 1.24573093962, ⁶√(3) = 1.20093695518, ⁷√(3) = 1.16993081276, ⁸√(3) = 1.14720269044, ⁹√(3) = 1.12983096391, ¹⁰√(3) = 1.11612317403, ¹⁰⁰√(3) = 1.01104669194, ¹⁰⁰⁰√(3) = 1.00109921598, ¹⁰⁰⁰⁰√(3) = 1.00010986726, ¹⁰⁰⁰⁰⁰√(3) = 1.00001098618, ^1000000√(3) = 1.00000109861

a > 1, a^1/2 > a^1/3; (a^1/2)⁶ > (a^1/3)⁶, a^6/2 > a^6/3, a³ > a²; aⁿ⁺¹ > aⁿ

ⁿ√(a) = a^1/n

²√(a) = a^1/2; ²√(²√(a)) = ⁴√(a) = a^1/4

9 - LIMIT OF SEQUENCES - PART 1

n ↦ c/n, ∀ ε > 0, 0 < c/n < ε, n/c > 1/ε, n > c/ε, n > n_ε ≥ c/ε, lim_n→∞(c/n) = 0

1 < ⁿ√(a) < 1+(a-1)/n; 1 < ⁿ√(a) < 1+c/n; |ⁿ√(a)-1| = ⁿ√(a)-1 < c/n

a_n → L, lim_n→∞(a_n) = L; ∀ ε > 0, ∃ n_ε : ∀ n > n_ε ⇒ |a_n-L| < ε; -ε < a_n-L < ε, L-ε < a_n < L+ε

Definition of limit of a sequence: ∀ ε > 0, ∃ n_ε : n > n_ε ⇒ |a_n-L| < ε

Neighbourhood of center L and radius ε: I(L;ε) = (L-ε,L+ε)

Increasing sequence: ∀ n : a_n ≤ a_n+1

Decreasing sequence: ∀ n : a_n ≥ a_n+1

Strictly increasing sequence: ∀ n : a_n < a_n+1

Strictly decreasing sequence: ∀ n : a_n > a_n+1

Constant sequence: ∀ n : a_n = a_n+1

n ↦ c/n, is a strictly decreasing sequence

n ↦ ⁿ√(a), for a > 1 is a strictly decreasing sequence that converges to the limit 1, for a = 1 is a constant sequence with the constant value 1, for 0 < a < 1 is a strictly increasing sequence that converges to the limit 1

Increasing and decreasing sequences are also called monotone sequences

Increasing and decreasing sequences, also called monotonous, always have a limit and for this reason they are called regular

Every increasing sequence, bounded above, has a limit that is its supremum

If the increasing sequence is not bounded above, the limit is the supremum that is +∞

An increasing monotone sequence has a limit that is its supremum

Every decreasing sequence, bounded below, has a limit that is its infimum

If the decreasing sequence is not bounded below, the limit is the infimum that is -∞

A decreasing monotone sequence has a limit that is its infimum

Supremum of a sequence means supremum of the image of the sequence that is the set of values resulting from the sequence

Demonstration of an increasing sequence bounded above: a_n ≤ a_n+1; ∀ n, a_n ≤ sup(a_n) := sup{a_n, n ∈ ℕ}; sup(a_n) = L; L-ε < a_{n_ε} ≤ a_n, L-ε < a_n ≤ L < L+ε; this demonstration is the verification of the definition of limit

A sequence whose limit is +∞ or -∞ is called divergent; a divergent sequence diverges to +∞ or diverges to -∞; if a divergent sequence diverges to +∞, it diverges positively; if a divergent sequence diverges to -∞, it diverges negatively

Definition of a sequence whose limit is +∞: lim_n→+∞(a_n) = +∞; ∀ M > 0, ∃ n_M : ∀ n > n_M ⇒ a_n > M

Definition of a sequence whose limit is -∞: lim_n→+∞(a_n) = -∞; ∀ m < 0, ∃ n_m : ∀ n < n_m ⇒ a_n < m

Geometric progression, or geometric sequence, of first element 1 and reason a; if a is 1 it is a constant sequence; prove that if a > 1 then the sequence diverges to + ∞; 1, a, a², a³, ...; aⁿ = (1+d)ⁿ ≥ 1+nd > M, nd > M-1, n > (M-1)/d, n > n_M ≥ (M-1)/d

Considering the geometric progression aⁿ, if 0 < a < 1, lim_n→+∞(aⁿ) = 0; (1/2)ⁿ = 1/2ⁿ, 2ⁿ diverges to +∞, so 1/2ⁿ converges to 0

n ↦ ⁿ√(n) = n^1/n, n ∈ ℕ; n = 1, 1; n = 2, ²√(2) = 1.41421356237; n = 3, ³√(3) = 1.44224957031; n = 4, ⁴√(4) = 1.41421356237; n = 5, ⁵√(5) = 1.37972966146; n = 6, ⁶√(6) = 1.3480061546; n = 7, ⁷√(7) = 1.32046924776; n = 8, ⁸√(8) = 1.29683955465; n = 9, ⁹√(9) = 1.27651800701; n = 10, ¹⁰√(10) = 1.25892541179; n = 100, ¹⁰√(100) = 1.04712854805; n = 1000, ¹⁰⁰⁰√(1000) = 1.00693166885; n = 10000, ¹⁰⁰⁰⁰√(10000) = 1.00092145832; n = 100000, ¹⁰⁰⁰⁰⁰√(100000) = 1.00011513588; n = 1000000, ^1000000√(1000000) = 1.00001381561

To compare two powers with different base and rational exponent, the exponents must be multiplied by their least common multiple

2^1/2, 3^1/3; (2^1/2)⁶, (3^1/3)⁶; 2^6/2, 3^6/3; 2³ = 8, 3³ = 9; 8 < 9 therefore 2^1/2 < 3^1/3

3^1/3, 4^1/4; (3^1/3)¹², (4^1/4)¹²; 3^12/3, 4^12/4; 3⁴ = 81, 4³ = 64, 81 > 64, therefore 3^1/3 > 4^1/4

The sequence ⁿ√(n) tends to 1 and it is a decreasing monotone sequence from the third term onwards

Demonstration that the sequence ⁿ√(n) tends to 1: n = √(n)⋅√(n)⋅1⋅...⋅1, 1 is repeated n-2 times; 1 ≤ ⁿ√(n) = ⁿ√(√(n)⋅√(n)⋅1⋅...⋅1) ≤ (2√(n)+n-2)/n < 1+(2/√(n)); 1 ≤ ⁿ√(n) < 1+(2/√(n)); 0 ≤ ⁿ√(n)-1 < 2/√(n) < ε; 4/n < ε², n/4 > 1/ε², n > 4/ε², n_ε > 4/ε²

Demonstration that the sequence ⁿ√(n) is a decreasing monotone sequence from the third term onwards: n ≥ 3, n^1/n > (n+1)^1/(n+1); (n^1/n)ⁿ⁽ⁿ⁺¹⁾ > ((n+1)^1/(n+1))ⁿ⁽ⁿ⁺¹⁾, n⋅nⁿ = nⁿ⁺¹ > (n+1)ⁿ, n > (n+1)ⁿ/nⁿ, n > (1+1/n)ⁿ; (1+1/n)ⁿ < n for n ≥ 3; (1+1/n)ⁿ < 3 ≤ n; considering the binomial formula (a+b)ⁿ = ⁿΣ_k=0(C(n,k)a^n-kb^k), and C(n,k) = n!/k!(n-k)!; (1+1/n)ⁿ = ⁿΣ_k=0(C(n,k)1^n-k(1/n)^k) = ⁿΣ_k=0(C(n,k)1/n^k) = ⁿΣ_k=0(((n(n-1)...(n-k+1))/n^k)(1/k!)) < ⁿΣ_k=0(1/k!) = 1+1+1/2!+1/3!+...+1/n! < 1+1+1/2+1/2²+...+1/2^n-1; 1+1/2+1/2²+...+1/2^n-1 is the sum a geometric progression with first element = 1 and ratio = 1/2, and this sum is < 2, so 1 + this sum < 3

10 - LIMIT OF SEQUENCES - PART 2

To be monotonic is a sufficient condition for the existence of the limit, but it is not a necessary condition; a monotonic sequence has always a limit, but non-monotonic sequences can also have a limit

Fibonacci numbers: F₀ := 0, F₁ := 1, F_n+2 := F_n + F_n+1; F₀ := 0, F₁ := 1, F_n := F_n-1 + F_n-2, n > 1; n = 0, F_n = 0; n = 1, F_n = 1; n = 2, F_n = 1; n = 3, F_n = 2; n = 4, F_n = 3; n = 5, F_n = 5; n = 6, F_n = 8; n = 7, F_n = 13; n = 8, F_n = 21; n = 9, F_n = 34; n = 10, F_n = 55

Studying the ratio that each Fibonacci number has with the previous one: r_n := F_n+1/F_n, n ≥ 1; n = 1, r_n 1/1 = 1; n = 2, r_n 2/1 = 2; n = 3, r_n 3/2 = 1.5; n = 4, r_n 5/3 = 1.6; n = 5, r_n = 8/5 = 1.6; r_n+1 = F_n+2/F_n+1 = (F_n+1+F_n)/F_n+1 = 1+1/r_n; r_{1_{:= 1, r₂ = 2; r_n+1 := 1+1/r_n; 1 ≤ r₁ < r₂, 1/r₁ > 1/r₂, 1+1/r₁ > 1+1/r₂ ⇔ r₂ > r₃; r_n+1-r_n = (1+1/r_n)-(1+1/r_n-1) = 1/r_n-1/r_n-1 = (r_n-1-r_n)/r_nr_n-1 = -(r_n-r_n-1)/r_nr_n-1; |r_n+1-r_n| = |r_n-r_n-1|/r_nr_n-1, r_n ≥ 3/2, |r_n+1-r_n| ≤ |r_n-r_n-1|/2; this shows that this sequence converges to a limit value; lim_n→+∞(r_n) = L, r_n+1 = 1+1/r_n ⇒ L = 1+1/L, L = (L+1)/L, L²-L-1 = 0, using the quadratic formula x = (-b±√(b²-4ac))/2a, L = (1±√(1+4))/2, L = (1±√(5))/2, L = (1+√(5))/2; (1+√(5))/2 is an irrational number called golden ratio that is 1.61803398875...; this shows that a non-monotonic sequence can be convergent}}

Sequence studied by the Swiss mathematician Leonhard Euler (1707, 1783): a_n := (1+1/n)ⁿ < 3, this monotonic increasing sequence in bounded above by the number 3, the limit is < 3, and the limit is the supremum; b_n := (1+1/n)ⁿ⁺¹ = a_n(1+1/n) = a_n((n+1)/n), b_n > a_n; b_n is a monotonic decreasing sequence and the difference between b_n and a_n tends to 0; the sequences a_n and b_n converge to a limit called Euler's number, e = 2.7182818284590...; a_n is a monotonic increasing sequence because a_n < a_n+1, considering n+1 factors 1⋅(1+1/n)⋅...⋅(1+1/n) where (1+1/n) is repeated n times, ⁿ⁺¹√((1+1/n)ⁿ) < (n+1+1)/(n+1) = 1+1/(n+1), (ⁿ⁺¹√((1+1/n)ⁿ))ⁿ⁺¹ < (1+1/(n+1))ⁿ⁺¹, (1+1/n)ⁿ < (1+1/(n+1))ⁿ⁺¹, a_n < a_n+1; b_n = a_n((n+1)/n), a_n = b_n(n/(n+1)), b_n-a_n = b_n-b_n(n/(n+1)) = b_n(1-(n/(n+1))) = b_n((n+1-n)/(n+1)) = b_n(1/(n+1)), b_n is a monotonic decreasing sequence and all terms are smaller than the first that is 4, b_n-a_n ≤ 4/(n+1) < 4/n, this sequence tends to 0

lim_n→+∞((1+1/n)ⁿ) = lim_n→+∞((1+1/n)ⁿ⁺¹) = e; e is the Euler's number and it is an irrational number; e = 2.7182818284590...

Considering the exponential function a^x, for any value of a, the graph always passes through the point x = 0 and y = 1; the tangent to the graph at the point x = 0 and y = 1 has an angular coefficient 1 only when a = e, therefore in this case the exponential function is e^x

11 - LIMIT OF FUNCTIONS

f(x), f: A → ℝ, lim_x→x₀(f(x)) = L

The function does not need to be defined at point x₀; the value of the function at x₀ is not important, but it is important that the function is defined close to x₀

A neighborhood of a real number is an open interval centered on the number itself; I(x₀,r) := (x₀-r,x₀+r)

A real number is the accumulation point of a set A if every neighborhood of this number contains infinite elements of A; if x₀ is the accumulation point of A, then ∀ r > 0, I(x₀,r)⋂A, this intersection contains infinite elements

A finite set has no accumulation points

The set ℕ of natural numbers is infinite but has no accumulation points; +∞ can be thought of as the only accumulation point of ℕ

A = (0,1]; 0 < x ≤ 1; points < 0 and > 1 are not accumulation points; the point 0 does not belong to A, but it is an accumulation point because the itersection I(x₀,r)⋂A contains infinite elements; all the points of the interval A are accumulation points, even 0 that does not belong to A

An accumulation point may or may not be a point of A

When an accumulation point belongs to the set, it is said to be a non-isolated point of the set

f(x) = x², A = ℝ; x₀ = 1, f(x₀) = 1; r(x) := (x²-1)/(x-1), A' = ℝ \ {1}, 1 does not belong to the definition set but is an accumulation point; as x approaches 1, r(x) is the angular coefficient of the tangent to the point (1,1), and is the derivative of f(x) calculated for x = 1; r(x) := (x²-1)/(x-1) = (x-1)(x+1)/(x-1) = x+1 := r*(x); x = 1, r*(x) = x+1 = 2, and for x ≠ 1 r(x) = r*(x), |r(x)-2| = |x+1-2| = |x-1| < ε

The limit of the angular coefficient of the secant that is the angular coefficient of the tangent, measures the instantaneous rate of change of the function x

|f(x)-L| < ε, 0 < |x-x₀| < δ_ε

Definition of limit of a function: ∀ ε > 0, ∃ δ_ε : x ∈ (A\{x₀})⋂I(x₀,δ_ε) ⇒ |f(x)-L| < ε

lim_x→x₀(f(x)) = L, the function converges to the limit L when x approaches x₀ that is an accumulation point of the set A where the function is defined

|f(x)-L| < ε ⇔ f(x) ∈ I(L,ε) ⇔ L-ε < f(x) < L+ε

0 < ε ≤ ε₀, δ_ε > 0

lim_x→x₀(f(x)) = L; |f(x)-f(x₀)| < ε, |x-x₀| < δ_ε

The function is continuous if, in a non-isolated point, the limit of the function coincides with the value of the function

All polynomial functions are continuous functions

f(x) = c, it is a continuous function, |f(x)-f(x₀)| = 0

f(x) = x, it is a continuous function, |f(x)-f(x₀)| = |x-x₀|, δ_ε = ε

f(x) = 2x, it is a continuous function, |f(x)-f(x₀)| = |2x-2x₀| = 2|x-x₀| < ε, |x-x₀| < ε/2 = δ_ε

f(x) = mx, it is a continuous function, |f(x)-f(x₀)| = |mx-mx₀| = m|x-x₀| < ε, |x-x₀| < ε/m = δ_ε

If a function is continuous then lim_x→x₀(f(x)) = f(x₀)

Continuous functions: all polynomial functions, exponential functions a^x with a > 0 like e^x, logarithm functions log(x) and ln(x), sin(x), cos(x), tan(x) where it is defined, cot(x) where it is defined; these are called elementary functions

Example of discontinuous function: f(x) = [x], [0,1]; [x] represents the integer part; [x] = max {z ∈ ℤ; z ≤ x}; f(1) = 1, lim_x→1(f(x)) = 0

12 - EXTENSION OF THE CONCEPT OF LIMIT

f: A → ℝ, L ∈ ℝ, x → x₀, lim_x→x₀(f(x)) = L

∀ ε > 0, ∃ δ_ε > 0: x ∈ (A\{x₀})⋂I(x₀,δ_ε) ⇒ |f(x)-L| < ε, f(x) ∈ I(L,ε)

Convergent function to the limit L, for x that tends to an accumulation point of domain A: ∀ ε > 0, ∃ δ_ε > 0: x ∈ (A\{x₀})⋂I(x₀,δ_ε) ⇒ |f(x)-L| < ε

A*(x₀,δ) := (A\{x₀})⋂I(x₀,δ); A(x₀,δ) := A⋂I(x₀,δ); I(L,ε), f(A*(x₀,δ)) ⊆ I(L,ε); the definition of a continuous function is: I(f(x₀),ε), f(A(x₀,δ)) ⊆ I(f(x₀),ε)

When the function is continuous at the point x₀, x₀ belongs to the definition set and is an accumulation point, so x₀ is a non-isolated point; the function is continuous when the limit coincides with f(x₀)

The concept of a function converging to an accumulation point is related to the concept of a continuous function at the same point

r(x) := (x²-1)/(x-1), r*(x) := x+1; f*(x) := {f(x), x ≠ x₀; L, x = x₀}; if f tends to L at x₀, then f* is continuous at x₀

f(x) = mx+q, m ≠ 0; f(x₀) = mx₀+q, f(x)-f(x₀) = m(x-x₀), |f(x)-f(x₀)| = |m||x-x₀| < ε, |x-x₀| < ε/|m| =: δ_ε; ε = f(x)-f(x₀), δ_ε = x-x₀, |m| = ε/δ_ε, δ_ε = ε/|m|

f(x) = √(x), A = [0,+∞], this function is continuous in all points of the domain; y = √(x), y² = x, 0 ≤ x₁ < x₂, √(x₂)-√(x₁); 0 ≤ x₁ < x₂ ⇒ 0 ≤ √(x₂)-√(x₁) ≤ √(x₂-x₁), x₂+x₁-2√(x₁x₂) ≤ x₂-x₁, 2x₁ ≤ 2√(x₁x₂), x₁ ≤ √(x₁x₂), x₁² ≤ x₁x₂, x₁ ≤ x₂; verifying the continuity of the function, x₀ < x, 0 < f(x)-f(x₀) = √(x)-√(x₀) ≤ √(x-x₀) < ε, x-x₀ < ε² =: δ_ε; note that if 0 < ε < 1, then ε² < ε

f(x) = 1/x², x ≠ 0; it is an even function or f(x) = f(-x); the graph is symmetrical with respect to the y-axis; ∀ M > 0, f(x) > M ⇔ 1/x² > M ⇔ x² < 1/M, |x| < 1/√(M) := δ_M, for 0 < |x| < δ_M f(x) > M that is for x ∈ A*(x₀,δ_M) f(x) > M

Positively diverging function for x tending to an accumulation point of domain A: ∀ M > 0, ∃ δ_M > 0: x ∈ (A\{x₀})⋂I(x₀,δ_M) ⇒ f(x) > M

Negatively diverging function for x tending to an accumulation point of domain A: ∀ m < 0, ∃ δ_m > 0: x ∈ (A\{x₀})⋂I(x₀,δ_m) ⇒ f(x) < m

Considering a function f defined in a set A unbounded above that is A has no majorants and sup(A) = +∞, lim_x→+∞(f(x)) = L; ∀ ε > 0, ∃ δ_ε: x ∈ A⋂(δ_ε,+∞) ⇒ |f(x)-L| < ε

f(x) = 1/x, A = ℝ* = ℝ\{0}; L = 0, ∀ ε > 0, ∃ δ_ε: x > δ_ε, 0 < 1/x < ε, x > 1/ε := δ_ε

Convergent function to the limit L as x tends to +∞: ∀ ε > 0, ∃ δ_ε > 0: x ∈ A⋂(δ_ε,+∞) ⇒ |f(x)-L| < ε

lim_x→0(1/x) = ∄

If a function has a limit, it is unique; it is impossible for the values assumed by a function to be in two different disjoint neighborhoods at the same time, considering for example L₁ < L₂ and ε < (L₂-L₁)/2

A function cannot be convergent and divergent at the same time, or simultaneously positively divergent and negatively divergent

The notions of limit on the right and limit on the left are related to the presence of an order relation in ℝ

f: A → ℝ, x₀ is the accumulation point; A⁺(x₀) := {x ∈ A, x > x₀}, A^-(x₀) := {x ∈ A, x < x₀}; lim_x→x₀⁺(f(x)) = L⁺; lim_x→x₀^-(f(x)) = L^-

If the limit on the right and the limit on the left exist and coincide, their common value is also the limit, and vice versa

f(x) := [x] = max {z ∈ ℤ, z ≤ x}; x₀ = 1, lim_x→1⁺([x]) = 1 = [1], lim_x→1^-([x]) = 0; the limit on the left is different from the limit on the right, the limit on the right coincides with the value of the function at the point, the function is continuous on the right and discontinuous on the left

13 - LIMIT THEOREMS - PART 1

The limit of the sum is equal to the sum of the limits

f,g: A → ℝ, x₀ is the accumulation point; lim_x→x₀(f(x)) = α, lim_x→x₀(g(x)) = β; lim_x→x₀(f(x)+g(x)) = α+β; the sum of two continuous functions is a continuous function; ∀ ε > 0, ∃ δ₁ ∀ x ∈ A*(x₀,δ₁), |f(x)-α| < ε; ∀ ε > 0, ∃ δ₂ ∀ x ∈ A*(x₀,δ₂), |g(x)-β| < ε; δ = min(δ₁,δ₂), x ∈ A*(x₀,δ), |f(x)+g(x)-(α+β)| = |f(x)-α+g(x)-β| ≤ |f(x)-α|+|g(x)-β| < 2ε

If a function f(x) is convergent, as x tends to an accumulation point of its domain, it is bounded in a neighborhood of the same point

A function is regular when it has limit and can be convergent or divergent

A function is irregular when it has no limit

If a function is convergent, in A*(x₀,δ) it is bounded that is with majorant and minorant

Remembering the triangle inequality |x+y| ≤ |x|+|y|, and |x⋅y| = |x|⋅|y|

f(x) → α, g(x) → β; A*(x₀,δ), β-ε < g(x) < β+ε, |g(x)| = |g(x)-β+β| ≤ ε+|β| ≤ |β|+1 = c

|f(x)g(x)-αβ| = |f(x)g(x)-αg(x)+αg(x)-αβ| = |g(x)(f(x)-α)+α(g(x)-β)| ≤ |g(x)||f(x)-α|+|α||g(x)-β| ≤ cε+|α|ε = (c+|α|)ε

The limit of the product is equal to the product of the limits

Sums and products of continuous functions are still continuous functions; therefore polynomial functions are continuous functions

Every monomial function is continuous, so polynomial functions are continuous

x → c, it is the constant function and it is a continuous function

x → x, it is the identity function and it is a continuous function

x → cxⁿ, it is a monomial function and it is a continuous function

A dividing rational expression is a ratio between polynomials and is defined at all points x that do not cancel the denominator

Property of permanence of the sign: if a function f(x) converges to a limit other than 0, when x tends to an accumulation point of the domain, it maintains the same sign as the limit in a suitable neighborhood of the point itself

f(x)/g(x); β ≠ 0, β > 0, 0 < β/2 ≤ β-ε < g(x) < β+ε, 0 < ε ≤ β/2

The limit of the quotient is equal to the quotient of the limits, but the denominator function must tend to a limit other than 0

The reciprocal function of g(x) is 1/g(x)

1/g(x) → 1/β, β > 0; |1/g(x)-1/β| = |(β-g(x))/βg(x)| < 2ε/β² = (2/β²)ε

f(x)/g(x) = f(x)(1/g(x)) → α/β

Polinomyal functions are continuous

Rational functions are continuous in all points where they are defined that is in all points where the denominator is non-zero

lim_x→x₀(f(x)) = 0, f(x) > 0, ∀ x ∈ A*(x₀,δ), lim_x→x₀(1/f(x)) = +∞; ∀ M > 0, 1/f(x) > M ⇔ 0 < f(x) < 1/M := ε

lim_x→x₀(f(x)) = 0, f(x) < 0, ∀ x ∈ A*(x₀,δ), lim_x→x₀(1/f(x)) = -∞

lim_x→0(x²) = 0, x² > 0 for x ≠ 0, lim_x→0(1/x²) = +∞

lim_x→0(1/x) = +∞ for x > 0; lim_x→0(1/x) = -∞ for x < 0

lim_x→x₀(|f(x)|) = +∞ ⇒ lim_x→x₀(1/f(x)) = 0; |1/f(x)| = 1/|f(x)| < ε, |f(x)| > 1/ε := M

a > 1, lim_n→+∞(aⁿ) = +∞; a = 1+d, aⁿ = (1+d)ⁿ ≥ 1+nd, 1+nd diverges to +∞ and therefore also aⁿ; n < 0, n = -|n|, lim_n→-∞(aⁿ) = lim_|n|→+∞(a^-|n|) = lim_k→+∞(1/a^k) = 0

A function is even when f(-x) = f(x), and is symmetrical to the y-axis

A function id odd when f(-x) = -f(x), and is symmetrical to the origin of the x and y axes

tg(x) = tan(x) = sin(x)/cos(x), cos(x) ≠ 0; cos(x) is an even function because cos(-x) = cos(x), and cos(x) is symmetrical to the y-axis; cos(x) = 0 for x = π/2 and -π/2; cos(x) = 0 for π/2+kπ, k ∈ ℤ; considering tan(x) for -π/2 < x < π/2; tan(x) is an odd function because tan(-x) = -tan(x), and tan(x) is symmetrical to the origin of the x and y axes; lim_x→(π/2)^-(tan(x)) = +∞; lim_{x→(-π/2)⁺}(tan(x)) = -∞; tan(x) = sin(x)/cos(x) = 1/cos(x)/sin(x) → +∞, for A^-(π/2,δ); tan(x) = sin(x)/cos(x) = 1/cos(x)/sin(x) → -∞, for A⁺(-π/2,δ)

14 - LIMIT THEOREMS - PART 2

f₁: A₁ → ℝ, f₂: A₂ → ℝ, f₁(A₁) ⊆ A₂; f₁ is defined in A₁ and f₂ is defined in A₂ and the image of A₁, obtained through f₁, is contained in the set A₂ that is the domain of the function f₂; x ∈ A₁ → f₁ → f₁(x) := y → f₂ → f₂(f₁(x)); x ↦ f₂(f₁(x)) = (f₂∘f₁)(x), the order of the functions is important, f1 acts before f2; the commutative property does not apply to the composition of functions

f₁: x → x+1, f₂: x → 2x; x → f₁ → x+1 → f₂ → 2(x+1) = 2x+2; x → f₂ → 2x → f₁ → 2x+1; f₂(f₁(x)) ≠ f₁(f₂(x))

By composing continuous functions we obtain continuous functions

f₁: A₁ → ℝ, f₂: A₂ → ℝ, f₁(A₁) ⊆ A₂; f₁(x₀) = y₀, f₂(y₀) = f₂(f₁(x₀)); x₀ → f₁ → f₁(x₀) := y₀ → f₂ → f₂(f₁(x₀)); there is a positive δ such that the image of A₁(x₀,δ) through f₂ after f₁ is contained in the neighborhood of ε radius of the point f₂(f₁(x₀)), and this is the continuity of the compound function

x ↦ √(x²+x+1); f₁ = x²+x+1 := y, f₂ = √(y); x ↦ f₁ ↦ x²+x+1 := y ↦ f₂ ↦ √(y); f₁ and f₂ are continuous functions and the compound function f₂ after f₁ is also a continuous function; f₁ is continuous because polynomials functions are continuous; f₂ is continuous because the root function is continuous

The function f is minorant of g, in a set A, if f(x) is less than or equal to g(x) for every x of A

f,g: A → ℝ, f is minorant of g if f(x) ≤ g(x)

If the function f(x) is greater than or equal to the value c in a neighborhood of an accumulation point, the limit of f, if any, is greater than or equal to c

f(x) ≥ c, A*(x₀,δ); L < c is false because ε < c-L

If the function f(x) converges to the limit L > c, as x tends to an accumulation point of its domain, it is greater than c in a suitable neighborhood of the same point

f*(x) = f(x)-c, f* → L-c > 0; it is an extension of the theorem of sign permanence

Squeeze theorem or theorem of the two carabinieri: a function between two functions converging to the same limit, also converges to the same limit

f,g,h: A → ℝ, x₀, x ∈ A*(x₀,δ); f(x) ≤ g(x) ≤ h(x); if f(x) → L and h(x) → L, then g(x) → L; L-ε < f(x) → L ≤ g(x) ≤ h(x) → L < L+ε, for the squeeze theorem, or theorem of the two carabinieri, g(x) → L

If a function has a positively divergent function as minorant, then it is positively divergent

f(x) ≤ g(x), if lim_x→x₀(f(x)) = +∞ then lim_x→x₀(g(x)) = +∞

a > 1, lim_n→+∞(aⁿ) = +∞; aⁿ = (1+d)ⁿ ≥ 1+nd, 1+nd is a positively divergent function; aⁿ has a positively divergent function as minorant, so it is positively divergent

If a function has a negatively divergent function as majorant, then it is negatively divergent

f(x) ≤ g(x), if lim_x→x₀(g(x)) = -∞ then lim_x→x₀(f(x)) = -∞

a > 1, n ∈ ℕ*, 1 < ⁿ√(a) < 1+(a-1)/n; 1 → 1 < ⁿ√(a) < 1 → 1 + (a-1)/n → 0, for the squeeze theorem, or theorem of the two carabinieri, ⁿ√(a) → 1

With the squeeze theorem, or theorem of the two carabinieri, is also possible to prove that ⁿ√(n) → 1

The exponential function is continuous in ℝ

x ↦ a^x, x₀ ∈ ℝ, a > 1, x > x₀; a^x₁+x₂ = a^x₁a^x₂; 0 < a^x-a^x₀ = a^x-x₀+x₀-a^x₀ = a^x-x₀a^x₀-a^x₀ = a^x₀(a^x-x₀-1); ⁿ√(a) = a^1/n → 1; 0 < x-x₀ ≤ 1/n ⇔ n ≤ 1/(x-x_{0/sub>); considering the integer part n ≤ [1/(x-x₀)], n = n(x) = [1/(x-x₀)]; 0 < a^x₀(a^x-x₀-1) ≤ a^x₀(a^1/n-1) ≤ a^x₀(a-1)/n because 1 < ⁿ√(a) < 1+(a-1)/n, 0 < ⁿ√(a) - 1 < (a-1)/n; (a-1)/n → 0 and a^x → a^x₀; this is the demonstration of the continuity to the right of the function a^x}

a_n = (1+1/n)ⁿ, b_n = (1+1/n)ⁿ⁺¹; f(x) = (1+1/x)^x = ((x+1)/x)^x, x > 0 or x < -1; lim_x→±∞((1+1/x)^x) = e ≈ 2.71

15 - LIMIT THEOREMS - PART 3

x ↦ a^x, 0 < a ≠ 1; lim_x→+∞((1+1/x)^x) = e, lim_x→-∞((1+1/x)^x) = e; (x+1)/x, x > 0, x < -1

n = [x], n is the integer part of x or the smallest integer that does not exceed x; [x] ≤ x < [x]+1, n ≤ x ≤ n+1; (1+1/([x]+1))^[x] < (1+1/x)^x < (1+1/[x])^[x]+1; (1+1/(n+1))ⁿ < (1+1/x)^x < (1+1/n)ⁿ⁺¹; ((1+1/(n+1))ⁿ⁺¹)/(1+1/(n+1)) < (1+1/x)^x < (1+1/n)ⁿ⁺¹; ((1+1/(n+1))ⁿ⁺¹)/(1+1/(n+1)) → e < (1+1/x)^x < (1+1/n)ⁿ⁺¹ → e, so lim_x→+∞((1+1/x)^x) = e

Circular functions, sine and cosine, are continuous in ℝ

One radian is defined as the angle subtended from the center of a circle which intercepts an arc equal in length to the radius of the circle; the magnitude in radians of a subtended angle is equal to the ratio of the arc length to the radius of the circle; θ = a/r, where θ is the subtended angle in radians, a is arc length, and r is radius

The measure in radians is the ratio between the length of the arc and the length of the radius; an angle measures a radian when the length of the subtended arc is equal to the length of the radius; 1 radian in degrees = 180/π = 57.2957795131...°, it is an irrational number like π, 1 rad = 180/π ≈ 57°

|sin(x)| ≤ |x|; sin(x) is the length of the cathetus which is less than the length of the hypotenuse which is less than the length of the arc which is the supremum of the lengths of the inscribed polygons, therefore |sin(x)| < |x|

Using the prosthaphaeresis formula is possible to demonstrate the continuity of the function sin(x); sin(x)-sin(x₀) = 2sin((x-x₀)/2)cos((x+x₀)/2); |sin(x)-sin(x₀)| = 2|sin((x-x₀)/2)||cos((x+x₀)/2)|; |sin(x)-sin(x₀)| ≤ 2|(x-x₀)/2|⋅1 = |x-x₀|, δ_ε = ε

cos(x) = sin(x+π/2); x ↦ x+π/2 := t ↦ sin(t); using the prosthaphaeresis formula is possible to demonstrate the continuity of the function cos(x)

The trigonometric functions, or circular functions, are continuous functions

sin(x)/x tends to 1 as x tends to 0

lim_x→0(sin(x)/x) = 1, x ≠ 0; sin(x)/x is an even function because the ratio between two odd functions is an even function; an even function is symmetrical with respect to the y-axis and for this reason the limit on the right coincides with the limit on the left; sin(x) → 0, x → 0, so 0/0 is an indeterminate form; lim_x→0((1-cos(x))/x) = 0, 1-cos(x) → 0, x → 0; geometrically we know that 0 < sin(x) < x < tan(x), sin(x) < x < sin(x)/cos(x), sin(x)/sin(x) < x/sin(x) < sin(x)/(cos(x)sin(x)), 1 < x/sin(x) < 1/cos(x), 1 → 1 < x/sin(x) < 1/cos(x) → 1, for the squeeze theorem, or theorem of the two carabinieri, x/sin(x) → 1 and therefore sin(x)/x → 1; f(x) = {sin(x)/x, x ≠ 0; 1, x = 0}, in this way we obtain the continuous function on the whole real axis

A monotone sequence, increasing or decreasing, is regular and therefore has a limit; if the sequence is monotone increasing, it tends to its supremum, finite or +∞; if the sequence is monotone decreasing, it tends to its infimum, finite or -∞

If f(x) is an increasing monotone function in a set A unbounded above, then lim_x→+∞(f(x)) = sup(f(A))

If f(x) is a decreasing monotone function in set A unbounded below, then lim_x→+∞(f(x)) = inf(f(A))

a^x, a> 1, it is an increasing monotone function; this function is unbounded above because the sequence aⁿ diverges positively; the values assumed by aⁿ constitute a set which is unbounded above or without majorants; the function f(a^x) is devoid of majorants or its image is devoid of majorants; f(a^x) is monotone increasing and lim_x→+∞(a^x) = +∞; lim_n→-∞(aⁿ) = 0; the image of f(a^x) that is the set of values it assumes, is an unbounded above set that is the supremum is +∞, and the function consists of positive values, but lim_n→-∞(aⁿ) = 0, the infimum of the function or the infimum of the image of the function is zero and the function, being monotonous, tends to its infimum, so lim_x→-∞(a^x) = 0; a^x, a > 1, lim_x→+∞(a^x) = +∞, lim_x→-∞(a^x) = 0

Property of connection of the intervals: if x1,x2 ∈ I and x₁ < x₂, then x₁ < x < x₂ and x ∈ I

If a function is continuous in a closed bounded interval, the image that is the set of values it assumes, is bounded above and below that is it has majorants and minorants; f: [a,b] → ℝ and f is continuous ⇒ sup(f([a,b])) ∈ ℝ, we want to show that the function is bounded above that is the supremum is a real number; with a constructive proof we need to show the existence of a majorant; with a proof for absurdity, the hypothesis is the negation of the thesis and we need to show that it brings to a contradiction; we want to show that the function is bounded above, or that the supremum is a real number, so we suppose absurdly that the function is unbounded above; absurdly we consider sup(f([a,b])) = +∞, but f(x₀)-ε < f(x) < f(x₀)+ε, and if a function is bounded in a set A, it is bounded in a subset of A, so the function is bounded above and below; sup(f([a,b])) = E ∈ ℝ, inf f([a,b]) = e ∈ ℝ

If a function is continuous in a bounded and closed interval, the image is bounded above and below, the supremum is the maximum, and the infimum is the minimum

16 - PROPERTIES OF CONTINUOUS FUNCTIONS ON AN INTERVAL

A continuous function on a bounded and closed interval [a,b] is bounded

If the hypothesis is denied, the thesis is also denied; f(x) = 1/x, (0,1], the graph of the function is a branch of equilateral hyperbola, this function is unbounded above and bounded below

Weierstrass' theorem or extreme value theorem: a continuous function f on a bounded and closed interval [a,b] has a maximum and a minimum

Demonstration by absurdity of the Weierstrass' theorem: E := sup(f([a,b])), ∀ x, f(x) ≤ E, if absurdly f(x) < E, then E-f(x) > 0, g(x) := 1/(E-f(x)), g(x) is continuous and should be bounded above, instead it is unbounded above, considering that the supremum is the smallest of the majorants, ∀ ε > 0 ∃ x_ε, E-ε < f(x_ε) < E, E-f(x_ε) < ε, 1/(E-f(x_ε)) = g(x_ε) > 1/ε = M, so g(x) is unbounded above but it should be bounded above, so f(x) < E is a contradiction

f(x) = 1/x, A = [1,+∞), f(A) = im(f) = (0,1], 1 is the maximum of the function, but there is no minimum because the function never assumes the value of 0

If a function is continuous on an interval and passes from negative to positive values, or vice versa, then it assumes the value 0 at least at one point; considering two points a and b where a < b, f(a) < 0 and f(b) > 0, it is possible to find an intermediate point c for studyng the sign of f(c); [a_n,b_n], f(a_n)f(b_n) < 0, 0 ≤ f(x₀)² ≤ 0; considering that the square of a real number is always ≥ 0, f(x₀) = 0

A continuous function on an interval cannot take two values without taking all the intermediate values; f: I → ℝ, f(x₁) = y₁ < y₂ = f(x₂), y₁ < y < y₂, g(x) := f(x)-y = 0, f(x) = y

Continuous functions transform intervals into intervals, and in particular they transform limited and closed intervals into intervals of the same type

sin(x), ℝ → [-1,1]; sin(π/2) = 1, sin(-π/2) = -1, sin(X) = [-1,1]

tan(x), (-π/2,π/2) → (-∞,+∞) = ℝ

x ↦ a^x, a > 1; ℝ → (0,+∞) = ℝ⁺

ⁿ√(x), [0,+∞) = ℝ₊ → [0,+∞) = ℝ₊

The inverse function is not the reciprocal function

The reciprocal function of f(x) is 1/f(x)

An injective function (also known as injection, or one-to-one function) is a function f that maps distinct elements to distinct elements; that is f(x1) = f(x2) implies x1 = x2; every element of the function's codomain is the image of at most one element of its domain; injective functions are monotonous functions, strictly increasing or decreasing; a function is injective when lines parallel to the x-axis intersect the curve of the function in at most one point; sin(x) is not an injective function because it is possible to draw lines parallel to the x-axis that intersect the curve of sin(x) in several points

If a function is injective it is possible to obtain the inverse function; x ∈ I → f → f(x), f(x) = y → f^-1 → x

f(x) = y = mx+q, y-q = mx, x = (y-q)/m = (y-q)(1/m) = f^-1(y)

The function y = x² is represented by a parabola and is not injective, since the lines parallel to the x-axis cut the curve of the function in two points and so it is impossible to obtain the inverse function; considering y = x² defined in [0,+∞) that is the branch of the parabola in the first quadrant, we have obtained an injective function from which we can obtain the inverse function; y = x² ⇔ x = √(y); f: x ↦ x², x ≥ 0, f^-1: x ↦ √(x), x ≥ 0; the graph of √(x) is a half parabola that is symmetrical to the half parabola y = x² [0,+∞) with respect to the bisector of the first quadrant

The graph of the inverse function is symmetrical to the graph of the starting function with respect to the bisector of the first quadrant

If f is a continuous and strictly monotone function that transforms the interval I into the interval J, the inverse function is continuous and strictly monotone on J

f: I → J, f^-1: J → I

A surjective function (also known as surjection, or onto function) is a function f that maps an element x to every element y; for every y, there is an x such that f(x) = y; every element of the function's codomain is the image of at least one element of its domain; it is not required that x be unique, the function f may map one or more elements of X to the same element of Y

A function is surjective if the codomain is the image of the function

If a function is continuous and monotone, the inverse function is also continuous and monotone

Exponential function: x ↦ a^x, x ∈ ℝ, ℝ → ℝ⁺ = (0,+∞)

logarithm function: x ↦ log_a(x), x ∈ ℝ⁺ = (0,+∞), ℝ⁺ = (0,+∞) → ℝ; the logarithm function is the inverse function of the exponential function

x → f = exponential function → a^x → f^-1 = logarithm function → x = a^log_a(x)

The logarithm is the inverse function to exponentiation; the logarithm of a number x is the exponent to which another fixed number, the base b, must be raised, to produce that number x; log_b(x) = y if b^y = x, x > 0, b > 0, b ≠ 1

The logarithm function is defined on ℝ⁺ and the image is ℝ

The exponential function is defined on ℝ and the image is ℝ⁺

The graph of the logarithm function is symmetrical to the graph of the exponential function with respect to the bisector of the first and third quadrant

b^y = x, the base b logarithm of x is log_b(x) = y

a^x = y, the base a logarithm of y is log_a(y) = x; 10² = 100, log₁₀(100) = 2; 10³ = 1000, log₁₀(1000) = 3; ln(e²) = 2; ln(e³) = 3

b^y = x, y = log_b(x), the anti logarithm or inverse logarithm is calculated by rasing the base b to the logarithm y, x = log_b^-1(y) = b^y = b^log_b(x), log_b^-1(y) = b^y = b^{log_b(b^y)}

a^x = y, x = log_a(y), the anti logarithm or inverse logarithm is calculated by rasing the base a to the logarithm x, y = log_a^-1(x) = a^x = a^log_a(y), log_a^-1(x) = a^x = a^{log_a(a^x)}; 10² = 10^{log₁₀(10²)}; 10³ = 10^{log₁₀(10³); e² = e^ln(e²); e³ = e^ln(e³)}

Logarithm product rule: log_b(x⋅y) = log_b(x)+log_b(y)

Logarithm product rule: log_a(x⋅y) = log_a(x)+log_a(y); 5 = log₁₀(10⁵) = log₁₀(10²⋅10³) = log₁₀(10²)+log₁₀(10³) = 2+3 = 5; 5 = ln(e⁵) = ln(e²⋅e³) = ln(e²)+ln(e³) = 2+3 = 5

Logarithm quotient rule: log_b(x/y) = log_b(x)-log_b(y)

Logarithm quotient rule: log_a(x/y) = log_a(x)-log_a(y); 2 = log₁₀(10²) = log₁₀(10⁵/10³) = log₁₀(10⁵)-log₁₀(10³) = 5-3 = 2; 2 = ln(e²) = ln(e⁵/e³) = ln(e⁵)-ln(e³) = 5-3 = 2

Logarithm power rule: log_b(x^y) = y⋅log_b(x)

Logarithm power rule: log_a(x^y) = y⋅log_a(x); 2 = log₁₀(10²) = 2⋅log₁₀(10) = 2⋅1 = 2; 2 = ln(e²) = 2⋅ln(e) = 2⋅1 = 2

Logarithm base switch rule: log_b(c) = 1/log_c(b)

Logarithm base switch rule: a^x = y, log_a(y) = x, log_a(y) = 1/log_y(a); 2 = log₁₀(10²) = 1/log_10²(10) = 1/(1/2) = 1⋅2 = 2; 2 = ln(e²) = 1/log_e²(e) = 1/(1/2) = 1⋅2 = 2

Logarithm change of base rule: log_b(x) = log_c(x)/log_c(b)

Logarithm change of base rule: a^x = y, log_a(y) = x, log_a(y) = log_b(y)/log_b(a); 2 = log₁₀(10²) = ln(10²)/ln(10) = 2⋅ln(10)/ln(10) = 2

Circular functions, or trigonometric functions, are not injective because they are periodic, so they cannot be reversed globally, but they can be reversed locally

It is incorrect to say that the square root function is the inverse of the squaring function, because the squaring function is not injective; the square root function is the inverse of the squaring function but restricted to the set of non-negative numbers

sin(x) [-π/2,π/2] → [-1,1]; the function is continuous and strictly increasing on this interval and therefore it can be inverted and its inverse is continuous and strictly increasing; arcsin(x), [-1,1] → [-π/2,π/2]

cos(x) [0,π] → [-1,1]; the function is continuous and strictly decreasing on this interval and therefore it can be inverted and its inverse is continuous and strictly decreasing; arccos(x), [-1,1] → [0,π]

tan(x) [-π/2,π/2] → ℝ; the function is continuous and strictly increasing on this interval and therefore it can be inverted and its inverse is continuous and strictly increasing; arctan(x), ℝ → [-π/2,π/2]; the oscillation that is the difference between the supremum and the infimum of the image, or the variation of the function, is π

17 - INTRODUCTION TO THE CONCEPT OF VECTOR SPACE

In physics, force, velocity, and acceleration are vectors; F = m⋅a, the force F and the acceleration a are vectors; v = a⋅t, the velocity v and the acceleration a are vectors; F_g = m⋅g, the force of gravity F_g and the acceleration of gravity g are vectors directed towards the center of the Earth

A Euclidean vector or simply a vector, sometimes called a geometric vector or spatial vector, is a geometric object that has magnitude or length and direction; vectors can be added to other vectors according to vector algebra; a Euclidean vector is frequently represented by a ray, a directed line segment, or graphically as an arrow connecting an initial point A with a terminal point B; a vector is what is needed to carry the point A to the point B, and the Latin word vector means carrier; it was first used by 18th century astronomers investigating planetary revolution around the Sun; the magnitude of the vector is the distance between the two points, and the direction refers to the direction of displacement from A to B

Vectors with different direction can be added using the parallelogram rule; the sum of vectors with equal direction is a simple addition of the lengths of the vectors

Two vectors that have the same length but opposite direction are called opposite and their sum is the null vector

Commutative property of vectors: v+w = w+v

Associative property of vectors: (v+w)+z = v+(w+z)

A vector can be multiplied by a real number and the length of the vector is given by the product; if the real number is negative, the resulting vector has opposite direction; if the real number is zero, the resulting vector is null

First distributive property of vectors: a(v+w) = av+aw

Multiplying a vector by the number 1 does not change the vector, 1⋅v = v

Second distributive property of vectors: (a+b)⋅v = a⋅v+b⋅v

(a⋅b)⋅v = a⋅(b⋅v) = b⋅(a⋅v)

A tuple is a finite ordered list, or sequence, of elements; an n-tuple is a sequence, or ordered list, of n elements, where n is a non-negative integer; there is only one 0-tuple, referred to as the empty tuple; an n-tuple is defined inductively using the construction of an ordered pair; tuples are pairs of numbers, triples of numbers, quadruples of numbers, and so on

ℝ² is the set of all pairs (a,b) of real numbers

Sum of pairs: (a,b)+(c,d) = (a+c,b+d)

Product of a number by a pair: m(a,b) = (ma,mb)

Commutative property in ℝ² for the sum: (a,b)+(c,d) = (c,d)+(a,b)

Associative property in ℝ² for the sum:((a,b)+(c,d))+(e,f) = (a,b)+((c,d)+(e,f))

Existence of 0 in ℝ² for the sum: (a,b)+(0,0) = (a,b)

Existence of the opposite in ℝ² for the sum: (a,b)+(-a,-b) = (0,0)

First distributive property in ℝ² for the product: m((a,b)+(c,d)) = m(a,b)+m(c,d)

Second distributive property in ℝ² for the product: (m+n)(a,b) = m(a,b)+n(a,b)

Property of the number 1 in ℝ² for the product: 1(a,b) = (a,b)

Property in ℝ² for the product: (mn)(a,b) = m(n(a,b))

Pairs have the same properties as vectors

Vectors and tuples from ℝ² to ℝⁿ have sum, product by a number, and these eight properties

18 - VECTOR SPACES - LINEAR DEPENDENCE AND INDEPENDENCE

A vector space, also called a linear space, is a set of objects called vectors, which may be added together and multiplied, scaled, by numbers, called scalars; scalars are often taken to be real numbers, but there are also vector spaces with scalar multiplication by complex numbers, rational numbers, or generally any field; the operations of vector addition and scalar multiplication must satisfy certain requirements, called vector axioms; to specify that the scalars are real or complex numbers, the terms real vector space and complex vector space are often used

Axiom 1 of vector spaces, concerning the sum: v+w = w+v, commutative property

Axiom 2 of vector spaces, concerning the sum: (v+w)+z = v+(w+z), associative property

Axiom 3 of vector spaces, concerning the sum: v+0_v = v, existence of the null element or zero

Axiom 4 of vector spaces, concerning the sum: v+(-v) = 0_v, existence of the opposite

Axiom 5 of vector spaces, concerning the product: a(v+w) = av+aw, first distributive property

Axiom 6 of vector spaces, concerning the product: (a+b)v = av+bv, second distributive property

Axiom 7 of vector spaces, concerning the product: 1v = v, 1 is the neutral element of the product

Axiom 8 of vector spaces, concerning the product: (ab)v = a(bv) = b(av)

ℝ, ℝ², ℝⁿ, are vector spaces

Vectors of two-dimensional space form a vector space

{(x,x) ∈ ℝ²}; (1,1)+(2,2) = (3,3); α(x,x) = (αx,αx); (0,0)+(x,x) = (x,x); -(x,x) = (-x,-x)

v+(-v) = 0_v; v-v = 0_v

(x+y)+(-(x,y)) = (x+y)-(x+y)

0⋅v = 0⋅_v; a⋅0_v = 0_v

Product cancellation law: a⋅v = 0_v ⇒ a = 0 or v = 0_v

-(av) = (-a)v = a(-v)

-(4(2,3)) = (-4)(2,3) = (-8,-12); -(4(2,3)) = 4(-2,-3) = (-8,12)

If W is a subspace of v, W ⊂ v, and W is a vector space with the sum and the product of v

{(x,x) ∈ ℝ²} is a subset of ℝ², and it is a vector subspace of ℝ²

{(x,y,x+y) ∈ ℝ³}; is a subset of ℝ³, a vector subspace of ℝ³; (3,2,5)

If in a set the properties of vector space are not valid, then the set is not a vector space

W = {(x,x+1)} is a subset of ℝ² but it is not a vector subspace of ℝ², because sum and product operations are not valid; (3,3+1) = (3,4), (3,4)+(5,6) = (8,10) ∉ {(x,x+1)}, 2(3,4) = (6,8) ∉ {(x,x+1)}; W = {(x,x+1)} is not a vector space

Linear combination: a₁v₁+a₂v₂+...+a_nv_n

3(2,-2)+(-4)(0,1)+7(1,7) is a linear combination in ℝ²; 3(2,-2)+(-4)(0,1)+7(1,7) = (6,-6)+(0,-4)+(7,49) = (13,39)

food A: 30% fats, 20% carbohydrates, 40% protein; food B: 10% fats, 30% carbohydrates, 5% protein; food C: 12% fats, 2% carbohydrates, 9% protein; 100 grams of food A (30,20,40); 100 grams of food B (10,30,5); 100 grams of food C (12,2,9); 40 grams of A with 50 grams of B with 40 grams of C = (40/100)(30,20,40)+(50/100)(10,30,5)+(40/100)(12,2,9) = (12,8,16)+(5,15,2.5)+(4.8,0.8,3.6) = (21.8,23.8,22.1); it is a linear combination of 3 elements of vector space of triples

Calculate the linear combination of these 3 vectors of ℝ³: 2(1,1,1)+1(2,2,2)-(4,4,4) = (0,0,0); these are linearly dependent vectors because it is possible to obtain a null linear combination using non-zero coefficients

a(1,0)+b(1,1) = (0,0); a+b = 0, b = 0, a = 0; these are linearly independent vectors because it is not possible to obtain a null linear combination using non-null coefficients

Linearly independent vectors: a₁v₁+a₂v₂+...+a_nv_n = 0 ⇒ a₁ = a₂ = ... = a_n = 0

If it is possible to linearly combine vectors with at least one non-null coefficient and get the null vector, then the vectors are linearly dependent

v₁, ..., v_m are linearly independent vectors if and only if: v₁ ≠ 0_v, v₂ is not a multiple of v₁, v₃ is not a linear combination of v₁ and v₂, v_m is not a linear combination of v₁, ..., v_m-1

Vectors are linearly independent if they cannot be obtained from a linear combination of the preceding vectors

The fundamental versors of ℝ³ e₁ = (1,0,0), e₂ = (0,1,0), e₃ = (0,0,1) are linearly independent in ℝ³; (1,0,0) ≠ (0,0,0), (0,1,0) ≠ a(1,0,0), (0,0,1) ≠ a(1,0,0)+b(0,1,0) because it is impossible that 0 = a and 0 = b and 1 = 0

The fundamental versors of ℝⁿ e₁ = (1,0,0,...,0), e₂ = (0,1,0,...,0), e_n = (0,0,0,...,1), are linearly independent in ℝⁿ

In a linear combination of linearly independent vectors the coefficients are unique; if v = a₁v₁+...+a_nv_n and v = b₁v₁+...+b_nv_n are linearly independent then a₁ = b₁,...,a_n = b_n

Linearly independent vectors can be described by only one linear combination because coefficients are unique

(1,0,0),(0,1,0),(0,0,1); (1,2,1) = a(1,0,0)+b(0,1,0)+c(0,0,1), a = 1, b = 2, c = 1

If the vectors are linearly dependent their linear combination can be written in different ways because the coefficients are not unique

The coefficients of a linear combination are also called components of the vector

19 - GENERATORS, BASES AND DIMENSION OF A VECTOR SPACE

{v₁,v₂,...,v_n} generate V if for each v in V is v = a₁v₁+a₂v₂+...+a_nv_n

The fundamental versors e₁ = (1,0,0,...,0), e₂ = (0,1,0,...,0), e_n = (0,0,0,...,1), generate ℝⁿ; fundamental versors are generators of ℝⁿ that is each element of ℝⁿ can be written as a linear combination of fundamental versors

n = 3, (1,0,0),(0,1,0),(0,0,1); ℝ³, (a,b,c) = a(1,0,0)+b(0,1,0)+c(0,0,1); fundamental versors can generate any vector

(1,1),(1,0) are not fundamental versors of ℝ² but are generators of ℝ² that is with their linear combinations a(1,1)+b(1,0) it is possible to obtain any vector of ℝ²; a(1,1)+b(1,0) = (α,β), a+b = α, a = β, b = α-a = α-β; a(1,1)+b(1,0) = (3,5), a+b = 3, a = 5 = β, b = 3-a = 3-5 = -2 = α-β; this is a system of generators for the vector space ℝ²

v₁ = (1,1), v₂ = (1,0), v₃ = (0,1), these 3 vectors are generators for ℝ²; a(1,1)+b(1,0)+c(0,1) = (α,β), a+b = α, a+c = β, there are infinite solution; these generators are not bases of vector space

The bases of vector spaces are linearly independent generators; {v₁,v₂,...,v_n} base of V ⇒ linearly independent generators

A vector is a linear combination of linearly independent vectors when the coefficients are unique; v = a₁v₁+...+a_nv_n, the coefficient a₁...a_n are unique because it is a linear combination of linearly independent vectors

A base is a set of generators such that each vector v of the vector space is written as a linear combination of these vectors in a unique way

The fundamental versors of ℝⁿ are not only generators but are also a base of ℝⁿ because they are linearly independent; each element of ℝⁿ can be written uniquely as a linear combination of these vectors; e₁ = (1,0,...,0), e₂ = (0,1,...,0), e_n = (0,0,...,1), versors of ℝⁿ are a base of ℝⁿ

In vector spaces there are other bases besides the fundamental versors

The fundamental versors of ℝ² are (1,0) and (0,1) and are generators and bases of ℝ²; vectors (1,1) and (1,0) are generating vectors of ℝ² because they can generate any other vector of ℝ², and they form a base because they are linearly independent since (1,1) is not zero and (1,0) is not a multiple of (1,1); any linear combination of the vectors (1,1) and (1,0) has unique coefficients

v₁ = (1,1), v₂ = (1,0), v₃ = (0,1), are generators of ℝ², but they are not a base of the vector space because they are non linearly independent generators; if they were linearly independent, each vector of vector space could be written in a unique way as a linear combination of these vectors

Vectors are generators of a vector space when they can generate all the vectors of the vector space; the generating vectors can form a base when the linear combination of these vectors has unique coefficients

In a vector v = a₁v₁+a₂v₂+...+a_nv_n the coefficients, a₁, a₂, a_n written in order, are the components of v with respect to the base v₁, v₂, v_n

A base must be in order; the vectors of a base have a precise order; changing the order of the vectors in the linear combination changes the base because the components change

Steinitz theorem: x₁,x₂,...,x_n are generators, and y₁,y₂,...,y_m are linearly independent, then m ≤ n; the number m of linearly independent vectors can never exceed the number n of generators

Corollary of Steinitz's theorem: all the bases of a vector space have the same number of elements; base₁ = v₁,...,v_m, base₂ = w₁,...,w_r, a base is a linear combination of linearly independent vectors, so for Steinitz theorem m ≤ r and r ≤ m, therefore m = r that is all the bases of a vector space have the same number of elements

Dimension of a vector space: if v₁,v₂,...,v_n is a base of V, all bases of V have n elements, then n = dim(V); in a vector space all the bases have the same number of vectors, and this number is the dimension of the vector space; the dimension of a vector space is the number of elements of any base of the vector space

dim(ℝ²) = 2, all bases of ℝ² are of 2 elements

dim(ℝ³) = 3, all bases of ℝ³ are of 3 elements

dim(ℝ⁴) = 4, all bases of ℝ⁴ are of 4 elements

dim(ℝⁿ) = n, all bases of ℝⁿ are of n elements

3 vectors cannot form a base of ℝ², because a base of ℝ² is of 2 elements

Consequences of Steinitz's theorem: dim(V) = n ⇒ n linearly independent vectors form a base and there is no need to verify that they are generators, dim(V) = n ⇒ n generators form a base and there is no need to verify that they are linearly independent

Dimension of subspaces: if W is a subspace of V then dim(W) ≤ dim(V); if dim(V) = n, then 0 ≤ dim(W) ≤ n; if dim(W) = 0, then W = {0_v}; if dim(W) = n, then W = V

In an environment V there is a subspace W = L(v₁,...,v_m); the subspace W has the vectors v₁,...,v_m as generators; vectors of W are linear combinations of vectors v₁,...,v_m; it is important to find a method for obtaining a base for this vector subspace; a system of generators is not a base when the vectors are not linearly independent because they are too many, and then the removal method is used

Removal method: to find the base, from a generator system v₁,v₂,...,v_m, the null vectors and any vector that is a linear combination of the previous ones are removed

v₁ = (1,1), v₂ = (1,0), v₃ = (0,1), are generators of ℝ², ℝ² = L(v₁,v₂,v₃); a base is (v₁,v₂), but also (v₃,v₁) is a base

Completion method of linearly independent vectors: if w₁,w₂,...,w_r is a system of linearly independent vectors, it possible to add a system of generators of V v₁,v₂,...,v_m obtaining the system w₁,w₂,...,w_r,v₁,v₂,...,v_m, and from this it is possible to obtain a base with the method of removal

Find the base of the generator system of R³ (1,1,1),(1,0,0),(0,1,0),(0,0,1); (1,1,1) is not null, so it is part of the base; (1,0,0) is not a multiple of (1,1,1), so it is part of the base; (1,1,0) = a(1,1,1)+b(1,0,0), 1 = a+b, 1 = a, 0 = a, the vector (1,1,1) is not it is a linear combination of the previous ones so it forms part of the base; considering the Steinitz's theorem the number of elements of a base is equal to the dimension of the vector space, therefore the base of R³ is formed by the 3 vectors (1,1,1),(1,0,0),(0,1,0)

Base: linearly independent generators; a base of a vector space is a set of linearly independent generators

Dimension: number of elements of each base; the dimension of a vector space is the number of vectors of the base

20 - MATRICES - PART 1 - RANK AND REDUCTION

Example of matrix: [(2,1,0),(3,4,-2)]

Example of matrix: [(0,3,4,1),(0,1,1,1),(1,-1,0,0)]

Matrix with m rows and n columns: A = [(a_1,1,...,a_1,n),(...,...,...),(a_m,1,...,a_m,n)]

a_i,j is the element a of the matrix in row i and column j

Row space: L(R₁,...,R_m) = vector space in Rⁿ generated by R₁ = (a_1,1,...,a_1,n), ..., R_m = (a_m,1,...,a_m,n); each row is a tuple and an element of Rⁿ; the rows are vectors of Rⁿ and generate a vector subspace of Rⁿ; linear combinations of the row vectors of the matrix generate the space of rows

Considering the matrix [(1,1,1),(2,1,0)], the row space is generated by the vectors (1,1,1) and (2,1,0) that are a vector subspace of 2 vectors in ℝ³

Column space: L(C₁,...,C_m) = vector space in R^m and generated by C₁ = (a_1,1,...,a_1,n), ..., C_m = (a_1,n, ..., a_m,n); each column is a tuple and an element of R^m; the columns are vectors of R^m and generate a vector subspace of R^m; linear combinations of the column vectors of the matrix generate the space of columns

Considering the matrix [(1,1,1),(2,1,0)], the column space is generated by the vectors (1,2), (1,1), (1,0) that are a vector subspace of 3 vectors in ℝ²

The row space and the column space are completely different spaces, generated by different vectors, which can also be in different environments

A matrix is square if the number of rows is equal to the number of columns, so the row space and the column space are contained in the same environment

The matrix [(1,1,1),(2,2,2,),(0,0,3)] is square; the space of rows is generated by the vectors (1,1,1), (2,2,2), (0,0,3) that are a vector subspace of 3 vectors in ℝ³; the space of columns is generated by the vectors (1,2,0), (1,2,0), (1,2,3) that are a vector subspace of 3 vectors in ℝ³; only in the special case of a square matrix the row space and the column space are contained in the same environment

Rank of a matrix: dim(L(R₁,...,R_m)) = dim(L(C₁,...,C_n)) = rank of A = ρ(A)

Calculate the rank of the matrix A = [(1,2,1),(0,1,3)]; row space = L((1,2,1),(0,1,3)) ⊆ R³, the vector (1,2,1) is not null and the vector (0,1,3) is not a multiple of (1,2,1), so the dimension of the vector row space is 2 = ρ(A); column space = L((1,0),(2,1),(1,3)) ⊆ R², the vector (1,0) is not null and the vector (2,1) is not a multiple of (1,0), dimension of column space = 2 = ρ(A)

Considering the matrix A = [(1,2,3-1),(4,3,2,1),(5,5,-1,7)], we need a technique for finding the dimension of the row space, the dimension of the column space, and the rank of a matrix, even when the matrix has many rows and many columns

Calculate the dimension of the row space, the dimension of the column space and the rank of the matrix A = [(1,2,1),(0,1,3),(0,5,0)]; the first row is not null, the second row is not a multiple of the first row; we need to understand if the third row is a linear combination of the first and the second row, R₃ = aR₁+bR₂ that is false, this matrix has 3 linearly independent rows which are 3 linearly independent vectors that generate the vector space, therefore they are a base of the vector space, so ρ(A) = 3; this is an example of a reduced matrix per row and the dimension of the row space is simply the number of the rows and from this we immediately obtain the rank of the matrix which is equal to the dimension of the row space

A matrix is reduced by rows when there are only zeros under some elements; if a matrix is reduced by rows, the rank of the matrix is the number of non-zero rows

[(1,0,0),(2,4,0),(4,3,-1)] this is a matrix reduced by columns, so the rank of the matrix is the number of non-zero columns

A matrix is reduced by columns when there are only zeros to the right of some elements; if a matrix is reduced by columns, the rank of the matrix is the number of non-zero columns

To find the rank of a non-reduced matrix it is possible to calculate the rank of a reduced matrix that has the same rank as the starting matrix

To reduce a matrix, it is possible to perform elementary transformations on the rows that do not change the row space and therefore the rank: multiply a row by a non-zero number, R_i → aR_i, a ≠ 0; replace a row with another row, R_i ↔ R_j, i ≠ j; add to a row the multiple of another row, R_i → R_i+aR_j, a ≠ 0, i ≠ j; these operations do not change the row space; to calculate the rank of a matrix we perform these elementary transformations, obtaining a new matrix reduced by rows, and the rank of this new matrix is equal to the rank of the starting matrix

Calculate the rank of the matrix A = [(1,1,1),(2,1,1),(3,1,-1)]; A → A₁, ρ(A) = ρ(A₁); [(1,1,1),(2,1,1),(3,1,-1)], R₂ → R₂-R₁, [(1,1,1),(1,0,0),(3,1,-1)], R₃ → R₃+R₁, [(1,1,1),(1,0,0),(4,2,0)], R₃ → R₃-4R₂, [(1,1,1),(1,0,0),(0,2,0)], the rank of this matrix is 3, ρ(A) = 3; another method is [(1,1,1),(2,1,1),(3,1,-1)], R₂ → R₂-R₁, [(1,1,1),(1,0,0),(3,1,-1)], R₃ → R₃+R₁, [(1,1,1),(1,0,0),(4,2,0)], R₂ ↔ R₃, [(1,1,1),(4,2,0),(1,0,0)], the rank of this matrix is 3, ρ(A) = 3

The rank of a matrix is obtained by reducing the matrix by rows or by columns, and this allows to calculate the dimension of a vector space generated by its vectors

The vectors of a vector space can be written as rows of a matrix; calculating the rank of the matrix allows to obtain the dimension of the row space, and the row space is the vector space that has these vectors as generators

v₁ = (a_1,1,...,a_1,n), ..., v_m = (a_m,1,...,a_m,n) ⇒ A = [(a_1,1,...,a_1,n),(...,...,...),(a_m,1,...,a_m,n)]

In the vector space R⁴, calculate the dimension of the subspace W generated by the vectors (1,1,2,1),(2,1,0,3),(4,4,1,0); these 3 vectors become the rows of a matrix with 3 rows and 4 columns [(1,1,2,1),(2,1,0,3),(4,4,1,0)]; reducing this matrix by rows we obtain the dimension of the row space that is the dimension of the vector subspace; [(1,1,2,1),(2,1,0,3),(4,4,1,0)], R₃ → 2R₃-R₁, [(1,1,2,1),(2,1,0,3),(7,7,0,-1)], R₃ → R₃-7R₂, [(1,1,2,1),(2,1,0,3),(-7,0,0,-22)], ρ(A) = 3, dim(W) = 3; the 3 rows are 3 linearly independent vectors, and are a base of the row space

21 - MATRICES - PART 2 - OPERATIONS

It is possible to add two matrices when they have the same number of rows and the same number of columns, and the procedure is to add the elements of the two matrices that have the same indices: [(a_1,1,...,a_1,n),(...,...,...),(a_m,1,...,a_m,n)]+[(b_1,1,...,b_1,n),(...,...,...),(b_m,1,...,b_m,n)] = [(a_1,1+b_1,1,...,a_1,n+b_1,n),(...,...,...),(a_m,1+b_m,1,...,a_m,n+b_m,n)]; the result of the addition is a matrix with the same number of rows and columns as the initial matrices; it is not possible to add two matrices that have different number of rows and columns

A = [(2,1,0),(3,0,2)], B = [(1,4,1),(2,1,-1)]; A+B = [(3,5,1),(5,1,1)]

It is possible to multiply a matrix by a real number: r⋅A = r[(a_1,1,...,a_1,n),(...,...,...),(a_m,1,...,a_m,n)] = [(r⋅a_1,1,...,r⋅a_1,n),(...,...,...),(r⋅a_m,1,...,r⋅a_m,n)]; the result of the product is a matrix with the same number of rows and columns as the initial matrix; it is possible to multiply any number by any matrix

a = -2, A = [(2,1,0),(3,0,2)]; -2A = [(-4,-2,0),(-6,0,-4)]

a = 0, A = [(2,1,0),(3,0,2)]; 0A = [(0,0,0),(0,0,0)]; multiplying a matrix by 0, the result is a null matrix

The sum of matrices is commutative: A+B = B+A

The sum of matrices is associative: A+(B+C) = (A+B)+C

The null matrix exists: A+0 = 0+A = A

The opposite matrix exists: A+(-A) = A-A = 0

Null matrix = 0 = [(0,...,0),(...,...,...),(0,...,0)]

A+(-A) = [(0,...,0),(...,...,...),(0,...,0)]

A = [(1,2,0),(-1,4,2)], -A = [(-1,-2,0),(1,-4,-2)]; A-A = [(0,0,0),(0,0,0)]

Distributive property of the product of a number for a sum of matrices: a(A+B) = aA+aB

Distributive property of the sum of two numbers for a matrix: (a+b)A = aA+bA

A matrix multiplied by 1 remains unchanged, the number 1 is the neutral element of the product: 1A = A

Distributive property of a product of two numbers for a matrix: (ab)A = a(bA) = b(aA)

A matrix with m rows and n columns is indicated as ℝ^m,n, and in a matrix the operations sum of matrices and matrix product for a real number are defined, and this is a vector space; ℝ^2,3 is the vector space of matrices with 2 rows and 3 columns; ℝ^4,2 is the vector space of matrices with 4 rows and 2 columns; ℝ^n,n is the vector space of square matrices with n rows and n columns; the matrix is an extension of the concept of tuple

Product of matrices: A = [(a_1,1,a_1,2,a_1,3),(a_2,1,a_2,2,a_2,3),(a_3,1,a_3,2,a_3,3)], B = [(b_1,1,b_1,2,b_1,3),(b_2,1,b_2,2,b_2,3),(b_3,1,b_3,2,b_3,3)]; calculating the element c_1,3 of the matrix C = AB, c_1,3 = a_1,1⋅b_1,3+a_1,2⋅b_2,3+a_1,3⋅b_3,3

The product of a matrix A by a matrix B can be made if the number of elements of the rows of A is equal to the number of elements of the columns of B; the number of elements in a row equals the number of columns, and the number of elements in a column equals the number of rows; to make the product A⋅B, the number of columns of the matrix A must be equal to the number of rows of the matrix B; A^m,p⋅B^p,n

A = [(2,1,0),(3,3,0)], B = [(4,4,2)(2,2,4)]; the product A⋅B cannot be done because A has 3 columns and B has 2 rows

A = [(2,1),(3,3)], B = [(4,4,2)(2,2,4)]; C = A⋅B, c_1,1 = 2⋅4+1⋅2 = 8+2 = 10, c_1,2 = 2⋅4+1⋅2 = 8+2 = 10, c_1,3 = 2⋅2+1⋅4 = 4+4 = 8, c_2,1 = 3⋅4+3⋅2 = 12+6 = 18, c_2,2 = 3⋅4+3⋅2 = 12+6 = 18, c_2,3 = 3⋅2+3⋅4 = 6+12 = 18; C = A⋅B = [(2,1),(3,3)][(4,4,2)(2,2,4)] = [(10,10,8),(18,18,18)]; the number of rows of C is 2 as the number of rows of A, the number of columns of C is 3 as the number of columns of B

The matrix resulting from the product of matrix A by matrix B has a number of rows equal to the number of rows in matrix A, and a number of columns equal to the number of columns in matrix B; A^m,p⋅B^p,n = C^m,n

A = [(1,1,1),(3,3,0),(0,-1,4)], B = [(1,2,1)]; C = A⋅B, c_1,1 = 1⋅(-1)+1⋅2+1⋅1 = -1+2+1 = 2, c_2,1 = 3⋅(-1)+3⋅2+0⋅1 = -3+6+0 = 3, c_3,1 = 0⋅(-1)+(-1)⋅2+4⋅1 = 0+(-2)+4 = 2; C = A⋅B = [(1,1,1),(3,3,0),(0,-1,4)][(1,2,1)] = [(2),(3),(2)]; the matrix C, resulting from the product of A⋅B, has 3 rows as the matrix A and 1 column as the matrix B

Matrix A can be multiplied by matrix B when the number of columns of A equals the number of rows of B, and the resulting matrix C has the number of rows of A and the number of columns of B

In an identical matrix, the elements of the diagonal are equal to 1 and all the others are equal to 0; a_1,1 = 1, a_2,2 = 1, a_3,3 = 1, a_p,p = 1; I = [(1,0,0),(0,1,0),(0,0,1)]; I = [(1,0,0,0),(0,1,0,0),(0,0,1,0),(0,0,0,1)]

Multiplying a matrix A by the identical matrix I, the result is the same matrix A; A⋅I = A; A = [(1,2,3),(4,4,2)], I = [(1,0,0),(0,1,0),(0,0,1)], A⋅I = [(1,2,3),(4,4,2)][(1,0,0),(0,1,0),(0,0,1)], c_1,1 = 1⋅1+1⋅0+1⋅0 = 1+0+0 = 1, c_1,2 = 1⋅0+2⋅1+3⋅0 = 0+2+0 = 2, c_1,3 = 1⋅0+2⋅0+3⋅1 = 0+0+3 = 3, c_2,1 = 4⋅1+4⋅0+2⋅0 = 4+0+0 = 4, c_2,2 = 4⋅0+4⋅1+2⋅0 = 0+4+0 = 4, c_2,3 = 4⋅0+4⋅0+2⋅1 = 0+0+2 = 2, A⋅I = [(1,2,3),(4,4,2)][(1,0,0),(0,1,0),(0,0,1)] = [(1,2,3),(4,4,2)] = A

food A = 30% fats, 10% carbohydrates, 10% proteins; food B = 20% fats, 20% carbohydrates, 5% proteins; food C = 15% fats, 15% carbohydrates, 10% proteins; 100 grams of food A = (30,10,10); 100 grams of food B = (20,20,5); 100 grams of food C = (15,15,10); arranging each food in a column, M = [(30,20,15),(10,20,15),(10,5,10)]; calculate fats, carbohydrates, proteins contained in a meal consisting of 120 grams of A, 50 grams of B, 150 grams of C; [(30,20,15),(10,20,15),(10,5,10)][(1.2,0.5,1.5)], c_1,1 = 30⋅1.2+20⋅0.5+15⋅1.5 = 36+10+22.5 = 68.5, c_2,1 = 10⋅1.2+20⋅0.5+15⋅1.5 = 12+10+22.5 = 44.5, c_3,1 = 10⋅1.2+5⋅0.5+10⋅1.5 = 12+2.5+15 = 29.5, [(30,20,15),(10,20,15),(10,5,10)][(1.2,0.5,1.5)] = [(68.5,44.5,29.5)], the meal contains 68.5 grams of fats, 44.5 grams of carbohydrates, 29.5 grams of proteins

Associative property of the product of matrices: A(BC) = (AB)C

Distributive property of the product with respect to the sum of matrices: A(B+C) = AB+AC

Properties of the identical matrix: AI = IA = A

Multiplying any matrix by the null matrix yields a null matrix; A = [(1,2),(3,4)], B = [(0,0),(0,0)], A⋅B = [(1,2),(3,4)][(0,0),(0,0)], c_1,1 = 1⋅0+2⋅0 = 0+0 = 0, c_1,2 = 1⋅0+2⋅0 = 0+0 = 0, c_2,1 = 3⋅0+4⋅0 = 0+0 = 0, c_2,2 = 3⋅0+4⋅0 = 0+0 = 0, A⋅B = [(1,2),(3,4)][(0,0),(0,0)] = C = [(0,0),(0,0)]

The product of two real numbers is zero if one of the two numbers is zero, while the product of two matrices can be zero even if the two matrices are different from zero; A ≠ 0, B ≠ 0, it is possible that AB = 0; with matrices the law of the cancellation of the product of real numbers does not apply; it may happen that the product of two non-zero matrices is zero; obviously the result of the product of any matrix multiplied by a null matrix is a null matrix

Two square matrices can be multiplied with each other as long as they are squares of the same order that is they must have the same number of rows and columns

A = [(0,-1),(0,-1)], B = [(0,-1),(0,0)]; AB = [(0,-1),(0,-1)][(0,-1),(0,0)], c_1,1 = 0⋅0+(-1)⋅0 = 0+0 = 0, c_1,2 = 0⋅(-1)+(-1)⋅0 = 0+0 = 0, c_2,1 = 0⋅0+(-1)⋅0 = 0+0 = 0, c_2,2 = 0⋅(-1)+(-1)⋅0 = 0+0 = 0, AB = [(0,-1),(0,-1)][(0,-1),(0,0)] = C = [(0,0),(0,0)]; this example is the product of two square matrices and neither is a null matrix, but their product is a null matrix

To reduce a matrix containing a parameter you have to pay attention to the value of the parameter; in this example of row reduction of a matrix with a parameter, it is necessary to avoid that the element h is a special element, because it could be zero; A = [(1,1,h),(2,1,3)], R₂ → R₂-R₁, [(1,1,h),(1,0,3-h)] = B, matrix B has the same rank as matrix A because for any value of h the two rows are not zero, ρ(A) = ρ(B) = 2

In this example, row reduction of a matrix containing a parameter requires more attention: A = [(1,1,h),(1,1,3)], R₂ → R₂-R₁, [(1,1,h),(0,0,3-h)] = B, the rank of matrix B changes according to the value of the parameter h, ρ(A) = 2, but ρ(B) = 2 if h ≠ 3 and ρ(B) = 1 if h = 3

A matrix A^m,n can be added; a matrix A^m,n can be multiplied by a number; two matrices can be multiplied with each other when the number of rows of the first matrix is equal to the number of columns of the second matrix, A^m,n⋅B^n,p = C^m,p

22 - MATRICES - PART 3 - INVERSE MATRIX AND TRANSPOSE MATRIX

Inverse matrix: if the matrix A is invertible, A^-1 is the inverse matrix of A, hence A⋅A^-1 = A^-1⋅A = I, where I is the identical matrix

A = [(2,1),(3,-1)], X = [(a,b),(c,d)], A⋅X = I = [(1,0),(0,1)]; A⋅X = [(2,1),(3,-1)][(a,b),(c,d)], x_1,1 = 2a+c, x_1,2 = 2b+d, x_2,1 = 3a-c, x_2,2 = 3b-d, X = [(2a+c,2b+d),(3a-c,3b-d)], 2a+c = 1, 2b+d = 0, 3a-c = 0, 3b-d = 1, adding 2a+c = 1 and 3a-c = 0 the result is 5a = 1 that is a = 1/5 and c = 3/5, adding 2b+d = 0 and 3b-d = 1 the result is 5b = 1 that is b = 1/5 and d = -2/5, X = A^-1 [(1/5,1/5),(3/5,-2/5)]; we have proved that the matrix A is invertible, the inverse matrix exists and we have computed it; A⋅A^-1 = I = [(2,1),(3,-1)][(1/5,1/5),(3/5,-2/5)], i_1,1 = 2⋅1/5+1⋅3/5 = 2/5+3/5 = 5/5 = 1, i_1,2 = 2⋅1/5+1⋅(-2/5) = 2/5-2/5 = 0, i_2,1 = 3⋅1/5+(-1)⋅3/5 = 3/5-3/5 = 0, i_2,2 = 3⋅1/5+(-1)⋅(-2/5) = 3/5+2/5 = 5/5 = 1, I = [(1,0),(0,1)]

Calculate the inverse matrix of the matrix A = [(2,1,-1),(3,0,4),(0,0,2)]; A⋅X = I, X = [(a,b,c),(d,e,f),(g,h,i)], A⋅X = I = [(1,0,0),(0,1,0),(0,0,1)], A⋅X = [(2,1,-1),(3,0,4),(0,0,2)][(a,b,c),(d,e,f),(g,h,i)], x_1,1 = 2a+d-g, x_1,2 = 2b+e-h, x_1,3 = 2c+f-i, x_2,1 = 3a+4g, x_2,2 = 3b+4h, x_2,3 = 3c+4i, x_3,1 = 2g, x_3,2 = 2h, x_3,3 = 2i, I = [(2a+d-g,2b+e-h,2c+f-i),(3a+4g,3b+4h,3c+4i),(2g,2h,2i)]; 2a+d-g = 1, 2b+e-h = 0, 2c+f-i = 0, 3a+4g = 0, 3b+4h = 1, 3c+4i = 0, 2g = 0, 2h = 0, 2i = 1, 9 first degree equations with 9 unknowns; 2g = 0, g = 0; 2h = 0, h = 0; 2i = 1, i = 1/2; 3a+4g = 0, 3a = 0, a = 0; 3b+4h = 1, 3b = 1, b = 1/3; 3c+4i = 0, 3c+4(1/2) = 0, 3c+4/2 = 0, 3c = -4/2, 3c = -2, c = -2/3; 2a+d-g = 1, d = 1; 2b+e-h = 0, 2(1/3)+e = 0, (2/3)+e = 0, e = -2/3; 2c+f-i = 0, 2(-2/3)+f-1/2 = 0, (-4/3)+f-(1/2) = 0, f = (4/3)+(1/2) = (8/6)+(3/6) = 11/6; a = 0, b = 1/3, c = -2/3, d = 1, e = -2/3, f = 11/6, g = 0, h = 0, i = 1/2; X = A^-1 = [(0,1/3,-2/3),(1,-2/3,11/6),(0,0,1/2)]; A⋅A^-1 = I, A⋅A^-1 = [(2,1,-1),(3,0,4),(0,0,2)][(0,1/3,-2/3),(1,-2/3,11/6),(0,0,1/2)], i_1,1 = 2⋅0+1⋅1+(-1)⋅0 = 0+1+0 = 1, i_1,2 = 2(1/3)+1(-2/3)+(-1)0 = (2/3)-(2/3) = 0, i_1,3 = 2(-2/3)+1(11/6)+(-1)(1/2) = (-4/3)+(11/6)-(1/2) = (-8/6)+(11/6)-(3/6) = (11/6)-(11/6) = 0, i_2,1 = 3⋅0+0⋅1+4⋅0 = 0+0+0 = 0, i_2,2 = 3(1/3)+0(-2/3)+4⋅0 = 3/3+0+0 = 1, i_2,3 = 3(-2/3)+0(11/6)+4(1/2) = -2+0+2 = 0, i_3,1 = 0⋅0+0⋅1+2⋅0 = 0+0+0 = 0, i_3,2 = 0(1/3)+0(-2/3)+2⋅0 = 0+0+0 = 0, i_3,3 = 0(-2/3)+0(11/6)+2(1/2) = 0+0+1 = 1; I = [(1,0,0),(0,1,0),(0,0,1)]

Example of a non-invertible matrix: A = [(2,3),(0,0)]; A⋅X = I = [(1,0),(0,1)], X = [(a,b),(c,d)], A⋅X = [(2,3),(0,0)][(a,b),(c,d)] = I, i_1,1 = 2a+3c, i_1,2 = 2b+3d, i_2,1 = 0a+0c = 0, i_2,2 = 0b+0d = 0; I = [(2a+3c,2b+3d),(0,0)] = [(1,0),(0,1)], 2a+3c = 1, 2b+3d = 0, 0 = 0, 0 = 1, it cannot be solved, the matrix A is not invertible

Transpose matrix: considering the matrix A with m rows and n columns, A = [(a_1,1,...,a_1,n),(...,...,...),(a_m,1,...,a_m,n)], to obtain the transpose matrix, the rows are exchanged with the columns, transpose of A = A^T = [(a_1,1,...,a_m,1),(...,...,...),(a_1,n,...,a_m,n)], the rows of matrix A become the columns of the transpose matrix, and the columns of matrix A become the rows of the transpose matrix; if matrix A has m rows and n columns, the transpose matrix has n rows and m columns; the exchange between rows and columns determines the change of the indexes of the elements

A = [(4,5,6),(7,8,9),(0,4,3)]; A^T = [(4,7,0),(5,8,4),(6,9,3)]

A = [(1,0,7,4),(3,3,2,2)]; A^T = [(1,3),(0,3),(7,2),(4,2)]

First property of the transpose matrix: (A+B)^T = A^T+B^T, the transpose of the sum of two matrices is equal to the sum of the transposes of the two matrices

Second property of the transpose matrix: (AB)^T = B^TA^T, the transpose of the product of two matrices is equal to the product of the transposes of the two matrices but with inverted order, because the product of matrices is not commutative, in fact, to make the product of matrix A by matrix B, the number of columns of A must be equal to the number of rows of B

Third property of the transpose matrix: (aA)^T = aA^T, the transpose of the product of a number by a matrix is equal to the product of the number by the transpose of the matrix

If a matrix is equal to its transposed matrix, A = A^T, then the matrix is symmetrical; a symmetrical matrix has its elements symmetrical with respect to its diagonal; [(1,4,5),(4,2,0),(5,0,3)] is a symmetrical matrix; obviously a symmetrical matrix must be square

Example of a symmetrical matrix: A = [(1,0,3),(0,2,4),(3,4,-1)]

A = [(1,0,2),(0,2,4),(3,4,-1)] is not a symmetrical matrix

If a matrix is equal to the opposite of its transpose matrix, A = -A^T, then the matrix is antisymmetrical; [(0,1,2),(-1,0,3),(-2,-3,0)] is an antisymmetrical matrix; in an antisymmetrical matrix, the diagonal is made up of zeros as 0 = -0, 0 is the only number that is equal to its opposite

Example of antisymmetrical matrix: [(0,1,5),(-1,0,2),(-5,-2,0)]

An orthogonal matrix must be square and invertible and its transpose must be equal to its inverse, A^T = A^-1; in an orthogonal matrix, inverse matrix and transpose matrix coincide; orthogonal matrices are particular invertible square matrices, such that the inverse is the transpose matrix

Example of orthogonal matrix: A = [(cos(α),-sin(α)),(sin(α),cos(α))]; A^T = [(cos(α),sin(α)),(-sin(α),cos(α))]; A^T⋅A = [(cos(α),sin(α)),(-sin(α),cos(α))][(cos(α),-sin(α)),(sin(α),cos(α))], e_1,1 = cos(α)⋅cos(α)+sin(α)⋅sin(α) = (cos(α))²+(sin(α))² = 1, e_1,2 = cos(α)⋅(-sin(α))+sin(α)⋅cos(α) = -sin(α)⋅cos(α)+sin(α)⋅cos(α) = 0, e_2,1 = -sin(α)⋅cos(α)+cos(α)⋅sin(α) = -sin(α)⋅cos(α)+sin(α)⋅cos(α) = 0, e_2,2 = -sin(α)⋅(-sin(α))+cos(α)⋅cos(α) = (sin(α))²+(cos(α))² = 1, A^T⋅A = [(1,0),(0,1)], so the transpose matrix of A is also the inverse matrix of A, A^T = A^-1, so A is an orthogonal matrix

Calculate the inverse of the matrix A = [(1,h),(2,0)]; X = [(a,b)(c,d)], AX = I = [(1,0),(0,1)]; [(1,h),(2,0)][(a,b)(c,d)] = [(1,0),(0,1)], a+hc = 1, b+hd = 0, 2a = 0, 2b = 1; a = 0; b = 1/2; a+hc = 1, 0+hc = 1, hc = 1, matrix A is not invertible for h = 0, if h ≠ 0 then c = 1/h and d = -1/2h; for any h other than zero the matrix is invertible; X = A^-1 = [(0,1/2)(1/h,-1/2h)]; if h = 0 then matrix A = [(1,0),(2,0)] and ρ(A) = 1

A matrix A is invertible when the inverse matrix A^-1 exists such that A⋅A^-1 = A^-1⋅A = I, where I denotes the identical matrix; the inverse matrix is only defined when matrix A is square

A matrix is symmetrical when it is equal to transpose matrix, A = A^T

A matrix is antisymmetrical when it is equal to the opposite of the transpose matrix, A = -A^T

The orthogonal matrix is a matrix whose inverse coincides with the transpose matrix

23 - THE CONCEPT OF LINEAR APPLICATION

Application is synonymous with function, but the term application is more used in this context

Function, or application, between the vector space ℝ² and the vector space ℝ: ℝ² → ℝ, f(x,y) = x+y; (x,y) → f → x+y; (x',y') → f → x'+y'; (x,y)+(x',y') = (x+x',y+y') → f → (x+x')+(y+y'); (x+y)+(x'+y') = (x+x')+(y+y'), the function f preserves the sum of pairs; a(x,y) = (ax,ay); a(x,y) → f → a(x+y); (ax,ay) → f → ax+ay; a(x+y) = ax+ay, the function f preserves the product by a number; a linear application between two vector spaces preserves the sum and the product by a number

Properties of linear applications: the function f is a linear application between the vector spaces V and W, f: V → W; f(v+v') = f(v)+f(v'), the sum is preserved; f(a⋅v) = a⋅f(v), the product by a number is preserved; f(0_V) = 0_W, the 0 of V is transformed into the 0 of W; f(-v) = -f(v), the opposite of v in the vector space V coincides with the opposite of f(v) in the vector space W

ℝ³ → f → ℝ², f(x,y,z) = (x,y+z); it is a linear application because it preserves the sum and the product; f(x,y,z) = (x,y+z), f(x',y',z') = (x',y'+z'), f(x+x',y+y',z+z') = (x+x',(y+y')+(z+z')); f(x,y,z) = (x,y+z), f(x',y',z') = (x',y'+z'), (x,y+z)+(x',y'+z') = (x+x',(y+z)+(y'+z')) = (x+x',(y+y')+(z+z')); the function of the sum of elements is equal to the sum of the functions of the individual elements, the sum is preserved; a(x,y,z) = (ax,ay,az), f(ax,ay,az) = (ax,ay+az) = a (x,y+z) = af(x,y,z), the product by a number is preserved; f(0,0,0) = (0,0), the 0 is transformed; f(-x,-y,-z) = (-x,-y-z) = -(x,y,z), because f(-v) = -f(v)

f(x,y) = x², f: ℝ² → ℝ; (x,y)+(x',y') = (x+x',y+y'), (1,2)+(3,1) = (1+3,2+1) = (4,3); f(1,2) = 1² = 1, f(3,1) = 3² = 9, f(4,3) = 4² = 16, 1+9 ≠ 16, the sum is not preserved; f(2,1) = 4, 5⋅4 = 20, 5(2,1) = (10,5), f(10,5) = 100, 20 ≠ 100, the product by a number is not preserved; it is not a linear application

f(x,y) = x+1, f: ℝ² → ℝ; it is not a linear application, the sum is not preserved, the product by a number is not preserved, f(0,0) = 1 ≠ 0

Properties of linear applications: f: V → W, f is a linear application, V and W are vector spaces; f(v) = 0_W ⇔ v ∈ Ker(f), f(0) = 0, f(0) = 0, but non-zero vectors can exist in the vector space V whose image is zero, these vectors form a set called Ker(f); the vector subspace Ker(f) can also contain only the null vector; w = f(v) ⇔ w ∈ Im(f), the vectors of W that are images of f, form Im(f), a subset of W

f(x,y) = x+y; f(0,0) = 0; f(x,y) = 0, x+y = 0, x = -y, (x,-x) → f → 0, Ker(f) is the subset of ℝ², formed by the pairs (x,-x), and this is a vector subspace generated by the pair (1,-1), because it is x(1,-1); f: V → W, if v → 0_W, it is the vector subspace Ker(f) ⊆ V

f: V → W, f(v) = w, Im(f) ⊆ W; f(x,y) = (x,x), ℝ² → ℝ²; the pairs (0,0), (1,1), (2,2), are images of the function; the pairs (1,0), (0,1), (1,2), are not images of the function; the image is the set of all the pairs obtained by multiplying the pair (1,1) by an arbitrary number x; is the vector subspace of ℝ² formed by multiples of (1,1) that is it has as base (1,1), and the image is only a part of ℝ²

f: V → W; Ker(f) or nucleus of f, is a subspace of V formed by all the elements that transform into 0

f: V → W; Im(f) or image of f, is a subspace of W formed by vectors which are a function of something

Ker is the abbreviation of the term kernel which means core or nucleus; Ker(f) is formed by the vectors that transform into 0; Ker(f) can be formed by only {0_V} or it can also contain other vectors, if it is formed by only {0_V} it means that f is an injective linear application; injective means that if v ≠ v' then f(v) ≠ f(v'); to check that a linear application is injective just check that there is only 0 in the ker(f)

Determine if the function f(x,y) = (x,x) is injective; if its core contains only 0 it is injective, but in this case it is not true; f(x,y) = (x,x) is (0,0) for every (0,y) = y(0,1) that is the vectorial subspace of ℝ² generated by the vector (0,1), Ker(f) is not reduced to just zero, so this application is not injective

f(x,y) = (2x+y,x-y); 2x+y = 0; x-y = 0, x = y; 2x+y = 0, x = y, 2y+y = 0, 3y = 0, y = 0; x = y = 0; there are no pairs other than the pair (0,0) that have the pair (0,0) as image; this f is an injective linear application

Ker(f), or nucleus of a linear application allows to study the injectivity; Img(f), or image of a linear application allows to study the surjectivity

The function f: V → W is surjective if ∀ w ∈ W ∃ v ∈ V: f(v) = w that is Im(f) = W; therefore, to understand if a linear application is surjective, it is necessary to study the image

f(x,y) = (x,x) is not a surjective linear application because Im(f) ≠ ℝ²

f(x,y) = 3x, ℝ² → ℝ; m ∈ ℝ, f(x,y) = m, 3x = m, x = m/3, f(m/3,y) = m, each number m has a counter image in ℝ²; this is a surjective linear application; by studying the image of f, which is a vector subspace, we understand if the linear application is surjective

A linear application f: V → W can be both injective and surjective at the same time; it is a function because for every v there is an image in W, ∀ v ∃ Im(v) = f(v) ⊆ W; it is injective because two different v have different images, v ≠ v' ⇒ f(v) ≠ f(v'); is surjective because every w comes from a v, ∀ w ∈ W ∃ v ∈ V: f(v) = w that is Im(f) = W; between V and W there is a one-to-one correspondence that is f is a bijective linear application, also called isomorphism; linear applications which are isomorphisms are simultaneously injective and surjective, the nucleus is reduced to only 0 and the image is all W

f(x,y) = (2x+y,x-y) it is an injective and surjective linear application that is bijective, therefore it is an isomorphism; to prove that this linear application is surjective, for every α and β, we have to find x and y; (α,β) = (2x+y,x-y); 2x+y = α; x-y = β, y = x-β; 2x+y = α, 2x+x-β = α, 3x-β = α, 3x = α+β, x = (α+β)/3; y = x-β, y = ((α+β)/3)-β; this linear application is injective and surjective that is bijective that is an isomorphism; an isomorphism is a bijective linear application

Find the Ker(f) of f(x,y,z) = (x+y,x-y,x+2z), ℝ³ → ℝ³; finding the Ker(f) means solving these first degree equations: x+y = 0, x-y = 0, x+2z = 0; finding the Im(f) means solving these first degree equations: α = x+y, β = x-y, γ = x+2z

Sum of linear applications: f: V → W, g: V → W, (f+g)(v) = f(v)+g(v)

Product of a number by an application: f: V → W, αf: v → αf(v)

Product or composition of linear applications: g∘f, f: V → W, g: W → Z, v → f → f(v) → g → g(f(v))

The function f: V → W is a linear application when f(v+w) = f(v)+f(w) and f(av) = af(v)

The function f is injective when f(v) = 0_W ⇔ v ∈ Ker(f) ⇔ Ker(f) = {0_V}

The function f is surjective when w = f(v) ⇔ w ∈ Im(f) ⇔ Im(f) = W

24 - LINEAR APPLICATIONS AND MATRICES

Matrix associated with a linear application: f(x,y,z) = (ax+by+cz,dx+ey+fz), ℝ³ → ℝ², M_f = [(a,b,c),(d,e,f)]; the fundamental versors of R³ are e₁ = (1,0,0), e₂ = (0,1,0), e₃ = (0,0,1); e₁ = (1,0,0) → f → (a,d); e₂ = (0,1,0) → f → (b,e); e₃ = (0,0,1) → f → (c,f); (a,d) is the first column of the matrix M_f that is f(e₁); (b,e) is the second column of the matrix M_f that is f(e₂); (c,f) is the third column of the matrix M_f that is f(e₃); the elements of the matrix M_f have a double meaning, the rows contain the coefficients of the variables x, y, z, the first row contains the coefficients of x, y, z of the first component, the second row contains the coefficients of x, y, z of the second component, the columns express the f of the fundamental versors, the first column indicates f of the first versor, the second column indicates f of the second versor, the third column indicates f of the third versor; now we have to understand what happens with linear applications between ℝⁿ and ℝ^m, ℝⁿ → ℝ^m

The relationship between a matrix and a linear application is given by the fact that the elements of the rows of the matrix indicate the coefficients of the components of the linear application, and the elements of the columns of the matrix indicate the images of the fundamental versors of the linear application

f: ℝⁿ → ℝ^m, f is a linear application of ℝⁿ in ℝ^m, n is the number of elements contained in a row and equals the number of columns, m is the number of elements contained in a column and equals the number of rows; e₁ = (1,0,...,0) → f → (a_1,1,a_2,1,...,a_m,1), e₂ = (0,1,...,0) → f → (a_1,2,a_2,2,...,a_m,2), e_n = (0,0,...,n) → f → (a_1,n,a_2,n,...,a_m,n); in this way we have constructed n vectors which are the images of the fundamental versors of ℝⁿ; M_f = [(a_1,1,a_1,2,...,a_1,n),(a_2,1,a_2,2,...,a_2,n),...,(a_m,1,a_m,2,...,a_m,n)], this is the matrix associated with the linear application f, the first column is the image of the first fundamental versor or f(e₁), the second column is the image of the second fundamental versor or f(e₂), the nth column is the image of the nth fundamental versor or f(e_n); f operates on tuples, f(x₁,x₂,...,x_n) that is an element of ℝⁿ, and on this tuple is applied the linear application and the result is an element of ℝ^m, so it must have m components, and the first row is the set of coefficients that I must apply to x₁, x_n to have the first component, f(x₁,x₂,...,x_n) = (a_1,1x₁+a_1,2x₂+...+a_1,nx_n,a_2,1x₁+a_2,2x₂+...+a_2,nx_n,...,a_m,1x₁+a_m,2x₂+...+a_m,nx_n); the rows of the matrix are the coefficients to assig to x1, x2, xn, with the first row of the matrix we get the first component of f(x1,xn), with the second row of the matrix the second component, with the last row of the matrix we get the last component; the matrix M_f associated with the linear application f: ℝⁿ → ℝ^m has m rows and n columns, the number of rows coincides with the dimension of the arrival space ℝ^m, the number of columns coincides with the dimension of the space of ℝⁿ on which f is defined that is the starting space; the components of f(x₁,x₂,x_n) are m elements and are each a linear combination of x₁, x₂, x_n that is a homogeneous polynomial of first degree in x₁, x₂, x_n, with no constant term, and this is valid for all the components; a linear application is characterized by the fact that f of a tuple has m components, each of which is a linear combination, a homogeneous first degree polynomial in x₁, x₂, x_n with coefficients deducible from the matrix

f(x,y) = (x+2y,x-2y,x+y), ℝ² → ℝ³; the associated matrix must have 2 columns and 3 rows; M_f = [(1,2),(1,-2),(1,1)]; f(1,0) = (1,1,1) that is the first column, f(0,1) = (2,-2,1) that is the second column; the application is linear because the 3 components (x+2y,x-2y,x+y) are homogeneous first degree polynomials in the variables x and y in ℝ²; a polynomial is homogeneous when all the monomials that compose it have the same degree

g(x,y,z) = (x+z,y+1), ℝ³ → ℝ² that means from 3 to 2 variables; it is not a linear application because x+z is a homogeneous polynomial of first degree in (x,y,z), but y+1 is not a homogeneous polynomial of first degree in (x,y,z) because there is the constant term +1

To create the matrix associated with a linear application, the coefficients of the components are written on the rows, or the images of the fundamental versors are written on the columns

From a matrix to a linear application, using this matrix with 2 rows and 3 columns, A = [(1,-1,3),(2,2,0)]; f: ℝ³ → ℝ², f(x,y,z) = (x-y+3z,2x+2y); it is a linear application because x-y+3z and 2x+2y are homogeneous polynomials of first degree in (x,y,z); from this linear application it is possible to obtain the matrix again, so the correspondence is one-to-one, M_f = [(1,-1,3),(2,2,0)] = A

From a linear application we obtain a matrix, and from a matrix we obtain a linear application; from a linear application of ℝⁿ in ℝ^m we obtain a matrix with m rows and n columns; from a matrix with m rows and n columns we obtain a linear application of ℝⁿ in ℝ^m

A linear application f: ℝⁿ → ℝ^m can be injective when v ≠ v' ⇒ f(v) ≠ f(v'), surjective if Im(f) = ℝ^m; when a linear application is injective and surjective or bijective, it is invertible that is the linear application f^-1 is the inverse of the linear application f, and ℝⁿ and ℝ^m must coincide for a linear application to be invertible that is f: ℝⁿ → ℝⁿ, v → f → f(v) = v', v' → f^-1 → v = f^-1(v')

A linear application f: ℝⁿ → ℝ^m is surjective if Im(f) = ℝ^m; the matrix M_f consists of the columns f(e₁), ..., f(e_n); a vector v is a linear combination of e₁, ..., e_n; f(v) is a linear combination of f(e₁), ..., f(e_n); f(e1), ..., f(en) are generators of the image of f; Im(f) is a vector subspace of ℝ^m generated by these vectors, and the linear application is surjective if f(e₁), ..., f(e_n) generate all the vector space ℝ^m; the vector space ℝ^m has dimension m, and to calculate the dimension of the space generated by these generating vectors we must calculate the rank; if ρ(M_f) = m then the dimension of the column space is m that is Im(f) = ℝ^m; a linear application is surjective if the rank of the matrix is equal to the number of rows, ρ(M_f) = m = number of rows

A linear application f: ℝⁿ → ℝ^m is injective if Ker(f) = {0_ℝⁿ}; dim(Ker(f)) = n-dim(Im(f)) = n-ρ(M_f) = 0, ρ(M_f) = n = number of columns; the linear application is injective if the rank of the associated matrix is equal to the number of columns

The rows of the matrix indicate if the linear application is surjective, the columns of the matrix indicate if the linear application is injective

f: ℝⁿ → ℝⁿ, ρ(M_f) = number of columns n = number of rows n, so the linear application is injective and surjective at the same time, so it is an invertible linear application

f: ℝⁿ → ℝ^m, ρ(M_f) = n, Ker(f) = {0_V}, the linear application is injective

f: ℝⁿ → ℝ^m, ρ(M_f) = m, Im(f) = ℝ^m, the linear application is surjective

f: ℝⁿ → ℝ^m, ρ(M_f) = n = m, the linear application is invertible, the matrix is square

f(x,y,z,t) = (x-y+z,x+y-t), ℝ⁴ → ℝ²; the matrix has 2 rows and 4 columns, M_f = [(1,-1,1,0),(1,1,0,-1)]; this matrix has two non-zero rows, ρ(M_f) = 2 = dim(ℝ²), so this matrix represents a surjective linear application; to be injective the number of columns must be equal to the rank of the matrix, and in this example it is false because the columns are 4 and ρ(M_f) = 2; this linear application is surjective but not injective

Considering f: ℝⁿ → ℝ^m the rank of the associated matrix cannot be greater than the smaller of the two numbers n and m; if n > m, the rank of the matrix cannot be n, and the linear application cannot be injective; if n < m, the rank of the matrix cannot be m, and the linear application cannot be surjective

f(x,y) = (x-y,x+y,x), ℝ² → ℝ³; the matrix has 2 columns and 3 rows, M_f = [(1,-1),(1,1),(1,0)]; ρ(M_f) = 2 = dim(ℝ²); this linear application is injective but not surjective because the rank of the matrix is not 3

f(x,y) = (x-y,x+y), ℝ² → ℝ²; the matrix has 2 columns and 2 rows, M_f = [(1,-1),(1,1)]; ρ(M_f) = 2; to calculate the rank there is no need to reduce the matrix, because the two rows are not linearly independent, because the second row is not a multiple of the first; the rank of the matrix is 2 which is equal to the number of rows and columns, therefore the linear application is simultaneously injective and surjective, therefore it is invertible; f(x,y) = (x-y,x+y) is invertible; the inverse linear application of f is f^-1: ℝ² → ℝ²; M_f is the matrix associated with the linear application f, M_f^-1 is the matrix associated with the inverse linear application f^-1, M_f^-1 is equal to the inverse of the matrix M_f, M_f^-1 = (M_f)^-1; to find the inverse of the linear application we have to calculate the inverse of the matrix; M_f⋅(M_f)^-1 = I, [(1,-1),(1,1)][(a,b)(c,d)] = [(1,0),(0,1)], a-c = 1, b-d = 0, a+c = 0, b+d = 1; a-c = 1, a = c+1; b-d = 0, b = d; a+c = 0, a = -c; b+d = 1, b = 1-d; b = d, b+d = 1, d+d = 1, 2d = 1, d = 1/2; b = d, b = 1/2; a = c+1, a = -c, -c = c+1, -2c = 1, 2c = -1, c = -1/2; a = -c, c = -1/2, a = -(-1/2) = 1/2; a = 1/2, b = 1/2, c = -1/2, d = 1/2; M_f^-1 = (M_f)^-1 = [(1/2,1/2),(-1/2,1/2)]; the inverse linear application, associated with the inverse matrix is f^-1(x,y) = ((1/2)x+(1/2)y,(-1/2)x+(1/2)y)

M_f^-1 = (M_f)^-1

g∘f, M_g⋅M_f = M_g∘f; g∘f is the linear application composed of the linear application g and the linear application f; the product of the matrices of linear applications is equal to the matrix of the composite application

f⋅f^-1 = identity; M_f⋅(M_f)^-1 = I

25 - LINEAR SYSTEMS - PART 1 - RESOLUTION OF REDUCED SYSTEMS

To find the Ker(f) of the linear application f(x,y,z) = (x+y,x-z), ℝ³ → ℝ², we need to solve the system of linear equations of first degree {x+y = 0, x-z = 0}; y = -x, x = z; the solution is (x,-x,x), by varying x we find all the solutions

Example of a system of linear equations of first degree: {x+3y-2z+t = 5, 2x-8y+4z-t = 0, -x+5y-3t = 12}

{x+y+2z = 1, 2y-z = 0, 4y = 5} this system of linear equations of first degree is easy to resolve because there are many zeros

A system of m equations with n unknowns can be represented like this: {a_1,1x₁+a_1,2x₂+...+a_1,nx_n = b₁, a_2,1x₁+a_2,2x₂+...+a_2,nx_n = b₂, ..., a_m,1x₁+a_m,2x₂+...+a_m,nx_n = b_m}; the different variables a form a matrix with m rows and n columns; the different variables a are real numbers; in addition to the matrix formed by the coffiecients of the variables x, there is also a column formed by the constant terms; the matrix formed by the coefficients of the x is indicated by A, the matrix formed by the coefficients of the x and by the constant terms is indicated by (A|B)

A system of linear equations follows the formula AX = B, where A is the matrix of the coefficients formed by m rows and n columns, X is the matrix of unknowns, B is the matrix of constant terms and consists of a column; (A|B) = [(a_1,1,...,a_1,n),...,(a_m,1,...,a_m,n)][(b₁),...,(b_m)]; x = [(x₁),(x₂),...,(x_n)]

x+y-z = 1, 2x+y-2z = 0, x+y = 4; (A|B) = [(1,1,-1)|(1),(2,1,-2)|(0),(1,1,0)|(4)], X = [(x),(y),(z)]; AX = B

A linear system can be solved more easily if there are many zeros; if AX = B is a linear system of m equations with n unknowns, it is easier to solve the system if it is reduced

The linear system AX = B is reduced when the matrix A is reduced by rows

{2x-y+z = 1, 4y-3z = 0, 5y = 10}, this is a system of 3 equations with 3 unknowns x, y, z; AX = B, A = [(2,-1,1),(0,4,-3),(0,5,0)], B = [(1),(0),(10)]; the matrix A of the coefficients is reduced by rows, and the special elements of the matrix A are 2, 5, -3; to solve a reduced system we start from the last row of the matrix that is the last equation, because the last equation contains more zeros that is fewer unknowns; 5y = 10, y = 10/5, y = 2; 4y-3z = 0, 4⋅2-3z = 0, 8-3z = 0, 3z = 8, z = 8/3; 2x-y+z = 1, 2x = y-z+1, x = (y-z+1)/2, x = (2-(8/3)+1)/2 = (3-(8/3))/2 = (1/3)/2 = 1/6; 5y = 10, y = 2, 5⋅2 = 10, 10 = 10, true; 4y-3z = 0, y = 2, z = 8/3, 4⋅2-3(8/3) = 0, 8-8 = 0, 0 = 0, true; 2x-y+z = 1, y = 2, z = 8/3, x = 1/6, 2⋅(1/6)-2+(8/3) = 1, (1/3)-2+(8/3) = 1, (9/3)-2 = 1, 3-2 = 1, 1 = 1, true; the solution is easy when the system is reduced by rows; (A|B) = [(2,-1,1)|(1),(0,4,-3)|(0),(0,5,0)|(10)], this is a row reduced system because the matrix A is row reduced, and therefore the complete matrix (A|B) is also row reduced; when a system is row reduced, it can be solved starting from the last equation with fewer unknowns, up to the first; this example is simple because the system is made up of only 3 equations with 3 unknowns, with a row reduced matrix of rank 3, the situation is more complicated when the number of equations and the number of unknowns are not the same

A system AX = B is reduced when the matrix A is reduced that is when in the last row there are all zeros except one number, in the penultimate row there are two numbers, up to the first row where there are no zeros but only numbers; special elements are those numbers below which there are only zeros

The general rule to solve a row reduced system AX = B is that we take the last equation, where there is the least possible number of unknowns because there are as many zeros as possible, we solve this equation with respect to one of the unknowns as a function of the others, we repeat everything in the penultimate equation, and so on up to the first

(A|B) = [(2,1,4,1)|(1),(0,2,-1,1)|(1),(0,0,2,1)|(0)], this is the complete matrix of the system AX = B, and by counting the elements of the first row we understand that there are 4 unknowns, but there are only 3 equations; the last row of the matrix allows to obtain the last equation that is 2z+t = 0, so it is possible to derive one of these two unknowns as a function of the other; t = -2z; the second row of the matrix allows to obtain the second equation that is 2y-z+t = 1; t = -2z, 2y-z+t = 1, 2y-z-2z = 1, 2y-3z = 1; we have to write the unknown y as a function of the unknown z because the unknown y does not appear in the last equation; 2y-3z = 1, 2y = 3z+1, y = (3z+1)/2; now it is possible to replace all the unknowns found in the first equation; 2x+y+4z+t = 1, t = -2z, y = (3z+1)/2, 2x+((3z+1)/2)+4z-2z = 1; we need to obtain the unknown x as a function of the unknown z because the unknown x does not appear in the other equations; 2x+((3z+1)/2)+4z-2z = 1, 2x+((3z+1)/2)+2z = 1, 2x = -((3z+1)/2)-2z+1, 2x = ((-3z-1)/2)-2z+1, 2x = ((-3z-1)/2)-(4z/2)+(2/2), 2x = (-3z-1-4z+2)/2, 2x = (-7z+1)/2, x = (-7z+1)/4; there are 4 unknowns and 3 equations, one of the unknowns can be chosen at will, in this example we have chosen the unknown z, the other unknowns are expressed as a function of z, then z is the free unknown; free unknown means that it can be assigned at will, it means that the system has infinite solutions; z is the free unknown and the system has infinite solutions, so for each z we have a different solution; z = 0, x = (-7z+1)/4, x = 1/4; z = 0, y = (3z+1)/2, y = 1/2; z = 0, t = -2z, t = 0; if we assign another value to z, we find other values of x, y, t that is a different solution of the linear system

If a reduced system has a free unknown, then the system has infinite solutions

To solve a system of linear equations we start from the last equation, taking an unknown and obtain it as a function of the others, we go up to the penultimate equation and we obtain an unknown that does not appear in the last equation and we continue up to the first equation; never start with the first equation, always start with the last equation and go up

A reduced system can be without solutions

(A|B) = [(2,1,4,1)|(1),(0,2,-1,1)|(1),(0,0,2,1)|(0),(0,0,0,0)|(2)]; the last equation is 0x+0y+0z+0t = 2, so there are no solutions

There are systems without solutions and they are called irresoluble or incompatible systems; if the system is reduced and on a row of the coefficients matrix there are all zeros and the constant term is different from zero, then the system has no solutions that is the system is irresoluble, also called incompatible system

With a system of the type AX = B we must understand if the system is solvable that is if there are solutions; solvable means that there are numbers that put into the equation AX = B lead to an identity that is the first member is equal to the second member; if the system is solvable, it is necessary to find the solutions or to find all the tuples of numbers that put into the equation AX = B give the identity; if the system is reduced by rows, finding the solutions is simple, but if the system is not reduced by rows then finding the solutions can be complicated

What we have to do is transform the system AX = B into the reduced system by rows A'X = B' which has the same solutions as the starting system; if the system is reduced by rows we immediately understand if it is solvable and we can apply the method to find the solutions

The method is to transform the system AX = B into the row reduced system A'X = B' which has the same solutions as the starting system AX = B; if the system is row reduced we immediately understand if it is solvable and we can apply the method to find the solutions

The system AX = B can be reduced with the elementary transformations on the rows to become the system A'X = B'; the complete matrix (A|B) is reduced as a consequence of the row reduction of the matrix A of the coefficients; the matrix A' of the coefficients is row reduced and the linear system A'X = B' is therefore row reduced; after the system has been reduced, solutions can be found

{x+y+z-t = 1, x-y+2z+t = 0}, this is a system of 2 equations with 4 unknowns, (A|B) = [(1,1,1,-1)|(1),(1,-1,2,1)|(0)], as can be seen from the matrix the system is not reduced; we can reduce it by adding the first line to the second line, (A|B) = [(1,1,1,-1)|(1),(1,-1,2,1)|(0)], R₂ → R₂+R₁, (A|B) = [(1,1,1,-1)|(1),(2,0,3,0)|(1)], the reduced system is {x+y+z-t = 1, 2x+3z = 1}, reduce the system by rows is the best way to solve it; 2x+3z = 1, 2x = -3z+1, x = (-3z+1)/2

; x+y+z-t = 1, ((-3z+1)/2)+y+z-t = 1, ((-3z+1)/2)+(2z/2)+y-t = 1, ((-3z+1+2z)/2)+y-t = 1, ((-z+1)/2)+y-t = 1, t = ((-z+1)/2)+y-1

A system can be solved using the row reduction, instead the column reduction is not useful; if the coefficient matrix is not reduced, then we have to reduce it by rows

26 - LINEAR SYSTEMS - PART 2 - ROUCHE-CAPELLI THEOREM - FREE UNKNOWNS

AX = B, A = [(1,1,-1,1,1),(-1,-1,0,1,1),(-3,-1,0,1,2)], B = [(0),(1),(1)]; the matrix A of the coefficients consists of 3 rows and 5 columns, therefore the linear system is composed of 3 equations with 5 unknowns; to solve this system we need to reduce the complete matrix (A|B) = [(1,1,-1,1,1)|(0),(-1,-1,0,1,1)|(1),(-3,-1,0,1,2)|(1)], R₃ → R₃-R₂, (A|B) = [(1,1,-1,1,1)|(0),(-1,-1,0,1,1)|(1),(-2,0,0,0,1)|(0)], now the matrix is row reduced and it is pissible to find the linear system {x+y-z+t+u = 0, -x-y+t+u = 1, -2x+u = 0}; -2x+u = 0, u = 2x; -x-y+t+u = 1, -x-y+t+2x = 1, x-y+t = 1, the unknown t is present in the second equation but is not present in the last equation, t = -x+y+1; x+y-z+t+u = 0, x+y-z-x+y+1+2x = 0, 2x+2y-z+1 = 0, 2x+2y-z = -1, the unknown z appears only in the first equation, z = 2x+2y+1; z = 2x+2y+1, t = -x+y+1, u = 2x; x and y are free unknowns that is they can be assigned at will, but the unknowns z, t, u, cannot be assigned at will because they depend on x and y; this system contains 5 unknowns, 2 are free unknowns, and 3 unknowns depend on the 2 free unknowns; the reduced matrix of the coefficients has rank 3 because it consists of 3 rows, ρ(A') = 3, and the rank of the reduced matrix of the coefficients A' is equal to the rank of the reduced complete matrix (A'|B'), ρ(A') = ρ(A'|B') = 3; this is the first property of the Rouché-Capelli theorem, a system that is solvable has the rank of the coefficient matrix equal to the rank of the complete matrix, this is the solvability condition; the number of non-free unknowns is equal to the rank of the coefficient matrix, and the number of free unknowns is equal to the total number of unknowns minus the rank of the coefficient matrix, number of free unknowns = n-ρ(A), and this is another property of the Rouché-Capelli theorem

AX = B, (A|B) = [(1,1,1)|(1),(2,1,1)|(0),(3,2,2)|(4)], R₂ → R₂-R₁, R₃ → R₃-2R₁, (A|B) = [(1,1,1)|(1),(1,0,0)|(-1),(1,0,0)|(2)], R₃ → R₃-R₂, (A|B) = [(1,1,1)|(1),(1,0,0)|(-1),(0,0,0)|(3)], transforming the last row in equation is 0 = 3, the system is without solution; the rank of the reduced coefficient matrix ρ(A) = 2, because the rank is given by the number of non-zero rows; the rank of the reduced complete matrix ρ(A|B) = 3, because the third row of the reduced complete matrix is not null; ρ(A) = 2 < ρ(A|B) = 3; if the rank of the coefficient matrix is different from the rank of the complete matrix, this indicates that the system is not solvable;

When a linear system has no solutions, the rank of the complete matrix is different from the rank of the coefficient matrix

A system is said to be solvable or compatible when it admits at least one solution

Rouché-Capelli theorem: in a linear system AX = B of m equations in n unknowns with ρ(A) = p and ρ(A|B) = q, the system is solvable ⇔ p = q, then there are ∞^n-p solutions, or n-p free unknowns

The Rouché-Capelli theorem allows us to understand if a linear system admits solutions even before solving it; a linear system can be studied using the Rouché-Capelli theorem before being solved; AX = B, if ρ(A) = ρ(A|B) then the system can be solved and we have to calculate the solutions, but if ρ(A) ≠ ρ(A|B) then the system cannot be solved and it is useless to calculate the solutions; to calculate the rank it is necessary to reduce by rows the matrix A of the coefficients, and the rank is the number of non-zero reduced rows; the Rouché-Capelli theorem says that the number of free unknowns is equal to the total number of unknowns minus the rank, number of free unknowns = n-ρ(A), and if we find a different number of free unknowns then there is an error in the resolution

{x+y = 1, x-3y = 2, x+7y = 0}, this linear system has 2 unknowns and 3 equations; the number of equations is not important, the rank is important; (A|B) = [(1,1)|(1),(1,-3)|(2),(1,7)|(0)], the complete matrix (A|B) has 3 rows and 3 columns, but the coefficient matrix A has 3 rows and 2 columns, so its rank cannot be greater than 2, ρ(A) ≤ 2; (A|B) = [(1,1)|(1),(1,-3)|(2),(1,7)|(0)], R₂ → R₂-R₁ and R₃ → R₃-R₁, (A|B) = [(1,1)|(1),(0,-4)|(1),(0,6)|(-1)], R₃ → (4/6)R₃+R₂, (A|B) = [(1,1)|(1),(0,-4)|(1),(0,0)|(1/3)]; the matrix of the coefficients A has rank 2 which is the maximum possible, ρ(A) = 2, the complete matrix (A|B) has rank 3, ρ(A|B) = 3, and from this we understand that the system is not solvable; for the Rouché-Capelli theorem, this linear system is not solvable, therefore it is useless to try to solve it, because there are no solutions

A homogeneous system AX = 0 is a system where the constant terms are 0; a homogeneous system AX = 0 always has solutions for the Rouché-Capelli theorem; a homogeneous system AX = 0 always has a number of free unknowns which is equal to the number of unknowns minus the rank of the matrix of the coefficients A, number of free unknowns = n-ρ(A); for the Rouché-Capelli theorem, in a homogeneous system AX = 0, the matrix B of the constant terms is zero, therefore the rank of the complete matrix can never be greater than the rank of the coefficient matrix; in a homogeneous system ρ(A) = ρ(A|B); moreover if x₁ = x₂ = ... = x_n = 0, all null unknowns form a solution; homogeneous systems always have solutions

In a linear system consisting of n unknowns and m equations, p = q where p is the rank of the matrix of the coefficients and q is the rank of the complete matrix, the free unknowns are n-p, so there is only one solution when there are no free unknowns or when n = p, or when the number of unknowns is equal to the rank of the coefficient matrix; in general, for any linear system, if there are no free unknowns, there is one and only one solution, and this happens when the number of unknowns is equal to the rank of the coefficient matrix; when there are no free unknowns, it is denoted by ∞^n-p = 1 = ∞⁰, but it is only a symbolic notation with no mathematical meaning

A homogeneous system has only one solution, which is null, formed by all zeros, when n = p, when the number of unknowns is equal to the rank of the coefficient matrix; a homogeneous system has other solutions besides the null solution when n-p > 0 that is n > p that is when the number of unknowns is greater than the rank of the coefficient matrix

A non-homogeneous system can have no solutions, while a homogeneous system cannot have no solutions; homogeneous and non-homogeneous systems have only one solution when all the unknowns can assume only one value; non-homogeneous systems have infinite solutions when they have at least one free unknown; it is not possible for a linear system to have a finite number of solutions other than 1 or 0; either there is 1 solution or there are infinite solutions; homogeneous systems always have either only 1 null solution, or they have infinite solutions; in a homogeneous system, looking for non-zero solutions is equivalent to seeing if there are infinite solutions that is the number of unknowns must be greater than the rank of the coefficient matrix also called the system rank, n-p > 0, n > p

{x-y+z = 0, 2x+3y-5z = 0}, it is a system of 2 equations and 3 unknowns; we must understand if this system admits solutions other than the null solution; p = ρ(A) ≤ 2 and the number of unknowns n is 3, n-p > 0, there is at least one free unknown and therefore there are infinite solutions; to find the solutions it is necessary to reduce the matrix by rows

{x-y+z = 0, 2x+3y-5z = 0, x+5y-z = 0, y+2z = 0}, is a homogeneous linear system of 4 equations and 3 unknowns; usually when there are many equations and few unknowns there are no solutions, but a homogeneous system always has solutions, so it is necessary to understand if there are non-zero solutions; A = [(1,-1,1),(2,3,-5),(1,5,-1),(0,1,2)], now it is necessary to calculate the rank of the matrix which is certainly at least 2 because we immediately see that the first and second rows are linearly independent; A = [(1,-1,1),(2,3,-5),(1,5,-1),(0,1,2)], R₂ → R₂-2R₁ and R₃ → R₃-R₁, A = [(1,-1,1),(0,5,-7),(0,6,-2),(0,1,2)], R₃ → 5R₃-6R₂ and R₄ → 5R₄-R₂, A = [(1,-1,1),(0,5,-7),(0,0,32),(0,0,17)], R₄ → 32R₄-17R₃, A = [(1,-1,1),(0,5,-7),(0,0,32),(0,0,0)]; the rank of the matrix is 3, p = ρ(A) = 3, the number of unknowns n is 3, n = p, and therefore there is only a null solution

If f: ℝⁿ → ℝ^m is a linear application, to find the elements that form the nucleus of the linear application we have to solve a homogeneous system; if A is the matrix associated with the linear application, A = M_f, the system we have to solve to find the nucleus of the linear application is AX = 0, X = [(x₁),...,(x_n)], f(x₁,...,x_n) = (a_1,1x₁+...); M_f = A, Ker(f) = solutions of AX = 0, dim(Ker(f)) = n-ρ(M_f), the dimension of the nucleus is related to the rank of the M_f matrix that is the dimension of the nucleus is equal to the dimension of the starting space minus the rank of the matrix M_f which is the dimension of the image; the nucleus is the set of solutions of the homogeneous system AX = 0; ρ(A) = ρ(M_f) = p, the nucleus is the set of solutions of a homogeneous system that has an associated matrix of rank p, then n-p is the number of free unknowns which is equal to the dimension of Ker(f)

To solve any linear system AX = B it is necessary to reduce matrix A by rows; if AX = B is solvable then ρ(A) = ρ(A|B) = p, and the system has n-p free unknowns, where n is the number of unknowns and p is the rank of matrix A

27 - LINEAR SYSTEMS - PART 3 - EXAMPLES AND APPLICATIONS

Counter image of a matrix: f: ℝⁿ → ℝ^m is a linear application and the associated matrix is M_f = A, the counter image of the matrix M_f is given by AX = [(a₁),...,(a_m)] and can be written as f^-1(a₁,...,a_m), but f^-1 is not the inverse function

f(x,y,z) = (x+y,x-z), is a linear application ℝ³ → ℝ²; we choose an element of ℝ², a vector v of random components (1,-1), and find all vectors of ℝ³, the triples (x,y,z), such that f(x,y,z) = (1,-1); ; M_f = A = [(1,1,0),(1,0,-1)]; {x+y = 1, x-z = -1}; x-z = -1, x = z-1; x+y = 1, y = -x+1 = -(z-1)+1 = -z+1+1 = -z+2; z is the free unknown and the solution is (z-1,-z+2,z); the counter images of the vector (1,-1) are infinite and are triples in the form (z-1,-z+2,z); the counter image for z = 0 is (-1,2,0); the counter image for z = 10 is (9,-8,10)

It is possible that there is no counter image

f(x,y,z) = (2x-y,2x-y,0), is a linear application ℝ³ → ℝ³ with associated matrix A = [(2,-1,0),(2,-1,0),(0,0,0)]; to calculate the counter image of the triad (4,4,1) it is necessary to solve the system {2x-y = 4, 2x-y = 4, 0 = 1}, which has no solutions, therefore the counter image of the vector (4,4,1) does not exist; in this linear application there is no counter image; in this case the linear application is not surjective, because not all the elements of ℝ³ have a counter image; when in the matrix there is a null row or a null column, then the linear application is not surjective; all columns are dependent so ρ(A) = 1 < 3, the linear application is neither injective nor surjective and this explains why there are vectors without counter image; it is interesting to find the counter image of zero, in this case of the triad (0,0,0), f^-1(0,0,0), {2x-y = 0, 2x-y = 0, 0 = 0}, y = 2x, x can be assigned at will, but z can also be assigned at will, x and z are 2 free unknowns, whereas y is a non-free unknown that depends on x, so the solution is (x,2x,z), therefore there are infinite elements in the nucleus of this linear application; the nucleus coincides with the counter image of the null vector; the nucleus coincides with the counter image of the null vector and contains infinite elements dependent on the 2 free unknowns x and z; examples of nucleus elements are (1,2,0), (0,0,1), (10,20,85); the nucleus of this linear application is (x,2x,z), there are 2 free unknowns and this means that the dimension of the nucleus is 2; if x = 1 and z = 0 we find the vector (1,2,0), if x = 0 and z = 1 we find the vector (0,0,1), and these 2 vectors together form a base of the nucleus; when in f^-1 of 0, in the nucleus, there are free unknowns, it is enough to set them to 0 and 1 and we find a base of the nucleus

The counter image of 0 is the nucleus of the linear application

The matrix A multiplied by its inverse A^-1 is equal to the identity matrix I, A⋅A^-1 = A^-1⋅A = I; find the inverse of the matrix A = [(1,1),(2,3)]; A⋅A^-1 = I, [(1,1),(2,3)][(a,b),(c,d)] = [(1,0),(0,1)], {a+c = 1, b+d = 0, 2a+3c = 0, 2b+3d = 1}; a+c = 1, a = -c+1; b+d = 0, b = -d; 2a+3c = 0, a = -c+1, 2(-c+1)+3c = 0, -2c+2+3c = 0, c+2 = 0, c = -2; a = -c+1, c = -2, a = -(-2)+1 = 2+1 = 3, a = 3; 2b+3d = 1, b = -d, 2(-d)+3d = 1, -2d+3d = 1, d = 1; b = -d, d = 1, b = -1; a = 3, b = -1, c = -2, d = 1; A^-1 = [(3,-1),(-2,1)]; A⋅A^-1 = I, [(1,1),(2,3)][(3,-1),(-2,1)] = [(1,0),(0,1)]; 1⋅3+1(-2) = 1, 3-2 = 1, 1 = 1; 1(-1)+1⋅1 = 0, -1+1 = 0, 0 = 0; 2⋅3+3(-2) = 0, 6-6 = 0, 0 = 0; 2(-1)+3⋅1 = 1, -2+3 = 1, 1 = 1; using this method, to find the inverse matrix, we solve a linear system or equations of first degree where the unknowns are the elements of the inverse matrix; if we look for the inverse of a square matrix 2x2 the unknowns are 4 as in this case; if we look for the inverse of a square matrix 3x3 the unknowns are 9; if the matrix were of order 25 we would have a system of linear equations with 25⋅25 = 625 unknowns; if n is the order of the matrix then n² is the number of unknowns and equations, so it is necessary to find a better method for calculating the inverse matrix

A⋅A^-1 = I, A⋅X = I, [(a,b,c),(a',b',c'),(a",b",c")]X = [(1,0,0),(0,1,0),(0,0,1)], X = [(X₁),(X₂),(X₃)]; X₁, X₂, X₃ = rows of X = A^-1; we consider unknowns not the coefficients of the matrix X, but the rows of the matrix X; instead of having the 9 coefficients of the matrix X as unknowns, we take the 3 rows of the matrix X as unknowns, and obviously each of the unknowns is a triple because each row of the matrix contains 3 coefficients

Let us consider a linear system in which the unknowns are the rows; we can consider the rows X₁, X₂, X₃ as 3 unknowns; we consider the rows as unknowns of a 3x3 matrix, a linear combination of 3 rows and each row is a triad equal to the corresponding row of the identical matrix; {aX₁+bX₂,cX₃ = (1,0,0), a'X₁+b'X₂,c'X₃ = (0,1,0), a"X₁+b"X₂,c"X₃ = (0,0,1)}; in this linear system the unknowns are triples, the coefficients are the elements of the matrix A, and the column of constant terms is not a column because each of the constant terms is a triplet; this linear system is associated with the coefficient matrix A = [(a,b,c),(a',b',c'),(a",b",c")] and the complete matrix (A|B) = [(a,b,c)|(1,0,0),(a',b',c')|(0,1,0),(a",b",c")|(0,0,1)]; this system can be solved by reducing by rows, finding the reduced system, the unknowns are the rows of the inverse matrix, following the rules of linear systems

A = [(2,1),(0,3)]; inverse matrix A^-1 = X must have 2 rows and two columns; matrix A has two linearly independent rows so the rank is 2, it is invertible that is the inverse matrix exists; X₁ and X₂ are the rows of the inverse matrix A^-1, and they are the unknowns of our linear system; A⋅X = I, [(2,1),(0,3)][(X₁),(X₂)] = [(1,0),(0,1]; 2X₁+X₂ = (1,0), 3X₂ = (0,1); we have to solve a system of 2 equations with 2 unknowns X₁ and X₂ which represent 2 rows with 2 elements each; 3X₂ = (0,1), X₂ = (0,1/3); 2X₁+X₂ = (1,0), 2X₁ = (1,0)-X₂, X₂ = (0,1/3), 2X₁ = (1,0)-(0,1/3), 2X₁ = (1,-1/3), X₁ = (1/2,-1/6); X = A^-1 = [(1/2,-1/6),(0,1/3)]; A⋅A^-1 = I, [(2,1),(0,3)][(1/2,-1/6),(0,1/3)] = [(1,0),(0,1)]; 2(1/2)+1⋅0 = 1+0 = 1, 2(-1/6)+1(1/3) = -1/3+1/3 = 0, 0(1/2)+3⋅0 = 0+0 = 0, 0(-1/6)+3(1/3) = 0+1 = 1

It is impossible to find the inverse matrix of the matrix A = [(2,1),(4,2)], because this matrix has rank 1 and therefore corresponds to a non-invertible linear application; AX = I, 2X₁+X₂ = (1,0), 4X₁+2X₂ = (0,1); (A|B) = [(2,1)|(1,0),(4,2)|(0,1)], R₂ → R₂-2R₁, (A|B) = [(2,1)|(1,0),(0,0)|(-2,1)], therefore it is impossible to find the inverse matrix because matrix A is not invertible

To the linear systems used to find the inverse of a matrix, we can apply the Rouchè-Capelli theorem, precisely this theorem says that a linear system admits solutions if and only if the rank of the matrix of the coefficients A is equal to rank of the complete matrix (A|B), ρ(A) = ρ(A|B); the identical matrix I has rows that are all linearly independent as they are fundamental versors, then the complete matrix (A|I) has the maximum possible rank, therefore the linear system that allows to find the inverse matrix is solvable if and only if the matrix of coefficients A has the highest possible rank, and since it is a square matrix of order n, its rank is equal to the number n of the rows, which is equal to the number of columns, and to the order of the matrix

With a matrix of order 3, if we use the rows of the inverse matrix we get 3 equations, if we use the coefficients of the inverse matrix we get 9 equations; A is a 3x3 matrix, we look for the inverse following the formula AX = I, if we take the elements of X as unknowns we get 9 equations, X = [(a,b,c),(d,e,f),(g,h,i)]; if we take the rows X₁, X₂, X₃, we get 3 equations and each unknown is a triple; A = [(1,2,1),(2,0,1)(0,0,-1)], find the inverse matrix of this 3x3 matrix; finding the inverse means writing the linear system that has the 3 rows of the inverse matrix as unknowns, each of which is a triple, the coefficient matrix is A, and the constant terms are the 3 rows of the identical 3x3 matrix; (A|I) = [(1,2,1)|(1,0,0),(2,0,1)|(0,1,0),(0,0,-2)|(0,0,1)], we have to reduce this system by rows, the matrix A is already reduced by rows and it has rank 3, therefore the inverse exists; being already reduced by rows we can start from the last row to solve the system; -X₃ = (0,0,1), X₃ = (0,0,-1); 2X₁+X₃ = (0,1,0), 2X₁ = (0,1,0)-X₃, X₃ = (0,0,-1), 2X₁ = (0,1,0)-(0,0,-1), 2X₁ = (0,1,1), X₁ = (0,1/2,1/2); X₁+2X₂+X₃ = (1,0,0), 2X₂ = (1,0,0)-X₁-X₃, X₁ = (0,1/2,1/2), X₃ = (0,0,-1), 2X₂ = (1,0,0)-(0,1/2,1/2)-(0,0,-1), 2X₂ = (1,-1/2,1/2), X₂ = (1/2,-1/4,1/4); A^-1 = [(0,1/2,1/2),(1/2,-1/4,1/4),(0,0,-1)]; A⋅A^-1 = I, [(1,2,1),(2,0,1),(0,0,-1)][(0,1/2,1/2),(1/2,-1/4,1/4),(0,0,-1)] = [(1,0,0),(0,1,0),(0,0,1)]; I_1,1 = 1⋅0+2(1/2)+1⋅0 = 0+1+0 = 1; I_1,2 = 1(1/2)+2(-1/4)+1⋅0 = 1/2-1/2+0 = 0; I_1,3 = 1(1/2)+2(1/4)+1(-1) = 1/2+1/2-1 = 0; I_2,1 = 2⋅0+0(1/2)+1⋅0 = 0+0+0 = 0; I_2,2 = 2(1/2)+0(-1/4)+1⋅0 = 1+0+0 = 1; I_2,3 = 2(1/2)+0(1/4)+1(-1) = 1+0-1 = 0; I_3,1 = 0⋅0+0(1/2)-1⋅0 = 0+0-0 = 0; I_3,2 = 0(1/2)+0(-1/4)+(-1)0 = 0+0+0 = 0; I_3,3 = 0(1/2)+0(1/4)+(-1)(-1) = 0+0+1 = 1

Calculate for which values of a parameter the matrix is invertible; the matrix is dependent on a parameter h, we must calculate for which real values of the parameter h the matrix is invertible and we must find the inverse matrix; A = [(1,h),(3,-1)], this matrix depends on the parameter h, we want to find the values of h for which matrix A is invertible; (A|I) = [(1,h)|(1,0),(3,-1)|(0,1)], calculate for which h this system admits solutions; this system admits solutions if and only if the rank of the matrix A is 2; A = [(1,h),(3,-1)], R₂ → R₂-3R₁, [(1,h),(0,-1-3h)], the matrix is invertible for the values of h for which the rank is 2, or for the values of h for which the two rows are linearly independent, for the other values of h is not invertible; the matrix is not invertible when -1-3h = 0, -3h = 1, h = -1/3, the matrix is invertible for h ≠ -1/3; we reduce by rows the complete matrix (A|I) = [(1,h)|(1,0),(3,-1)|(0,1)], R₂ → R₂-3R₁, [(1,h)|(1,0),(0,-1-3h)|(-3,1)]; it is possible to find the inverse matrix for any value of h other than -1/3, for example h = -1, [(1,-1)|(1,0),(0,2)|(-3,1)]; 2X₂ = (-3,1), X₂ = (-3/2,1/2); X₁-X₂ = (1,0), X₁ = (1,0)+X₂, X₂ = (-3/2,1/2), X₁ = (1,0)+(-3/2,1/2) = (-1/2,1/2); A^-1 = [(-1/2,1/2),(-3/2,1/2)]; A⋅A^-1 = I, A = [(1,h),(3,-1)]; for h = -1, A = [(1,-1),(3,-1)]; for h = -1, A^-1 = [(-1/2,1/2),(-3/2,1/2)]; [(1,-1),(3,-1)][(-1/2,1/2),(-3/2,1/2)] = [(1,0),(0,1]; I_1,1 = 1(-1/2)+(-1)(-3/2) = -1/2+3/2 = 2/2 = 1; I_1,2 = 1(1/2)+(-1)(1/2) = 1/2-1/2 = 0; I_2,1 = 3(-1/2)+(-1)(-3/2) = -3/2+3/2 = 0; I_2,2 = 3(1/2)+(-1)(1/2) = 3/2-1/2 = 2/2 = 1; for any h different from -1/3 it is possible to find the inverse matrix of the matrix A, while for h = -1/3 it is not possible to solve the system.

Linear systems are used to find inverse matrices, counter images, Ker (f); by Ker (f) we mean the nucleus or f: ℝⁿ → ℝ^m, M_f = A, and AX = 0 is the system that must be solved; to find the counter image and the nucleus of a linear application we have to solve ordinary linear systems that are linear systems in which the unknowns are real numbers; when we use the technique to reduce the number of equations and unknowns for finding the inverse matrix, the unknowns are not numbers but tuples, and the constant terms are not real numbers but tuples

28 - THE DETERMINANT OF A SQUARE MATRIX

A = [(a_1,1,a_1,2),(a_2,1,a_2,2)], is a square matrix 2x2; the determinant of A is denoted by |A|, or det(A), or |(a_1,1,a_1,2),(a_2,1,a_2,2)|; the determinant of a square matrix is a real number calculated from the coefficients of the matrix; det(A) = a_1,1⋅a_2,2-a_1,2⋅a_2,1; the determinant is given by the product of the elements of the main diagonal minus the product of the elements of the secondary diagonal; det[(1,3),(5,-2)] = 1(-2)-3⋅5 = -2-15 = -17

Considering a square matrix of order n, A = [(a_1,1,a_1,2,...,a_1,n),(a_2,1,a_2,2,...,a_2,n),...,(a_n,1,a_n,2,...,a_n,n)], and for each element of the matrix we define its algebraic complement; the algebraic complement of an element is the determinant of the matrix which is obtained by deleting the row and the column that cross in that element considering also the sign; we calculate the algebraic element of a_2,1 obtaining a submatrix which has one row less and one column less than the matrix A, in this case we delete the second row and the first column obtaining the submatrix [(a_1,2,...,a_1,n),(a_3,2,...,a_3,n),...,(a_n,2,...,a_n,n)]; this submatrix has a determinant which is the algebraic complement of the element a_2,1; the sign of the algebraic complement is positive when the row number of the element added to the column number of the element is an even number, while the sign of the algebraic complement is negative when the row number of the element added to the column number of the element is an odd number; in this case the determinant is obtained by deleting row 2 and column 1, 2+1 = 3 which is an odd number, therefore the determinant is preceded by the sign -, so the algebraic complement is negative; the algebraic complement of the element a_2,2 is positive because 2+2 = 4 which is an even number

A = [(1,3,5,-1),(2,2,0,1),(1,0,1,4),(1,1,-1,0)]; calculate the algebraic complement of a_2,3, element in the second row and third column which is 0; we delete the second row and the third column and we get the determinant |(1,3,-1),(1,0,4),(1,1,0)| which is a 3x3 submatrix; the element is in row 2 and column 3, 2+3 = 5 which is an odd number, so the algebraic complement of a_2,3 is -|(1,3,-1),(1,0,4),(1,1,0)|

The algebraic complement of an element is the determinant of the matrix that we get by deleting the row and column of the element, and the sign is positive if the row number of the element added to the column number of the element is an even number, while the sign is negative if the row number of the element added to the column number of the element is an odd number

To define the determinant of a 4x4 matrix, we take the elements of a row, for example the first, a_1,1, a_1,2, a_1,3, a_1,4, and we multiply each element of the row by its algebraic complement which is a 3x3 determinant; the algebraic complement of a_1,1 is A_1,1, the algebraic complement of a_1,2 is A_1,2, the algebraic complement of a_1,3 is A_1,3, the algebraic complement of a_1,4 is A_1,4; the determinant of the matrix A is therefore det(A) = a_1,1⋅A_1,1+a_1,2⋅A_1,2+a_1,3⋅A_1,3+a_1,4⋅A_1,4; to define the determinant of a 4x4 matrix we could also take the elements of a column, for example the first column, a_1,1, a_2,1, a_3,1, a_4,1, and we multiply each element of the column by its algebraic complement which is a determinant 3x3; the algebraic complement of a_1,1 is A_1,1, the algebraic complement of a_2,1 is A_2,1, the algebraic complement of a_3,1 is A_3,1, the algebraic complement of a_4,1 is A_4,1; the determinant of matrix A is therefore det(A) = a_1,1⋅A_1,1+a_2,1⋅A_2,1+a_3,1⋅A_3,1+a_4,1⋅A_4,1; this same rule is also used for larger square matrices, the elements of a row or a column are taken and multiplied by the algebraic complements and then the sum is made, in this way we can calculate the determinant of a square matrix of any order

In a 1x1 matrix A = [(a)], det (A) = a

Laplace's rule: the determinant of a square matrix A is obtained by multiplying the elements of a row, or column, by their algebraic complements and adding the results; det(A) = a_i,1⋅A_i,1+a_i,2⋅A_i,2+...+a_i,n⋅A_i,n = a_1,j⋅A_1,j+a_2,j⋅A_2,j+...+a_n,j⋅A_n,j

Using Laplace's rule, calculate the determinant of the 3x3 square matrix A = [(1,0,1),(2,1,-1),(0,0,2)]; using the second row is det[(1,0,1),(2,1,-1),(0,0,2)] = -2|(0,1),(0,2)|+1|(1,1),(0,2)|-(-1)|(1,0),(0,0)| = -2(0⋅2-1⋅0)+1(1⋅2-1⋅0)+1(1⋅0-0⋅0) = -2(0-0)+1(2-0)+1(0-0) = -2(0)+1(2)+1(0) = 0+2+0 = 2

Properties of the determinants: A is a square matrix of order n, and det(A) is the determinant of matrix A which is obtained using Laplace's rule; when a row of matrix A is null, or a column of matrix A is null, det(A) = 0; when there are two equal rows R_i = R_j, or when there are two equal columns C_i = C_j, det (A) = 0; if we multiply each element of the matrix A by a number a, then det(A') = det(aA) = aⁿ⋅det(A); if we multiply the elements of a row of matrix A by a number a, then det(A') = a⋅det(A); if in a matrix A a row is the sum of two rows that is each element of a row is the sum of two elements that we call a and b, the determinant of the matrix is equal to the determinant of the matrix that contains only the elements a in that row, added to the determinant of the matrix that contains only the elements b in that row that is det[(a_1,1+b_1,1,...,a_1,n+b_1,n),(...)] = det[(a_1,1,...,a_1,n),(...)]+det[(b_1,1,...,b_1,n),(...)]

If A is a square matrix of order n, and if we apply elementary transformations to the rows or columns of matrix A, the determinant follows some rules; if we exchange 2 rows between them, the new matrix A' has opposite determinant to the determinant of matrix A, R_i ↔ R_j, det(A') = -det(A); if we multiply a row by a number other than 0, R_i → aR_i, A → A', then det(A') = a⋅det(A); if R_i → R_i+aR_j, the determinant of the new matrix does not change, A → A', det(A') = det(A)

Matrix A is a 4x4 square matrix, A = [(1,1,1,1),(1,-1,0,2),(0,1,0,0),(-1,4,2,1)], we can calculate the determinant of this matrix using Laplace's rule on the third row, specifically on the second element of the third row; we have to delete row 3 and column 2, and the determinant is preceded by the sign - because the element chosen is from row 3 and column 2 and 3+2 = 5 which is an odd number; C₁ → C₁-(1/2)C₃, det(A) = -1|(1/2,1,1),(0,0,2),(-3/2,2,1)|, with this operation the determinant remains unchanged; now we develop the calculation using the second row, where the only non-null element is 2 which is in row 2 and column 3, 2+3 = 5 which is an odd number, so the sign is -, det(A) = (-1)2(-1)|(1/2,1),(-3/2,2)| = 2|(1/2,1),(-3/2,2)| = 2((1/2)2-1(-3/2)) = 2(1+3/2) = 2(5/2) = 5; we computed the determinant in a simpler way by making zeroes appear with the elementary transformations

Calculate the determinant of the matrix A = [(1,1,1,2),(1,3,2,1),(4,3,2,1),(-1,-1,2,5)]

The determinants are used to understand some properties of matrices; A is a square matrix of order n and we want to know when the matrix is invertible; if det(A) = 0, then matrix A is not invertible; if det(A) ≠ 0, then matrix A is invertible; A = [(1,2),(3,-1)], det(A) = 1(-1)-2(3) = -1-6 = -7, matrix A is invertible; B = [(1,4),(2,8)], det(A) = 1⋅8-4⋅2 = 8-8 = 0, matrix B is not invertible; invertible matrices have the highest possible rank, matrix A is invertible when det(A) ≠ 0, so ρ(A) = n; non-invertible matrices have rank less than n, matrix A is not invertible when det(A) = 0, so ρ(A) < n

The determinant is used to find the inverse matrix; if A is an invertible matrix because det(A) ≠ 0, the inverse matrix A^-1 is equal to the reciprocal of the determinant of A multiplied by the matrix obtained by writing instead of the elements their algebraic complements, after having swapped the rows with the columns, A^-1 = (1/det(A))|(A_1,1,A_2,1,...,A_n,1),...,(A_1,n,A_2,n,...,A_n,n)|; det(A^-1) = 1/det(A)

A = [(1,3),(-1,5)]; det(A) = 1⋅5-3(-1) = 5+3 = 8, 8 ≠ 0, so matrix A is invertible; the algebraic complement of a_1,1 = 1 is A_1,1 = 5; the algebraic complement of a_1,2 = 3 is A_1,2 = 1; the algebraic complement of a_2,1 = -1 is A_2,1 = -3; the algebraic complement of a_2,2 = 5 is A_2,2 = 1; A^-1 = (1/8)[(5,-3),(1,1)] = [(5/8,-3/8),(1/8,1/8)]; A⋅A^-1 = I, [(1,3),(-1,5)][(5/8,-3/8),(1/8,1/8)] = [(1,0),(0,1)], i_1,1 = 1(5/8)+3(1/8) = 5/8+3/8 = 8/8 = 1, i_1,2 = 1(-3/8)+3(1/8) = -3/8+3/8 = 0, i_2,1 = -1(5/8)+5(1/8) = -5/8+5/8 = 0, i_2,2 = -1(-3/8)+5(1/8) = 3/8+5/8 = 8/8 = 1

A square matrix A of order n is orthogonal if the transpose of A is equal to the inverse of A that is if A is an orthogonal matrix then A^T = A^-1, therefore orthogonal matrices are invertible matrices; the determinant of an orthogonal matrix A can be 1 or -1 that is if A is an orthogonal matrix then det(A) = 1, or det(A) = -1; there are orthogonal matrices with determinant equal to 1, and there are orthogonal matrices with determinant equal to -1; an orthogonal matrix is called special when it has determinant equal to 1; a matrix is square when the number of rows n equals the number of columns m that is when n = m; a square matrix is generically A = [(a_1,1,...,a_n,n),...,(a_n,1,...,a_n,n)]; if A is an orthogonal square matrix then A⋅A^T = I ⇔ A^T = A^-1; if A is an orthogonal square matrix then det (A) = ±1; if det(A) = 1 then the orthogonal square matrix A is special

A = [(cos(α),-sin(α)),(sin(α),cos(α))], this is a special orthogonal matrix because det(A) = cos(α)cos(α)-(-sin(α))sin(α) = cos(α)cos(α)+sin(α)sin(α) = (cos(α))²+(sin(α))² = 1

Laplace's rule: det(A) = a_i,1⋅A_i,1+a_i,2⋅A_i,2+...+a_i,n⋅A_i,n = a_1,j⋅A_1,j+a_2,j⋅A_2,j+...+a_n,j⋅A_n,j; A is invertible if and only if det(A) ≠ 0; det(A^-1) = 1/det(A); if det(A) ≠ 0, then A^-1 = (1/det(A))[(A_1,1,...,A_n,1),...,(A_1,n,...,A_n,n)]

det(A⋅B) = det(A)⋅det(B)

29 - CRAMER'S RULE

Cramer's rule is used to solve linear systems that have one and only one solution; Cramer's rule must be modified when a solvable linear system has infinite solutions that is when it has free unknowns

The determinant of a matrix with a row and a column is equal to the only element present in the matrix, this is a special case of Laplace's rule

Considering a linear system AX = B, A is the matrix of the coefficients, X is the column of unknowns, B is the column of known terms; suppose that our system is solvable, and therefore ρ(A) = ρ(A|B) that is the matrix of the coefficients and the complete matrix must have the same rank; if this system has m equations and n unknowns, there is one and only one solution when number of free unknowns = n-ρ(A) = 0 that is the matrix A and the complete matrix (A|B) must have rank equal to the number of unknowns; we are not interested in the number of equations m, but in the number of independet equations n that is the rank that is the number of independent rows of the matrix A and (A|B); the equations that are not interesting, which are linear combinations of others, must be discarded, therefore the rank of A is also equal to the number of equations, ρ(A) = m; therefore ρ(A) is equal to the number of unknowns and is equal to the number of equations after having canceled the superfluous equations, so n = m, so the system is square; the first condition for using Cramer's rule is that the system must have n equations and n unknowns; if the system is solvable and ρ(A) = n, the system has one and only one solution; to find the solutions of a solvable linear system, the complete matrix (A|B) must be reduced by rows

A = [(2,1),(1,3)], B = [(0),(1)], 2x+y = 0, x+3y = 1; it is a system with 2 equations and 2 unknowns; we want to solve this system not with the method of reducing the rows, but using the determinant; in matrix form this system is written AX = B, the matrix A is a square matrix of order n and rank n, ρ(A) = n = 2; det(A) = 2⋅3-1⋅1 = 6-1 = 5 ≠ 0, it means that the matrix has maximum rank ρ(A) = 2 and is invertible so the inverse matrix A^-1 exists; A⋅X = B, A^-1⋅A⋅X = A^-1⋅B, I⋅X = A^-1⋅B, X = A^-1⋅B, this is the only solution of the linear system; we write the inverse matrix A^-1 with the rule of algebraic complements, A^-1 = 1/5[(3,-1),(-1,2)] = [(3/5,-1/5),(-1/5,2/5)]; X = A^-1⋅B = [(3/5,-1/5),(-1/5,2/5)][(0),(1)], x_1,1 = (3/5)0+(-1/5)1 = 0-1/5 = -1/5, x_2,1 = (-1/5)0+(2/5)1 = 0+2/5 = 2/5, X = [(-1/5,2/5)]; x_1,1 = -1/5 = x, x_2,1 = 2/5 = y, so x = -1/5 and y = 2/5; this is a way of finding the solution using the inverse matrix; A = [(2,1),(1,3)], B = [(0),(1)], we replace the first column of matrix A with column B of the constant terms obtaining [(0,1),(1,3)], and from this new matrix we get the determinant |(0,1),(1,3)| = 0⋅3-1⋅1 = 0-1 = -1, and we divide this number by the determinant of the matrix A which is 5, and we get -1/5 which is the value of the unknown x; A = [(2,1),(1,3)], B = [(0),(1)], we replace the second column of matrix A with column B of the constant terms, obtaining [(2,0),(1,1)], and from this new matrix we get the determinant |(2,0),(1,1)| = 2⋅1-0⋅1 = 2-0 = 2, and we divide this number by the determinant of the matrix A which is 5, and we get 2/5 which is the value of the unknown y; when a square linear system, with non-zero determinant, has only one solution, this unique solution can be obtained with this method

Cramer's rule: A is a square matrix of order n invertible, A⋅X = B is a linear system of n equations in n unknowns that admits one and only one solution, the solution is unique because the unknowns are n and A is a matrix of rank n, so there are no free unknowns; the only solution is equal to the product of the inverse matrix of A by the matrix of constant terms which is a column, following the formula X = A^-1⋅B = (1/det(A))[(A_1,1,...,A_n,1),...,(A_1,n,...,A_n,n)][(b₁),...,[(b_n)], where the capital letters A indicate the algebraic complements of the elements of the matrix A after having swapped the rows with the columns, and matrix B is a column that contains the constant terms; the unknowns contained in the matrix X, which is a column, are obtained with the formula X = [(x₁),...,(x_n)] = det(α)/det(A), where by α we mean the matrix A with a column swapped with the column of constant terms, the first unknown is obtained by exchanging the first column, the second unknown is obtained by exchanging the second column, and so on

AX = B, A = [(1,2,-1),(0,4,2),(0,0,3)], B = [(0),(1),(0)]; this is a non-homogeneous system with 3 equations and 3 unknowns; the matrix A has 3 rows and 3 columns, it is reduced by rows, there are no null rows, therefore ρ(A) = 3 = ρ(A|B) and therefore the system is solvable, and since the unknowns are 3 and the rank is 3, 3-3 = 0, then the matrix A has one and only one solution; the only solution X is obtained by multiplying the inverse matrix of A by the column of known terms B, X = A^-1⋅B, but so we should calculate the inverse matrix A^-1; calculate the determinant using row 3, det(A) = 3|(1,2),(0,4)| = 3(1⋅4-2⋅0) = 3(4-0) = 3⋅4 = 12; x = |(0,2,-1),(1,4,2),(0,0,3)|/12 = 3|(0,2),(1,4)|/12 = 3(0⋅4-2⋅1)/12 = 3(0-2)/12 = 3(-2)/12 = -6/12 = -1/2, determinant calculated using row 3, x = -1/2; y = |(1,0,-1),(0,1,2),(0,0,3)|/12 = 1|(1,2),(0,3)|/12 = 1(1⋅3-2⋅0)/12 = 1(3-0)/12 = 1(3)/12 = 3/12 = 1/4, determinant calculated using column 1, y = 1/4; z = |(1,2,0),(0,4,1),(0,0,0)|/12 = 0/12 = 0, since row 3 is null, the determinant calculated using row 3 is 0, therefore z = 0; the solution of the system is x = -1/2, y = 1/4, z = 0; this system has one and only one solution and it is the triad (-1/2,1/4,0)

A = [(1,2,0),(3,1,-1)], B = [(0,2)]; this system has 2 equations and 3 unknowns, ρ(A) = 2, there are 2 rows, the system is solvable, and since the unknowns are 3, 3-2 = 1, therefore there is a free unknown; it is not a system with one and only one solution, it is a system that has more than one solution, ∞¹ solutions dependent on a free unknown, but it is possible to solve this system using Cramer's rule; it is necessary to identify the free unknown, therefore we take inside the matrix A a submatrix formed by 2 rows and 2 columns in such a way that the matrix we obtain has the maximum possible rank; the unknown x is column 1, the unknown y is column 2, the unknown z is column 3; deleting column 3 of matrix A we obtain matrix A' = [(1,2),(3,1)] which is a square matrix, det(A') = 1⋅1-2⋅3 = 1-6 = -5 ≠ 0, therefore A' is an invertible square matrix; we move the unknown z that is column 3, from matrix A to matrix B of the constant terms, considering that the system is {x+2y = 0, 3x+2y-z = 2) that is {x+2y = 0, 3x+2y = z+2}, therefore the new column of constant terms is B' = [(0),(z+2)], so we can assign to z a value of the field of real numbers; the new system A'⋅[(x),(y)] = B' has 2 equations and 2 unknowns, and the constant term depends on the free unknown z, and this new system can be solved with Cramer's rule; x = |(0,2),(z+2,1)|/-5 = (0⋅1-2(z+2))/-5 = (0-2z-4)/-5 = (-2z-4)/-5 = -(2z+4)/-5 = (2z+4)/5, x = (2z+4)/5; y = |(1,0),(3,z+2)|/5 = (1(z+2)-0⋅3)/-5 = (z+2-0)/-5 = (z+2)/-5, y = (z+2)/-5; we have found x and y as a function of the free unknown z, so the system has infinite solutions; we used Cramer's rule, after having moved the free unknown z into matrix B which is the column of constant terms

If we want to use Cramer's rule, the system AX = B must be solvable that is ρ(A) = ρ(A|B), where with ρ(A) we indicate the rank of the matrix A and with ρ(A|B) we indicate the rank of the complete matrix (A|B); when a system does not have only one solution, then it has infinite solutions, and this depends on the presence of free unknowns; number of free unknowns = n-ρ(A), where n means the number of unknowns and ρ(A) is the rank of matrix A; we cancel the superfluous equations or the linearly dependent equations, and we obtain only linearly independent equations; a free unknown is identified by deleting columns and obtaining a square submatrix of maximum rank or with a determinant other than zero; and the unknowns outside the square matrix are moved to the column of constant terms; at this point we can apply Cramer's rule

{x+y+z+t = 0, x-y-z-t = 0, 2x+y = 0}, is a homogeneous system of 3 equations and 4 unknowns; the coefficient matrix is A = [(1,1,1,1),(1,-1,-1,-1),(2,1,0,0)]; to calculate the rank we have to reduce the matrix, A = [(1,1,1,1),(1,-1,-1,-1),(2,1,0,0)], R₂ → R₂ + R₁, [(1,1,1,1),(2,0,0,0),(2,1,0,0)], R₂ ↔ R₃, [(1,1,1,1),(2,1,0,0),(2,0,0,0)], therefore the rank of this matrix is 3, ρ(A) = 3; number of free unknowns = n-ρ(A), where n indicates the number of unknowns and ρ(A) the rank of the matrix, n-ρ(A) = 4-3 = 1, therefore there is a free unknown; to identify the free unknown we must look for a 3x3 submatrix with a determinant other than 0; we want to understand if the unknown t can be chosen as a free unknown, and then we take the submatrix consisting of rows 1, 2, 3 and columns 1, 2, 3, and calculate the determinant using row 3, |(1,1,1),(1,-1,-1),(2,1,0)| = 2|(1,1),-1,-1)|-1|(1,1),(1,-1)| = 2(1(-1)-1(-1))-1(1(-1)-1⋅1) = 2(-1+1)-1(-1-1) = 2(0)-1(-2) = 0+2 = 2, the determinant is 2, so t can be chosen as a free unknown; the homogeneous system can be rewritten as a non-homogeneous system, by moving the unknown t into the column of constant terms, {x+y+z = -t, x-y-z = t, 2x+y = 0}; the unknown t is the free unknown that is the unknown to which we can assign a value at will, and now we can solve the system using Cramer's rule, so we can find the unknowns x, y, z, as function of the free unknown t

30 - COMPLEX NUMBERS - PART 1

ax²+bx+c = 0 is a quadratic equation, therefore non-linear, with a, b, c, 3 real numbers and a ≠ 0; the solutions of this equation can be obtained with the quadratic formula x = (-b±√(b²-4ac))/2a; it is possible that a second degree equation with real coefficients has no real solutions because if b²-4ac is a negative number, the square root of a negative number cannot be computed in the field of real numbers; 3x²+10 = 0, b²-4ac = 0-4⋅3⋅10 = 0-120 = -120, the square root of a negative number cannot be calculated in the field of real numbers, so there are no solutions in the field of real numbers; x²+1 = 0, x² = -1, no real number squared can result in -1; no real number squared can result in a negative number; in the field of real numbers not all second degree equations can be solved; x⁴+20 = 0, there is no solution in the field of real numbers, ⁴√(-20) it doesn't mean anything; in the field of real numbers not all equations of degree higher than the first can be solved; complex numbers have been introduced to obtain solutions that do not exist in the field of real numbers; in the field of real numbers √(-1) does not make sense

√(-1) does not exist in the field of real numbers, so we introduce the symbol i such that i² = -1

Complex numbers are formal expressions of the type a+bi or a+bi, where a and b are real numbers and i is a symbol such that i² = -1; a is the real part of the complex number, b is the imaginary coefficient, bi is the imaginary part of the complex number; a+0i is a real number; 0+bi is a pure imaginary number; a+bi is a complex number; a-bi is the conjugate complex number of a+bi, z = a+bi, z = a-bi, z usually indicates a complex number, z usually indicates a complex conjugate number

3+2i, 3 is the real part, 2i is the imaginary part, 2 is the imaginary coefficient

Complex numbers a+bi, with a and b real numbers and i² = -1, can be added together

(a+bi)+(a'+bi') = (a+a')+i(b+b'), sum of 2 complex numbers; (3+2i)+(5-7i) = 8-5i

(a+bi)+(a'+bi') = (a'+bi')+(a+bi), commutative property of the sum of complex numbers

((a+bi)+(a'+bi'))+(a''+bi'') = (a+bi)+((a'+bi')+(a''+bi'')), associative property of the sum of complex numbers

Existence of the zero, 0+0i; (a+bi)+(0+0i) = a+bi, zero is the number that, added to any other, leaves it unchanged

Existence of the opposite, (-a)+(-b)i = -a-bi; (a+bi)+(-a-bi) = 0, a number added to its opposite gives 0 as result

Complex numbers a+bi, with a and b real numbers and i² = -1, can be multiplied with each other

(a+bi)(a'+b'i) = aa'+ab'i+a'bi+bb'ii = aa'+ab'i+a'bi-bb' = (aa'-bb')+i(ab'+a'b), product of 2 complex numbers

(a+bi)(a'+b'i) = (a'+b'i)(a+bi), commutative property of the product of complex numbers

((a+bi)(a'+b'i))(a''+b''i) = (a+bi)((a'+b'i)(a''+b''i)), associative property of the product of complex numbers

Existence of the neutral element, 1+0i; (1+0i)(a+bi) = a+bi, the complex number 1+0i is a neutral element because multiplied by any other leaves it unchanged

The difference between complex numbers and real numbers is given by the number i; in the field of real numbers there is no number that multiplied by itself gives -1; the complex number i, also called imaginary unit, multiplied by itself gives as a result -1

(0+1i)(0+1i) = 0⋅0+0⋅1i+1i⋅0+1i⋅1i = i² = -1

z = a+bi, z = a-bi, zz = (a+bi)(a-bi) = aa-abi+abi-bbii = a²-b²i² = a²-b²(-1) = a²+b², the product of a complex number by its conjugate is the real number a²+b², which is the square of the modulus of z, |z| = |z| = √(a²+b²)

If z = 0 then a = 0 and b = 0, so |z| = |z| = 0; if z ≠ 0, then a ≠ 0 or b ≠ 0, so |z| = |z| ≠ 0

r ∈ ℝ, r ≠ 0, ∃ r^-1 = 1/r, called inverse or reciprocal of r so that r(1/r) = 1; z ∈ ℂ, z = a+bi, z ≠ 0, 1/z ≠ 1/(a+bi); z = a+bi ≠ 0, 1/z = z^-1 = (a/(a²+b²))-(b/(a²+b²))i = (a/|z|²)-(b/|z|²)i = z/|z|²; (a+bi)((a/(a²+b²))-(b/(a²+b²))i) = (a²/(a²+b²))-(ab/(a²+b²))i+(ab/(a²+b²))i-(b²/(a²+b²))i² = (a²/(a²+b²))+(b²/(a²+b²)) = (a²+b²)/(a²+b²) = 1

Calculate the inverse of the complex number z = 1+2i; z = 1+2i ≠ 0, a = 1, b = 2, |z| = √(a²+b²) = √(1²+2²) = √(1+4) = √(5); 1/z = z^-1 = (a/(a²+b²))-(b/(a²+b²))i = (a/|z|²)-(b/|z|²)i = (1/√(5)²)-(2/√(5)²)i = (1/5)-(2/5)i = (1-2i)/5; 1/z = z/|z|²

i² = -1; i³ = i²i = -1i = -i; i⁴ = i²i² = (-1)(-1) = 1

Calculate 1/i; z = i, a = 0, b = 1; 1/z = z/|z|²; i = 0+1i = 0-1i = -i; |i|² = a²+b² = 0²+1² = 1; 1/i = i/|i|² = -i/1 = -i; i(-i) = 1, so the reciprocal of i is -i

To calculate the quotient of a complex number, we need to know the inverse; (2+i)/i = (2+i)(1/i) = (2+i)(-i) = -2i+i(-i) = -2i-(i)² = -2i-(-1) = -2i+1 = 1-2i

In the field of complex numbers we can solve, x²+1 = 0, x² = -1, x = ±√(-1), x₁ = i, x₂ = -i

ax²+bx+c = 0, x = (-b±√(b²-4ac))/2a, ∆ = b²-4ac; a second degree equation has no solutions in the field of real numbers when the determinant ∆ = b²-4ac < 0, for example if ∆ = -20, ±√(∆) = ±√(-20) = ±√(-1)√(20) = ±i√(20), therefore when a second degree equation has a negative determinant, the solutions can be found in the field of complex numbers

ax²+bx+c = 0, is a second degree equation with real coefficients, but we can also solve second degree equations with complex coefficients such as x²+2ix+5 = 0; x = (-b±√(b²-4ac))/2a = (-2i±√((2i)²-4⋅1⋅5))/2⋅1 = (-2i±√(-4-20))/2 = (-2i±√(-24))/2 = (-2i±√(24)i)/2 = -i±(√(24)i)/2 = i(-1±(√(24))/2)

In the field of real numbers ℝ it is possible to define vector spaces, but also in the field of complex numbers ℂ it is possible to define vector spaces

ℂ² is the set of pairs of complex numbers (z₁,z₂), (z₁,z₂)+(z₁',z₂') = (z₁+z₁',z₂+z₂'); α(z₁,z₂) = (αz₁,αz₂); vector spaces in the field of complex numbers have the same properties as vector spaces in the field of real numbers

V is a vector space in the field of complex numbers ℂ, if the sum and the product of a number is possible; ℝ, ℝ², ℝ³,...,ℝⁿ, are examples of real vector spaces; ℂ, ℂ², ℂ³,...,ℂⁿ, are examples of complex vector spaces; ℂⁿ are the tuples of complex numbers; in complex vector spaces, the generators, the bases, the independence, the dimension, exactly all that is valid in the field of real numbers, can be defined; a base of ℂ² is formed by the 2 pairs (1,0), (0,1) that is each pair of complex numbers can be written in this way; the pair of complex numbers (z,z') can be written as a linear combination z(1,0)+z'(0,1); complex element matrices can be defined

We want to know if the vectors (i,0,1), (0,i,0), (0,0,2), which are 3 triples, elements of ℂ³, are linearly independent in ℂ³; we have to reduce the matrix by rows A = [(i,0,1),(0,i,0),(0,0,2)]; the matrix is already reduced by rows, therefore ρ(A) = 3, so these 3 vectors of ℂ³ are linearly independent; ℂ³ has dimension 3, then these 3 linearly independent vectors are a base of ℂ³

Complex vector spaces have a double nature of complex vector space and real vector space because, if we multiply a real number by the complex numbers of a tuple, we obtain complex numbers; real vector spaces do not have the double nature of complex vector space and real vector space because, if we multiply a complex number by the real numbers of a tuple, we do not obtain real numbers; α(z,z') = (αz,αz'), αz and αz' are complex numbers; i(x,y) = (ix,iy), ix and iy are not real numbers

If we multiply a real number by a tuple of complex numbers we get a tuple of complex numbers; 3(i,2i) = (3i,6i); ℂ² also has the nature of a real vector space; every complex vector space also has the nature of a real vector space

ℂ² has dimension 2 over ℂ, but has dimension 4 over ℝ; in ℂ² we have 4 vectors which are (1,0), (0,1), (i,0), (0,i), and these 4 pairs of ℂ² form a base of ℂ² on ℝ that is each pair (a+bi,a'+b'i) can be written as a linear combination of the 4 vectors (1,0), (0,1), (i,0), (0,i), therefore (a+bi,a'+b'i) = a(1,0)+a'(0,1)+b(i,0)+b'(0,i); therefore ℂ² has dimension 4 on the field of real numbers ℝ

If the vector space V has dimension n on ℂ, then V has dimension 2n on ℝ

{ix+y = 0, x+2iy = 1} is a linear system with complex coefficients and the associated matrix is [(i,1)|(0),(1,2i)|(1)]; to solve this system we must first reduce it by rows

31 - COMPLEX NUMBERS - PART 2

The complex plane, or Argand-Gauss plane, is the plane formed by the complex numbers, with a Cartesian coordinate system such that the x-axis, called real axis, is formed by the real numbers, and the y-axis, called imaginary-axis, is formed by the imaginary numbers

z = x+yi, x is the real part and y is the coefficient of the imaginary part; in the Argand-Gauss plane the complex number z is represented by a point which has the real part as x coordinate and the coefficient of the imaginary part as y coordinate

In the Argand-Gauss plane we can trace a vector that starts from the origin o and arrives at the point z, called the vector oz; the vector oz has a length that is the modulus, which we denote by ρ; the modulus ρ is always a positive number, but it can be 0 if z coincides with the origin that is when z = 0+0i; with θ we denote the angle that the vector oz forms with the x-axis; the modulus ρ can vary from 0 to +∞, the angle θ can vary from 0 to 2π; x = ρ·cos(θ), y = ρ·sin(θ); z = x+yi = ρ·cos(θ)+ρ·sin(θ)·i = ρ(cos(θ)+i·sin(θ)), this is the trigonometric form of a complex number; ρ is the modulus or the length of the vector oz, and θ is the argument or the angle formed by the vector oz with the x-axis; for the same value of modulus ρ, different angles θ can lead to the same coordinates of points z, when θ' = θ±2π, in fact the same value of the angle θ is repeated with a periodicity of 2π; x = ρ·cos(θ), x² = (ρ·cos(θ))² = ρ²(cos(θ))², y = ρ·sin(θ) , y² = (ρ·sin(θ))² = ρ²(sin(θ))², x²+y² = ρ²(cos(θ))²+ρ²(sin(θ))² = ρ²((cos(θ))²+(sin(θ))²) = ρ², ρ² = x²+y², ρ = √(x²+y²) = |z|; y/x = (ρ·sin(θ))/(ρ·cos(θ)) = sin(θ)/cos(θ) = tan(θ); the angle θ is the arc whose tangent is equal to y/x; cos(θ) = x/ρ, sin(θ) = y/ρ

x = ρ·cos(θ); y = ρ·sin(θ); x+iy = ρ(cos(θ)+i·sin(θ)); ρ = √(x²+y²) = |z|; cos(θ) = x/ρ = x/√(x²+y²) = x/|z|; sin(θ) = y/ρ = y/√(x²+y²) = y/|z|

z = x+iy, x = 1, y = 0, z = 1+i·0 = 1, the complex number z coincides with the real number 1, ρ = 1, θ = 0; positive real numbers have θ = 0, negative real numbers have θ = π, pure imaginary numbers with positive imaginary coefficient have θ = π/2, pure imaginary numbers with negative imaginary coefficients have θ = (3/2)π

z = 1+i, x = 1, y = 1, θ = π/4, ρ = √(x²+y²) = √(1²+1²) = √(1+1) = √(2), z = x+iy = ρ(cos(θ)+i·sin(θ)), z = 1+i = √(2)(cos(π/4)+i·sin(π/4)), z = √(2)(cos(π/4)+i·sin(π/4)) is the trigonometric form of z = 1+i, because z = √(2)(cos(π/4)+i·sin(π/4)) = √(2)(√(2)/2+i√(2)/2) = √(2)(√(2)/2)(1+i) = 2/2(1+i) = 1(1+i) = 1+i

z = 0+1i = i, x = 0, y = 1, θ = π/2, ρ = √(x²+y²) = √(0²+1²) = √(0+1) = √(1) = 1, z = x+iy = ρ(cos(θ)+i·sin(θ)), z = 0+i = 1(cos(π/2)+i·sin(π/2)), in fact z = 1(cos(π/2)+i·sin(π/2)) = 1(0+i·1) = 1(i) = i

The trigonometric form of complex numbers is useful for simplifying the multiplication operation between complex numbers; z₁ = 1+i, z₂ = 1-i; z₂ is the symmetrical of z₁ with respect to the x-axis; z₁ = 1+i = √(2)(cos(π/4))+i·sin(π/4)), z₂ = 1-i = √(2)(cos(7π/4))+i·sin(7π/4)); z₁·z₂ = 2(cos(π/4)·cos(7π/4)-sin(π/4)·sin(7π/4)+i(cos(π/4)·sin(7π/4)+sin(π/4)·cos(7π/4)) = 2(cos(2π)+i·sin(2π)) = 2(1+i·0) = 2(1+0) = 2(1) = 2, and this is confirmed by z₁·z₂ = (1+i)(1-i) = 1·1+1(-i)+i·1+i(-i) = 1-i+i-i² = 1-(-1) = 1+1 = 2

sin(α+β) = sin(α)·cos(β)+cos(α)·sin(β)

sin(α-β) = sin(α)·cos(β)-cos(α)·sin(β)

cos(α+β) = cos(α)·cos(β)-sin(α)·sin(β)

cos(α-β) = cos(α)·cos(β)+sin(α)·sin(β)

If we have 2 complex numbers, z₁ of modulus ρ₁ and argument θ₁, and z₂ of modulus ρ₂ and argument θ₂, then the product of complex numbers z₁·z₂ has as modulus the product of the moduli ρ₁·ρ₂ and as argument the sum of the arguments θ₁+θ₂; z₁ = z², ρ₁ = ρ², θ₁ = 2θ; z₁ = z³, ρ₁ = ρ³, θ₁ = 3θ

z is a complex number, n > 0, w is a complex number, ⁿ√(z) = w, wⁿ = z, the complex number z has modulus ρ and argument θ, the complex number w has modulus σ and argument φ, σⁿ = ρ, nφ = θ+2kπ, σ = ⁿ√(ρ), φ = (θ+2kπ)/n, to find the nth roots we have to vary k from 0 to n-1; to find the nth root of a complex number it is better to use the trigonometric form

Nth root of a complex number z: wⁿ = z, σ = ⁿ√(ρ), φ = (θ+2kπ)/n, ⁿ√(ρ) is the nth arithmetic root of ρ, and k is an integer between 0 and n-1

For n = 2, there are 2 nth roots of 1, both real roots, 1 and -1, so the square roots of the real number 1 are the real numbers 1 and -1; ²√(1) = ± 1, 1² = 1, -1² = 1

For n = 3, there are 3 nth roots of 1, 1 root is real and 2 roots are complex; the real root is 1 because in the field of real numbers ³√(1) = 1, in fact 1³ = 1; z = a+bi, 1 = 1+0i, a = 1, b = 0, ρ = √(a²+b²) = √(1²+0²) = √(1+0) = √(1) = 1, θ = 0, 1 = 1(cos(0)+i⋅sin(0)); ρ' = ³√(ρ) = ³√(1) = 1; θ' = (θ+2kπ)/n with n = 3 and k from 0 to n-1, θ₁' = (0+2·0π)/3 = 0, θ₂' = (0+2·1π)/3 = (2/3)π, θ₃' = (0+2·2π)/3 = (4/3)π; the first cube root of the real number 1 is 1(cos(0)+i·sin(0)) = 1(1+i0) = 1(1) = 1 that we had already found as it is the real root; the second cube root of the real number 1 is 1(cos(2π/3)+i·sin(2π/3) = 1(-1/2+i(√(3)/2)) = -1/2+i(√(3)/2); the third cube root of the real number 1 is 1(cos(4π/3)+i·sin(4π/3) = 1(-1/2-i(√(3)/2)) = -1/2-i(√(3)/2)

Find the roots of √(i); there are 2 square roots of i which are 2 complex numbers that we can find using the trigonometric form; z = a+bi, z = i, a = 0, b = 1, i = 0+1i, ρ = 1, θ = π/2, z = ρ(cos(θ)+i·sin(θ)), i = 1(cos(π/2)+i·sin(π/2)) = cos(π/2)+i·sin(π/2); ρ' = √(ρ) = √(1) = 1; θ' = (θ+2kπ)/n with n = 2 and k from 0 to n-1, θ₁' = ((π/2)+2·0π)/2 = (π/2)/2 = π/4, θ₂' = ((π/2)+2·1π)/2 = ((π/2)+2π)/2 = ((5/2)π)/2 = (5/4)π; the first square root of the complex number i is 1(cos(π/4)+i·sin(π/4)) = 1(√(2)/2+i√(2)/2) = √(2)/2+i√(2)/2 = √(2)/2(1+i); the second square root of the complex number i is 1(cos(5π/4)+i·sin(5π/4)) = 1(-√(2)/2-i√(2)/2) = -√(2)/2-i√(2)/2 = -√(2)/2(1+i)

The nth roots of a complex number z are n distinct numbers if z ≠ 0, if z = 0 the only root is 0; ⁿ√z has n distinct roots if z ≠ 0, has only root 0 if z = 0; the modulus of z is ρ, |z| = ρ, then the modulus of all nth roots is the nth root of ρ, ρ' = ⁿ√ρ; in Argand-Gauss plane the roots are in the circle that has radius equal to the nth root of ρ, and the nth roots are the vertices of a regular polygon with n sides inscribed in the circle with radius ⁿ√ρ; if n = 2 it is a degenerate polygon, they are two points diametrically opposite to the origin, if n = 3 it is a triangle, if n = 4 it is a square, and so on

All complex numbers other than 0 have n roots that is the equation xⁿ = z has n solutions in the field of complex numbers; in the equation a₀xⁿ+a₁x^n-1+...+a_n = 0, where a denotes any complex number, then by the fundamental theorem of algebra an equation of this type always has n roots; to be precise, the fundamental theorem of algebra says that an equation of this type has 1 root, but if there is 1 root then it can be shown that there are n roots; the polynomial (x-i)³ = 0 apparently has only one root, x = i, but the roots must be counted with their multiplicity; we consider the polynomial P(x) and the number z is its root, then the polynomial P(x) is divisible by (x-z), P(x) = (x-z)⋅Q(x), and it is possible that also Q(x) is divisible by (x-z), P(x) = (x-z)²⋅Q₁(x), reaching the last root P(x) = (x-z)^m⋅Q(x), this is multiplicity; the polynomial (x-i)³ = 0, has root x = i of multiplicity 3; therefore if we consider the multiplicity of the roots, then the number of the roots is equal to n

Fundamental theorem of algebra: let p(x) ∈ ℂ[x] be a polynomial of degree n > 0, then p(x) is a product of complex polynomials of first degree, in particular p(x) always has at least one root in ℂ; the fundamental theorem of algebra concerns complex polynomials, if p(x) is a polynomial of degree n positive with complex coefficients, then p(x) is a product of complex polynomials of degree 1, and consequently p(x) has always at least one complex root; the fundamental theorem of algebra is true in the field of complex numbers, it is not true in the field of real numbers because a real polynomial may not be a product of first degree polynomials and may have no root; it is important to remember that the roots must be considered with their multiplicity, for example the polynomial (x-i)³ has only the root x = i, but this root is triple that is multiplicity equal to 3

A consequence of the fundamental theorem of algebra is the identity principle that is if 2 complex or real polynomials assume the same values for each value of the variable x, then they have the same coefficients

32 - EIGENVALUES AND EIGENVECTORS OF AN ENDOMORPHISM

An endomorphism is a linear application of a vector space ℝⁿ in itself, ℝⁿ → ℝⁿ

Considering a linear application of ℝ² in itself, f: ℝ² → ℝ² that is an endomorphism of ℝ², defined as f(x,y) = (x+y,-y), and this linear application is associated with a matrix A = [(1,1),(0,-1)]; we look for a vector v = (x,y) ≠ (0,0) such that f(x,y) = λ(x,y), so we look for a pair (x,y) different from (0,0) which transformed by f gives the same pair multiplied by any real number λ; {x+y = λx, -y = λy}, {x+y-λx = 0, -y-λy = 0}, {x+y-λx = 0, -y-λy = 0}, {(1-λ)x+y = 0, (-1-λ)y = 0}, this is a homogeneous system of 2 equations in 2 unknowns, therefore it admits solutions other than solution (0,0) only if there is a free unknown; A_λ = [(1-λ,1),(0,-1-λ)], this matrix must have a rank lower than the maximum, λI = [(λ,0),(0,λ)], A_λ = A-λI, a square matrix has rank less than the maximum when its determinant is equal to 0, |A-λI| = 0, (1-λ)(-1-λ) = 0, this equation reveals for which real numbers λ it is possible to find a pair (x,y) different from (0,0) which through f is transformed into its own multiple according to λ, 1-λ₁ = 0, λ₁ = 1, -1-λ₂ = 0, λ₂ = -1; to find the pairs (x, y) that satisfy this condition we must use the system {(1-λ)x+y = 0, (-1-λ)y = 0}, where instead of λ we have to substitute λ₁ and λ₂; λ₁ = 1, {(1-λ)x+y = 0, (-1-λ)y = 0}, {(1-1)x+y = 0, (-1-1)y = 0}, {0x+y = 0, -2y = 0}, {y = 0, y = 0}; λ₂ = -1, {(1-λ)x+y = 0, (-1-λ)y = 0}, {(1-(-1))x+y = 0, (-1-(-1))y = 0}, {(1+1)x+y = 0, (-1+1)y = 0}, {2x+y = 0, 0y = 0}, {y = -2x, 0 = 0}; if λ₁ = 1, the solutions are all pairs with y = 0, all pairs (x,0); if λ₂ = -1, the solutions are all pairs with y = -2x, all pairs (x,-2x); the real numbers λ₁ = 1 and λ₂ = -1 are the eigenvalues of the endomorphism f and are the solutions of the equation |A-λI| = 0 or det (A-λI) = 0; the eigenvectors that are the pairs (x,y), such that f(x,y) = λ(x,y), are the solutions of the system {(1-λ)x+y = 0, (-1-λ)y = 0} where instead of λ we put λ₁ and λ₂

f: ℝ² → ℝ², A = [(1,1),(-1,1)], is a square matrix of an endomorphism of ℝ², f(x,y) = (x+y,-x+y); we look for the numbers λ such that f(x,y) = λ(x,y), A-λI = [(1-λ,1), (-1,1-λ)], |A-λI| = det(A-λI) = (1-λ)(1-λ)-(1)(-1) = (1-λ)²+1 = 0, this is our equation to find the λ values or the eigenvalues of endomorphism f and if λ is a real number this equation cannot be equal to zero because it is always > 0; we must consider f: ℂ² → ℂ², because in the field of complex numbers the equation (1-λ)²+1 = 0 has roots, (1-λ)² = -1, 1-λ = ±√(-1), 1-λ = ±i, λ = 1±i, so there are 2 roots, λ₁ = 1-i, λ₂ = 1+i; therefore in the complex field there are 2 eigenvalues λ₁ = 1+i and λ₂ = 1-i; and it is possible to find the eigenvectors that have properties relating to the eigenvalues of transforming into their own multiples; this example shows that sometimes these numbers λ are not real numbers, but are complex numbers

f: ℝⁿ → ℝⁿ, n ≥ 1, is a linear application of ℝⁿ in itself or an endomorphism, and λ is an eigenvalue of f if there is a non-zero vector v such that we have: f(v) = λv; any v such that f(v) = λ is called the eigenvector of f relative to λ

With λ we indicate an eigenvalue, with v we indicate the eigenvectors such that f(v) = λv, with v₀ we indicate the null vector, with V_λ we indicate the eigenspace that is a subset of ℝⁿ consisting of the eigenvectors and the null vector, V_λ ⊆ ℝⁿ

We have to find the eigenvalues and consequently the eigenvectors of an endomorphism; f: ℝⁿ → ℝⁿ, f is an endomorphism of ℝⁿ, we want to find v such that f(v) = λv, a v is a tuple, v = (x₁,...,x_n) = X, X is a column, A is the matrix associated with our linear application, therefore A is a square matrix of order n; to find the vector of ℝⁿ, image of the column vector X, we make the product AX, then AX = λX = λIX, therefore AX-(λI)X = 0 , (A-λI)X = 0, we want this linear system in the unknowns X to have non-zero solutions; a homogeneous linear system has non-zero solutions when the matrix A-λI has rank < n that is when det(A-λI) = 0

A-λI = [(a_1,1-λ,...,a_1,n),...,(a_n,1,...,a_n,n-λ)]; det(A-λI) = 0; we develop the determinant of the square matrix of order n with Laplace's rule and we obtain a polynomial in the unknown λ, P(λ) = (-1)ⁿλⁿ+..., and it is a polynomial in λ of degree n, and this is the characteristic polynomial of the matrix or of the endomorphism f of the linear application; to find the eigenvalues we look for the roots of the characteristic polynomial, P (λ) = 0; the characteristic polynomial, if we are dealing with an endomorphism of ℝⁿ, is a polynomial of degree n with real coefficients; in the field of real numbers, the characteristic polynomial could also have no roots or have less than n, for example P(λ) = λ²+1 has no real roots; in the field of complex numbers, the characteristic polynomial has n roots and each of these must be counted with the multiplicity, for example in the polynomial P(λ) = (λ-λ₁)³(λ-λ₂)⁴ there are multiple roots

If we want to find the eigenvalues of a linear application of ℝⁿ in ℝⁿ, we must look for the characteristic polynomial obtained from det(A-λI) = 0, and this is an equation of degree n in λ

A = [(0,0,1),(0,0,0),(0,0,3)], this matrix corresponds to an endomorphism of ℝ³, f(x,y,z) = (z,0,3z), we must look for the eigenvalues of this matrix that is we must look for the eigenvalues of the endomorphism of ℝ³; A-λI = [(-λ,0,1),(0,-λ,0),(0,0,3-λ)] and the characteristic polynomial is det(A-λI) = |A-λI|; if a matrix has all the null elements below or above the diagonal, it is called a triangular matrix, then its determinant is the product of the elements of the diagonal, so in this case det(A-λI) = |A-λI| = P(λ) = (-λ)(-λ)(3-λ) = λ²(3-λ) = 0, λ₁ = 3 with multiplicity 1 or simple root, λ₂ = 0 with multiplicity 2 or double root, therefore this matrix A or the endomorphism f has 3 real eigenvalues, λ₁ = 3 with multiplicity 1 and λ₂ = 0 with multiplicity 2; to find the eigenvectors we have to use λ₁ and λ₂ separately; λ₁ = 3, we have to rewrite the linear system corresponding to the matrix A-λI = [(-λ,0,1),(0,-λ,0),(0,0,3-λ)] where instead of λ we substitute the value 3, {-3x+z = 0, -3y = 0, 0 = 0}, there are 2 independent equations so there are non-zero solutions, y = 0, z = 3x, the most general solution is formed by the vectors of the form (x,0,3x), there is a free unknown and a base of vector space, the autospace V_λ₁ is formed by the vector (1,0,3) which is obtained by substituting x = 1 that is the only free unknown equal to 1, and is a vector space of dimension 1; λ₂ = 0, we have to rewrite the linear system corresponding to the matrix A-λI = [(-λ,0,1),(0,-λ,0),(0,0,3-λ)] where instead of λ we substitute the value 0, {z = 0, 0 = 0, 3z = 0}, there is only one distinct equation, the solution is (x,y,0) where x and y are free unknowns, the space vector V_λ₂ has 2 vectors because 2 are the free unknowns, therefore a base of the vector space V_λ₂ is formed by 2 linearly independent vectors because 2 are the free unknowns, (1,0,0), (0,1,0), therefore V_λ₂ is a subspace of dimension 2; the eigenspace V_λ₁ has a base formed by the vector (1,0,3), the eigenspace V_λ₂ has a base formed by 2 vectors (1,0,0), (0,1,0); in this example an eigenvalue of multiplicity 1 has generated an eigenspace of dimension 1, and an eigenvalue of multiplicity 2 has generated an eigenspace of dimension 2, but it is only a coincidence

A = [(0,0,2),(0,0,0),(0,0,0)], this matrix corresponds to the linear application of ℝ³ in ℝ³, f(x,y,z) = (2z,0,0); we must find the characteristic polynomial; A-λI = [(-λ,0,2),(0,-λ,0),(0,0,-λ)], det(A-λI) = |A-λI| = P(λ) = (-λ)(-λ)(-λ) = -λ³ = 0, the only solution is λ₁ = 0 with multiplicity m₁ = 3

; we look for the eigenvectors relative to the eigenvalue λ₁ = 0, in the matrix A-λI we have to replace λ with 0 and we get again the matrix A with the system {2z = 0, 0 = 0, 0 = 0}, therefore we must study the system AX = 0 that is if the eigenvalue is λ = 0, the corresponding eigenvectors form the eigenspace relative to λ, but they are actually the vectors of the nucleus of the linear application or ker (f); the equation 2z = 0 has as solution (x,y,0), and the bases of the eigenspace are the vectors (1,0,0) and (0,1,0); therefore the eigenspace which coincides with ker(f) has dimension 2 which is the number of vectors of the basis which are (1,0,0) and (0,1,0); in this example the eigenspace has dimension 2, but it originates from λ₁ = 0 with multiplicity 3; therefore there is no relation between the dimension of the vector space of the eigenvectors, the eigenspace, and the multiplicity of the eigenvalue which is obtained as the root of the characteristic polynomial; in this example there is only 1 eigenvalue, its multiplicity is 3, equal to the dimension of the vector space ℝ³

By definition, an eigenvalue is a number λ such that there are non-zero vectors and f(v) = λv, so by definition the eigenspace of a number λ cannot be the only zero, because this negates the definitions of eigenvalue and eigenvector, therefore it is impossible that V_λ = {0_ℝⁿ}

A = [(3,0,0),(0,5,0),(0,0,-2)] we look for the eigenvalues of this matrix; A-λI = [(3-λ,0,0),(0,5-λ,0),(0,0,-2-λ)], matrix A-λI, like matrix A, is a diagonal matrix that is a matrix where only the elements of the main diagonal are non-zero; the determinant of the matrix A-λI, which is its characteristic polynomial, is the product of the elements of the diagonal, det(A-λI) = |A-λI| = P(λ) = (3-λ)(5-λ)(-2-λ), the roots of this characteristic polynomial are λ₁ = 3, λ₂ = 5, λ₃ = -2; if a matrix is diagonal the eigenvalues are equal to the n elements of the diagonal, and if in the diagonal there are elements equal to each other we find eigenvalues with multiplicity > 1

A = [(3,0,0),(0,5,0),(0,0,3)], A-λI = [(3-λ,0,0),(0,5-λ),(0,0,3-λ)], det(A-λI) = |A-λI| = P(λ) = (3-λ)(5-λ)(3-λ) = (3-λ)²(5-λ), λ₁ = 3 with multiplicity m₁ = 2, λ₂ = 5 with multiplicity m₂ = 1; we calculate the eigenvector relative to λ₁ = 3, therefore in the matrix A-λI we replace λ with the value 3, A-λI = [(3-λ,0,0),(0,5-λ),(0,0,3-λ)] = [(3-3,0,0),(0,5-3,0),(0,0,3-3)] = [(0,0,0),(0,2,0),(0,0,0)], therefore the only equation is 2y = 0, and x and z are free unknowns, the eigenspace has dimension 2 which in this case is equal to the multiplicity of the eigenvalue λ₁ = 3 with multiplicity m₁ = 2; the equation 2y = 0 has as solution (x,0,z), and the bases of the eigenspace are the vectors (1,0,0) and (0,0,1), therefore the eigenspace relative to the eigenvalue λ₁ = 3 has dimension 2

To find the eigenvalues we use the characteristic polynomial P(λ) which is obtained by calculating the determinant of the matrix A-λI and setting it equal to 0, P(λ) = det(A-λI) = |A-λI| = 0; these eigenvalues can be in the real field, and some can have multiplicity > 1; the corresponding eigenspaces always have dimension at least 1, their dimension is never 0; to find the eigenvectors that form the eigenspace we have to solve the homogeneous linear system (A-λI)X = 0, putting the value of the eigenvalue in the place of λ

33 - THE DIAGONALIZATION OF SQUARE MATRICES

Considering an endomorphism of ℝⁿ, we can calculate the dimension of the eigenspaces; an eigenspace always has dimension ≥ 1; A = [(0,0,2),(0,0,0),(0,0,0)], this is a square matrix of order 3, so it represents an endomorphism of ℝ³, and if we want to find eigenvalues and eigenspaces we have to compute the determinant of the matrix A-λI = [(-λ,0,2),(0,-λ,0),(0,0,-λ)] = 0, (-λ)³ = 0, λ₁ = 0 with multiplicity m₁ = 3; the only eigenspace of this endomorphism is calculated by replacing λ in the matrix A-λI with the value λ₁ = 0, so A-λI = [(0,0,2),(0,0,0),(0,0,0)], and therefore the homogeneous linear system obtained is {2z = 0, 0 = 0, 0 = 0}, x and y are free unknowns and the solution of the system is (x,y,0); V_λ₁ is the eigenspace generated by vectors (1,0,0) and (0,1,0), V_λ₁ = L((1,0,0),(0,1,0)); to the eigenvalue λ₁ = 0 with multiplicity m₁ = 3 is related the eigenspace with dimension d₁ = 2 and d₁ < m₁; the dimension of the eigenspace must always be ≥ 1

A = [(0,2,0),(0,0,1),(0,0,0)], we need to find the eigenvalues and eigenvectors of this matrix; A-λI = [(-λ,2,0),(0,-λ,1),(0,0,-λ)], is a triangular matrix so the determinant is calculated by multiplying the elements of the main diagonal, det(A-λI) = (-λ)(-λ)(-λ) = -λ³, -λ³ = 0, so λ₁ = 0 with multiplicity m₁ = 3; to find the eigenspace corresponding to λ₁ = 0, in the matrix A-λI = [(-λ,2,0),(0,-λ,1),(0,0,-λ)] we have to substitute λ with 0 and we get the matrix [(0,2,0),(0,0,1),(0,0,0)] which is equal to matrix A, and from this matrix we write the corresponding system of equations {2y = 0, z = 0, 0 = 0}, therefore x is a free unknown and the solutions are of the form (x,0,0), so a base of V_λ₁ is formed by the only vector (1,0,0) from which we deduce that dim(V₁) = d₁ = 1, and the value 1 is the minimum possible, and the value 1 is different from the multiplicity m₁ = 3 of λ₁ = 0

To calculate the dimension of the autospace we look for the solutions of the homogeneous system of the autospace and count the elements of the base, but there is also another method; looking for the eigenspace means looking for the solutions of the homogeneous system (A-λI)X = 0 that is looking for the nucleus of the linear application associated with the matrix A-λI and this is a new linear application that we denote by f ℝⁿ → ℝⁿ, and f is f-λ⋅id_ℝⁿ that is the application f associated to the matrix A minus lambda times the identical linear application of ℝⁿ and to each element is associated f(x,y,z)-λ(x,y,z); it is the nucleus of the linear application associated with the matrix A-λI, and we must calculate the dimension of the nucleus without doing the calculations to find a base for the nucleus; f is a linear application between ℝⁿ and ℝⁿ, the nucleus dimension of this linear application is equal to n minus the image dimension, dim(Ker(f)) = n-ρ(A-λI); to calculate the dimension of the nucleus that is the eigenspace relative to the eigenvalue λ, we can use the formula dim(Ker(f)) = n-ρ(A-λI)

Calculate the dimension of the nucleus of A = [(0,2,0),(0,0,1),(0,0,0)]; n = 3, A-λI = [(-λ,2,0),(0,-λ,1),(0,0,-λ)], λ = 0, A-λI = A = [(0,2,0),(0,0,1),(0,0,0)], ρ(A-λI) = 2, because matrix A is reduced by rows and has 2 non-zero rows, dim(Ker(f)) = n-ρ(A-λI) = 3-2 = 1, which is the dimension of the nucleus

Calculate the dimension of the nucleus of A = [(0,0,2),(0,0,0),(0,0,0)]; A-λI = [(-λ,0,2),(0,-λ,0),(0,0,-λ)], det(A-λI) = (-λ)(-λ)(-λ) = -λ³, -λ³ = 0, λ₁ = 0 with m₁ = 3, the only eigenvalue is the null eigenvalue of multiplicity 3; dim(Ker(f)) = dim(V_λ₁) = n-ρ(A-λI) = 3-1 = 2

In general the eigenspace is the nucleus of the linear application associated with the matrix A-λI, therefore the dimension of the eigenspace relative to the eigenvalue λ is dim(V_λ) = n-ρ(A-λI)

The eigenspace relating to an eigenvalue can have a variable dimension; the dimension of the eigenspace relative to an eigenvalue is always ≥ 1 and is always ≤ the multiplicity of the eigenvalue as the root of the characteristic polynomial, and this is a theorem

λ is an eigenvalue of multiplicity m, V_λ is the eigenspace associated with λ, then 1 ≤ dim(V_λ) ≤ m

λ is an eigenvalue of multiplicity m of the endomorphism f: ℝⁿ → ℝⁿ, V_λ is the eigenspace associated with λ or the set of eigenvectors relative to the eigenvalue λ, then 1 ≤ dim(V_λ) ≤ m, the dimension of the eigenspace associated with the eigenvalue λ is always between 1 and the multiplicity of the eigenvalue λ as the root of the characteristic polynomial

We need to understand how we can diagonalize a matrix; we have a linear application f: ℝⁿ → ℝⁿ with the associated matrix A; and this matrix A is diagonalizable if we can transform it with a standard operation into a diagonal matrix; a matrix is called a diagonal when on the main diagonal there are any elements and all the other elements are 0; an example of a diagonal matrix is D = [(1,0,0),(0,2,0),(0,0,-1)]; an example of a non-diagonal matrix is A = [(1,1),(0,1)]; only some matrices can be diagonalized, because only some matrices, after a suitable transformation, can become diagonal

Considering an endomorphism f: ℝⁿ → ℝⁿ with the associated matrix A, and with eigenvalues λ₁,...,λ_r, with multiplicity m₁,...,m_r, as roots of the characteristic polynomial P(λ) = (λ-λ₁)^m₁⋅...; the characteristic polynomial can have all real roots, or some real roots and some complex roots, or all complex roots; the characteristic polynomial λ²+1 = 0 has no real roots; the characteristic polynomial (λ-1)(λ+1)(λ²+2) has real and complex roots; we consider a characteristic polynomial with all real roots that is when λ₁,...,λ_r are all real roots with their multiplicities m₁,...,m_r, and the corresponding eigenspaces are V_λ₁,...,V_{λ_r} with their dimensions d₁,...,d_r, and 1 ≤ dim(V_λ) ≤ m, if d₁ = m₁, ..., d_r = m_r then f is a simple endomorphism; if all the eigenspaces have the dimension equal to the maximum possible then f is a simple endomorphism

If all the eigenvalues are real and all the dimensions of the eigenspaces coincide with the multiplicities of the eigenvalues as roots of the characteristic polynomial, then f is a simple endomorphism; the diagonalization is related to the condition of simple endomorphism; endomorphism is simple when all the eigenvalues are real and the dimensions of the eigenpaces are as high as possible

Endomorphism f(x,y) = (x+y,-y) with associated matrix A = [(1,1),(0,-1)]; A-λI = [(1-λ,1),(0,-1-λ)], det(A-λI) = |(1-λ,1),(0,-1-λ)| = P(λ) = (1-λ)(-1-λ), (1-λ)(-1-λ) = 0, λ₁ = 1, λ₂ = -1; λ₁ = 1 and λ₂ = -1 are two eigenvalues, and in general the number of eigenvalues of an endomorphism is n, therefore all the eigenvalues of this endomorphism are real numbers; the dimension of the eigenspace is obtained by calculating the rank of the matrix A-λI after having replaced λ with its value; λ₁ = 1, A-λ₁I = [(1-λ₁,1),(0,-1-λ₁)] = [(1-1,1),(0,-1-1)] = [(0,1),(0,-2)], the rank of the matrix A-λ₁I is 1, so the rank of the eigenspace of the eigenvalue λ₁ is 1, dim(V_λ₁) = 1 = multiplicity of λ₁; λ₂ = -1, A-λ₂I = [(1-λ₂,1),(0,-1-λ₂)] = [(1-(-1),1),(0,-1-(-1))] = [(1+1,1),(0,-1+1)] = [(2,1),(0,0)], the rank of the matrix A-λ₂I is 1, so the rank of the eigenspace of the eigenvalue λ₂ is 1, dim(V_λ₂) = 1 = multiplicity of λ₂; this is a simple endomorphism

If an endomorphism ℝⁿ → ℝⁿ has eigenvalues λ₁,...,λ_n all real and distinct, distinct because the multiplicity is 1, m₁ = 1, ..., m_n = 1, the corresponding eigenspaces all have dimension 1, dim(V_λ₁) = 1, ..., dim(V_{λ_n}) = 1; if all the eigenvalues are real and have multiplicity 1 that is when all the eigenvalues are real and distinct, then all the eigenspaces have dimension 1, and this happens in a simple endomorphism

A = [(0,0,2),(0,0,0),(0,0,0)]; A-λI = [(-λ,0,2),(0,-λ,0),(0,0,-λ)], det(A-λI) = (-λ)(-λ)(-λ) = -λ³, -λ³ = 0, λ₁ = 0 with multiplicity m₁ = 3; dim(V_λ₁) = n-ρ(A-λI) = 3-1 = 2; this is an example of endomorphism which is not simple because the dimension of the autospace V_λ₁ is less than the multiplicity of λ₁

A matrix A is diagonalizable if it can be transformed into a diagonal matrix; suppose that there exists a square matrix P of the same order n as matrix A, and P is an invertible matrix so it admits the inverse matrix P^-1; P^-1⋅A⋅P = D, the matrices A, P, P^-1 are 3 square matrices of order n, therefore also D is a square matrix of order n, and the matrix D is said to be similar to A; if we can find an invertible matrix P such that matrix D is diagonal, then A is a diagonalizable matrix

There are matrices that are already diagonal and therefore are certainly diagonalizable, for example, A = [(1,0),(0,2)] is a diagonal matrix and to diagonalize a matrix that is already diagonal P = I, I^-1⋅A⋅I = A because I^-1 = I, so diagonal matrices are diagonalizable

f(x,y) = (x+y,-y) is a simple endomorphism and we need to understand if the matrix is diagonalizable; {x+y = 0, -y = 0}, A = [(1,1),(0,-1)]; A-λI = [(1-λ,1),(0,-1-λ)], det(A-λI) = P(λ) = (1-λ)(-1-λ), (1-λ)(-1-λ) = 0, 1-λ = 0, λ₁ = 1, -1-λ = 0, λ₂ = -1; eigenvalue λ₁ = 1, A-λ₁I = [(1-λ₁,1),(0,-1-λ₁)] = [(1-1,1),(0,-1-1)] = [(0,1),(0,-2)], {y = 0, -2y = 0}, y = 0, eigenvector V_λ₁ = (1,0); eigenvalue λ₂ = -1, A-λ₁I = [(1-λ₁,1),(0,-1-λ₁)] = [(1-(-1),1),(0,-1-(-1)] = [(1+1,1),(0,-1+1)] = [(2,1),(0,0)], {2x+y = 0, 0 = 0}, y = -2x, eigenvector V_λ₂ = (1,-2); P = [(1,1),(0,-2)], the columns of this matrix P are the eigenvectors V_λ₁ and V_λ₂, this matrix P is certainly invertible because the determinant of P is different from 0, det(P) = 1(-2)-1⋅0 = -2 ≠ 0; to calculate the matrix P^-1 that is the inverse matrix of the matrix P, we use the method of algebraic complements, P^-1 = -1/2[(-2,-1),(0,1)] = [(1,1/2),(0,-1/2)]; P⋅P^-1 = [(1,1),(0,-2)][(1,1/2),(0,-1/2)], a_1,1 = 1⋅1+1⋅0 = 1+0 = 1, a_1,2 = 1(1/2)+1(-1/2) = (1/2)-(1/2) = 0, a_2,1 = 0⋅1+(-2)0 = 0+0 = 0, a_2,2 = 0(1/2)+(-2)(-1/2) = 0+1 = 1, P⋅P^-1 = [(1,1),(0,-2)][(1,1/2),(0,-1/2)] = [(1,0),(0,1)] = I; P^-1⋅A = [(1,1/2),(0,-1/2)][(1,1),(0,-1)] = a_1,1 = 1⋅1+(1/2)0 = 1+0 = 1, a_2,1 = 1⋅1+(1/2)(-1) = 1-(1/2) = 1/2, a_2,1 = 0⋅1+(-1/2)0 = 0+0 = 0, a_2,2 = 0⋅1+(-1/2)(-1) = 0+1/2 = 1/2, P^-1⋅A = [(1,1/2),(0,-1/2)][(1,1),(0,-1)] = [(1,1/2),(0,1/2)]; P^-1⋅A⋅P = [(1,1/2),(0,1/2)][(1,1),(0,-2)], a_1,1 = 1⋅1+(1/2)0 = 1+0 = 1, a_1,2 = 1⋅1+(1/2)(-2) = 1-1 = 0, a_2,1 = 0⋅1+(1/2)0 = 0+0 = 0, a_2,2 = 0⋅1+(1/2)(-2) = 0-1 = -1, P^-1⋅A⋅P = [(1,1/2),(0,1/2)][(1,1),(0,-2)] = [(1,0),(0,-1)] = D, the result is the matrix D which is a diagonal matrix, and the elements of the diagonal of matrix D are the eigenvalues of matrix A which are λ₁ = 1 e λ₂ = -1

If f: ℝⁿ → ℝⁿ is a simple endomorphism, then there are real eigenvalues λ₁,...,λ_r, with multiplicity m₁,...,m_r, there are the eigenspaces V_λ₁,...,V_{λ_r}, and dim(V_λ₁) = m₁, ..., dim(V_{λ_r}) = m_r, each of these eigenspaces has a base, and m₁+...+m_r = n, v,...,v_n is the set of base vectors, and these base vectors are the columns of the matrix P; vectors v,...,v_n are linearly independent being bases of different eigenspaces; the matrix P has the columns formed by n linearly independent vectors, then the matrix P has maximum rank, ρ(P) = n, therefore the determinant is different from 0, det(P) ≠ 0, therefore P is invertible, then we can use the formula P^-1⋅A⋅P = D, where D is a diagonal matrix having on the main diagonal the eigenvalues of matrix A and elsewhere all 0; D = P^-1⋅A⋅P = [(λ₁,0,...,0,0),(0,λ₁,...,0,0),...,(0,0,...,0,λ_n)], the number of times that an eigenvalue is repeated on the main diagonal is its multiplicity, the eigenvalues of the matrix A are repeated on the diagonal of the matrix D as many times as is their multiplicity as roots of the characteristic polynomial; if f is a simple endomorphism that is if it has all the real eigenvalues, and the sum of the multiplicity of the eigenvalues is equal to n, which is the dimension of ℝⁿ, and if the dimension of each eigenspace is equal to the multiplicity of the corresponding eigenvalue, then the matrix A associated with the linear application f, which is a simple endomorphism, is a diagonalizable matrix, so there is a matrix P which is invertible such that the product P^-1⋅A⋅P is the diagonal matrix D; the matrix P is obtained by inserting in column 1 the vectors of a base of V_λ₁, up to column n where we insert the vectors of a base of V_{λ_n}, considering that every eigenspace has a base, and on the diagonal of the diagonalized matrix D we obtain the eigenvalues of the matrix A repeated each with their multiplicity

Not all endomorphisms are simple, therefore not all matrices are diagonalizable; for example the matrix A = [(0,0,2), (0,0,0), (0,0,0)] is not diagonalizable, therefore the result of the product P^-1⋅A⋅P is not a diagonal matrix; the matrix A = [(0,0,2),(0,0,0),(0,0,0)] cannot be diagonalized because there are too few eigenvectors, the eigenspace associated with the only eigenvalue has dimension 2, so it is not possible to create the column 3 of matrix P

The characteristic polynomial λ²+1, in ℂ admits 2 roots which are i and -i; a similar characteristic polynomial certainly generates a simple endomorphism so the matrix is diagonalizable, but this is true in ℂ, it is not true in ℝ

Eigenvalue: ∃ v ≠ 0 so that f(v) = λv, and det(A-λI) = 0; the eigenvalues are calculated by solving the equation det(A-λI) = 0

Eigenspace: (A-λI)X = 0; the eigenspaces are computed by solving the system (A-λI)X = 0

D = P^-1AP = [(λ₁,0,...,0,0),(0,λ₁,...,0,0),...,(0,0,...,0,λ_n)], this is when the matrix A is diagonalizable

34 - CONCEPT OF DERIVATIVE

The function f(x) = x² has a parabola as its graph; we consider two points with abscissa x₀ and x, and we consider the angular coefficient of the secant joining the point (x₀,f(x₀)) and (x,f(x)); the slope is given by the ratio between the variation of the function and the variation of the independent variable, r(x) = (f(x)-f(x₀))/(x-x₀) = (x²-x₀²)/(x-x₀), we are interested in the limit of this ratio which is a function defined for all x except for x = x₀, and we are interested in the limit of this function when x = x₀ because this ratio represents the angular coefficient that is the slope, of the secant to the graph of the function f passing through the points x₀ and x, and the limit of this function represents the slope that is the angular coefficient of the tangent to the graph at the point x₀; factoring the numerator (x²-x₀²)/(x-x₀) = ((x-x₀)(x+x₀))/(x-x₀) = x+x₀ that is the function for all x ≠ x₀ coincides with the function x+x₀ which is a polynomial function of first degree and is defined on all ℝ, is continuous, and the limit it tends to is 2x₀; if we calculate the slope of the tangent at a point of abscissa x₀ to the parabola which is the graph of the function f(x) = x², then the slope is 2x₀; when x₀ is positive, the slope is positive; when x₀ = 0, the slope is zero; when x₀ is negative, the slope is negative; f(x) = x², x₀, x, r(x) = (f(x)-f(x₀))/(x-x₀) = (x²-x₀²)/(x-x₀) = ((x-x₀)(x+x₀))/(x-x₀) = x+x₀ → 2x₀

Galielo Galilei, born in Pisa in 1564 and died in Arcetri in 1642, studied the falling bodies and their naturally accelerated motion using inclined planes, and discovered that the spaces covered by the falling bodies are proportional to the square of the times

s(t) = (g/2)t², s(t) is the space traveled by a falling body after the time t; we want to calculate the average speed in the time interval between t₀ and t, and we calculate the limit of this average speed by tending t to t₀ obtaining the instantaneous speed; (s(t)-s(t₀))/(t-t₀) = mean velocity relative to the time interval between t₀ and t; (s(t)-s(t₀))/(t-t₀) = ((g/2)t²-(g/2)t₀²)/(t-t₀) = (g/2)(t²-t₀²)/(t-t₀) = (g/2)(t-t₀)(t+t₀)(t-t₀) = (g/2)(t+t₀), when t tends to t₀ (g/2)(t+t₀) = (g/2)(t₀+t₀) = (g/2)(2t₀) = g⋅t₀, therefore the limit exists and is g⋅t₀; the instantaneous speed, which is the limit of the average speed, exists at every point, is well defined and is g⋅t₀

s(t) = (g/2)t², (s(t+h)-s(t))/h = ((g/2)(t+h)²-(g/2)t²)/h = (g/2)((t+h)²-t²)/h = (g/2)(t²+2th+h²-t²)/h = (g/2)(2th+h²)/h = (g/2)(2t+h), the limit for h tending to 0 is (g/2)(2t) = gt

f: I → ℝ, x₀ ∈ I, x ≠ x₀, (f(x)-f(x₀))/(x-x₀), this incremental ratio is the angular coefficient of the secant to the graph and passes through the abscissa points x₀ and x; this function, which is an incremental ratio, is defined for all x of the interval I, except the point x₀, and we are interested in the limit for x which approaches x₀; lim_x→x₀((f(x)-f(x₀))/(x-x₀)) = f'(x₀); lim_x→x₀((f(x)-f(x₀))/(x-x₀)) = f'(x₀), Df(x₀) = Df(x)|_x=x₀, this is the derivative of the function f at the point x₀

f: I → ℝ, x₀ ∈ I, x ≠ x₀, lim_x→x₀((f(x)-f(x₀))/(x-x₀)) = f'(x₀)

The derivative is the limit of the incremental ratio: lim_x→x₀((f(x)-f(x₀))/(x-x₀)) = f'(x₀)

The incremental ratio is (f(x)-f(x₀))/(x-x₀), ratio between the variation of the function and the variation of the independent variable; incremental in this context means variation

f(x) = mx+q, is a polynomial function of first degree; we take 2 points on a straight line, the straight line joining these 2 points is the starting line; for polynomial functions of degree ≤ 1 the incremental ratio must be constant; f(x) = mx+q, f(x₀) = mx₀+q, (f(x)-f(x₀))/(x-x₀) = (mx+q-mx₀-q)/(x-x₀) = m(x-x₀)/(x-x₀) = m

We compute the derivative of xⁿ, where n is a natural number; f(x) = xⁿ, (f(x+h)-f(x))/h = ((x+h)ⁿ-xⁿ)/h, we must apply the binomial formula, ((x+h)ⁿ-xⁿ)/h = (xⁿ+nx^n-1h+C(n,2)x^n-2h²+...+hⁿ-xⁿ)/h = nx^n-1+C(n,2)x^n-2h+h^n-1, by calculating the limit for h which tends to zero all the addends containing h are canceled, nx^n-1+C(n,2)x^n-2h+h^n-1 = nx^n-1, so the derivative of xⁿ is nx^n-1

f(x) = c, f'(x) = 0; f(x) = x, f'(x) = 1;f(x) = x², f'(x) = 2x; f(x) = xⁿ, f'(x) = nx^n-1

f(x) = x⁰ = 1, f'(x) = 0, because x⁰ = 1 = c

A function is differentiable when the derivative exists and is a real number; if the limit of the incremental ratio is +∞ or -∞ the function is not defined as differentiable, but derivative at +∞ or -∞

f(x) = c, f'(x) = 0; f(x) = x, f'(x) = 1;f(x) = x², f'(x) = 2x; f(x) = xⁿ, f'(x) = nx^n-1; these are all derivable functions in all the points of their natural domain which is all ℝ; the derivative function is the function which, to each x of the domain of the starting function, associates the value of the derivative calculated at that point

The function f is said to be differentiable at a point if its derivative exists and is finite at the same point

Continuity is a necessary but not sufficient condition for derivability; a function that can be differentiated at a point is certainly continuous at that point, but a function that is continuous at a point may not be differentiable at that point

f: I → ℝ, in x₀ the function is differentiable and we must prove that it is also continuous; we must show that the limit of f(x) for x tending to x₀ is f(x₀) that is the difference f(x)-f(x₀) tends to zero as x approaches x₀; f(x)-f(x₀) = ((f(x)-f(x₀))/(x-x₀))(x-x₀), calculating the limit for x tending to x₀ we obtain f'(x₀)⋅0, therefore by the product limit theorem everything tends to 0

A function can be continuous at a point but not differentiable at that point; f(x) = |x|, the graph is the union of the bisectors of the first and second quadrant, and this function is continuous everywhere but cannot be differentiated in the origin; this curve which is actually a broken line does not have a tangent in the origin, in the origin the derivative does not exist; the incremental ratio of the function f(x) = |x| in the origin is (|x|-0)/(x-0) = |x|/x, this incremental ratio is 1 when x is positive, because the absolute value of x coincides with x, but it is -1 for negative x, therefore this incremental ratio does not admit a limit, but admits a limit on the right which is 1, and in fact in the first quadrant the graph is a ray with angular coefficient 1, and admits the limit to the left which is -1, and in fact in the second quadrant the graph is a ray with angular coefficient -1; therefore in these cases we must consider the derivative on the right and the derivative on the left

The derivative on the right is the limit of the incremental ratio when x approaches x₀⁺ or from the right; the derivative on the left is the limit of the incremental ratio when x approaches x₀^- or from the left

The derivative at a point exists if and only if at the same point the derivative on the right and the derivative on the left exist and coincide

We know how to calculate the derivative of monomial functions, but now we are interested in calculating the derivative of the circular functions, sine and cosine

Using a prostaferesis formula, sin(x)-sin(x₀) = 2(sin((x-x₀)/2))(cos((x+x₀)/2)), we get the incremental ratio of the sine function, (sin(x)-sin(x₀))/(x-x₀) = (2/(x-x₀))(sin((x-x₀)/2))(cos((x+x₀)/2)), we have to calculate the limit for x which approaches x₀, (x-x₀)/2 = t, the limit for x tending to x₀ is the limit for t tending to 0, we know that lim_x→0(sin(x)/x) = 1, so lim_t→0(sin(t)/t) = 1, lim_x→x₀(cos((x+x₀)/2)) = cos((x₀+x₀)/2) = cos(2(x₀)/2) = cos(x₀), therefore the derivative of the sine function calculated at the point x₀ is the cosine of x₀, f'(sin(x)) = cos(x), (sin(x))' = cos(x), D(sin(x)) = cos(x)

The cosine function is differentiable at any point of its domain, which is all ℝ; f'(cos(x)) = -sin(x), (cos(x))' = -sin(x), D(cos(x)) = -sin(x)

We know how to derive monomials and the circular functions sine and cosine; we will study how to derive the exponential function and the logarithm function; we will study how to derive integer rational functions also called polynomial functions

f(x) = a^x, a > 1; if the base a = e, the tangent to the exponential curve at the point (0,1) has an angular coefficient 1, so it is parallel to the bisector of the first and third quadrant; if we choose as the base a the eluter number, denoted by the letter e, the derivative of the function e^x, calculated for x = 0 is 1, and this is the reason that makes the euler number important in the context of exponential functions; we calculate the incremental ratio for a generic exponential function f(x) = a^x, (f(x)-f(x₀))/(x-x₀) = (a^x-a^x₀)/(x-x₀) = (a^x-x₀+x₀-a^x₀)/(x-x₀) = (a^x₀⋅a^x-x₀-a^x₀)/(x-x₀) = a^x₀((a^x-x₀-1)/(x-x₀)) = a^x₀((a^h-1)/h), is an incremental ratio because 1 = a⁰, is the incremental ratio of the exponential function in the origin, the limit of the incremental ratio for x tending to x₀ that is for h tending to 0, is the derivative, and is a quantity that does not depend on x₀ but is a constant that depends on a, c(a), which is the derivative of a^x calculated at x = 0, c(a) = D(a^x)|_x=0, lim_h→0(a^x₀((a^h-1)/h)) = a^x₀⋅c(a); D(a^x) = c(a)⋅a^x = D(a^x)|_x=0⋅a^x, therefore the derivative of a^x is equal to a^x multiplied by the derivative of a^x at the point x = 0; D(e^x) = e^x, the derivative of e^x for x = 0 is 1; the derivative of e^x is equal to e^x that is in the curve e^x the ordinate of each point is equal to the angular coefficient of the tangent line to that point

The inverse function of the exponential function is the logarithm function; f(x) = log_a(x), the function is defined for x > 0, and the incremental ratio is (log_a(x+h)-log_a(x))/h = (1/h)(log_a((x+h)/x)) = (1/h)(log_a(1+h/x)), h/x = t, (1/h)(log_a(1+h/x)) = (1/x)(x/h)log_a(1+t) = (1/x)(1/t)log_a(1+t) = (1/x)log_a((1+t)^1/t), we need to calculate the limit for t which tends to 0 of the incremental ratio of the logarithm function, lim_t→0((1/x)log_a((1+t)^1/t)), t = 1/x, (1+t)^1/t = (1+1/x)^x, lim_x→±∞((1+1/x)^x) = e, lim_t→0^±((1+t)^1/t) = e, the derivative of the logarithm with base e is 1/x

35 - DERIVATIVE THEOREMS

Calculate the derivative of the function log_a(x), a > 1; (log_a(x+h)-log_a(x))/h = (1/x)log_a((1+t)^1/t), t := h/x; x is a fixed positive number, h is the increment that tends to zero, therefore t tends to zero; lim_x→±∞((1+(1/x))^x) = e; lim_t→0^±((1+t)^1/t) = e; by virtue of the continuity of the logarithm function, D(log_a(x)) = (1/x)log_a(e); a = e, D(log_e(x)) = D(ln(x)) = 1/x; D(e^x) = e^x; the curve of the function ln(x), at the point (1,0), has a tangent with angular coefficient 1, since the derivative is 1/x and 1/1 = 1, therefore the tangent at the point (1,0) of the function ln(x) is parallel to the bisector of the first and third quadrant; the tangent at the point (0,1) of the function e^x has angular coefficient 1, because D(e^x) calculated for x = 0 is 1, so it is parallel to the bisector of the first and third quadrant; the tangent at the point (0,1) of the function e^x is parallel to the tangent at the point (1,0) of the function ln(x), and they are both parallel to the bisector of the first and third quadrant, as they have an angular coefficient of 1; the function e^x and the function ln(x) are symmetrical with respect to the bisector of the first and third quadrant, therefore they are the inverse of each other

The limit of the sum is equal to the sum of the limits; the limit of the difference is equal to the difference of the limits; the limit of the product is equal to the product of the limits; the limit of the quotient is equal to the quotient of the limits

The derivative of the sum of two functions is equal to the sum of their respective derivatives: (f(x)+g(x))' = f'(x)+g'(x); the incremental ratio of the sum function is equal to the sum of the incremental ratios; lim_x→x₀((f(x)+g(x)-(f(x₀)+g(x₀)))/(x-x₀)) = lim_x→x₀((f(x)-f(x₀))/(x-x₀))+lim_x→x₀((g(x)-g(x₀))/(x-x₀)) = f'(x₀)+g'(x₀), and therefore the theorem is proved

(c⋅f(x))' = c⋅(f(x))', lim_x→x₀((c⋅f(x)-c⋅f(x₀))/(x-x₀)) = c⋅lim_x→x₀((f(x)-f(x₀))/(x-x₀))

To derive a polynomial it is enough to know how to derive monomials, because the derivative of a polynomial is obtained simply by deriving term by term

p(x) = x³-5x²+4x+3; the first derivative of p(x) is p'(x) = 3x²-10x+4; the second derivative of p(x) is p''(x) = 6x-10; the third derivative of p(x) is p'''(x) = 6; the fourth derivative of p(x) is p''''(x) = 0; starting from a polynomial of third degree and deriving 4 times, or once more than its degree, we have found the constant 0; starting from a polynomial of degree n, its nth derivative is a constant, and the (n+1)th derivative is zero

The derivative of the product is not the product of derivatives; the derivative of the product of two functions is not the product of the derivatives of the functions; x² = x⋅x, if it were true that the derivative of the product is the product of the derivatives, then D(x2) = D(x)⋅D(x) = 1⋅1 = 1, but it is false because D(x2) = 2x

The derivative of the product of 2 functions is equal to the derivative of the first function for the second function, plus the first function for the derivative of the second function: (f(x)⋅g(x))' = f'(x)⋅g(x)+f(x)⋅g'(x); D(x²) = D(x⋅x) = 1⋅x+x⋅1 = x+x = 2x; lim_x→x₀((f(x)g(x)-f(x₀)g(x₀))/(x-x₀)) = lim_x→x₀((f(x)g(x)-f(x₀)g(x₀)-f(x₀)g(x)+f(x₀)g(x))/(x-x₀)) = lim_x→x₀(g(x)((f(x)-f(x₀))/(x-x₀))+f(x₀)((g(x)-g(x₀))/(x-x₀))) = f'(x₀)g(x₀)+f(x₀)g'(x₀)

Differentiability implies continuity

D(sin(x)²) = D(sin(x)sin(x)) = cos(x)sin(x)+sin(x)cos(x) = 2sin(x)cos(x)

D(x⋅e^x) =1⋅e^x+x⋅e^x = e^x(1+x)

The derivative of the reciprocal function is equal to minus the ratio between the derivative of the function and the square of the function: (1/g(x))' = -(g'(x)/(g(x))²)

((1/g(x))-(1/g(x₀)))/(x-x₀) = (1/(x-x₀))((1/g(x))-(1/g(x₀))) = (1/(x-x₀))((g(x₀)-g(x))/(g(x)g(x₀))) = (-1/(x-x₀))((g(x)-g(x₀))/(g(x)g(x₀))) = -((g(x)-g(x₀))/(x-x₀))(1/g(x)g(x₀)); lim_x→x₀(-((g(x)-g(x₀))/(x-x₀))(1/g(x)g(x₀))) = -(g'(x₀)/(g(x₀))²); -g'(x)/(g(x))²

D(1/x) = -1/x²

D(1/xⁿ) = D(x^-n) = -nx^n-1/x²ⁿ = (-nx^n-1)(x^-2n) = -nx^-2n+n-1 = -nx^-n-1

D(1/x³) = D(x^-3) = -3x^-4

D(1/x⁴) = D(x^-4) = -4x^-5

The derivative of the quotient of two functions is equal to the derivative of the numerator for the denominator minus the numerator for the derivative of the denominator, all divided by the square of the denominator: (f(x)/g(x))' = (f'(x)g(x)-f(x)g'(x))/(g(x))²

f(x)/g(x) = f(x)(1/g(x)); D(f(x)/g(x)) = D(f(x)(1/g(x))) = f'(x)/g(x)-f(x)g'(x)/(g(x))² = (f'(x)g(x)-f(x)g'(x))/(g(x))²

D(tan(x)) = D(sin(x)/cos(x)) = (cos(x)cos(x)-sin(x)(-sin(x)))/(cos(x))² = ((cos(x))²+(sin(x))²)/(cos(x))² = 1+(tan(x))² = 1/(cos(x))²; the tangent function is sin(x)/cos(x), and is defined where the cosine is not null that is for x ≠ (π/2)+kπ with k ∈ ℤ

D(cot(x)) = D(cos(x)/sin(x)) = (-sin(x)sin(x)-cos(x)cos(x))/(sin(x))² = (-(sin(x))²-(cos(x))²)/(sin(x))² = -((sin(x))²+(cos(x))²)/(sin(x))² = -1/(sin(x))²; the cotangent function is cos(x)/sin(x), and it is defined where the sine is not null that is for x ≠ kπ with k ∈ ℤ

We can calculate the derivative of any function that is a ratio between polynomials

(1-x²)/(1+x²), the denominator has no real zeros, it is not null for any real value of x, so this function is defined on the whole real line; D((1-x²)/(1+x²)) = (-2x(1+x²)-(1-x²)2x)/(1+x²)² = (-2x-2x³-2x+2x³)/(1+x²)² = -4x/(1+x²)²

The sine function is an odd function, changing x to -x, then f becomes -f; an odd function, such as the sine function, is symmetrical with respect to the origin; in an odd function such as the sine function, the tangent at a point x and the tangent at a point -x are parallel, therefore in an odd function the derivative calculated at the point x is equal to the derivative calculated at the point -x; the derivative of an odd function is an even function, in fact the derivative of the odd sine function is the even cosine function; an even function is symmetrical with respect to the ordinate axis and the tangents at the point x and -x are symmetrical with respect to the ordinate axis and the tangents have opposite angular coefficients; deriving an even function we obtain an odd function, and deriving an odd function we obtain an even function; the derivative of the even function cos(x) is the odd function -sin(x)

The function xⁿ is even when n is even; the function xⁿ is odd when n is odd; D(xⁿ) = nx^n-1, the derivative of an even function is an odd function, and the derivative of an odd function is an even function

The tangent function is an odd function, its graph is symmetrical with respect to the origin; D(tan(x)) = 1+(tan(x))² = 1/(cos(x))², 1+(tan(x))² and 1/(cos(x))² are even functions

The derivative of an even function is an odd function, and the derivative of an odd function is an even function

D((1+e^x)/(1+x²)) = (e^x(1+x²)-(1+e^x)2x)/(1+x²)² = (e^x(1+x²)-2x(1+e^x))/(1+x²)²

The graph f(x) and the graph f(-x) are symmetrical with respect to the ordinate axis; the graph f(x) and the graph -f(x) are symmetrical with respect to the abscissa axis

D(e^-x) = D(1/e^x); the graphs e^x and e^-x = 1/e^x are symmetrical with respect to the ordinate axis; D(e^-x) = D(1/e^x) = -e^x/e^2x = -e^x⋅e^-2x = -e^-x; the graphs e^-x and -e^-x are symmetrical with respect to the abscissa axis

The function e^x is increasing in every point, therefore in every point the tangent has a positive angular coefficient, in fact the derivative of e^x is e^x which is a positive function; the function e^-x is decreasing in every point, therefore in every point the tangent has negative angular coefficient, in fact the derivative of e^-x is -e^-x which is a negative function

If a function is increasing the derivative is ≥ 0; if a function is strictly increasing the derivative is > 0

If a function is decreasing the derivative is ≤ 0; if a function is strictly decreasing the derivative is < 0

If in an interval the derivative is positive, then in that interval the function is increasing

If in an interval the derivative is negative, then in that interval the function is decreasing

f(x) = √(x), x ≥ 0, x ∈ [0,+∞); the function f(x) = √x is increasing therefore the derivative is positive, but in the origin the tangent ray coincides with the ordinate axis, therefore for x = 0 the function is not differentiable because the derivative is +∞; a function is differentiable at a point when the derivative exists and when the value of the derivative is a finite number; (f(x+h)-f(x))/h = (√(x+h)-√(x))/h = ((√(x+h)-√(x))/h)((√(x+h)+√(x))/(√(x+h)+√(x))) = (x+h-x)/(h(√(x+h)+√(x))) = h/(h(√(x+h)+√(x))) = 1/(√(x+h)+√(x)); lim_h→0(1/(√(x+h)+√(x))) = 1/(√(x)+√(x)) = 1/(2√(x)); D(√(x)) = 1/(2√(x)); when x approaches 0⁺, 1/(2√(x)) tends to +∞; calculating the derivative in the origin, (f(x)-f(0))/(x-0) = (√(x)-0)/(x-0) = √(x)/x = 1/√(x), when x tends to 0⁺, 1/√(x) tends to +∞; D(√(x)) = {1/(2√(x)) for x > 0; +∞ for x = 0}; the function √(x) is strictly increasing and its derivative 1/(2√(x)) is strictly positive

There is a close link between the sign of the derivative and the monotonic property of the function

36 - DERIVATION OF COMPOUND FUNCTIONS

f₁: I₁ → ℝ, f₂: I₂ → ℝ, suppose that the image of f₁, f₁(I₁), is contained in I₂, x → f₁ → f₁(x) → f₂ → f₂(f₁(x)) that is (f₂∘f₁)(x), or composite function f₂ after f₁; if f₁ is differentiable at a point x₀, x₀ → f₁(x₀), and if f₂ is differentiable in f₁(x₀), it is possible to calculate the derivative of the composite function (f₂∘f₁)'(x₀)

The derivative of a compound function is equal to the product of the derivatives of the component functions: (f₂(f₁(x)))' = (f₂'(f₁(x)))(f₁'(x)); we can demonstrate this with the incremental ratio (f₂(f₁(x))-f₂(f₁(x₀)))/(x-x₀); suppose that for x other than x₀ implies that f₁(x) is different from f₁(x₀) that is f₁ is an injective function; ((f₂(f₁(x))-f₂(f₁(x₀)))/(f₁(x)-f₁(x₀)))((f₁(x)-f₁(x₀))/(x-x₀)), f₁(x) = y, f₁(x₀) = y₀, ((f₂(y)-f₂(y₀))/(y-y₀))((f₁(x)-f₁(x₀))/(x-x₀)); lim_x→x₀((f₁(x)-f₁(x₀))/(x-x₀)) = f₁'(x₀); lim_y→y₀((f₂(y)-f₂(y₀))/(y-y₀)) = f₂'(y₀) = f₂'(f₁(x₀)); (f₂(f₁(x₀)))' = (f₂'(f₁(x₀)))(f₁'(x₀))

Calculate the derivative of f(x) = sin(x²); x → f₁ → x² := t → f₂ → sin(t) = sin(x²) := y; to calculate the derivative of y with respect to x, we must derive y with respect to t, and t with respect to x, and the order is not important as the product has the commutative property; (sin(t))' = cos(t) = cos(x²) = f₂'(f₁(x)); (t)' = (x²)' = 2x = f₁'(x); (sin(x²))' = 2x⋅cos(x²)

Calculate the derivative of f(x) = sin²(x) = (sin(x))²; x → f₁ → sin(x) := t → f₂ → t² = sin²(x) := y; we must derive the variable y with respect to x using the chain rule that is we derive y with respect to t and t with respect to x, and we make the product, and the order is not important because the product has the commutative property; (t²)' = 2t = 2sin(x); (t)' = (sin(x))' = cos(x); (sin²(x))' = 2sin(x)cos(x); this derivative can also be calculated with the product rule because sin²(x) = sin(x)sin(x), (sin²(x))' = (sin(x)sin(x))' = cos(x)sin(x)+sin(x)cos(x) = 2sin(x)cos(x)

f₂(f₁(x)) ≠ f₁(f₂(x)); (f₂(f₁(x)))' ≠ (f₁(f₂(x)))'; in a compound function, the order of the functions is fundamental; the composition of functions is not always possible; f₂ after f₁ may be possible, but f₁ after f₂ may not be possible

Usually the independent variable is denoted by x and the dependent variable is denoted by y, but we can also use other symbols

f(t) = A⋅cos(ωt+γ); A, ω, γ are constants, t is the independent variable; t → ωt+γ := y → A⋅cos(y) := z; to derive z with respect to t, we must derive z with respect to y, and y with respect to t, and make the product of the 2 derivatives that is dz/dt = (dz/dy)(dy/dt); (ωt+γ)' = ω, ωt+γ is a first degree polynomial function and its derivative is ω; (A⋅cos(y))' = -A⋅sin(y) = -A⋅sin(ωt+γ); (A⋅cos(ωt+γ))' = (A⋅cos(y))'(ωt+γ)' = -A⋅sin(ωt+γ)ω

Symbols to indicate the derivative: (f(x))' = f'(x), used by Newton; D(f(x)), used by Cauchy; if y = f(x) the derivative of y with respect to x is dy/dx, used by Leibniz

The derivative of the inverse function is equal to the reciprocal of the derivative of the direct function: D(f^-1(f(x))) = 1/Df(x); D(f^-1(x)) = 1/D(f(x))

To be invertible, a function must be injective; function means that only one value of the dependent variable is associated with each value of the independent variable, so only one output corresponds to an input, therefore a vertical line meets the graph of a function at no more than one point; injective function means that distinct values of the independent variable are associated with distinct values of the dependent variable, so an output can be obtained from a single input, therefore a horizontal line meets the graph of an injective function at no more than one point

An inverse function allows us to return to the value of x; x → f → f(x) → f^-1 → f^-1(f(x)) = x; f^-1(f(x)) = x, f^-1(f(x)) is the identity function, suppose that f'(x) ≠ 0, D(f^-1(f(x)))⋅D(f(x)) = 1, so D(f^-1(f(x))) and D(f(x)) must be the reciprocal of one another, but this is true if the derivative of f^-1(f(x)) exists

An increasing function is certainly injective; the derivative is the angular coefficient of the tangent line at a point; the inverse function is symmetrical with respect to the bisector of the first and third quadrant; the tangent line at the symmetrical point of the inverse function has the angular coefficient which is the reciprocal of the angular coefficient of the tangent line of the direct function that is D(f^-1(f(x))) = 1/D(f(x))

The derivative at a point of a function is zero when the tangent at the point is parallel to the x-axis; if the derivative at a point of an increasing function is zero, then the derivative of the inverse function, in the symmetrical point with respect to the bisector of the first and third quadrant, is +∞, therefore the inverse function is not differentiable at that point; if the derivative at a point of a decreasing function is zero, then the derivative of the inverse function, in the symmetrical point with respect to the bisector of the first and third quadrant, is -∞, therefore the inverse function is not differentiable at that point

If f^-1 is the inverse function of f, then f is the inverse function of f^-1

y = f(x) = √(x), x ≥ 0, y² = x, x = f^-1(y) = y², y ≥ 0, dy/dx = 1/(dx/dy) = 1/2y = 1/2√(x); lim_x→0(1/(2(√(x)))) = +∞

If an increasing function has null derivative at a point, the inverse function at that point has derivative +∞; if a descending function has null derivative at a point, the inverse function at that point has derivative -∞

If a function is increasing, its inverse function is also increasing; if a function is decreasing, its inverse function is also decreasing

The arctangent function is the inverse function of the tangent function restricted in the open interval (-π/2,π/2); y = arctan(x), x ∈ ℝ ⇔ x = tan(y), -π/2 < y < π/2; the derivative of the inverse function is equal to the reciprocal of the derivative of the direct function, dy/dx = 1/(dx/dy) = 1/(1+tan²(y)) = 1/(1+x²) > 0

If a function is increasing the derivative is ≥ 0, and if the derivative is ≥ 0 the function is increasing; if a function is decreasing the derivative is ≤ 0, and if the derivative is ≤ 0 the function is decreasing

We can get the derivative of the exponential function from the derivative of the logarithm function, and we can get the derivative of the logarithm function from the derivative of the exponential function

The exponential function e^x has a derivative equal to 1 in the point (0,1), so in the point (0,1) the tangent is parallel to the bisector of the first and third quadrant; the logarithm function ln(x) has a derivative equal to 1 in the point (1,0), so in the point (1,0) the tangent is parallel to the bisector of the first and third quadrant; the tangent of the exponential function e^x at the point (0,1) is parallel to the tangent of the logarithm function ln(x) at the point (1,0), therefore they have the same angular coefficient, and both are symmetrical with respect to the bisector of the first and third quadrant

D(e^x) = e^x; y = e^x, x ∈ ℝ, x = ln(y), y > 0; the derivative of the exponential function is dy/dx = e^x; the derivative of the inverse function is equal to the reciprocal of the derivative of the direct function, dx/dy = 1/(dy/dx) = 1/e^x = 1/y, the derivative of the natural logarithm of y with respect to y is 1/y, D(ln(y)) = 1/y, therefore the derivative of the natural logarithm of x with respect to x is 1/x, D(ln(x)) = 1/x

The inverse function of sin(x) in the closed interval [-π/2,π/2] is arcsin(x); the inverse function of cos(x) in the closed interval [0,π] is arccos(x); the inverse function of tan(x) in the open interval (-π/2,π/2) is arctan(x)

When we talk about the maximum, minimum, supremum, infimum of a function, we are referring to the image of the function that is the set of values it assumes; f: I → ℝ, if x₀ is an absolute maximum point, then f(x₀) is the maximum of the image of the function, and this maximum does not always exist; Weierstrass's theorem states that if f is continuous and I is a compact interval that is a bounded and closed interval, then the maximum and the minimum exist that are the points where the function assumes the maximum value and the minimum value; x₀ can indicate the maximum or minimum that the function assumes in a neighborhood of x₀, therefore not an absolute maximum or minimum, but a relative maximum or minimum, so in the neighborhood [x₀-δ, x₀+δ], f(x₀) is the maximum or minimum that the function assumes

f(x) = x⁴-x² = x²(x²-1), the graph of the function intercepts the abscissa axis at the points (-1,0), (0,0), (1,0); -1 < x < 1, f(x) < 0; this function is unbounded above, therefore it has no maximum, and it is bounded below, therefore it has an absolute minimum; the point (0,0) is a point of relative maximum in the neighborhood of radius < 1

At a relative maximum the left derivative is ≥ 0, and the right derivative is ≤ 0

At a relative minimum the left derivative is ≤ 0, and the right derivative is ≥ 0

If a point is an absolute maximum it is also a relative maximum, but not vice versa; if a point is an absolute minimum it is also a relative minimum, but not vice versa

Considering the graph of a function with maximum point x₀; the derivative on the left is the angular coefficient of the ray tangent to the left; the derivative on the right is the angular coefficient of the ray tangent to the right; f_l'(x₀) ≥ 0, the derivative of the function to the left of the maximum point is ≥ 0; the derivative of the function to the right of the maximum point is ≤ 0; to the left of the maximum point x₀ lim_x→x₀((f(x)-f(x₀))/(x-x₀)) = f_l'(x₀) ≥ 0; to the right of the maximum point x₀ lim_x→x₀((f(x)-f(x₀))/(x-x₀)) = f_r'(x₀) ≤ 0

In a point of relative maximum, or relative minimum, the derivative exists and is zero, because the tangent is parallel to the x-axis

If f is differentiable in an interval and admits a relative maximum or minimum point, inside this interval, the derivative at that point is zero

The cancellation points of the first derivative are called critical points or stationarity points of the function

The points of relative maximum or minimum of a differentiable function, within its definition range, are to be found among the critical points that are the points where the derivative is canceled

The relative maximum or minimum points of a differentiable function must be within the definition interval; the function √(x) has an absolute minimum point and therefore also relative for x = 0, but it is not inside the definition interval which is [0,+∞); the derivative at the minimum point of the function √(x) is +∞, so it is not 0

The relative maximum and minimum points are to be found among the critical points within the function definition interval

37 - MAXIMUM AND MINIMUM POINTS

If a differentiable function admits a point of relative maximum or minimum inside the definition interval, at this point the derivative of the function is zero

If f is differentiable in an interval, and admits a relative maximum or minimum point within that interval, the derivative at that point is zero

The points where the first derivative is equal to 0 are called critical points or points of stationarity

The relative maximum or minimum points of a differentiable function, within its definition range, are critical points that are points where the derivative is equal to 0

A point of relative maximum or minimum, within the interval, has derivative 0, but the opposite is not always true, because a point that has derivative 0 may not be a point of relative maximum or minimum

f(x) = x³ is an odd function, the graph is symmetrical with respect to the origin, the graph passes through the points (-1,-1), (0,0), (1,1), the function is strictly increasing, the first derivative f'(x) = 3x² is null for x = 0, therefore the point (0,0) is a critical point but it is not a point of relative maximum or minimum, in this case the point (0,0) is an inflection point or a point where the curve passes from one side of its tangent to the other

f(x) = x³; it is an odd function, so the graph is symmetrical with respect to the origin and it passes through the points (-1,-1), (0,0), (1,1); the function is strictly increasing; the first derivative f'(x) = 3x² is zero for x = 0, therefore the point (0,0) is a critical point but it is not a point of relative maximum or minimum, and in this case the point (0,0) is an inflection point or a point where the curve passes from one side of its tangent to the other

Usually, considering the graph of a function, in a point with derivative 0 there is either a maximum point, or a minimum point, or an inflection point, but there are more complicated functions where a point with derivative equal to 0 may not be a maximum, minimum, or inflection point

The Weierstrass's theorem says that a continuous function on a compact interval that is bounded and closed, has maximum and minimum

Combining the Weierstrass's theorem with the fact that the derivative is zero at a point of relative maximum or minimum, we can understand the trend of a function

We have a square of paper with side L and we want to obtain a container in the shape of a parallelepiped by folding the sheet and cutting 4 squares at the corners of the sheet, and these 4 squares have side x, and we must find the value of x to obtain the parallelepiped with maximum volume; the parallelepiped has base L-2x and height x, therefore the volume as a function of x is the product of the area of the base by the height, V(x) = (L-2x)²x ≥ 0, 0 ≤ x ≤ L/2, therefore this function at the extremes is 0 and positive internally to the interval; Weierstrass's theorem assures us that there is at least one maximum point; the function is a third degree polynomial function, and at the maximum point the first derivative must be equal to 0; if we discover that the first derivative of the function expressing the volume is 0 in a single point inside the interval of extremes 0 and L/2, then that point is necessarily the maximum point; Weierstrass's theorem assures us that there is at least one maximum point, this maximum point is internal because at the extremes the function is 0, and in this point of internal maximum the derivative is zero; if we find that within the interval there is only one critical point, then the critical point is the maximum point; V(x) = (L-2x)²x, V'(x) = 2(L-2x)(-2)x+(L-2x)² = -4x(L-2x)+(L-2x)² = (L-2x)(-4x+L-2x) = (L-2x)(L-6x); we must find the critical points that are the points where the first derivative is 0; the first null point of the first derivative is L-2x = 0, L = 2x, x = L/2 which is an extreme of the interval, it is not an interior point; the second null point of the first derivative is L-6x = 0, L = 6x, x = L/6 which is a point inside the interval, therefore it is the critical point that interests us; x = L/6 is the critical point inside the interval, therefore x = L/6 is the maximum point; the function V(x) = x(L-2x)² reaches its maximum when x = L/6; to obtain the parallelepiped of maximum volume we have to cut at the corners of the large square 4 squares that have side 1/6 of the side of the large square; the maximum volume of the parallelepiped is V(x) = (L-2x)²x when x = L/6, V(x) = (L-2(L/6))²(L/6) = (L-(L/3))²(L/6) = ((3L-L)/3)²(L/6) = (2L/3)²(L/6) = (4L²/9)(L/6) = (2L²/9)(L/3) = 2L³/27 = (2/27)L³; in summary, the function is cubic, a null point is x = L/2 but it is not inside the interval, and in the interval [0,L/2] there is a single internal critical point x₀ = L/6

Two variables x and y are ≥ 0 and their sum s is ≥ 0, and we must find for which values of x and y the product xy is maximum; x ≥ 0, y ≥ 0, s ≥ 0, x+y = s ⇔ y = s-x; we have to find the maximum of xy = x(s-x) = f(x), 0 ≤ x ≤ s, f(x) is a second degree polynomial function, f(x) = xy = x(s-x) = -x²+sx, the graph is a parabola and since the coefficient is negative the parabola turns its concavity downwards and the function is equal to 0 for x = 0 and for x = s, and by intuition the point of maximum is x = s/2, which is the vertex of the parabola; the first derivative of the function f(x) = -x²+sx is f'(x) = -2x+s; setting the first derivative equal to 0 we find the maximum point, -2x+s = 0, 2x = s, x = s/2; the product xy has maximum value when x = s/2; we find the maximum value for x = s/2, f(x) = -x²+sx = -(s/2)²+s(s/2) = (-s²/4)+(s²/2) = (-s²+2s²)/4 = s²/4; xy ≤ s²/4 = (x+y)²/4, and applying the square root √(xy) ≤ (x+y)/2, therefore the geometric mean of two non-negative numbers is less than or equal to their arithmetic mean, and the two averages are equal when the geometric mean reaches the maximum value that is when the numbers x and y are equal to each other

A light beam starts from point A and reaches point B, moving through two mediums with different refractive index, in the first medium it has speed v₁ and in the second medium it has speed v₂ ≠ v₁; in the first medium the light beam travels the straight path from A to P with speed v₁, and in the second medium the light beam travels the straight path from P to B with speed v₂ ≠ v₁; when the light passes through two mediums with different refractive indexes it does not follow a straight path from A to B, but a broken path from A to P and from P to B; minimizing the space the path would be straight from A to B, but the light beam moves minimizing the travel time, for this reason the light beam follows a broken path from A to P and from P to B; in the Cartesian plane we consider the points A(0,a), B(c,b), P(x,0); we want to minimize time, and time is a function of x; space = velocity⋅time, s = v⋅t; time = space/velocity, t = s/v; s₁ is the distance between point A and point P and is obtained using the Pythagorean theorem, s₁² = x²+a², s₁ = √(x²+a²); t₁ = s₁/v₁ = √(x²+a²)/v₁; s₂ is the distance between point P and point B and is obtained using the Pythagorean theorem, s₂² = (c-x)²+b², s₂ = √((c-x)²+b²); t₂ = s₂/v₂ = √((c-x)²+b²)/v₂; t(x) = t₁+t₂ = (√(x²+a²)/v₁)+(√((c-x)²+b²)/v₂); we must find the minimum of the function t(x) with 0 ≤ x ≤ c, minimum that exists by Weierstrass's theorem, and we must calculate the first derivative of t(x) and set it equal to 0, and consider that there is a critic point; t'(x) = (1/v₁)(x/√(x²+a²))+(1/v₂)((x-c)/√((c-x)²+b²)); to find the minimum of the function t(x) we must set the first derivative of t(x) equal to 0, (1/v₁)(x/√(x²+a²))+(1/v₂)((x-c)/√((c-x)²+b²)) = 0, (1/v₁)(x/√(x²+a²)) = (1/v₂)((c-x)/√((c-x)²+b²)); the null condition of the first derivative has brought us to this equality, we call x₀ the position of the critical point, (1/v₁)(x₀/√(x₀²+a²)) = (1/v₂)((c-x₀)/√((c-x₀)²+b²)); we have to understand what this equality means geometrically, the critical point is P = P₀ and its x coordinate is x₀; considering the angle α that the segment AP = s₁ forms with the y-axis, and this angle α is equal to that which the segment AP = s₁ forms with the perpendicular to the x-axis passing through the point P = P₀; α = angle of incidence, sin(α) = (opposite cathetus)/(hypotenuse), sin(α) = x₀/√(x₀²+a²); considering the angle β that the segment PB = s₂ forms with the perpendicular to the x-axis passing through point B, and this angle β is equal to the angle formed by the segment PB = s₂ with the perpendicular to the x-axis passing through the point P = P₀; β = angle of refraction, sin(β) = (opposite cathetus)/(hypotenuse), sin(β) = (c-x₀)/√((c-x₀)²+b²); (1/v₁)sin(α) = (1/v₂)sin(β), sin(α)/sin(β) = v₁/v₂, and this is the Snell's law, when a light ray passes through 2 mediums with different refractive indices, the optimal path of the light ray follows this law that is the ratio between the sine of the angle of incidence and the sine of the angle of refraction is equal to the ratio of the speed in the first medium and the speed in the second medium; we have to prove the uniqueness of the critical point of t'(x), because the function t(x) has a minimum, therefore we consider the equation (1/v₁)(x/√(x²+a²)) = (1/v₂)((c-x)/√((c-x)²+b²)); the function x/√(x²+a²) is 0 for x = 0, is positive, it has always positive derivative therefore it is increasing; the function ((c-x)/√((c-x)²+b²)) is positive when x < c, and it is 0 for x = c, and the derivative is always negative so the function is decreasing; (1/v₁)(x/√(x²+a²)) = ((c-x)/√((c-x)²+b²)), the point where this equality is true is the intersection point of the graph of the function to the left of the equal sign and of the graph of the function to the right of the equal sign; the strictly positive derivative implies that the function increases, and the strictly negative derivative implies that the function decreases; an increasing function and a decreasing function can intersect at only one point, so there is a single point of intersection; f(x) = x/√(x²+a²), f(x) = g(x)/h(x), f'(x) = (g'(x)h(x)-g(x)h'(x))/h(x)², f'(x) = (√(x²+a²)-x(2x/2√(x²+a²)))/(x²+a²) = (√(x²+a²)-(x²/√(x²+a²)))/(x²+a²) = (x²+a²-x²)/(√(x²+a²))/(x²+a²) = a²/((√(x²+a²))(x²+a²)) = a²/((x²+a²)^1/2(x²+a²)) = a²/(x²+a²)^3/2; f(x) = (c-x)/(√((c-x)²+b²)), f(x) = g(x)/h(x), f'(x) = (g'(x)h(x)-g(x)h'(x))/h(x)², f'(x) = (-√((c-x)²+b²)-(c-x)((2x-2c)/2√((c-x)²+b²)))/((c-x)²+b²) = (-((c-x)²+b²)-(c-x)(x-c))/(√((c-x)²+b²)((c-x)²+b²)) = (-(c²-2cx+x²+b²)-(cx-x²-c²+cx))/(√((c-x)²+b²)((c-x)²+b²)) = (-c²+2cx-x²-b²+x²+c²-2cx)/(((c-x)²+b²)^1/2((c-x)²+b²)) = -b²/((c-x)²+b²)^3/2; we have proved the Snell's law

Descartes used the first letters of the alphabet for the known quantities and the last letters of the alphabet for the unknowns, also called variables

We have n pairs (x,y) and we represent them in the form of points in a Cartesian plane and a generic point is (x_k,y_k); these points are approximately aligned, therefore we can hypothesize a link between x and y of the type y = mx+q that is a linear relationship, also called affine, between x and y; the equation y = mx+q is represented by a straight line on the plane of the Cartesian axes; these points (x_k,y_k) are not exactly aligned, if they were exactly aligned we would take two of these points and find the line passing through the two points and all the others; we have a cloud of points (x_k,y_k) which are approximately aligned, and it means that choosing m and q we still get a straight line y = mx+q which does not pass exactly through all points; the equation y = mx+q is a mathematical model, which we now denote by ^{^}y_k = mx_k+q, where k is the number of the point we are considering, x_k is the abscissa of the point, and ^{^}y_k is the estimated ordinate of the point which is different from y_k which is the experimental ordinate of the point; y_k differs from ^{^}y_k of a positive or negative quantity, therefore we have differences between the experimental value y_k and the estimated value ^{^}y_k, and the difference is ^{^}y_k-y_k and it can be positive or negative; to treat the positive and negative differences in the same way, we must make the squares of these differences and then see if it is possible to determine m and q in order to minimize the sum of the squares of these differences, and the line obtained is the line of least squares; the ordinate of the experimental value is y_k = mx_k+q, the ordinate of the estimated value is ^{^}y_k = mx_k+q, so ^{^}y_k-y_k = mx_k+q-y_k, and by making the square we treat the positive and negative differences in the same way (^{^}y_k-y_k)² = (mx_k+q-y_k)², and we consider the function of 2 variables m and q, f(m,q) = ⁿΣ_k=1(mx_k+q-y_k)², where n is the number of points; therefore we have a function of 2 variables, then we fix m and think q as a variable, and developing the square we obtain a second degree trinomial in the variable q, with second degree term nq², where n is the number of points and it is a positive quantity; therefore we have to study this trinomial in q, the second degree term is nq², there is a first degree term in q, and a known term that is a term without the variable q, therefore if we put q on the abscissa it is a parabola, and the coefficient n of the second degree term is positive, therefore the parabola has the concavity facing upwards, and the minimum point of the parabola that is the vertex, is obtained making the first derivative with respect to q and equating it to 0, and the vertex is the only minimum point; f(m,q) = ⁿΣ_k=1(mx_k+q-y_k)², if we fix m and consider q as the only variable then we have a second degree polynomial in q, with a quadratic term with positive coefficient, therefore to minimize this equation with respect to q we must calculate the derivative of f with respect to q; fixing m means choosing the slope of the line, or its angular coefficient, and varying q means varying the y-intercept, or the point of intersection that the line has with the y-axis, and among all the lines that have an inclination m the one that minimizes the sum of the squares of the differences; f(m,q) = ⁿΣ_k=1(mx_k+q-y_k)², we calculate the first derivative of f with respect to q, D_qf = 2ⁿΣ_k=1(mx_k+q-y_k), we set the first derivative equal to 0 to find the minimum, 2⋅ⁿΣ_k=1(mx_k+q-y_k) = 0, ⁿΣ_k=1(mx_k+q-y_k) = 0, m⋅ⁿΣ_k=1(x_k)+nq-ⁿΣ_k=1(y_k) = 0, m⋅ⁿΣ_k=1(x_k)+nq = ⁿΣ_k=1(y_k), m(1/n)ⁿΣ_k=1(x_k)+(1/n)nq = (1/n)ⁿΣ_k=1(y_k), m(1/n)ⁿΣ_k=1(x_k)+q = (1/n)ⁿΣ_k=1(y_k), the sum of n numbers divided by n is the arithmetic mean, mx+q = y, considering m a fixed value then q = y-mx, the line we are looking for passes through the point (x,y), among all the straight lines that have a constant slope m, the one that minimizes the sum of the squares of the differences is the one that passes through the center of gravity of the cloud of points; the center of gravity is a point whose coordinates are the mean of the x-values and the mean of the y-values, therefore among all the straight lines that have an assigned slope, the optimal one is the one that minimizes the sum of the squares of the differences passing through the center of gravity; we must find the optimal line among all the lines that pass through the center of gravity, f(m,q) = ⁿΣ_k=1(mx_k+q-y_k)² = ⁿΣ_k=1(mx_k+y-mx-y_k)² = ⁿΣ_k=1(m(x_k-x)-(y_k-y))², f(m) = ⁿΣ_k=1(m(x_k-x)-(y_k-y))², we must calculate the derivative of f with respect to m and equal it to 0, we have to make the derivative of f with respect to m and equal it to 0, it is a second degree polynomial in m and the coefficient of m² is positive, so the graph of this function in the variable m is a parabola with concavity upward, m = (ⁿΣ_k=1(x_k-x)(y_k-y))/(ⁿΣ_k=1(x_k-x)²), m is the angular coefficient of the least squares line; we have found the linear model, also called affine model, which allows us to find the line that best approximates a distribution of points on the plane

38 - MEAN VALUE THEOREM

We have a function f defined in a closed interval [a,b] with real values, and suppose that the function is continuous in the interval [a,b], and that this function in the interior points is differentiable, and that f(a) = f(b), then there is at least one point inside the interval (a,b) which we indicate with the Greek letter Xi such that the first derivative is 0; if f: [a,b] → ℝ, f is continuous in [a,b], f'(x) ∀ x ∈ (a,b), f(a) = f(b), then ∃ ξ ∈ (a,b) such that f'(ξ) = 0; this is Rolle's theorem; Rolle's theorem was stated for polynomial functions, but this restriction is inessential; Weierstrass's theorem assures us that this function has a maximum and a minimum, so there is a point x₁ where the function reaches the minimum m, f(x₁) = m, and there is a point x₂ where the function reaches the maximum M, f(x₂) = M; if x₁ = a and x₂ = b or vice versa, because f(a) = f(b), then m = M, therefore the function is constant, and then at all points ξ of the interval (a,b) the first derivative is zero, in fact the derivative of a constant is 0; suppose that x₂ is an interior point, the function is differentiable, then x₂ is the absolute maximum point and therefore also the relative maximum point; the extreme points inside the interval that are the maximum and minimum points, are necessarily critical, so the first derivative in these points is 0; if we drop the hypothesis that f(a) = f(b), then the thesis falls, for example the function f(x) = x, or any other function that is strictly increasing or strictly decreasing like f(x) = mx+q with m ≠ 0, the function is continuous in its compact interval, the first derivative is continuous, but there are no points where the first derivative is zero

Rolle's theorem: if f is continuous in the interval [a,b] and derivable at least in the points inside this interval, and f(a) = f(b), then there is a point inside the interval [a,b] where the derivative of f is 0

By Rolle's theorem f(a) = f(b) or in the extremes of the interval the graph of the function has the same ordinate, and within this interval there is at least one point where the first derivative is zero or where the tangent line is arranged horizontally, so it is parallel to the x-axis and it is also parallel to the secant that passes through the points (a,f(a)) and (b,f(b))

The function f is defined and continuous in the interval [a,b] with real values, f: [a,b] → ℝ, it is differentiable at least in the interior points, f'(x) ∀ x ∈ (a,b), then there is at least one point ξ in which the first derivative in the point ξ is equal to the incremental ratio, ∃ ξ such that f'(ξ) = (f(b)-f(a))/(b-a); f: [a,b] → ℝ, f'(x) ∀ x ∈ (a,b) then ∃ ξ so that f'(ξ) = (f(b)-f(a))/(b-a); f(a) must not equal f(b), f'(ξ) is the slope of the tangent to the graph at the point (ξ,f(ξ)), (f(b)-f(a))/(b-a) is the angular coefficient of the secant passing through the points (a,f(a)) and (b,f(b)), therefore the tangent to the graph at the point (ξ,f(ξ)) is parallel to the secant passing for the extremes, and this is the Lagrange's theorem or mean value theorem

Lagrange's theorem or mean value theorem: if f is continuous in the interval [a,b] and derivable at least in the points inside this interval, then there is a point inside the interval [a,b] in which the derivative of f is ((f(b)-f(a))/(b-a)

To prove Lagrange's theorem, or mean value theorem, we must use Rolle's theorem; we use an auxiliary function F(x) which is the difference between f(x) and a function g(x) which we choose to be a simple function like a first degree polynomial, g(x) is therefore a linear or affine function, so it has a straight line as a graph; F(x) := f(x)-g(x), f(a) = g(a), f(b) = g(b), the function g(x) in the extremes of the interval coincides with the function f(x), therefore F(x) is 0 in a and in b that is we choose g(x) in such a way that F(x) is null at the extremes a and b; if the function g(x) is a first degree polynomial and at the extremes a and b coincides with the function f(x), then g(x) is the secant passing through the points (a,f(a)) and (b,f(b)); g(x) = f(a)+((f(b)-f(a))/(b-a))(x-a), (f(b)-f(a))/(b-a) is the slope of the secant and it is a constant that we could indicate with m, if x = a then f(x) = f(a), if x = b then g(x) = f(b); F(x) = f(x)-f(a)-((f(b)-f(a))/(b-a))(x-a), this function F(x) verifies the hypotheses of Rolle's theorem because in the extremes it is 0, therefore it also verifies the thesis of Rolle's theorem that there is a point ξ in which the first derivative is 0; F'(x) = f'(x)-(f(b)-f(a))/(b-a), Rolle's theorem assures us that there is a point ξ in which the first derivative is 0, so there is a point ξ in which f'(ξ) = (f(b)-f(a))/(b-a) that it is precisely the thesis of Lagrange's theorem that we have therefore proved

If a function is constant, its derivative is null everywhere; if a function has a null derivative in all points of an interval, then all points in the interval are critical points, so in all points of the interval the tangent line is horizontal, therefore the function is constant

If a function has a null derivative at all points of an interval, it is constant over that interval

Suppose that a function has null derivative at all points of an interval, f'(x) = 0, ∀ x ∈ I, we consider two generic points x₀ and x of the interval and apply Lagrange's theorem to the interval of endpoints x₀ and x, so there is a point ξ between x₀ and x such that f'(ξ) = (f(x)-f(x₀))/(x-x₀), therefore the derivative is null in all points and in particular it is null in the point ξ, so if the ratio (f(x)-f(x₀))/(x-x₀) is null then f(x) = f(x₀), and x is any point, therefore f(x) is a constant function

f'(x) ≥ 0, f'(ξ) = (f(x₂)-f(x₁))/(x₂-x₁) ≥ 0, if x₂ ≥ x₁ then f(x₂) ≥ f(x₁), and if x₂ ≤ x₁ then f(x₂) ≤ f(x₁), the function is increasing

f'(x) > 0, f'(ξ) = (f(x₂)-f(x₁))/(x₂-x₁) > 0, if x₂ > x₁ then f(x₂) > f(x₁), and if x₂ < x₁ then f(x₂) < f(x₁), the function is strictly increasing

f'(x) ≤ 0, f'(ξ) = (f(x₂)-f(x₁))/(x₂-x₁) ≤ 0, if x₂ ≥ x₁ then f(x₂) ≤ f(x₁), and if x₂ ≤ x₁ then f(x₂) ≥ f(x₁), the function is decreasing

f'(x) < 0, f'(ξ) = (f(x₂)-f(x₁))/(x₂-x₁) < 0, if x₂ > x₁ then f(x₂) < f(x₁), and if x₂ < x₁ then f(x₂) > f(x₁), the function is strictly decreasing

A function is increasing in an interval if it has derivative ≥ 0 at all points in the interval

A function is strictly increasing in an interval if it has derivative > 0 at all points in the interval

A function is decreasing in an interval if it has derivative ≤ 0 at all points in the interval

A function is strictly decreasing in an interval if it has derivative < 0 at all points in the interval

Cauchy's theorem: if f and g are continuous in the interval [a,b] and differentiable at least in (a,b) and g(b) ≠ g(a), then there is a point ξ in (a,b) where f'(ξ) = g'(ξ) = 0, or f'(ξ)/g'(ξ) = (f(b)-f(a))/(g(b)-g(a))

f,g: [a,b] → ℝ, g(a) ≠ g(b), ∃ f'(x) ∀ x ∈ (a,b), ∃ g'(x) ∀ x ∈ (a,b), or f'(ξ) = g'(ξ) = 0, or f'(ξ)/g'(ξ) = (f(b)-f(a))/(g(b)-g(a)); if g'(x) ≠ 0 ∀ x, then f'(ξ) = g'(ξ) = 0 is not true; if g'(x) ≠ 0 ∀ x, then automatically g(a) ≠ g(b), because by Rolle's theorem if g(a) = g(b) then there is a point where g'(x) = 0

If in Cauchy's theorem we introduce the hypothesis g(x) = x, then Cauchy's theorem coincides with Lagrange's theorem because g(b) = b and g(a) = a and the derivative of g(x) = x is 1 in every point, therefore f'(ξ) = (f(b)-f(a))/(b-a) which is Langrange's theorem; if in Lagrange's mean value theorem we introduce the hypothesis f(b) = f(a), then we find Rolle's theorem, because if f(b) = f(a) the numerator of the incremental ratio (f(b)-f(a))/(b-a) is 0; Cauchy's theorem is a generalization of Lagrange's theorem and Lagrange's theorem is a generalization of Rolle's theorem

If the derivative is > 0 at all points of an interval, then the function is strictly increasing, but the reverse is not true; if the derivative is < 0 at all points of an interval, then the function is strictly decreasing, but the reverse is not true; the function x³ is strictly increasing on all ℝ but at the point x = 0 the derivative is zero, so if a function is increasing or strictly increasing its derivative is ≥ 0 because it is the positive limit of an incremental ratio, and a positive function can tend to a positive or null limit; if a function is strictly decreasing then the derivative is ≤ 0, but we cannot exclude that at some point this derivative is 0

The proof of Cauchy's theorem is similar to Lagrange's theorem, we construct an auxiliary function F(x) = f(x)-f(a)-((f(b)-f(a))/(g(b)-g(a)))(g(x)-g(a)), F(x) is continuous and differentiable as f(x) and g(x), if x = a then F(x) = 0, if x = b then F(x) = 0; this auxiliary function verifies the hypotheses of Rolle's theorem and therefore also verifies the thesis of Rolle's theorem; F'(ξ) = f'(ξ)-((f(b)-f(a))/(g(b)-g(a)))(g'(ξ)) = 0, se g'(ξ) = 0 then also f'(ξ) = 0, f'(ξ) = ((f(b)-f(a))/(g(b)-g(a)))(g'(ξ)), f'(ξ)/g'(ξ) = (f(b)-f(a))/(g(b)-g(a)), we have proved Cauchy's theorem, precisely the second eventuality

f'(ξ)/g'(ξ) = (f(b)-f(a))/(g(b)-g(a)), f'(ξ)/(f(b)-f(a)) = g'(ξ)/(g(b)-g(a))

A geometric interpretation can be given to the Cauchy theorem; f'(ξ)/g'(ξ) = (f(b)-f(a))/(g(b)-g(a)), f'(ξ)/(f(b)-f(a)) = g'(ξ)/(g(b)-g(a)); we consider time the independent variable and we indicate it with the letter t and we have two functions x = f(t) and y = g(t) which are the parametric representation of a graph in the xy plane, when t goes from point a to point b, the curve starts from the coordinate point (f(a),g(a)) and arrives at the coordinate point (f(b),g(b)), and is a continuous curve because f(x) and g(x) are continuous functions, and in the interior points of the interval [a,b] there exist f'(x) and g'(x); the pair (f'(t),g'(t)) represents the tangent vector which is the velocity vector, so (f'(t),g'(t)) are the components of the velocity vector at instant t; (f(b)-f(a),g(b)-g(a)) are the components of the displacement vector that is the vector that joins the initial point with the final point of the displacement; f'(ξ)/(f(b)-f(a)) = g'(ξ)/(g(b)-g(a)), there is an instant ξ in which the components of the velocity vector (f'(t),g'(t)) are proportional to the components of the displacement vector (f(b)-f(a),g(b)-g(a)), so there exists an instant ξ in which the velocity vector is parallel to the displacement vector, parallel means same direction or opposite direction

Cauchy's theorem is also called the finite growth theorem, or finite increments theorem, where with finite growth or finite increment we indicate the variation f(b)-f(a) and g(b)-g(a); at the time of Cauchy the finite term was opposed to the infinitesimal term; Cauchy's theorem concerns the variations of function f and function g in the passage from point a to point b

Cauchy's mean value theorem, also known as the extended mean value theorem, is a generalization of the mean value theorem; it states that if the functions f and g are both continuous on the closed interval [a,b] and differentiable on the open interval (a,b), then there exists some c ∈ (a,b), such that (f(b)-f(a))g'(c) = (g(b)-g(a))f'(c); if g(a) ≠ g(b) and g'(c) ≠ 0, then f'(c)/g'(c) = (f(b)-f(a))/(g(b)-g(a)); geometrically this means that there is some tangent to the graph of the curve {[a,b] → ℝ², t → (f(t),g(t))} which is parallel to the line defined by the points (f(a),g(a)) and (f(b),g(b))

When we solve the limit of a quotient we must assume that the function in the denominator must be different from 0 and must tend towards a limit other than 0; when we solve the limit of a sum or a difference we must assume that the limits both exist and that they are finite; lim_x→∞(f(x)) = ∞, lim_x→∞(g(x)) = ∞, lim_x→∞(f(x)-g(x)) = ∞-∞ that is not 0 but it is an indeterminate form; lim_x→∞(x²-x) = ∞-∞ is an indeterminate form, lim_x→∞(x(x-1)) = +∞·+∞ = +∞, in fact ∞-∞ ≠ 0; lim_x→0(f(x)) = 0, lim_x→0(g(x)) = 0, lim_x→0(f(x)/g(x)) = 0/0 that is an indeterminate form; lim_x→0(sin(x)) = 0, lim_x→0(x) = 0, lim_x→0(sin(x)/x) = 0/0 that is an indeterminate form, lim_x→0(sin(x)/x) = 1; the derivative is the limit of an incremental ratio, lim_x→x₀((f(x)-f(x₀))/(x-x₀)) = 0/0 when f is continuous, and 0/0 is an indeterminate form; a limit can be finite, infinite, null, non-null, it may not exist when the function is irregular or oscillating; limits of the type 0/0, ∞-∞, 0·∞, are indeterminate forms, forms of indecision, it does not mean that the limit does not exist, but that the information we have is insufficient to determine if the limit exists or does not exist and when it exists what is its value; knowing that the numerator tends to 0 and the denominator tends to 0 is not sufficient to decide the behavior at the limit of this ratio, it is therefore a form of indecision or an indeterminate form

Guillaume de l'Hôpital was a French mathematician, a friend of the Swiss mathematician Johann Bernoulli; de l'Hôpital's theorem is used to compute limits of the type f(x)/g(x) when f(x) and g(x) are both infinitesimal functions that are functions converging to 0

39 - L'HOSPITAL'S RULE

The rule of De l'Hôpital, or L'Hospital's theorem, allows to compute limits of quotients of real functions of a real variable that result in indeterminate forms 0/0 and ∞/∞, stating that the limit of the quotient of two functions is equal to the limit of the quotient of their derivatives

f(x) and g(x) are 2 continuous functions defined in an interval I of the real line, and in a point x₀, inside the interval, the functions f(x) and g(x) both are 0, lim_x→x₀(f(x)) = 0 and lim_x→x₀(g(x)) = 0, therefore f(x) and g(x) are infinitesimal functions or functions that tend to 0; f,g: I → ℝ, x₀, f(x₀) = g(x₀) = 0, lim_x→x₀(f(x)) = 0, lim_x→x₀(g(x)) = 0; we want to calculate lim_x→x₀(f(x)/g(x)), and if x ≠ x₀ then g(x) ≠ 0, and suppose that f(x) and g(x) are differentiable in the interval I, and g'(x) ≠ 0, then L'Hospital's theorem states that lim_x→x₀(f'(x)/g'(x)) = L ⇒ lim_x→x₀(f(x)/g(x)) = L; it is important to note that the ratio of the derivatives is different from the derivative of the ratio that is f'(x)/g'(x) ≠ (f(x)/g(x))'; if the limit of the ratio of the derivatives exists, finite or infinite, then there is the limit of the ratio of the functions we are looking for, but not vice versa, so there are situations in which lim_x→x₀(f(x)/g(x)) = L ∃, but lim_x→x₀(f'(x)/g'(x)) = L ∄, so in this case the L'Hospital rule cannot be used

L'Hôpital's rule: if f(x) and g(x) are continuous in the interval [a,b], null in x₀ and differentiable for x ≠ x₀ with g'(x) ≠ 0, then if exists the limit lim_x→x₀(f'(x)/g'(x)), also exists and has the same value the limit lim_x→x₀(f(x)/g(x))

To prove de l'Hospital's theorem we use Cauchy's theorem, applying it to the interval of extremes x₀ and x; we assumed that g'(x) ≠ 0, therefore of the two alternatives of the thesis of Cauchy's theorem, the first cannot be verified that there is a point ξ in which f'(x) and g'(x) are 0; the second alternative of the thesis of Cauchy's theorem is verified, considering the interval [x₀,x], all the hypotheses of Cauchy's theorem are verified, there is a point ξ = ξ(x) between x₀ and x where f'(ξ)/g'(ξ) = (f(x)-f(x₀))/(g(x)-g(x₀)), but f(x₀) = 0 and g(x₀) = 0, then f'(ξ)/g'(ξ) = (f(x)-f(x₀))/(g(x)-g(x₀)) = f(x)/g(x); suppose that this limit exists and is L, and if we have a tolerance ε, we can find a δ dependent on ε, so that L-ε < f'(x)/g'(x) < L+ε for x ≠ 0 and for x-x₀ < δ_ε, so ξ-x₀ < δ_ε and L-ε < f'(ξ)/g'(ξ) < L+ε, therefore if x ≠ 0 and x-x₀ < δ_ε then L-ε < f(x)/g(x) < L+ε, and this is the definition of limit, so f(x)/g(x) = f'(x)/g'(x) = L; the proof is also true for lim_x→x₀(f'(x)/g'(x)) = +∞ considering f'(x)/g'(x) > M; the proof is also true for lim_x→x₀(f'(x)/g'(x)) = -∞ considering f'(x)/g'(x) < -M; Cauchy's theorem is also true when the functions f(x) and g(x) tend to ∞; the cases 0/0 and ∞/∞ are not different from each other because f(x)/g(x) = (1/g(x))/(1/f(x)), so if f(x) and g(x) tend to 0, then 1/f(x) and 1/g(x) tend to +∞ or -∞ depending on the sign of f(x) and g(x), and vice versa, if f(x) and g(x) tend to ∞ then 1/f(x) and 1/g(x) tend to 0

The implication contained in de l'Hospital's theorem is not reversed, lim_x→x₀(f(x)/g(x)) can exist, but lim_x→x₀(f'(x)/g'(x)) cannot exist

f(x) = x²⋅sin(1/x), x ≠ 0, g(x) = x, f(x)/g(x) = x²⋅sin(1/x)/x = x⋅sin(1/x), |sin(1/x)| ≤ 1, lim_x→0(x) = 0, |x⋅sin(1/x)| ≤ |x|, lim_x→0(x⋅sin(1/x)) = 0; f'(x)/g'(x) = 2x⋅sin(1/x)+x²⋅cos(1/x)⋅(-1x^-2) = 2x⋅sin(1/x)-x²⋅cos(1/x)⋅(1/x²) = 2x⋅sin(1/x)-cos(1/x); lim_x→0(2x⋅sin(1/x)-cos(1/x)) = lim_x→0(2x⋅sin(1/x))-lim_x→0(cos(1/x)), lim_x→0(2x⋅sin(1/x)) = 0, lim_x→0(cos(1/x)) = ∄, lim_x→0(2x⋅sin(1/x)-cos(1/x)) = lim_x→0(2x⋅sin(1/x))-lim_x→0(cos(1/x)) = ∄; this example shows that in some situations L'Hospital's theorem cannot be applied

lim_x→0((1-cos(x))/x²) =^H lim_x→0(sin(x)/2x) = 1/2, because lim_x→0(sin(x)/x) = 1; the letter H indicates that we apply L'Hospital's theorem, it is a conditional equality, after we show that the second member exists, finite or infinite, we can say that the first member also exists and that they are equal; 1-cos(x) tends to zero with the same speed as x² because this ratio produces a finite limit other than 0

lim_x→0((1-cos(x))/x) =^H lim_x→0(sin(x)/1) = lim_x→0(sin(x)) = 0 that is 1-cos(x) tends to 0 faster than x, in fact 1-cos(x) tends to 0 with the same speed as x²

lim_x→0((x-sin(x))/x³) =^H lim_x→0((1-cos(x))/3x²) =^H lim_x→0(sin(x)/6x) = 1/6, because lim_x→0(sin(x)/x) = 1

When the limit of the ratio of 2 functions tends to 1, then the 2 functions are asymptotically equal, they behave asymptotically in the same way; lim_x→x₀(f(x)/g(x)) = 1, f(x) ~ g(x); the symbol ~ is called tilde, and means asymptotically equal; sin(x) ~ x, x→0; 1-cos(x) ~ x²/2, x→0; x-sin(x) ~ x³/6, x→0

lim_x→∞(√(1+x²)/x) =^H lim_x→∞(2x/2√(1+x²)) = lim_x→∞(x/√(1+x²)), in this case the L'Hospital's rule is ineffective; lim_x→∞(√(1+x²)/x) = lim_x→∞(√(1+x²)/√(x²)) = lim_x→∞(√((1/x²)+1)) = 1; √(1+x²) ~ x, x→+∞, so y = x is the asymptote to the right of the function y = √(1+x²) as x tends to +∞

lim_x→0⁺(x⋅ln(x)) = 0⋅-∞, lim_x→0⁺(x) = 0, lim_x→0⁺(ln(x)) = -∞, the function x⋅ln(x) is negative for x < 1, lim_x→0⁺(x⋅ln(x)) = lim_x→0⁺(ln(x)/(1/x)) = -∞/+∞, lim_x→0⁺(x⋅ln(x)) = lim_x→0⁺(x/(1/ln(x))) = 0/0; lim_x→0⁺(x⋅ln(x)) = lim_x→0⁺(ln(x)/(1/x)) =^H lim_x→0⁺((1/x)/(-1/x²)) = lim_x→0⁺((1/x)(-x²)) = lim_x→0⁺(-x) = 0; x overrides ln(x) because x tends to 0 faster than ln(x) tends to -∞, so the limit is 0; lim_x→0⁺(x⋅ln(x)) = lim_x→0⁺(x/(1/ln(x))) =^H lim_x→0⁺(1/(-1/x⋅ln²(x)) = lim_x→0⁺(-x⋅ln²(x)), it is not the easiest way to proceed

lim_x→+∞(ln(x)/x) = +∞/+∞; the tangent to the curve of the function ln(x) at the point (1,0) has angular coefficient 1, and therefore is parallel to the bisector of the first and third quadrant; the function ln(x) grows slower than the linear function x, therefore we expect the limit of their ratio to be equal to 0; lim_x→+∞(ln(x)/x) =^H lim_x→+∞(1/x) = 0

lim_x→+∞(e^x/x) = +∞/+∞; the exponential function is the inverse function of the logarithm function; e^x is the inverse function of ln(x) and their graphs are symmetrical with respect to the bisector of the first and third quadrant; the tangent to the curve of the function e^x in the point (0,1) has angular coefficient 1, and therefore is parallel to the bisector of the first and third quadrant; the exponential function e^x grows faster than the linear function x, so we expect the limit of their ratio to be equal to +∞; lim_x→+∞(e^x/x) =^H lim_x→+∞(e^x) = +∞

lim_x→+∞(e^x/x²) = +∞/+∞; we study the trend of the exponential function e^x with respect to the parabola function x²; lim_x→+∞(e^x/x²) =^H lim_x→+∞(e^x/2x) =^H lim_x→+∞(e^x/2) = +∞, therefore the exponential function e^x grows faster than the parabola function x²

We guess that e^x goes to infinity faster than x, faster than x², faster than xⁿ, whatever the positive exponent n; n ≥ 1, lim_x→+∞(e^x/xⁿ) =^H lim_x→+∞(e^x/nx^n-1) =^H lim_x→+∞(e^x/n(n-1)x^n-2) =^H lim_x→+∞(e^x/n(n-1)(n-2)x^n-3) =^H ..., applying the rule of L'Hospital the numerator remains unchanged and the denominator takes as value the successive derivatives of xⁿ, and after having derived n times, in the numerator we have e^x, and in the denominator we have the product of the numbers from n to 1, which is n! that is a constant, so this ratio tends to +∞, and in conclusion e^x grows faster than xⁿ whatever the positive exponent n is; e^x is a higher order infinity than xⁿ

We must understand the link between the sign of the second derivative at a critical point and the behavior of the function; if a function is defined in an interval I, and x₀ is a relative maximum or minimum point within the interval I, and if the function is differentiable, then x₀ is a critical point, therefore the first derivative in this point is zero; f: I → ℝ, x₀, f'(x₀) = 0; suppose that this function admits the second derivative, so the first derivative exists in the interval I and is further differentiable, since the second derivative is the derivative of the first derivative; f''(x₀) = lim_x→x₀((f'(x)-f'(x₀)/(x-x₀)), x₀ is a critical point therefore f'(x₀) = 0; we assume that x₀ is a critical point and that the second derivative exists, lim_x→x₀((f(x)-f(x₀))/(x-x₀)²) =^H lim_x→x₀((f'(x)-f'(x₀)/(2(x-x₀))) = (1/2)f''(x₀); suppose we know that the second derivative at the point x₀ exists and is positive, f'(x₀) = 0, f''(x₀) > 0, and we know that if a function tends towards a positive limit, the function is also strictly positive for all x next to x₀; lim_x→x₀((f(x)-f(x₀))/(x-x₀)²), this ratio is positive, the denominator is a square and it is certainly positive, therefore the numerator is positive for x close to x₀, so f(x) > f(x₀), therefore x₀ is a relative minimum point; lim_x→x₀((f(x)-f(x₀))/(x-x₀)²), if this ratio is negative, the denominator is a square and it is certainly positive, therefore the numerator is negative for x close to x₀, so f(x) < f(x₀), therefore x₀ is a relative maximum point

If f is twice differentiable in an interval I, and in a critical point inside I the second derivative is positive, then this point is a proper relative minimum point, and if the second derivative is negative, then this point is a proper maximum point; proper means that f(x) is strictly greater than f(x₀) for x quite close to x₀ and obviously distinct from it

At a critical point the first derivative is zero, and if the second derivative is negative then it is a relative maximum point, if the second derivative is positive then it is a relative minimum point

f(x) = sin(x), we must verify that x = π/2 is a critical point, an absolute maximum point and therefore also a relative maximum point, in fact sin(π/2) = 1; f'(x) = cos(x), if x = π/2 then cos(π/2) = 0, so x = π/2 is a critical point; f''(x) = -sin(x), if x = π/2 then -sin(π/2) = -1, so x = π/2 is a maximum point; the function sin(x) at the point x = π/2 has the first derivative equal to zero and the second derivative is negative, therefore x = π/2 is a maximum point of the function sin(x)

f(x) = sin(x), we must verify that x = -π/2 is a critical point, an absolute minimum point and therefore also a relative minimum point, in fact sin(-π/2) = -1; f'(x) = cos(x), if x = -π/2 then cos(-π/2) = 0, so x = -π/2 is a critical point; f''(x) = -sin(x), if x = -π/2 then -sin(-π/2) = 1, so x = -π/2 is a minimum point; the function sin(x) at the point x = -π/2 has the first derivative equal to zero and the second derivative is positive, therefore x = -π/2 is a minimum point of the function sin(x)

In a critical point that is a point with first derivative equal to 0, lim_x→x₀((f(x)-f(x₀))/(x-x₀)²) =^H lim_x→x₀((f'(x)-f'(x₀)/(2(x-x₀))) = (1/2)f''(x₀); if x₀ is a minimum point then f(x) > f(x₀), f(x)-f(x₀ > 0, (x-x₀)² > 0, so f''(x₀) ≥ 0; if x₀ is a maximum point then f(x) < f(x₀), f(x)-f(x₀ < 0, (x-x₀)² > 0, so f''(x₀) ≤ 0; if we know the sign of the second derivative, we know if the critical point is a relative maximum or a relative minimum, while if we know the behavior of the function, we do not have precise information on the sign of the second derivative, but we can exclude that it is positive or negative

f(x) = x⁴, the curve of this function looks like a parabola but it is not, it has a concavity pointing upwards but the tip is flattened; the function is strictly positive for all x other than 0, and is 0 for x equal to 0, f(x) > 0 for x ≠ 0, f(x) = 0 for x = 0; x = 0 is the absolute minimum point and therefore also the relative minimum point; f'(x) = 4x³; f''(x) = 12x²; if x = 0 then f'(x) = 4x³ = 0, at the point x = 0 the first derivative is zero; if x = 0 then f''(x) = 12x² = 0, at the point x = 0 the second derivative is zero; x = 0 is a point of absolute minimum and therefore also a point of relative minimum, the first derivative is zero and the second derivative is greater than or equal to 0, therefore it is not less than 0, but we cannot exclude that it is equal to 0

If f is two times differentiable in an interval I, in a point of relative minimum inside I the first derivative is 0 and the second derivative is greater than or equal to 0, in a point of relative maximum inside I the first derivative is 0 and the second derivative is less than or equal to 0

If the derivative is greater than or equal to 0, then we can exclude that it is less than 0; if the derivative is less than or equal to 0, then we can exclude that it is greater than 0; it may happen that the second derivative is zero and then we could examine the sign of the successive derivatives

40 - CONCAVITY AND CONVEXITY

The second derivative is the derivative of the first derivative; it is important to study the sign of the second derivative to understand the trend of the function; the analysis of the sign of the second derivative in a critical point of the function, or in a point where the first derivative is 0, allows us to identify whether this point is a relative maximum or a relative minimum

A second degree polynomial function has a parabola as its graph; f(x) = ax²+bx+c; f'(x) = 2ax+b; f''(x) = 2a; the second derivative is constant and has as its sign the sign of the coefficient a that is the coefficient of the second degree term of the polynomial; if the coefficient a is positive, the parabola has upward concavity, if the coefficient a is negative, the parabola has downward concavity; a = 1, b = 0, c = 0, f(x) = x², the parabola has its vertex at the point (0,0), is symmetrical with respect to the y-axis, has concavity upwards; a = -1, b = 0, c = 0, f(x) = -x², the parabola has its vertex at the point (0,0), is symmetrical with respect to the y-axis, has concavity downwards

Let us consider the simplest third degree polynomial function, f(x) = x³; f(x) = x³ is an odd function that is f(-x) = -f(x), the graph of the function is symmetrical with respect to the origin of the Cartesian axes; the graph of an odd function is symmetrical with respect to the point (0,0); each point (x,y) of an odd function has its symmetrical with respect to the origin that is the point (-x,-y); f(x) = x³; f'(x) = 3x²; f''(x) = 6x; the second derivative, f''(x) = 6x, has the same sign as x, the second derivative is negative if x is negative, the second derivative is zero if x is zero, the second derivative is positive if x is positive; f''(x) < 0 if x < 0, f''(x) = 0 if x = 0, f''(x) > 0 if x > 0; the graph of the function f(x) = x³ is positive for x > 0, is 0 for x = 0, is negative for x < 0, and passes through the points (-1,-1), (0,0), (1,1); f'(x) = 3x², the first derivative is always positive but it is 0 for x = 0, and the tangent to the point (0,0) coincides with the x-axis; half of the graph for x > 0 has concavity upwards, while half of the graph for x < 0 has concavity downwards; at point (0,0) the graph changes curvature, the graph of the function passes from one side of its tangent to the other, the graph crosses the tangent

A set of the plane is said to be convex if for each pair of points belonging to it all the segment that joins them belongs to the same set

A half-plane is a convex set, because a segment that joins any two points of the half-plane is entirely contained in the half-plane; a triangle is a convex set, because a segment joining any two points of the triangle is entirely contained in the triangle; a regular polygon is a convex set, because a segment that joins any two points of the regular polygon is entirely contained in the regular polygon; a quadrilateral can be convex or concave; a quadrilateral is convex when a segment joining any two points of the quadrilateral is entirely contained in the quadrilateral; a quadrilateral is concave when a segment joining any two points of the quadrilateral is not entirely contained in the quadrilateral; a circular crown that is the area of the plane contained between 2 concentric circumferences, is concave, because a segment that joins any two points is not entirely contained in the circular crown; a half moon is concave, because a segment connecting any two points of the half moon is not entirely contained in the half moon

The graph of a function has upward concavity if the set of points above the graph is convex

Let us consider the parabola corresponding to the second degree polynomial f(x) = ax²+bx+c, with a > 0; the set of points above this parabola is {(x,y), x ∈ ℝ, y ≥ f(x)}

The set of points above the graph of a function is indicated with f: I → ℝ, G⁺(f) := {(x,y), x ∈ I, y ≥ f

The set of points under the graph of a function is indicated with f: I → ℝ, G^-(f) := {(x,y), x ∈ I, y ≤ f(x)}

When a function, in a certain interval, has concavity upwards, then the set of points above the graph is a convex set

The graph of a function has downward concavity if the set of points below the graph is convex

The graph of the function f(x) = x³ has upward concavity for x ≥ 0 and has downward concavity for x ≤ 0; the point (0,0) is an inflection point because it separates the interval of x ≤ 0 in which the function has concavity downwards, from the interval of x ≥ 0 in which the function has concavity upwards

An inflection point, or flex, is a point on a smooth plane curve at which the curvature changes sign; it is a point where the function changes from being concave to convex, or vice versa

f(x) = sin(x), in the interval [0,π] the graph of the function sin(x) has downward concavity, so the set of points under the graph is convex, and in the interval [π,2π] the graph of the function sin(x) has upward concavity, so the set of points above the graph is convex; all points kπ with k ∈ ℤ are inflection points or points where the function inverts its concavity

A function f is convex in an interval I if it has upward concavity; a function f is concave in an interval I if it has downward concavity; this convention gives importance to the points above the graph that is if the set of points above the graph is convex then the function is defined as convex, and if the set of points above the graph is concave then the function is defined as concave; with this terminology the function sin(x) is concave in the interval [0,π] and convex in the interval [π,2π]

A parabola, ax²+bx+c, is convex when a > 0, and is concave when a < 0; f(x) = ax²+bx+c, f'(x) = 2ax+b, f''(x) = 2a; the second derivative has the same sign as the coefficient a that is the coefficient of the second degree term

If the second derivative is positive then the function is convex or has upward concavity; if the second derivative is negative then the function is concave or has downward concavity

f(x) = sin(x), f'(x) = cos(x), f''(x) = -sin(x); f''(x) = -sin(x) = -f(x), the second derivative of the function sin(x) is the opposite of the function sin(x); in the interval [0,π], f(x) = sin(x) > 0, f''(x) = -sin(x) < 0, the second derivative is negative and the curve of the function sin(x) is concave, it has downward concavity; in the interval [π,2π], f(x) = sin(x) < 0, f''(x) = -sin(x) > 0, the second derivative is positive and the curve of the function sin(x) is convex, it has upward concavity

The first derivative is the angular coefficient of the tangent at a point on the curve of the function; the second derivative is the derivative of the first derivative that is the angular coefficient of the tangent at a point on the curve of the first derivative; if the second derivative is negative the first derivative is decreasing, if the second derivative is positive the first derivative is increasing; x = 0, f'(x) = cos(x) = cos(0) = 1; x = π/2, f'(x) = cos(x) = cos(π/2) = 0; x = π, f'(x) = cos(x) = cos(π) = -1; x = 3π/2, f'(x) = cos(x) = cos(3π/2) = 0; x = 2π, f'(x) = cos(x) = cos(2π) = 1; the first derivative is 1 in x = 0 and decreases to -1 in x = π, so the angular coefficient of the tangent to the function sin(x) decreases from x = 0 to x = π, in fact the second derivative is negative in the interval [0,π]; the first derivative is -1 in x = π and increases to 1 in x = 2π, so the angular coefficient of the tangent to the function sin(x) increases from x = π to x = 2π, in fact the second derivative is positive in the interval [π,2π]

The second derivative is the derivative of the first derivative; the second derivative is negative in an interval when the first derivative is decreasing that is the angular coefficient of the tangent line to the function is decreasing; the second derivative is positive in an interval when the first derivative is increasing that is the angular coefficient of the tangent line to the function is increasing

If the second derivative is positive or null, the function is convex; if the second derivative is negative or null, the function is concave

If a function f is twice differentiable in an interval I and its second derivative is at any point greater than or equal to 0, then f is convex, and if the second derivative is less than or equal to 0, then f is concave

f: I → ℝ, f''(x) ≥ 0, we must show that the set of points above the graph is convex; we must show that the segment connecting any 2 points of the graph of the function is entirely contained in the set of points above the graph; we take 2 points of the graph, the point (x₁,y₁) and the point (x₂,y₂) with x₂ > x₁; a segment joins the point (x₁,y₁) with the point (x₂,y₂), and any point x of the segment with coordinates (x,r(x)), comprised between x₁ and x₂, has an ordinate greater than or equal to the ordinate of the point on the curve with coordinates (x,f(x)); the extreme points of the segment are points of the graph with coordinates (x₁,y₁) and (x₂,y₂); we must show that the segment connecting the point (x₁,y₁) with the point (x₂,y₂) is entirely contained in the set of points above the graph; to show that the set of points above the graph is convex we must show that the segment that joins any two points on the graph is entirely contained in the set of points above the graph; we have to find the equation of the line containing the segment, and show that r(x) ≥ f(x) for every x between x₁ and x₂, where r(x) is the ordinate of the point of the segment with coordinates (x,r(x)) and f(x) is the ordinate of the point of the curve with coordinates (x,f(x)); to prove that the set of points above the graph is convex we must show that r(x)-f(x) ≥ 0, assuming that f''(x) ≥ 0; the equation of the bundle of straight lines passing through the point (0,0) is y = mx, the equation of the bundle of straight lines passing through the point (x₁,y₁) is y-y₁ = m(x-x₁), and the angular coefficient m of the straight line is given by the ratio (y₂-y₁)/(x₂-x₁), so y-y₁ = ((y₂-y₁)/(x₂-x₁))(x-x₁), y = y₁+((y₂-y₁)/(x₂-x₁))(x-x₁) = (y₁(x₂-x₁)+(y₂-y₁)(x-x₁))/(x₂-x₁) = (y₁x₂-y₁x₁+y₂x-y₁x-y₂x₁+y₁x₁)/(x₂-x₁) = (y₁(x₂-x)+y₂(x-x₁))/(x₂-x₁), r(x)-f(x) = ((y₁(x₂-x)+y₂(x-x₁))/(x₂-x₁))-f(x) = (y₁(x₂-x)+y₂(x-x₁)-f(x)(x₂-x₁))/(x₂-x₁), y₁ = f(x₁), y₂ = f(x₂), r(x)-f(x) = (f(x₁)(x₂-x)+f(x₂)(x-x₁)-f(x)((x₂-x)+(x-x₁)))/(x₂-x₁) = ((f(x₁)-f(x))(x₂-x)+(f(x₂)-f(x))(x-x₁))/(x₂-x₁), aα+bβ-c(α+β) = aα+bβ-cα-cβ = α(a-c)+β(b-c) = (a-c)α+(b-c)β, according to Lagrange's mean value theorem (f(b)-f(a))/(b-a) = f'(ξ) where ξ is an intermediate point between point a and point b, therefore f(b)-f(a) = (b-a)f'(ξ), r(x)-f(x) = (f'(ξ₁)(x₁-x)(x₂-x)+f'(ξ₂)(x₂-x)(x-x₁))/(x₂-x₁) = (-f'(ξ₁)(x-x₁)(x₂-x)+f'(ξ₂)(x₂-x)(x-x₁))/(x₂-x₁) = ((f'(ξ₂)-f'(ξ₁))(x-x₁)(x₂-x))/(x₂-x₁) = (f''(ξ)(ξ₂-ξ₁)(x-x₁)(x₂-x))/(x₂-x₁), ((ξ₂-ξ₁)(x-x₁)(x₂-x))/(x₂-x₁) > 0, if f''(ξ) ≥ 0 then r(x) ≥ f(x), therefore the function f(x) is convex; the function is convex because the segment r(x), which has as extreme points 2 points of f(x), is above the graph of f(x); if r(x) > f(x) the function f(x) is strictly convex and we can exclude that there are straight lines of the graph

If the second derivative is ≥ 0 the function is convex; if the second derivative is > 0 the function is strictly convex; if the second derivative is ≤ 0 the function is concave; if the second derivative is < 0 the function is strictly concave

The function sin(x) is strictly concave in the range [0,π] and strictly convex in the range [π,2π]

The function x³ is strictly concave for x < 0 and strictly convex for x > 0

A linear or affine function has a straight line as its graph and can be considered either convex or concave

The graph of a function, such as the graph of a line, can be convex and concave at the same time, but it cannot be strictly convex and strictly concave at the same time

If f(x) is concave then -f(x) is convex and vice versa

The sign of the first derivative indicates whether the function increases or decreases in an interval, and the sign of the second derivative indicates whether the function is concave or convex in the interval

An inflection point is a point that separates an interval of concavity from an interval of convexity, so if a function is differentiable twice, an inflection point is a point where the second derivative is zero

In an inflection point the second derivative is necessarily equal to 0, but that the second derivative is equal to 0 is not sufficient to establish whether the point is an inflection point

We must not confuse a sufficient condition for a necessary condition; if a function is twice differentiable, in an inflection point the second derivative is necessarily zero, but there are situations in which the second derivative is zero in a point but it is not an inflection point

f(x) = x⁴ is an even function and it is strictly positive, and it is null only for x = 0; f'(x) = 4x³; f''(x) = 12x²; the point (0,0) has a first derivative equal to 0 and is a point of absolute and relative minimum, and has a second derivative equal to 0 but it is not an inflection point; an inflection point separates an interval where the second derivative is ≥ 0 from an interval where the second derivative is ≤ 0, but in this case the point (0,0) separates 2 intervals in which the second derivative is > 0 to the left and to the right; the function f(x) = x⁴ is strictly convex; the condition f''(x) > 0 is sufficient for the function to be strictly convex, but as this example shows, a function can be strictly convex and have a point where the second derivative is equal to 0

The cancellation points of the first derivative are called critical points; not all critical points are points of relative maximum and relative minimum, but the points of relative maximum and relative minimum must be sought among the critical points; being a critical point is a necessary condition for an interior point to be a point of relative minimum or relative maximum, but it is not a sufficient condition

The cancellation points of the second derivative are possible inflection points; the inflection points are to be found among the cancellation points of the second derivative, but then it is necessary to verify if the considered point is really an inflection point

In circular functions the points kπ are points of inflection

The exponential and logarithm functions have no inflection points

f(x) = e^x > 0; f'(x) = e^x > 0, the function is strictly increasing; f''(x) = e^x > 0, the function is strictly convex

f(x) = ln(x) > 0; f'(x) = 1/x > 0, the domain of the logarithm is the set of points > 0, the function is strictly increasing; f''(x) = -1/x² < 0, the function is strictly concave

41 - GRAPHS OF FUNCTIONS - PART 1

The sign of the first derivative gives us information on the monotony of a function that is if the first derivative is ≥ 0 the function is increasing, if the first derivative is ≤ 0 the function is decreasing, if in a point the first derivative is 0 it can be a point of minimum or maximum

The sign of the second derivative gives us information on the concavity or convexity of the function that is if the second derivative is ≥ 0 the function is convex, if the second derivative is ≤ 0 the function is concave, if in a point the second derivative is 0 it can be an inflection point

The cubic function f(x) = x³ has an inflection point in the origin of the Cartesian axes; we need to understand how many inflection points a generic third degree polynomial function can have; f(x) = ax³+bx²+cx+d, a ≠ 0; f'(x) = 3ax²+2bx+c; f''(x) = 6ax+2b; at an inflection point the second derivative is 0, f''(x) = 6ax+2b = 0, 3ax+b = 0, 3ax = -b, x = -b/3a is the x coordinate of the inflection point; at the inflection point which has coordinate x = -b/3a, depending on the sign of the coefficient a, the second derivative passes from negative to positive values or vice versa; each cubic polynomial function has only one inflection point which has coordinate x = -b/3a, and is the symmetry point of the graph of the function

f(x) = x³-x, it is an odd function that is f(-x) = -f(x), therefore the graph is symmetrical with respect to the origin; f'(x) = 3x²-1; f''(x) = 6x; f''(x) < 0 when x < 0, f''(x) = 0 when x = 0, f''(x) > 0 when x > 0, therefore the point (0,0) it is an inflection point, for x < 0 the function is strictly concave and for x > 0 the function is strictly convex; f(x) = x³-x = x(x²-1) = x(x+1)(x-1), therefore the graph of the function intersects the X axis at the points x = -1, x = 0, x = 1, so the equation x³-x = 0 has 3 distinct real zeros; f(x) < 0 when x < -1 and 0 < x < 1, f(x) > 0 when -1 < x < 0 and x > 1; f'(x) = 3x²-1, the first derivative in the point x = 0 is -1, the tangent to the graph of the function at the origin has angular coefficient -1 and is the bisector of the second and fourth quadrant; to locate the points of relative maximum and minimum we must examine the critical points that are the points where the first derivative is 0, 3x²-1 = 0, 3x² = 1, x² = 1/3, x = ±√(1/3) = , ±1/√(3) = ±√(3)/3, therefore the relative minimum point is x = √(3)/3 and the relative maximum point is x = -√(3)/3; the function f(x) = x³-x is odd and therefore the first derivative is even; the function f(x) = x³-x has as domain all ℝ and as image all ℝ, continuous functions transform intervals into intervals therefore the domain is all ℝ and the image is all ℝ; the function f(x) = x³-x tends to -∞ as x tends to -∞, and tends to +∞ as x tends to +∞, therefore it is unbounded below and unbounded above, therefore it has no absolute minimum and no absolute maximum, the infimum is by convention -∞ and the supremum is by convention +∞, it has an inflection point at x = 0, it is concave for x < 0 and convex for x > 0, has a relative minimum point at x = √(3)/3 and a relative maximum point at x = -√(3)/3

Euler found a constant, denoted by the symbol e, as the limit of the sequence of (1+1/n)ⁿ because he wanted to find an exponential function of the type a^x that would meet the point (0,1) with an angular coefficient equal to 1, therefore the tangent line to the function f(x) = e^x at the point (0,1) is parallel to the bisector of the first and third quadrant

f(x) = a^x, a > 0; 0 < a^x = e^{ln(a^x)} = e^x⋅ln(a) = e^(ln(a))x = e^λx, λ = ln(a), e^λ is the ordinate of the exponential function e^λx at the abscissa point 1; a > 1, ln(a) > 0, e^ln(a)⋅x is a strictly increasing and strictly convex function, it has no inflection points ; 0 < a < 1, ln(a) < 0, e^ln(a)⋅x is a strictly decreasing and strictly convex function, it has no inflection points

f(x) and -f(x) are symmetrical functions with respect to the x-axis

f(x) and f(-x) are symmetrical functions with respect to the y-axis

f(x) and -f(-x) are symmetrical functions with respect to the origin of the x and y axes

f(x) = e^-x = 1/e^x is a strictly decreasing and strictly convex function, and the tangent at the point (0,1) has angular coefficient -1, so it is parallel to the bisector of the second and fourth quadrant; e^-x is the symmetrical function of e^x with respect to the y-axis

f(x) = -e^x is a strictly decreasing and strictly concave function, and the tangent at the point (0,-1) has angular coefficient -1, so it is parallel to the bisector of the second and fourth quadrant; -e^x is the symmetrical function of e^x with respect to the x-axis

f(x) = -e^-x is a strictly increasing and strictly concave function, and the tangent at the point (0,-1) has angular coefficient of 1, so it is parallel to the bisector of the first and third quadrant; -e^-x is the symmetrical function of e^x with respect to the origin of the x and y axes

Hyperbolic functions are analogues of the ordinary trigonometric functions, but defined using the hyperbola rather than the circle; just as the points (cos(t),sin(t)) form a circle with a unit radius, the points (cosh(t),sinh(t)) form the right half of the unit hyperbola; the derivatives of sin(t) and cos(t) are cos(t) and –sin(t), the derivatives of sinh(t) and cosh(t) are cosh(t) and sinh(t)

cosh(x) := (e^x+e^-x)/2, this is the hyperbolic cosine

sinh(x) := (e^x-e^-x)/2, this is the hyperbolic sine

cosh(x) is an even function because f(x) = f(-x), therefore the graph of the hyperbolic cosine function is symmetrical with respect to the y-axis; cosh(x) is always a positive function because the numerator and denominator are always positive; the hyperbolic cosine graph is entirely contained in the first and second quadrant; (cosh(x))' = (e^x-e^-x)/2 = sinh(x); the derivative of e^x is e^x, and the derivative of e^-x with respect to -x is e^-x multiplied by the derivative of -x with respect to x which is -1; (cosh(x))' < 0 for x < 0, (cosh(x))' = 0 for x = 0, (cosh(x))' > 0 for x > 0; the hyperbolic cosine is a decreasing function in the second quadrant, it intersects the y-axis at the point (0,1) which is the absolute minimum point, and is increasing in the first quadrant, therefore cosh(x) is ≥ 1; (cosh(x))' = sinh(x), (sinh(x))' = cosh(x), (cosh(x))'' = cosh(x), the hyperbolic cosine has the second derivative equal to the function itself and so it is always > 0, therefore the hyperbolic cosine is a strictly convex function

sinh(x) is an odd function because f(x) = -f(-x), therefore the graph of the hyperbolic sine function is symmetrical with respect to the origin of the x and y axes; sinh(x) is a positive function for x > 0, and negative for x < 0; the hyperbolic sine graph is entirely contained in the first and third quadrant; (sinh(x))' = (e^x+e^-x)/2 = cosh(x); the derivative of e^x is e^x, and the derivative of e^-x with respect to -x is e^-x multiplied by the derivative of -x with respect to x which is -1; (sinh(x))' > 1 for x < 0, (sinh(x))' = 1 for x = 0, (sinh(x))' > 1 for x > 0; the hyperbolic sine is an increasing function in the third quadrant, passes through the origin of the x and y axes which is the inflection point, and is increasing in the first quadrant; (sinh(x))' = cosh(x), (cosh(x))' = sinh(x), (sinh(x))'' = sinh(x), the hyperbolic sine has the second derivative equal to the function itself and therefore (sinh(x))'' < 0 for x < 0, so the function is strictly concave for x < 0, (sinh(x))'' = 0 for x = 0, so the point (0,0) is an inflection point, (sinh(x))'' > 0 for x > 0, so the function is strictly convex for x > 0; the first derivative of the hyperbolic sine is the hyperbolic cosine, therefore the tangent of the curve sinh(x) at the point (0,0) has angular coefficient 1, therefore it is parallel to the bisector of the first and third quadrant

(sin(x))' = cos(x), (cos(x))' = -sin(x); (sinh(x))' = cosh(x), (cosh(x))' = sinh(x)

(cosh(x)+sinh(x))/2 = (((e^x+e^-x)/2)+((e^x-e^-x)/2))/2 = ((e^x+e^-x+e^x-e^-x)/2)/2 = (2e^x/2)/2 = e^x/2, the arithmetic mean between the hyperbolic cosine and the hyperbolic sine is e^x/2

The hyperbolic cosine is an even function and is ≥ 1 and is unbounded above; the hyperbolic sine is an odd function and is unlimited below and unlimited above; the arithmetic mean between cosh(x) and sinh(x) is the function e^x/2; the function e^x/2 meets the y-axis at the point (0,1/2)

The fundamental identity of circular functions is cos²(x)+sin²(x) = 1, but for the hyperbolic cosine and the hyperbolic sine the relation is cosh²(x)-sinh²(x) = 1; cosh²(x) = ((e^x+e^-x)/2)² = (e^2x+2e^xe^-x+e^-2x)/4; sinh²(x) = ((e^x-e^-x)/2)² = (e^2x-2e^xe^-x+e^-2x)/4; cosh²(x)-sinh²(x) = ((e^x+e^-x)/2)²-((e^x-e^-x)/2)² = ((e^2x+2e^xe^-x+e^-2x)/4)-((e^2x-2e^xe^-x+e^-2x)/4) = (e^2x+2e^xe^-x+e^-2x-e^2x+2e^xe^-x-e^-2x)/4 = 4e^xe^-x/4 = e^xe^-x = e^x(1/e^x) = e^x/e^x = 1, so cosh²(x)-sinh²(x) = 1

{x = cos(t), y = sin(t)}, t is the independent variable, and the curve represented by this parametric equation is the circle with center (0,0) and radius 1; cos²(t)+sin²(t) = 1, x²+y² = 1, this is the equation of the circle with center (0,0) and radius 1

{x = cosh(t), y = sinh(t)}, t is the independent variable and the curve represented by this parametric equation is an equilateral hyperbola; cosh(t) ≥ 1, for t = 0 the curve intersects the x-axis at the point (1,0); the hyperbolic cosine is an even function and the hyperbolic sine is an odd function, therefore the curve obtained from the parametric equation is symmetrical with respect to the x-axis; cosh²(t)-sinh²(t) = 1, x²-y² = 1, this is the equation of an equilateral hyperbola, precisely of the branch in the semi-plane x > 0 since cosh(t) ≥ 1, and since it is equilateral the asymptotes are perpendicular to each other and are the bisector of the first quadrant and the bisector of the fourth quadrant; a generic equation of a hyperbola is (x²/a²)-(y²/b²) = 1, with a ≠ 0 and b ≠ 0, and if a = b, then the hyperbola is equilateral, so the asymptotes are perpendicular to each other; this is why cosh(x) and sinh(x) are called hyperbolic functions

f(x) = f(x) = a⋅e^-(x-b)²/c², Gaussian function

f(x) = e^-x², simplified version of the Gaussian function; f(x) is an even and positive function; e^x > 0, e^-x² > 0; the graph of f(x) is in the first and second quadrant and intersects the y-axis at the point (0,1); f(x) = e^-x² = 1/e^x², when x tends to +∞ or -∞ f(x) tends to 0 always keeping itself above the x-axis which is the horizontal asymptote, lim_x→+∞(1/e^x²) = 0, lim_x→-∞(1/e^x²) = 0; f(x) is increasing for x < 0 and decreasing for x > 0; the first derivative of f(x) is the derivative of -x² with respect to x which is -2x, multiplied by the derivative of e^-x² with respect to -x² that is e^-x², so f'(x) = -2x⋅e^-x²; f(x) = e^-x², f'(x) = -2x⋅e^-x²; f'(x) > 0 for x < 0, so the function is increasing for x < 0; f'(x) < 0 for x > 0, so the function is decreasing for x > 0; f''(x) = -2e^-x²+(-2x)(-2xe^-x²) = -2e^-x²+4x²e^-x² = 2e^-x²(-1+2x²) = 2e^-x²(2x²-1); f(x) = e^-x², f'(x) = -2x⋅e^-x², f''(x) = 2e^-x²(2x²-1); 2x²-1 = 0, 2x² = 1, x² = 1/2, x = ±√(1/2) = ±(1/√(2)) = ±(√(2)/2), so f(x) has 2 inflection points, for x = -√(2)/2, and for √(2)/2; f''(x) > 0 so f(x) is strictly convex in the intervals (-∞,-√(2)/2) and (√(2)/2,+∞), f''(x) < 0 so f(x) is strictly concave in the interval (-√(2)/2,√(2)/2); the graph of the function f(x) = e^-x² is a bell-shaped curve, also called Gaussian

A horizontal line y = q is a horizontal asymptote of a function when lim_x→+∞(f(x)-q) = 0, or when lim_x→-∞(f(x)-q) = 0, so the graph of the function approaches the straight line when x tends to +∞ or -∞; a horizontal asymptote can be left or right, or simultaneously left and right; there are also vertical asymptotes and oblique asymptotes

f(x) = 1/(1+x²), this rational function is the derivative of the arctangent function, it is a positive and even function, it is increasing for x < 0 and decreasing for x> 0; the derivative of the function can be calculated using the formula (1/f(x))' = -f'(x)/(f(x))², or the formula (f(x)/g(x))' = (f'(x)g(x)-f(x)g'(x))/(g(x))², or the rule of the compound function considering that f(x) = 1/(1+x²) = (1+x²)^-1; f'(x) = -2x/(1+x²)²; f'(x) > 0 for x < 0, so f(x) is increasing for x < 0; f'(x) < 0 for x > 0, so f(x) is decreasing for x > 0; f'(x) = 0 for x = 0, so the point (0,1) is a critical point, a relative and also an absolute maximum point; by calculating the second derivative we can find the inflection points

42 - GRAPHS OF FUNCTIONS - PART 2

f(x) = 1/(1+x²) = (1+x²)^-1; f'(x) = -(2x)/(1+x²)² = -2x(1+x²)^-2; f''(x) = -2((1+x²)^-2-2x(1+x²)^-32x) = -2(1/(1+x²)²)-4x²/(1+x²)³) = -2((1+x²-4x²)/(1+x²)³) = -2((1-3x²)/(1+x²)³) = 2((3x²-1)/(1+x²)³), 3x²-1 = 0, 3x² = 1, x² = 1/3, x = ±√(1/3) = ±(1/√(3)) = ±(√(3)/3), the inflection point I₁ has coordinate x₁ = -1/√(3), the inflection point I₂ has coordinate x₂ = 1/√(3), y = 1/(1+x²), y₁ = 1/(1+(-1/√(3))²) = 1/(1+1/3) = 1/(4/3) = 3/4, y₂ = 1/(1+(1/√(3))²) = 1/(1+1/3) = 1/(4/3) = 3/4, F₁(-1/√(3),3/4), F₁(1/√(3),3/4); at the inflection points the concavity changes sign, and the second derivative is equal to 0; the second derivative of this function is negative between -1/√(3) and 1/√(3), and therefore the graph of the function is strictly concave in the interval between these two inflection points; the second derivative of this function is positive for x < -1/√(3) and for x > 1/√(3), and therefore the graph of the function is strictly convex for x < -1/√(3) and for x > 1/√(3); lim_x→+∞(1/(1+x²)) = 0, lim_x→-∞(1/(1+x²)) = 0, therefore this function has the x-axis as its horizontal asymptote

A rational function is a ratio of polynomials

When in a rational function f(x) = p(x)/q(x), the polynomial in the numerator has a lower degree than the polynomial in the denominator, then lim_x→±∞(f(x)) = 0, therefore the x-axis is a horizontal asymptote

When in a rational function f(x) = p(x)/q(x), the polynomial in the numerator and the polynomial in the denominator have the same degree, then f(x) = (a_nxⁿ+...+a₀)/(b_nxⁿ+...+b₀), dividing by xⁿ, f(x) = (a_n+a_n-1/x+...+a₀/x_n)/(b_n+b_n-1/x+...+b₀/x_n), lim_x→±∞((a_n+a_n-1/x+...+a₀/x_n)/(b_n+b_n-1/x+...+b₀/x_n)) = a_n/b_n, y = a_n/b_n is the equation of the horizontal asymptote, and in this case the horizontal asymptote on the right, for x tending to +∞, is equal to the horizontal asymptote on the left, for x tending to -∞

In some cases the horizontal asymptote on the left, for x tending to -∞, is equal to the horizontal asymptote on the right, for x tending to +∞, but in other cases the horizontal asymptote on the left, for x tending to -∞, is different from the horizontal asymptote on the right, for x tending to +∞

f(x) = 1/(1+e^-x) = 1/(1+1/e^x) = e^x/(e^x+1); 0 < f(x) < 1, the image of the function is contained in the open interval (0,1); lim_x→∞(f(x)) = 1; (1/f(x))' = -f'(x)/(f(x))², (1/(1+e^-x))' = -(1+e^-x)'/(1+e^-x)² = -(-e^-x)/(1+e^-x)² = e^-x/(1+e^-x)² = e^-x/(e^-x+1)² = 1/(e^x(1/e^x+1)²) = 1/(e^x((1+e^x)/e^x)² = 1/(e^x((1+e^x)²/e^2x)) = 1/((1+e^x)²/e^x) = e^x/(1+e^x)², f'(x) > 0, the function is increasing; lim_x→-∞(f(x)) = 0; the graph of the function intersects the y-axis at the point (0,1/2), the x-axis is the horizontal asymptote on the left, y = 1 is the horizontal asymptote on the right; im(f) = (0,1), the image of the function is the open interval (0,1); continuous functions transform intervals into intervals; the function is defined on the whole set ℝ which is an interval; the image of the function is an interval contained in the open interval (0,1); the function tends to 1 when x tends to +∞, therefore the supremum of the image of the function is 1; the function tends to 0 when x tends to -∞, therefore the infimum of the image of the function is 0; the image of the function is an interval contained in the open interval (0,1), and this interval has 0 as infimum and 1 as supremum, therefore the image of the function is the open interval (0,1)

The graph of the function f(x) = 1/x is an equilateral hyperbola with a branch in the first quadrant and a branch in the third quadrant; lim_x→0⁺(1/x) = +∞; lim_x→0^-(1/x) = -∞; the line of equation y = 0 that is the x-axis, is the horizontal asymptote; the line of equation x = 0 that is the y-axis, is the vertical asymptote

The graph of the function f(x) = 1/x² is contained in the first and second quadrant; lim_x→0⁺(1/x²) = +∞; lim_x→0^-(1/x²) = +∞; the line of equation y = 0 that is the x-axis, is the horizontal asymptote; the line of equation x = 0 that is the y-axis, is the vertical asymptote

When in a rational function, which is the ratio of two polynomials, the denominator is equal to 0 in a point, but the numerator is different from 0 in this point, then at this point the limit of the function is +∞ or -∞, and the graph of the function has a vertical asymptote passing through this point; f(x) = p(x)/q(x), q(x₀) = 0, p(x₀) ≠ 0, x = x₀ is a vertical asymptote

Considering any oblique line with equation y = mx+q, if lim_{x→+∞(f(x)-(mx+q)) = 0 then the line is an oblique asymptote to the right, if lim_{x→-∞(f(x)-(mx+q)) = 0 then the line is an oblique asymptote to the left}}

f(x) = √(1+x²) is an even function, so the trend to +∞ is equal to the trend to -∞; f(x) = √(1+x²) ~ √(x²) = |x|; lim_x→+∞(√(1+x²)-x) = 0; lim_x→+∞(√(1+x²)-x) = ∞-∞, indeterminate form; lim_x→+∞(√(1+x²)-x) = lim_x→+∞((√(1+x²)-x)((√(1+x²)+x)/(√(1+x²)+x))) = lim_x→+∞(((√(1+x²)-x)(√(1+x²)+x))/(√(1+x²)+x)) = lim_x→+∞((1+x²-x²)/(√(1+x²)+x)) = lim_x→+∞(1/(√(1+x²)+x)) = 1/(∞+∞) = 1/∞ = 0; y = x that is the bisector of the first and third quadrant, is an oblique asymptote to the right; y = -x that is the bisector of the second and fourth quadrant, is an oblique asymptote to the left

If the graph of a function f(x) has an oblique asymptote y = mx+q, then lim_x→+∞(f(x)-mx-q) = 0; if there is an oblique asymptote then lim_x→+∞(f(x)-mx-q) = 0, and dividing by x we get lim_x→+∞(f(x)/x-mx/x-q/x) = lim_x→+∞(f(x)/x-m-q/x), therefore if there is an oblique asymptote its angular coefficient is m = lim_x→+∞(f(x)/x), if m ∈ ℝ then q = lim_x→+∞(f(x)-mx), if m = 0 the asymptote is horizontal, if m and q do not exist, then the oblique asymptote does not exist

f(x) = e^x; lim_x→-∞(e^x) = 0, the x-axis is a horizontal asymptote to the left; the angular coefficient of a possible asymptotic line would be m = lim_x→+∞(f(x)/x) = lim_x→+∞(e^x/x) =^H lim_x→+∞(e^x) = +∞, therefore there is no oblique asymptote; the exponential function f(x) = e^x has a horizontal asymptote on the left which is the x-axis, it has no oblique asymptotes, and it has no vertical asymptotes because the function is defined on the whole line of real numbers

The graph of the exponential function e^x and the graph of the logarithm function ln(x) are symmetrical with respect to the bisector of the first and third quadrant

f(x) = ln(x); lim_x→0⁺(ln(x)) = -∞, the y-axis is the vertical asymptote, in fact the logarithm function f(x) = ln(x) is defined for x > 0; the angular coefficient of a possible oblique asymptote would be m = lim_x→+∞(ln(x)/x) =^H lim_x→+∞(1/x) = 0, therefore the logarithm function f(x) = ln(x) has no oblique asymptotes and has no horizontal asymptotes; the logarithm function when x tends to +∞ diverges positively, it tends to +∞

The domain of the exponential function is all ℝ; the image of the logarithm function is all ℝ

ln(eⁿ) = n, the logarithm assumes all natural values corresponding to the powers eⁿ

The image of the logarithm function is unbounded above; the image of the logarithm function is the domain of the exponential function

The domain of the exponential function is ℝ, which is the image of the logarithm function; the image of the exponential function is (0,+∞), which is the domain of the logarithm function

f(x) = e^1/x, x ≠ 0; lim_x→+∞(e^1/x) = 1; lim_x→-∞(e^1/x) = 1; x > 0 ⇒ f(x) > 1; x < 0 ⇒ 0 < f(x) < 1; the straight line y = 1 is a horizontal asymptote to the left, as x approaches -∞, and to the right, as x approaches +∞; the function is not defined at the point x = 0; lim_x→0⁺(e^1/x) = e^+∞ = +∞, therefore the y-axis is a vertical asymptote for x > 0; lim_x→0^-(e^1/x) = e^-∞ = 0, therefore the y-axis is not a vertical asymptote for x < 0

A vertical line can be asymptote only on one side when the function is not rational that is when the function is not a ratio between polynomials

The slope of a line is the angle that the line forms with the x axis; the slope of a line is the variation of the ordinate of a point on the line when the abscissa is increased by one unit

If x > 0, then x = e^ln(x)

f(x) = x^x, x > 0; f(x) = x^x = e^{ln(x^x)} = e^x⋅ln(x), the function is defined for x > 0 and assumes only positive values, in fact the logarithm function is defined only for positive values, and the exponential function assumes only positive values; lim_x→0⁺(x⋅ln(x)) = lim_x→0⁺(ln(x)/(1/x)) =^H lim_x→0⁺((1/x)/(-1/x²) = lim_x→0⁺(-x²/x) = lim_x→0⁺(-x) = 0; lim_x→0⁺(e^x⋅ln(x)) = e⁰ = 1; lim_x→0⁺(f(x)) = 1; lim_x→+∞(f(x)) = lim_x→+∞(e^x⋅ln(x)) = e^+∞⋅+∞ = +∞; f(x) = x^x = e^x⋅ln(x), f'(x) = e^x⋅ln(x)(1⋅ln(x)+x(1/x)) = x^x(ln(x)+1), x^x is always > 0, therefore the sign of the first derivative depends on ln(x)+1, ln(x)+1 = 0, ln(x) = -1, x = e^ln(x) = e^-1 = 1/e, x = 1/e is the critical point, which is the absolute minimum point; f'(x) < 0 for 0 < x < 1/e, therefore f(x) = x^x is decreasing for 0 < x < 1/e; f'(x) = 0 for x = 1/e, so the absolute minimum point of f(x) = x^x is (1/e,(1/e)^1/e); f'(x) > 0 for x > 1/e, therefore f(x) = x^x is increasing for x > 1/e

43 - DEFINITION OF INTEGRAL

The integral of a function is the area of the region of the plane contained between the x-axis and the graph of the function

The trapezoid of a non-negative function f defined on the interval [a,b] is the set of points (x,y) of the plane for which x is between a and b and y is between 0 and f(x)

f:[a,b] → ℝ, f(x) ≥ 0, {(x,y) ∈ ℝ², a ≤ x ≤ b, 0 ≤ y ≤ f(x)}; we want to calculate the area formed by the points (x,y) of the plane ℝ² bounded by the x-axis, by the vertical line x = a, by the vertical line x = b, and by the graph of the function; if f(x) = c, the trapezoid is a rectangle and the area is c(b-a); when the trapezoid is contained in the half-plane of negative ordinates, then its area is negative; the area of the trapezoid is the integral of the function; the area of the trapezoid has a positive value when it is contained in the half-plane of the positive ordinates, and has a negative value when it is contained in the half-plane of the negative ordinates; if the function f is polynomial of first degree, the trapezoid has as bases f(a) and f(b), and as height b-a, and the area is the semisum of the bases times the height that is ((f(a)+f(b))/2)(b-a) = ((f(a)+f(b))(b-a))/2; the length of an arc of circumference is the supremum of the numerical set consisting of the lengths of the inscribed polygons, or the infimum of the numerical set consisting of the lengths of the circumscribed polygons, and these two numerical sets are separate and contiguous, and the supremum of the lengths of the inscribed polygons coincides with the infimum of the lengths of the circumscribed polygons; we decompose the interval [a,b] into the set of points σ = {x₀ = a, x₁, x₂, ..., x_n = b}; m ≤ f(x) ≤ M, the function is bounded; we consider interval by interval the minimum value that the function assumes, and since the function is continuous the minimum value exists, and if the function were not continuous there would not be a minimum, but an infimum; m ≤ f(x) ≤ M, the function is bounded; e_k := inf{f(x), x_k-1 ≤ x ≤ x_n}, k = 1,2,3,...; the area of the trapezoid is decomposed into rectangles that have base x_k-x_k-1, and the height is e_k that is the infimum that the function assumes in that interval; the area of each rectangle is e_k(x_k-x_k-1); the total area of all rectangles is ⁿΣ_k=1(e_k(x_k-x_k-1)), and this area is contained in the area of the trapezoid of the function f(x); the total area of all the rectangles depends on the function f and on how the interval [a,b] has been decomposed, therefore the sum of the areas of the rectangles depends on f and σ; s(f,σ) := ⁿΣ_k=1(e_k(x_k-x_k-1)), lower sum relative to function f and interval [a,b], that is the area of the pluri-rectangle contained in the trapezoid of the function f(x); E_k := sup{f(x), x_k-1 ≤ x ≤ x_n}, k = 1,2,3,...; the area of the trapezoid is decomposed into rectangles that have base x_k-x_k-1, and the height is E_k that is the supremum that the function assumes in that interval; the area of each rectangle is E_k(x_k-x_k-1); S(f,σ) := ⁿΣ_k=1(E_k(x_k-x_k-1)), upper sum relative to function f and interval [a,b], that is the area of the pluri-rectangle containing the trapezoid of the function f(x); the lower sum is an underestimate of the trapezoid area, the upper sum is an overestimate of the area of the trapezoid

e_k := inf{f(x), x_k-1 ≤ x ≤ x_n}, k = 1,2,3,...; s(f,σ) := ⁿΣ_k=1(e_k(x_k-x_k-1))

E_k := sup{f(x), x_k-1 ≤ x ≤ x_n}, k = 1,2,3,...; S(f,σ) := ⁿΣ_k=1(E_k(x_k-x_k-1))

area(trapezoid(f)) := sup_σ(s(f,σ)), in this definition the area of the trapezoid is the supremum of the lower sums

area(trapezoid(f)) := inf_σ(S(f,σ)), in this definition the area of the trapezoid is the infimum of the upper sums

The area of the trapezoid of a continuous and non-negative function f is defined as the supremum of the lower sums relative to f and to all the possible decompositions of the interval [a,b]

The area of the trapezoid of a non-negative function f, defined and continuous on the interval [a,b], is called the integral of the function and is indicated by ^b∫_a(f(x)dx)

The integral symbol is an elongated letter S, from the Latin word Summa, and the extremes of the integration interval are indicated above and below the integral symbol

The integration of a function is the operation that associates the function with the area of its trapezoid, that is the integral

The German Gottfried Wilhelm von Leibniz, born in Leipzig in 1646 and died in Hanover in 1716, was together with the English Isaac Newton to develop the concept of integral; today the symbols used for integral calculus are those of Leibiniz

A function that traverses the x-axis is a function of variable sign; this function can be written as the difference between two functions which are the positive part and the negative part

f⁺(x) := max{f(x),0}, positive part of the function; the part above the x-axis does not change, the part below the x-axis becomes 0; f⁺(x) ≥ 0

f^-(x) := max{-f(x),0}, negative part of the function; the function is overturned with respect to the x-axis, the part above the x-axis does not change, the part below the x-axis becomes 0; f^-(x) ≥ 0

f⁺(x) = max{f(x),0}, positive part of the function f, f⁺(x) ≥ 0; f^-(x) = max{-f(x),0}, negative part of the function f, f^-(x) ≥ 0

The positive part of a function f is f⁺(x) = max{f(x),0}, at any point x f⁺(x) is the maximum between f(x) and 0; the negative part of a function f is f^-(x) = max{-f(x),0}, at any point x f^-(x) is the maximum between -f(x) and 0

f(x) = f⁺(x)-f^-(x)

|f(x)| = f⁺(x)+f^-(x)

f(x) = f⁺(x)-f^-(x), f⁺(x) and f^-(x) are two non-negative functions; the integral of a non-negative function is the area of its trapezoid defined as the supremum of the set of lower sums

The operation that associates a function with its integral is linear, therefore the integral of the sum of two functions is equal to the sum of the integrals of the two functions, and the integral of the difference of two functions is equal to the difference of the integrals of the two functions

f(x) = f⁺(x)-f^-(x); ^b∫_a(f(x)dx) := ^b∫_a(f⁺(x)dx)-^b∫_a(f^-(x)dx)

^b∫_a(f⁺(x)dx) is the area of the graph of f above the x-axis; ^b∫_a(f^-(x)dx) is the area of the graph of f under the x-axis; ^b∫_a(f(x)dx) is the difference between the area of the graph of f above the x-axis and the area of the graph of f below the x-axis; ^b∫_a(f(x)dx) > 0 if the area of the graph of f above the x-axis is greater than the area of the graph of f below the x-axis; ^b∫_a(f(x)dx) < 0 if the area of the graph of f above the x axis is less than the area of the graph of f below the x axis; ^b∫_a(f(x)dx) = 0 if the area of the graph of f above the x axis is equal to the area of the graph of f below the x axis

The integral of the sin(x) function between 0 and 2π is 0, because the area of sin(x) between 0 and π and the area of sin(x) between π and 2π are equal but of opposite sign, therefore their sum is 0; the integral of sin(x) in the interval [0,2π] is equal to 0 due to the symmetry of the graph

The integral of a function f of variable sign is defined as the difference between the integrals of the positive and negative parts of f

^b∫_a((f(x)±g(x))dx) = ^b∫_a(f(x)dx)±^b∫_a(g(x)dx)

^b∫_a((f(x)+g(x))dx) = ^b∫_a(f(x)dx)+^b∫_a(g(x)dx)

^b∫_a((f(x)-g(x))dx) = ^b∫_a(f(x)dx)-^b∫_a(g(x)dx)

If the function is ≥ 0, the integral equals the area between the curve and the x-axis; if the function is ≤ 0, the integral is the opposite of the area between the x-axis and the curve; if the function is of variable sign, the integral is the algebraic sum of the areas above and below the x-axis; the integral can have a positive, negative, or null value

f(x) = x, 0 ≤ x ≤ 1; the trapezoid is an isosceles right triangle, and the area of a triangle is (base⋅height)/2, therefore the integral of this function in the interval [0,1] must be 1/2; decomposing the base into n equal parts, x₀ = 0, x₁ = 1/n, x₂ = 2/n, ..., x_k = k/n, therefore the generic interval is [x_k-1,x_k] e x_k-1 = (k-1)/n; the function is increasing, therefore the infimum is also the minimum; e_k = x_k-1 = (k-1)/n, E_k = x_k = k/n; x_k-x_k-1 = 1/n; s(f,σ_n) = ⁿΣ_k=1(e_k(x_k-x_k-1)) = ⁿΣ_k=1(((k-1)/n)(1/n)) = (1/n)ⁿΣ_k=1((k-1)/n) = (1/n²)ⁿΣ_k=1(k-1), the sum of the numbers from 1 to n is (n(n+1))/2, therefore ⁿΣ_k=1(k-1) = ((n-1)n)/2, (1/n²)ⁿΣ_k=1(k-1) = (1/n²)((n-1)n)/2 = (n-1)/(2n) = n/(2n)-1/(2n) = 1/2-1/(2n), the lower sums are < 1/2, the supremum of 1/2-1/(2n) is lim_n→+∞(1/2-1/(2n)) = 1/2, therefore the integral of the function is 1/2 because 1/2 is the supremum of the lower sums; S(f,σ_n) = ⁿΣ_k=1(E_k(x_k-x_k-1)) = ⁿΣ_k=1((k/n)(1/n)) = ⁿΣ_k=1(k/n²) = (1/n²)ⁿΣ_k=1(k), the sum of the numbers from 1 to n is (n(n+1))/2, (1/n²)ⁿΣ_k=1(k) = (1/n²)((n(n+1))/2) = (n+1)/(2n) = n/(2n)+1/(2n) = 1/2+(1/2n), the higher sums are > 1/2, the infimum of 1/2+(1/2n) is lim_n→+∞(1/2+1/(2n)) = 1/2, therefore the integral of the function is 1/2 because 1/2 is the infimum of the higher sums; 1/2-1/(2n) < trapezoid area < 1/2+1/(2n)

f(x) = x²; x_k = k/n, x_k-1 = (k-1)/n; x_k-x_k-1 = 1/n; e_k = ((k-1)/n)²; E_k = (k/n)²; S(f,σ_n) = ⁿΣ_k=1(E_k(x_k-x_k-1)) = ⁿΣ_k=1((k/n)²(1/n) = (1/n³)ⁿΣ_k=1(k²), the sum of the squared numbers from 1 to n is (n(n+1)(2n+1))/6, (1/n³)ⁿΣ_k=1(k²) = (1/n³)((n(n+1)(2n+1))/6) = (2n³+...)/(6n³) = 1/3+((...)/(6n³)), in the ratio (...)/(6n³) the polynomial in the numerator has a lower degree than the polynomial in the denominator, therefore lim_n→+∞((...)/(6n³)) = 0, the higher sums are 1/3 plus a term that tends to 0 when n tends to +∞, this succession of numbers tends to 1/3; s(f,σ_n) = 1/3-((...)/(6n³)), lim_n→+∞((...)/(6n³)) = 0, the lower sums are 1/3 minus a term that tends to 0 when n tends to +∞; ¹∫₀(x²) = 1/3

44 - THE FUNDAMENTAL THEOREM OF INTEGRAL CALCULUS

The integral of a function is the supremum of the lower sums, or the infimum of the higher sums

A primitive of the function f is a function F having f as a derivative

F: [a,b] → ℝ, f: [a,b] → ℝ, ∀ x F'(x) = f(x), F is the primitive of f

f(x) = xⁿ, F(x) = xⁿ⁺¹/(n+1); (xⁿ⁺¹/(n+1))' = xⁿ, xⁿ⁺¹/(n+1) is the primitive of xⁿ

f(x) = 1/x, x > 0, F(x) = log(x); (log(x))' = 1/x, log(x) is the primitive of 1/x

f(x) = sin(x), F(x) = -cos(x); (-cos(x))' = sin(x), -cos(x) is the primitive of sin(x)

f(x) = cos(x), F(x) = sin(x); (sin(x))' = cos(x), sin(x) is the primitive of cos(x)

If f'(x) = g(x), then f(x) is the primitive of g(x)

As a consequence of the mean value theorem, also called Lagrange's theorem, we know that if a function has null derivative everywhere then the function is constant; if log(x) is the primitive of 1/x, since the derivative of a constant is 0, then also log(x)+c is the primitive of 1/x where c is a constant; there are infinite primitives of a function which all differ by the value of a constant

F₁ and F₂ are two primitives of a function f; the difference between F₁ and F₂ is constant, F₁ = F₂+c, and this is a consequence of the mean value theorem, because F₁'(x)-F₂'(x) = f(x)-f(x) = 0, (F₁-F₂)' = 0, therefore the function F₁-F₂ is constant

Two functions that differ by a constant have the graph translated parallel to the y-axis; F₂(x) = F₁(x)+c; F₂(b)-F₂(a) = F₁(b)-F₁(a), (F₁(b)+c)-(F₁(a)+c) = F₁(b)-F₁(a), F₁(b)+c-F₁(a)-c = F₁(b)-F₁(a), F₁(b)-F₁(a) = F₁(b)-F₁(a); different primitives of the same function differ by a constant, therefore the variation of a primitive from point a to point b is the same for all the primitives of the same function

Two primitives of the same function differ by a constant

f: [a,b] → ℝ, x ↦ F(x) := ^x∫_a(f(t)dt); we have a continuous function f defined on an interval [a,b] with values in ℝ, to the function f we can associate the integral from point a to point b which is a number, and to the function f we can associate a function F which is the result of the integration from point a to point x, where the letter x indicates the second extreme of the integration interval, and we denote the integration variable with the letter t; F(x) is the integral function of the function f; we have to integrate between a fixed integration limit which is the extreme left of the integration interval, and a variable limit that we call x which is the extreme right of the integration interval, and we get a function of the variable x

The integral function relative to the function f is obtained by integrating f between a fixed point and a variable point, x ↦ ^x∫_a(f(t)dt)

f(x) = 1, integrating from 0 to x we obtain a rectangle that has base x and height 1, therefore the result of the integration is x; f(x) = 1, F(x) = x, F'(x) = f(x), (x)' = 1, F is a primitive of f, for example the primitive that is 0 for x = 0

f(x) = x, this is the identity function or the bisector of the first and third quadrant, integrating between 0 and x we obtain a triangle that has base x and height x; f(x) = x, F(x) = (x⋅x)/2 = x²/2, F'(x) = f(x), (x²/2)' = x

The integral function of a continuous function is a primitive of the integrand function

The integration operation is the reverse of the derivation operation

If starting from a function we apply the integration operation and then the derivation operation, we get back the starting function

The fundamental theorem of calculus states that if f is continuous on [a,b], then the integral function x ↦ ^x∫_a(f(t)dt) is a primitive of f

x ↦ ^x∫_a(f(t)dt) is a special primitive because it is the primitive that is equal to 0 for x = a; ^a∫_a(f(t)dt) = 0

F(x) := x ↦ ^x∫_a(f(t)dt), the fundamental theorem of calculus states that this integral function is the primitive function of the function f, and precisely the primitive that is equal to 0 for x = a

The fundamental theorem of calculus allows us to easily calculate the integral of a function f, we just need to know a primitive of the function f

The fundamental theorem of calculus is also called Torricelli-Barrow theorem; Evangelista Torricelli and Isaac Barrow were two precursors of the infinitesimal calculus; Isaac Barrow was Isaac Newton's professor; the complete understanding of this theorem came with Gottfried Wilhelm Leibniz, born in 1646 and died in 1716, and with Isaac Newton, born in 1643 and died in 1727

a < c < b, ^b∫_a(f(x)dx) = ^c∫_a(f(x)dx)+^b∫_c(f(x)dx); the integral has the property of additivity with respect to the integration interval; the integral is the area of the trapezoid, therefore the area of the trapezoid from a to b is equal to the sum of the area of the trapezoid from a to c and from c to b; with σ we denote a decomposition of the interval [a,b], and with σ* we denote a decomposition of the interval [a,b] containing c as decomposition point; σ* > σ, σ* is a decomposition greater than σ; increasing the decomposition of the base of the trapezoid we obtain a lower sum ≥ than the starting one, s(f,σ*) ≥ s(f,σ), or a higher sum ≤ than the starting one, S(f,σ*) ≤ S(f,σ); s(f,σ) ≤ s(f,σ*) = s(f,σ₁)+s(f,σ₂) ≤ ^c∫_a(f(x)dx)+^b∫_c(f(x)dx), the sum of the two integrals from a to c and from c to b is a majorant of the set of the lower sums relative to the whole interval [a,b]; the supremum of s(f,σ), which by definition is the integral from a to b, is ≤ the sum of the two integrals; the supremum is the smallest of the majorants, and the inequality in the opposite direction can also be proved, therefore we have proved the additive property of the integral with respect to the integration interval

F is a primitive of f, if f < 0 then F is decreasing, if f = 0 then F has a critical point, if f > 0 then F is increasing

F(x) = ^x∫_a(f(t)dt); (F(x+h)-F(x))/h = (1/h)(^x+h∫_a(f(t)dt)-^x∫_a(f(t)dt)), due to the additive property of the integral, (1/h)(^x+h∫_a(f(t)dt)-^x∫_a(f(t)dt)) = (1/h)(^x∫_a(f(t)dt)+^x+h∫_x(f(t)dt)-^x∫_a(f(t)dt)) = (1/h)(^x+h∫_x(f(t)dt)), lim_h→0((1/h)(^x+h∫_x(f(t)dt))) = f(x), the function f is continuous at point x, therefore the values that f(t) assumes in the interval [x,x+h] are close to f(x) as much as we want, f(x)-ε < f(t) < f(x)+ε when h < δ_ε; f(x+ε) is a majorant of the function; f(x-ε) is a minorant of the function; m(x,h) is the minimum of the function; M(x,h) is the maximum of the function; m(x,h) and M(x,h) converge to f(x) because the function is continuous; (F(x+h)-F(x))/h = (1/h)^x+h∫_x(f(t)dt), h⋅m(x,h) ≤ (1/h)^x+h∫_x(f(t)dt) ≤ h⋅M(x,h), m(x,h) ≤ ^x+h∫_x(f(t)dt) ≤ M(x,h), m(x,h) ≤ (F(x+h)-F(x))/h ≤ M(x,h), lim_h→0(m(x,h)) = f(x), lim_h→0(M(x,h)) = f(x), therefore lim_h→0((F(x+h)-F(x))/h) = f(x)

If G is a primitive of the continuous function f, then _b∫_a(f(x)dx) = G(b)-G(a)

The integral of a function is equal to the difference between the values that a primitive assumes in the two extremes of the integration interval

F(x) = ^x∫_a(f(t)dt); F(b) = ^b∫_a(f(t)dt); F(a) = ^a∫_a(f(t)dt) = 0; F(x) = ^b∫_a(f(t)dt) = ^b∫_a(f(x)dx) = F(b)-F(a); two primitives of the same function differ by a constant, G(x) = F(x)+c; the variation of a primitive from point a to point b is the same for all primitives, therefore F(x) = ^b∫_a(f(t)dt) = ^b∫_a(f(x)dx) = F(b)-F(a) = G(b)-G(a); the integral of a function is equal to the difference between the primitive of the integrand function calculated in point b and the primitive of the integrand function calculated in point a

^b∫_a(f(x)dx) = G(b)-G(a) = ^b[G(x)]_a

^a∫₀(x²dx) = ^a[x³/3]₀ = a³/3; x = a, y = a², the area of the rectangle that contains the arc of the parabola is x⋅y = a⋅a² = a³; the area under the parabola in the interval [0,a] is a³/3, it is the third part of the area of the rectangle that contains the trapezoid of the parabola

^π∫₀(sin(x)dx) = ^π[-cos(x)]₀ = -cos(π)-(-cos(0)) = -cos(π)+cos(0) = -(-1)+1 = 1+1 = 2, the area under a sine arc is 2

^π/2∫_-π/2(cos(x)dx) = ^π/2[sin(x)]_-π/2 = sin(π/2)-sin(-π/2) = 1-(-1) = 1+1 = 2, the area under a cosine arc is 2

f(x) = 1/(1+x²), F(x) = arctan(x); 1/(1+x²) is the derivative of arctan(x), arctan(x) is the primitive of 1/(1+x²); ¹∫₀(dx/(1+x²)) = arctan(1)-arctan(0) = π/4-0 = π/4

^+∞∫₀(dx/(1+x²)) = lim_b→+∞(^b∫₀(dx/(1+x²))) = lim_b→+∞(^b[arctan(x)]₀) = lim_b→+∞(arctan(b)-arctan(0)) = lim_b→+∞(arctan(b))-arctan(0) = arctan(+∞)-arctan(0) = π/2-0 = π/2; an integral that has ∞ as extreme of integration is called improper integral, and is denoted by ^+∞∫₀; the arctangent is the inverse function of the tangent; lim_{x→(-π/2)⁺}(tan(x)) = -∞, x = -π/2 is a vertical asymptote of tan(x); lim_x→0(tan(x)) = 0; lim_x→(π/2)^-(tan(x)) = +∞, x = π/2 is a vertical asymptote of tan(x); lim_x→-∞(arctan(x)) = -π/2, y = -π/2 is a horizontal asymptote to the left of arctan(x); lim_x→0(arctan(x)) = 0; im_x→+∞(arctan(x)) = π/2, y = π/2 is a horizontal asymptote to the right of arctan(x); the curve of f(x) = 1/(1+x²) is infinite, but the area under the curve is finite; the area under the curve of f(x) = 1/(1+x²) is π/4 in the interval [0,1], is π/2 in the interval [0,+∞), is π/4 in the interval [1,+∞), therefore the line x = 1 divides the area into two parts of equal value

f(x) = e^-x, f'(x) = -e^-x, F(x) = -e^-x, in this case the derivative and the primitive coincide; ^b∫₀(e^-xdx) = ^b[-e^-x]₀ = -e^-b-(-e⁰) = -e^-b+e⁰ = -e^-b+1 = 1-e^-b; ^+∞∫₀(e^-xdx) = lim_b→+∞(^b∫₀(e^-xdx)) = lim_b→+∞(^b[-e^-x]₀) = lim_b→+∞(-e^-b-(-e⁰)) = lim_b→+∞(-e^-b+e⁰) = lim_b→+∞(-e^-b+1) = lim_b→+∞(1-e^-b) = 1, therefore ^+∞∫₀(e^-xdx) = 1

¹∫₀((x/(1+x²))dx); (1+x²)' = 2x; if we multiply the integrand function by a constant, the integral is also multiplied by this constant; ¹∫₀((x/(1+x²))dx) = (1/2)⋅¹∫₀(((2x)/(1+x²))dx), now the function is in the form f'(x)/f(x), and we know that (log(x))' = 1/x, (log(f(x)))' = f'(x)/f(x), so the primitive of (2x)/(1+x²) is log(1+x²); ¹∫₀((x/(1+x²))dx) = (1/2)⋅¹∫₀(((2x)/(1+x²))dx) = (1/2)⋅¹[log(1+x²)]₀ = (1/2)(log(2)-log(1)) = (1/2)(log(2)-0) = (1/2)(log(2)) = log(2^1/2) = log(√(2))

^π/4∫₀(tan(x)dx) = ^π/4∫₀(sin(x)/cos(x)); (cos(x))' = -sin(x); -^π/4∫₀(-sin(x)/cos(x)); (log(f(x)))' = f'(x)/f(x), (-sin(x)/cos(x))' = log(-^π/4∫₀(-sin(x)/cos(x)) = -^π/4[log(cos(x)]₀ = -(log(cos(π/4))-log(cos(0)) = -(log(√(2)/2)-log(1)) = -(log(√(2)/2)-0) = -log(√(2)/2) = -log((√(2)/2)(√(2)/√(2))) = -log(2/(2√(2))) = -log(1/√(2)) = log(√(2)), because log(a) = -log(1/a)

45 - PROPERTIES OF THE INTEGRAL

^x∫_a(f(t)dt) := F(x), for the fundamental theorem of calculus, the integral of a continuous function between a fixed limit a and a variable limit x is equal to the primitive of the function, and this is true for positive functions, but it is also true for the other cases; f(x) = f⁺(x)-f^-(x), ^x∫_a(f(t)dt) = ^x∫_a(f⁺(t)dt)-^x∫_a(f^-(t)dt) = F⁺(x)-F^-(x), D(F⁺(x)) = f⁺(x), D(F^-(x)) = f^-(x), D(F⁺(x)-F^-(x)) = f⁺(x)-f^-(x) = f(x); if the fundamental theorem of calculus is proved for positive sign functions, then it is also proved for any sign functions

^x∫_a(f(t)dt) = ^x∫_a(f⁺(t)dt)-^x∫_a(f^-(t)dt), in this way the integral function is linear; the integral function is a linear function that associates its integral to a function; ^b∫_a((f(x)+g(x))dx) = ^b∫_a(f(x)dx)+^b∫_a(g(x)dx), the integral of the sum of two functions is equal to the sum of the two integrals; ^b∫_a(c⋅f(x)dx) = c⋅^b∫_a(f(x)dx), the integrand function multiplied by a constant c is equal to the integral of the function multiplied by the constant c

The integral of the sum is equal to the sum of the integrals; the integral of the product of a constant times a function is equal to the product of the constant times the integral of the function

^b∫_a((f(x)+g(x))dx) = ^b∫_a(f(x)dx)+^b∫_a(g(x)dx), f and g are continuous functions; F(x) is a primitive of f(x), G(x) is a primitive of g(x), by the theorem of the derivative of the sum F(x)+G(x) is a primitive of f(x)+g(x), therefore ^b∫_a((f(x)+g(x))dx) = ^b[F(x)+G(x)]_a = F(b)+G(b)-F(a)-G(a) = F(b)-F(a)+G(b)-G(a) = ^b∫_a(f(x)dx)+^b∫_a(g(x)dx)

^b∫_a(c⋅f(x)dx) = ^b[c⋅F(x)]_a = c⋅F(b)-c⋅F(a) = c(F(b)-F(a)) = c⋅^b∫_a(f(x)dx)

The integral from point a to point b is the difference between the primitive calculated in point b and the primitive calculated in point a

^a∫_a(f(x)dx) = ^a[F(a)]_a = F(a)-F(a) = 0, if the extremes of integration are equal the integral is 0

^b∫_a(f(x)dx) = F(b)-F(a), a < b, ^a∫_b(f(x)dx) = F(a)-F(b); ^a∫_b(f(x)dx) = -^b∫_a(f(x)dx); ^b∫_a(f(x)dx) = -^a∫_b(f(x)dx)

^b∫_a(f(x)dx) = F(b)-F(a), definite integral; the integral is defined when the extremes of the integration interval are defined; with the definite integral we obtain a value that is the algebraic sum of the areas of the trapezoid above and below the x-axis

∫(f(x)dx) = F(x)+c, indefinite integral; the integral is indefinite when the extremes of the integration interval are not defined; the indefinite integral has no integration extremes, and is the set of all primitives that differ by a constant

The definite integral is a number which is the sum of the areas of the trapezoid, ^b∫_a(f(x)dx) = F(b)-F(a); the indefinite integral is a collection of functions that differ by an additive constant, ∫(f(x)dx) = F(x)+c

The derivative of the sum is equal to the sum of the derivatives, but the derivative of the product is not the same as the product of the derivatives; the integration operation is the reverse operation of the derivation; the integral of the sum is equal to the sum of the integrals, but the integral of the product is not the same as the product of the integrals

f(x) = {0 per 0 < x < 1, x-1 per 1 < x < 2}; g(x) = {-x+1 per 0 < x < 1, 0 per 1 < x < 2}; ²∫₁(f(x)dx) = ²[x²/2-x]₁ = (2²/2-2)-(1^{2/sup>/2-1) = (4/2-2)-(1/2-1) = (2-2)-(-1/2) = 0+1/2 = 1/2; ¹∫₀(g(x)dx) = ¹[-x²/2+x]₀ = (-1²/2+1)-(0²/2+0) = (-1/2+1)-(0/2+0) = 1/2-0 = 1/2; f(x)g(x) = 0, the product of the 2 functions is the null function; ²∫₀((f(x)g(x))dx) = 0, the integral of the product of the 2 functions is 0, the trapezoid degenerates into an x-axis segment; the 2 functions are not null, and have a not null integral, their product is the null function, the integral of their product is null, therefore it is not true that the integral of the product is the product of the integrals}

If a function is less than a constant, the constant is a majorant of the function; if a function, in a certain interval, is less than a constant, the area of the trapezoid of the function is smaller than the area of the rectangle that has the same base and the constant as height

If a function is greater than a constant, the constant is a minorant of the function; if a function, in a certain interval, is greater than a constant, the area of the trapezoid of the function is greater than the area of the rectangle that has the same base and the constant as height

If f is less than or equal to g, the integral of f is less than or equal to the integral of g

f,g: [a,b] → ℝ, f(x) ≤ g(x) ⇒ ^b∫_a(f(x)dx) ≤ ^b∫_a(g(x)dx), this is the monotonic property of the integral; 0 ≤ ^b∫_a(g(x)dx)-^b∫_a(f(x)dx) = ^b∫_a((g(x)-f(x))dx), by hypothesis g(x) ≥ f(x) therefore g(x)-f(x) ≥ 0; if a function is ≥ 0, then its integral that is the infimum of the upper sums is ≥ 0, because the lower sums of a non-negative function are non-negative numbers, so the infimum is non-negative; if a function is non-negative its integral is non-negative; since g(x)-f(x) is hypothetically non-negative, then its integral is non-negative; g(x)-f(x) is the vertical distance between the graphs of the two functions at the point of abscissa x; ^b∫_a((g(x)-f(x))dx) is the area between the graph of g(x) and the graph of f(x), or as Leibniz would have said the sum of rectangles with an infinitesimal base and height g(x)-f(x), and this is the integral of g(x)-f(x); to obtain the area of the plane region contained between the graphs of the two functions, it is sufficient to calculate the integral of g(x)-f(x) that is ^b∫_a((g(x)-f(x))dx)

f,g: [0,1] → ℝ, f(x) = x², g(x) = √(x) = x^1/2; the graphs of f(x) and g(x) are symmetrical with respect to y = x, that is the bisector of the first and third quadrant; the primitive of f(x) = x² is F(x) = x³/3, the primitive of g(x) = √(x) = x^1/2 is G(x) = (2/3)x^3/2; we calculate the area between g(x) and f(x) in the interval [0,1], ¹∫₀((g(x)-f(x))dx) = ¹∫₀((√(x)-x²)dx) = ¹[(2/3)x^3/2-x³/3]₀ = ((2/3)·1^3/2-1³/3)-((2/3)·0^3/2-0³/3) = (2/3-1/3)-(0-0) = 1/3; the area under f(x) = x² is 1/3 of the area of the square of area 1 of the interval [0,1]; the area above g(x) = √(x) is 1/3 of the area of the square of area 1 of the interval [0,1]; the area between g(x) and f(x) is 1/3 of the area of the square of area 1 of the interval [0,1]

To calculate the primitive, the exponent is increased by one unit and everything is divided by the exponent obtained; D(x^c) = c·x^c-1, D(x^c) = D(e^{ln(x^c)}) = D(e^c·ln(x)) = (c/x)(e^c·ln(x)) = (c/x)x^c = c·x^c-1, we have shown that (x^c)' = c·x^c-1; to integrate x^c, that is to find the primitive of x^c, we have to compute x^c+1/(c+1); F(x^c) = x^c+1/(c+1)

x ≥ 0, cos(x) ≤ 1, ^x∫(cos(t)dt) ≤ ^x∫(1⋅dt), ^x[sin(t)]₀ ≤ ^x[t]₀, sin(x)-sin(0) ≤ x-0, sin(x)-0 ≤ x, sin(x) ≤ x, sin(x) ≤ x for x ≥ 0, the sinusoid is under the bisector of the first quadrant that is also the tangent of the sinusoid at the point x = 0; (sin(x))' = cos(x), cos(0) = 1, the angular coefficient of the tangent of sin(x) at the point x = 0 is 1, the tangent of sin(x) at the point x = 0 is y = x; in the interval we consider, the curve of sin(x) is below the tangent, as happens when a function is concave or with concavity downwards; sin(x) ≤ x, sin(t) ≤ t, ^x∫₀(sin(t)dt) ≤ ^x∫₀(t⋅dt), ^x[-cos(t)]₀ ≤ ^x[t²/2]₀, -cos(x)-(-cos(0)) ≤ x²/2-0²/2, -cos(x)-(-1) ≤ x²/2-0, -cos(x)+1 ≤ x²/2, 1-cos(x) ≤ x²/2; 1-cos(x) ≤ x²/2, 1-cos(t) ≤ t²/2, ^x∫₀((1-cos(t))dt) ≤ ^x∫₀((t²/2)dt), ^x[t-sin(t)]₀ ≤ ^x[t³/6]₀, x-sin(x)-(0-sin(0)) ≤ x³/6-0³/6, x-sin(x)-0+0 ≤ x³/6-0, x-sin(x) ≤ x³/6, x-x³/6 ≤ sin(x), x-x³/6 ≤ sin(x) ≤ x, x-x³/3! ≤ sin(x) ≤ x; x ≥ 0, x-x³/3! ≤ sin(x) ≤ x, for x ≥ 0, the graph of the line y = x is above the graph of sin(x), and the graph of the cubic function y = x-x³/3! is below the graph of sin(x); we can continue this procedure by writing t in place of x, integrating between 0 and x, and in this way we find polynomials that are majorant and minorant of the function sin(x), and these polynomials are called Taylor polynomials; Taylor's polynomials best approximate the function sin(x) in a neighborhood of the point x = 0; continuing we get x-x³/3! ≤ sin(x) ≤ x-x³/3!+x⁵/5!

If f is continuous over the interval [a,b], m and M are the minimum and maximum of f, then m ≤ (1/(b-a))⋅^b∫_a(f(x)dx) ≤ M

The integral mean theorem states that if the function f is continuous in the interval [a,b], m and M are the minimum and maximum of the function f, then m ≤ (1/(b-a))⋅^b∫_a(f(x)dx) ≤ M; the function f certainly has a minimum and a maximum for the Weierstrass Theorem; Weierstrass' theorem states that if [a,b] ⊂ ℝ is a closed, bounded, and non-empty interval, and f: [a,b] → ℝ is a continuous function, then f(x) has at least an absolute maximum point and an absolute minimum point in the interval [a,b]; (1/(b-a))⋅^b∫_a(f(x)dx) is the integral mean of the function f, the ratio between the integral and the length of the integration interval; the integral mean of a function is similar to the arithmetic mean of n numbers, (x₁+x₂+...+x_n)/n; the integral is a limit of sums, it is a supremum of sums; the numerator of an arithmetic mean corresponds to the integral of the function in the integral mean, and the denominator of an arithmetic mean, where n indicates the number of addends, corresponds to the sum of the bases of the rectangles of an integral mean, that is the length of the integration interval b-a; m ≤ (1/(b-a))⋅^b∫_a(f(x)dx) ≤ M, to demonstrate this we must demonstrate the monotony of the integral; m ≤ f(x) ≤ M, the function f(x) is comprised between the constant function m, and the constant function M; f(x) is contained between the line of equation y = m and the line of equation y = M; m ≤ (1/(b-a))⋅^b∫_a(f(x)dx) ≤ M, (b-a)m ≤ ^b∫_a(f(x)dx) ≤ (b-a)M, b-a is a positive quantity, therefore the inequality relations do not change; the concept of integral is not related to continuous functions; the function f need not be continuous, continuity is a sufficient condition for the function f to be integrable, so that the two numerical sets made up of the lower sums and the higher sums are contiguous and therefore their separation element is the integral of the function; monotony does not mean continuity, a function can be monotonous but have discontinuity points; monotonic functions are certainly integrable, therefore also functions that are sums or differences of monotonic functions are certainly integrable; non-continuous functions are also integrable; it is sufficient to suppose that the function f is bounded, its image is bounded, the set of values it assumes is a bounded set, therefore it is contained between a minimum e and a maximum E, e ≤ f(x) ≤ E, therefore the integral mean theorem becomes e ≤ (1/(b-a))⋅^b∫_a(f(x)dx) ≤ E, the integral mean is in any case included between the extremes of the values that the function assumes in the interval [a,b], and these extremes may or may not be the maximum and the minimum; if the function is continuous over the whole interval [a,b], m ≤ (1/(b-a))⋅^b∫_a(f(x)dx) ≤ M, the integral mean is between the minimum m and the maximum M, and for a theorem already proved a value between the maximum and the minimum of a continuous function in an interval is a value assumed by the function; continuous functions transform intervals into intervals, and in particular continuous functions transform bounded and closed intervals into bounded and closed intervals; the graph of the function f is included between the line y = m, which delimits the minimum, and the line y = M, which delimits the maximum, and the integral mean is a horizontal line comprised between m and M such that the area above is equal to the area below; if the function is continuous, the value of the integral mean is a value that the function assumes, therefore there exists a point ξ such that f(ξ) = (1/(b-a))⋅^b∫_a(f(x)dx), or ^b∫_a(f(x)dx) = f(ξ)(b-a), this is the integral mean theorem

The integral mean is the ratio between the integral and the length of the integration interval

^π∫₀(sin(x)dx) = 2, the integral mean of the function sin(x) is 2/π, the integral mean theorem states that there is at least one point at which the function assumes this value; in the interval [0,π], the function sin(x) is equal to the value of its integral mean, which is 2/π, at 2 points

These are the main properties of the integral, but we need other integration techniques, we need other tools for determining the primitives of functions; for now we have used the theorem of the derivative of the sum, and the theorem of the derivative of a constant multiplied by a function; we must study the integration technique by parts, and the integration technique by substitution, which are based on the rule of the derivative of the product, and on the rule of the derivative of the compound function

^b∫_a((f(x)G(x))dx); f(x) is the derivative of F(x), and F(x) is a primitive of f(x); g(x) is the derivative of G(x), and G(x) is a primitive of g(x); F'(x) = f(x), G'(x) = g(x); we must find the primitive of f(x)G(x) and then calculate the integral; (F(x)G(x))' = f(x)G(x)+F(x)g(x), f(x)G(x) is the function we have to integrate, ^b∫_a((F(x)G(x))'dx) = ^b∫_a((f(x)G(x))dx)+^b∫_a((F(x)g(x))dx), a primitive of the derivative of a function is the function itself, ^b[F(x)G(x)]_a = ^b∫_a((f(x)G(x))dx)+^b∫_a((F(x)g(x))dx), ^b∫_a((f(x)G(x))dx) = ^b[F(x)G(x)]_a-^b∫_a((F(x)g(x))dx), ^b∫_a((f(x)G(x))dx) = F(b)G(b)-F(a)G(a)-^b∫_a((F(x)g(x))dx); we found the formula ^b∫_a((f(x)G(x))dx) = ^b[F(x)G(x)]_a-^b∫_a((F(x)g(x))dx) that can be used to transform one integral into another, and is often useful for simplifying calculations

46 - INTEGRATION BY PARTS AND BY SUBSTITUTION

The method of integration by parts is a tool to transform one integral into another, for example the integral of the product of 2 functions is transformed into an analogous integral, and this is useful when the integral obtained is simpler than the initial integral

The integral of the product is not equal to the product of the integrals

If f is continuous in the interval [a,b] and F is its primitive, G is differentiable on the same interval and g is its derivative, then ^b∫_a((f(x)G(x))dx) = F(b)G(b)-F(a)G(a)-^b∫_a((F(x)g(x))dx)

^b∫_a((f(x)G(x))dx), F'(x) = f(x), G'(x) = g(x), ^b∫_a((f(x)G(x))dx) = ^b[F(x)G(x)]_a-^b∫_a((F(x)g(x))dx) = F(b)G(b)-F(a)G(a)-^b∫_a((F(x)g(x))dx); the proof comes from the product rule of the derivative; (F(x)G(x))' = f(x)G(x)+F(x)g(x), integrating this equality, ^b∫_a([F(x)G(x)]'dx) = ^b∫_a((f(x)G(x))dx)+^b∫_a((F(x)g(x))dx), the primitive of the derivative of a function is the function itself, ^b[F(x)G(x)]_a = ^b∫_a((f(x)G(x))dx)+^b∫_a((F(x)g(x))dx), F(b)G(b)-F(a)G(a) = ^b∫_a((f(x)G(x))dx)+^b∫_a((F(x)g(x))dx), ^b∫_a((f(x)G(x))dx) = F(b)G(b)-F(a)G(a)-^b∫_a((F(x)g(x))dx)

³∫₁(log(x)dx) = ³∫₁((1⋅log(x))dx), the primitive of 1 is x, and the derivative of log(x) is 1/x, ³∫₁((1⋅log(x))dx) = ³[x⋅log(x)]₁-³∫₁((x(1/x))dx) = 3log(3)-1log(1)-³∫₁(1dx) = 3log(3)-0-³[x]1 = 3log(3)-(3-1) = 3log(3)-2, the integral between 1 and 3 of the function log(x) in dx is 3log(3)-2; the initial extreme of the integration is 1 so that log(1) = 0; the fundamental theorem of calculus states that by integrating between 1 and x we obtain the primitive of the function; we must not confuse the extreme of integration with the variable of integration; the rule of integration by parts allows us to find a primitive of log(x); ^x∫₁((1⋅log(t))dt) = ^x[t⋅log(t)]₁-^x∫₁((t⋅1/t)dt) = ^x[t⋅log(t)]₁-^x∫₁(1⋅dt) = ^x[t⋅log(t)]₁-^x[x]₁ = x⋅log(x)-1⋅log(1)-(x-1) = x⋅log(x)-0-x+1 = x⋅log(x)-x+1; x⋅log(x)-x+1 is a primitive of the function log(x), and since the derivative of a constant is 0, we must verify that x⋅log(x)-x is a primitive of log(x); the totality of the primitives of the function log(x), which is the indefinite integral, is the collection of functions x⋅log(x)-x+c; ∫(log(x)dx) = x⋅log(x)-x+c, (x⋅log(x)-x+c)' = 1⋅log(x)+x(1/x)-1+0 = log(x)+1-1 = log(x), therefore the primitive of log(x) is x⋅log(x)-x

To calculate the primitive of sin²(x) we must integrate from 0 to x the function sin²(t)dt; ^x∫₀(sin²(t)dt) = ^x∫₀((sin(t)sin(t))dt); the primitive of sin(t) is -cos(t), the derivative of -cos(t) is sin(t), (-cos(t))' = sin(t); the derivative of sin(t) is cos(t), the primitive of cos(t) is sin(t), (sin(t))' = cos(t); ^x∫₀(sin²(t)dt) = ^x∫₀((sin(t)sin(t))dt) = ^x[-cos(t)sin(t)]₀-^x∫₀((-cos(t)cos(t))dt) = ^x[-cos(t)sin(t)]₀+^x∫₀((cos(t)cos(t))dt) = -cos(x)sin(x)-1⋅0+^x∫₀(cos²(t)dt) = -sin(x)cos(x)-0+^x∫₀((1-sin²(t))dt) = -sin(x)cos(x)+^x∫₀(1⋅dt)+^x∫₀(-sin²(t)dt) = -sin(x)cos(x)+^x[t]₀-^x∫₀(sin²(t)dt) = -sin(x)cos(x)+x-0-^x∫₀(sin²(t)dt) = -sin(x)cos(x)+x-^x∫₀(sin²(t)dt), ^x∫₀(sin²(t)dt)+^x∫₀(sin²(t)dt) = -sin(x)cos(x)+x, 2^x∫₀(sin²(t)dt) = -sin(x)cos(x)+x, ^x∫₀(sin²(t)dt) = (-sin(x)cos(x)+x)/2, ^x∫₀(sin²(t)dt) = (x-sin(x)cos(x))/2; the function (x-sin(x)cos(x))/2 is a primitive of sin²(x); ((x-sin(x)cos(x))/2)' = 1/2(x-sin(x)cos(x))' = (1/2)(1-(cos(x)cos(x)+sin(x)(-sin(x))) = (1/2)(1-(cos²(x)-sin²(x))) = (1/2)(1-cos²(x)+sin²(x)) = (1/2)(sin²(x)+sin²(x)) = (1/2)(2sin²(x)) = sin²(x), ((x-sin(x)cos(x))/2)' = sin²(x); the primitive of sin²(x) is (x-sin(x)cos(x))/2

To calculate the primitive of cos²(x) we must integrate from 0 to x the function cos²(t)dt; ^x∫₀(cos²(t)dt) = ^x∫₀((cos(t)cos(t))dt); the primitive of cos(t) is sin(t), the derivative of sin(t) is cos(t), (sin(t))' = cos(t); the derivative of cos(t) is -sin(t), the primitive of -sin(t) is cos(t), (cos(t))' = -sin(t); ^x∫₀(cos²(t)dt) = ^x∫₀((cos(t)cos(t))dt) = ^x[sin(t)cos(t)]₀-^x∫₀((sin(t)(-sin(t)))dt) = sin(x)cos(x)-sin(0)cos(0)-^x∫₀(-sin²(t)dt) = sin(x)cos(x)-0⋅1+^x∫₀(sin²(t)dt) = sin(x)cos(x)-0+^x∫₀((1-cos²(t))dt) = sin(x)cos(x)+^x∫₀(1⋅dt)+^x∫₀(-cos²(t)dt) = sin(x)cos(x)+^x[t]₀-^x∫₀(cos²(t)dt) = sin(x)cos(x)+x-0-^x∫₀(cos²(t)dt) = sin(x)cos(x)+x-^x∫₀(cos²(t)dt), ^x∫₀(cos²(t)dt) = sin(x)cos(x)+x-^x∫₀(cos²(t)dt), ^x∫₀(cos²(t)dt)+^x∫₀(cos²(t)dt) = sin(x)cos(x)+x, 2^x∫₀(cos²(t)dt) = sin(x)cos(x)+x, ^x∫₀(cos²(t)dt) = (sin(x)cos(x)+x)/2, ^x∫₀(cos²(t)dt) = (x+sin(x)cos(x))/2, the function (x+sin(x)cos(x))/2 is the primitive of cos²(x); ((x+sin(x)cos(x))/2)' = (1/2)(1+cos(x)cos(x)+sin(x)(-sin(x)) = (1/2)(1+cos²(x)-sin²(x)) = (1/2)(1-sin²(x)+cos²(x)) = (1/2)(cos²(x)+cos²(x)) = (1/2)(2cos²(x)) = cos²(x), ((x+sin(x)cos(x))/2)' = cos²(x); the primitive of cos²(x) is (x+sin(x)cos(x))/2

¹∫₀((x⋅e^-x)dx); the primitive of x is x²/2, the derivative of x²/2 is x, (x²/2)' = x; the derivative of e^-x is -e^-x, the primitive of -e^-x is e^-x, (e^-x)' = -e^-x; ¹∫₀((x⋅e^-x)dx) = ¹[(x²/2)(e^-x)]₀-¹∫₀(((x²/2)(-e^-x))dx) = ¹[(x²/2)(e^-x)]₀+¹∫₀(((x²/2)(e^-x))dx) = (1²/2)(e^-1)-(0²/2)(e^-0)+¹∫₀(((x²/2)(e^-x))dx) = (1/2)(1/e)-0⋅1+¹∫₀(((x²/2)(e^-x))dx) = 1/2e-0+¹∫₀(((x²/2)(e^-x))dx) = 1/2e+¹∫₀(((x²/2)(e^-x))dx), ¹∫₀(((x²/2)(e^-x))dx) is an integral more complicated than ¹∫₀((x⋅e^-x)dx), therefore we must calculate the integral in another way; ¹∫₀((e^-xx)dx) = ¹[-e^-xx]₀-¹∫₀((-e^-x⋅1)dx) = ¹[-x⋅e^-x]₀+¹∫₀(e^-x⋅dx) = ¹[-x⋅e^-x]₀+¹[-e^-x]₀ = ¹[-x⋅e^-x-e^-x]₀ = -1⋅e^-1-e^-1-(0⋅e^-0-e⁰) = -e^-1-e^-1-(0⋅1-1) = -2e^-1-(0-1) = -2/e-(-1) = -2/e+1 = 1-2/e; 1-2/e > 0, in fact we have integrated a positive function; (-x⋅e^-x-e^-x)' = -1⋅e^-x+(-x)(-e^-x)-(-e^-x) = -e^-x+x⋅e^-x+e^-x = x⋅e^-x; the primitive of x⋅e^-x is -x⋅e^-x-e^-x, the derivative of -x⋅e^-x-e^-x is x⋅e^-x; x⋅e^-x = x/e^x; lim_x→+∞(x/e^x) = ∞/e^∞ = ∞/∞, indeterminate form; lim_x→+∞(x/e^x) =^H lim_x→+∞(1/e^x) = 1/e^∞ = 1/∞ = 0; calculate the area of the trapezoid in the interval [0,b], ^b[-x⋅e^-x-e^-x]₀ = -b⋅e^-b-e^-b-(0⋅e⁰-e⁰) = -b⋅e^-b-e^-b-(0⋅1-1) = -b⋅e^-b-e^-b-(0-1) = -b⋅e^-b-e^-b-(-1) = -b⋅e^-b-e^-b+1 = 1-b⋅e^-b-e^-b = 1-b/e^b-1/e^b; calculate the improper integral of the function x⋅e^-x, ^+∞∫₀((x⋅e^-x)dx), geometrically we want to understand how much is the area of the unlimited trapezoid whose base goes from 0 to +∞, ^+∞∫₀((x⋅e^-x)dx) = lim_b→+∞(^+∞∫₀((x⋅e^-x)dx)) = lim_b→+∞(1-b⋅e^-b-e^-b) = lim_b→+∞(1-b/e^b-1/e^b) = 1-(+∞)/e^+∞-1/e^+∞ = 1-0-0 = 1, the area under the graph x⋅e^-x in the interval [0,+∞] is 1; this is another example of integration by parts

The rule of integration by parts is used to transform an integral into a simpler integral, and comes from the product rule of differentiation

The rule of integration by substitution is used to transform an integral into a simpler integral, and comes from the rule of differentiation of compound functions

Let f: [a,b] → ℝ be a continuous function and let φ: [α,β] → [a,b] be a function with continuous first derivative, such that φ(α) = a, φ(β) = b; then ^b∫_a(f(x)dx) = ^β∫_α(f(φ(t))φ'(t)dt); this is the integration by substitution

^b∫_a(f(x)dx) = ^β∫_α(f(φ(t))φ'(t)dt)

φ: [α,β] → [a,b], φ is a bijective function, so it is injective and surjective; a function is injective when different values of the independent variable correspond to different values of the dependent variable; a function is surjective when the image of the function coincides with its codomain; x = φ(t), and using Leibniz notation for calculating the derivative, dx/dt = φ'(t), dx = φ'(t)dt, therefore the rule of integration by substitution arises from the substitution of x with φ(t), and from the substitution of dx with φ'(t)dt, and the extremes of integration change because φ(α) = a, and φ(β) = b, so ^b∫_a(f(x)dx) = ^β∫_α(f(φ(t))φ'(t)dt); all this applies to an increasing function, if instead we consider a decreasing function φ(α) = b, and φ(β) = a, then the infimum of integration becomes β and the supremum of integration becomes α, and exchanging the extremes of integration we obtain an integral of opposite sign, therefore ^α∫_β(f(x)dx) = -^β∫_α(f(x)dx)

The function f is continuous, therefore it admits a primitive F by the fundamental theorem of calculus, F'(x) = f(x); G(t) = F(φ(t)), to calculate G'(t) we must apply the derivation of compound functions theorem, G'(t) = F'(φ(t))φ'(t) = f(φ(t))φ'(t), G(t) is a primitive of f(φ(t))φ'(t), ^β∫_α(f(φ(t))φ'(t)dt) = G(β)-G(α) = F(φ(β))-F(φ(α)) = F(b)-F(a) = ^b∫_a(f(x)dx); if φ(β) = a and φ(α) = b, then F(φ(β))-F(φ(α)) = F(a)-F(b) = ^a∫_b(f(x)dx) = -^b∫_a(f(x)dx); ^β∫_α(f(φ(t))φ'(t)dt) = ^b∫_a(f(x)dx), if φ(α) = a and φ(β) = b; ^α∫_β(f(φ(t))φ'(t)dt) = ^a∫_b(f(x)dx), if φ(α) = a and φ(β) = b; ^β∫_α(f(φ(t))φ'(t)dt) = ^a∫_b(f(x)dx), if φ(α) = b and φ(β) = a; ^α∫_β(f(φ(t))φ'(t)dt) = ^b∫_a(f(x)dx), if φ(α) = b and φ(β) = a

Calculate the area of 1/4 of a circle of radius 1, the area of the part of a circle of radius 1 located in the first quadrant; x²+y² = 1, y² = 1-x², y = f(x) = √(1-x²), we only take the positive root because we consider the first quadrant, ¹∫₀(√(1-x²)dx); x = sin(t) = φ(t), t is the measure in radians of the angle that is the length of the arc, when t goes from 0 to π/2, the function sin(t) goes from 0 to 1, ¹∫₀(√(1-x²)dx) = ^π/2∫₀(√(1-sin²(t))cos(t)dt) = ^π/2∫₀(√(cos²(t))cos(t)dt) = ^π/2∫₀(|cos(t)|cos(t)dt), |cos(t)| = cos(t) because in the first quadrant cos(t) ≥ 0, ^π/2∫₀(|cos(t)|cos(t)dt) = ^π/2∫₀(cos(t)cos(t)dt) = ^π/2∫₀(cos²(t)dt) = ^π/2[(t+sin(t)cos(t))/2]₀ = (π/2+sin(π/2)cos(π/2))/2-(0+sin(0)cos(0))/2 = (π/2+1⋅0)/2-(0+0⋅1)/2 = (π/2+0)/2-(0+0)/2 = (π/2)/2-0/2 = π/4-0 = π/4; the area of 1/4 of a circle of radius 1 is π/4; the area of a circle is πr², if the radius is 1 then the area of the circle is π, therefore 1/4 of a circle with radius 1 has area = π/4; x = cos(t), 0 ≤ t ≤ π/2, ¹∫₀(√(1-x²)dx) = ⁰∫_π/2(√(1-cos²(t))(-sin(t))dt) = ⁰∫_π/2(√(sin²(t))(-sin(t))dt) = ⁰∫_π/2(|sin(t)|(-sin(t))dt), |sin(t)| = sin(t) because in the first quadrant sin(t) ≥ 0, ⁰∫_π/2(|sin(t)|(-sin(t))dt) = ⁰∫_π/2((sin(t))(-sin(t))dt) = ⁰∫_π/2(-sin²(t)dt) = -⁰∫_π/2(sin²(t)dt) = ^π/2∫₀(sin²(t)dt) = ^π/2[(t-sin(t)cos(t))/2]₀ = (π/2-sin(π/2)cos(π/2))/2-(0-sin(0)cos(0))/2 = (π/2-1⋅0)/2-(0-0⋅1)/2 = (π/2-0)/2-(0-0)/2 = (π/2)/2-0/2 = π/4-0 = π/4, the area of 1/4 of a circle of radius 1 is π/4

If a function is expressible with elementary functions such as sine, cosine, exponential, logarithm, then the first derivative and all subsequent derivatives are functions that can be elementarily expressed, but the primitive functions may not be elementarily expressible; a function composed of elementary functions is always elementarily differentiable, but it is not always elementarily integrable; the Gaussian function e^-x² does not have an elementary expressible primitive; the fundamental theorem of calculus states that if we integrate from 0 to x f(t)dt, where f is a continuous function, we obtain a primitive, but this primitve may not be elementarily expressible; ^x∫₀(e^-t²dt), the function of the variable x obtained is a primitve of the function e^-x²; all continuous functions admit primitives, but not all admit primitives that can be expressed elementarily; primitives that cannot be elementarily expressed are expressible with an integral; an integral that is not elementarily integrable can be roughly calculated by numerical methods; it is possible to construct functions using integrals; the logarithm function has been defined as the inverse of the exponential function, but it is also possible to define it as the integral between 1 and x of the function (1/t)dt, ^x∫₁((1/t)dt)

47 - EXTENSION OF THE NOTION OF INTEGRAL

The notion of integral arises from the desire to assign an area to the trapezoid identified by the Cartesian graph of a function; we have a non-negative function defined on an interval [a,b] and we want to construct a procedure for assigning an area to the trapezoid between the curve of the function and the x-axis; we have built this procedure for non-negative functions; for the functions of any sign we have exploited the idea that any function can be written as the difference between two non-negative functions, therefore we have extended the concept of integral to the functions of any sign; we have defined the integral of a function of any sign as the difference between the integral of the positive part and the integral of the negative part; the hypothesis of the continuity of the starting function is a sufficient but not necessary condition; the procedure for approximating the area of the trapezoid of a function does not require the notion of continuity, but only the notion of boundedness of the function; a function f defined in an interval [a,b] to real values, f: [a,b] → ℝ, is bounded if the infimum and the supremum are 2 real numbers that is if the infimum and the supremum are finite quantities, so the values that f assumes are included between e and E, e ≤ f(x) ≤ E, and if f is non-negative then 0 ≤ e ≤ f(x) ≤ E; decompose the interval [a,b] into a finite number of parts, x₀ = a, x₁, x₂, ..., x_n = b; e_k is the infimum of the function f on the generic interval [x_k-1,x_k]; E_K is the supremum of the function f on the generic interval [x_k-1,x_k]; σ is the decomposition, s is the lower sum, S is the upper sum; the lower sum is s(f,σ) = ⁿΣ_k=1(e_k(x_k-x_k-1)); e_k(x_k-x_k-1) is the area of a rectangle, in the generic interval, which is contained in the trapezoid; s(f,σ) = ⁿΣ_k=1(e_k(x_k-x_k-1)) is the area of a multi-rectangle that is contained in the trapezoid of the function f; the higher sum is S(f,σ) = ⁿΣ_k=1(E_k(x_k-x_k-1)); E_k(x_k-x_k-1) is the area of a rectangle, in the generic interval, which contains the trapezoid; S(f,σ) = ⁿΣ_k=1(E_k(x_k-x_k-1)) is the area of a multi-rectangle that contains the trapezoid of the function f; the lower sums and the upper sums are areas of the trapezoid, which is a multi-rectangle; we consider a stepwise function that is e_k over every interval, the trapezoid of this function is the multi-rectangle whose area is s(f,σ); we consider a stepwise function that is E_k over every interval, the trapezoid of this function is the multi-rectangle whose area is S(f,σ); the lower sum is a minorant function, and the higher sum is a majorant function

If f is a bounded function, the numerical sets consisting of the lower and upper sums, are separated

The numerical sets of the lower and upper sums are separate that is each lower sum is less than or equal to each higher sum; s(f,σ) ≤ S(f,σ), this is obviously true for the same decomposition σ, because e_k ≤ E_k ⇒ e_k(x_k-x_k-1) ≤ E_k(x_k-x_k-1) ⇒ ⁿΣ_k=1(e_k(x_k-x_k-1)) ≤ ⁿΣ_k=1(E_k(x_k-x_k-1)); s(f,σ₁) ≤ S(f,σ₂); σ₁ and σ₂ are different decompositions; refining a decomposition means adding a new point; refining a lower sum means adding a point to the decomposition and therefore the lower sum increases; refining a higher sum means adding a point to the decomposition and therefore the higher sum decreases; passing from a decomposition to a finer decomposition, the lower sum increases and the higher sum decreases; σ := σ₁ ∪ σ₂, the decomposition σ is more refined than σ₁ and σ_{2 because it is obtained by putting together the decomposition points of σ₁ and σ₂; s(f,σ₁) ≤ s(f,σ) ≤ S(f,σ) ≤ S(f,σ₂), s(f,σ₁) ≤ S(f,σ₂), any lower sum is less than or equal to any higher sum, therefore the sets are separate}

For each bounded function, the set of lower sums is separated from the set of higher sums

The supremum of the lower sums s(f,σ) is less than or equal to the infimum of the upper sums S(f,σ); two sets of real numbers are separated when the supremum of the smaller set is less than the infimum of the greater set; two sets of real numbers are contiguous when the supremum of the smaller set is equal to the infimum of the greater set, and it is true when there is only one separating element between the two sets

The bounded function f is integrable on the interval [a,b] if the numerical sets of the lower and higher sums are contiguous; in this case the integral of f over [a,b] is the element of separation between the considered sets

If there is only one number that is greater than or equal to each lower sum and less than or equal to each higher sum, this number that separates the areas of the multi-rectangles contained in the trapezoid from the areas of the multi-rectangles containing the trapezoid, this unique number is the integral of the function f over the interval [a,b] and is denoted by ^b∫_a(f(x)dx)

Continuous functions are integrable because the lower sums and the upper sums are two contiguous sets, so there is only one number that separates these two sets, and this number is the value of the integral

The integrability is verified for continuous functions, and is verified for monotone functions; a function can be continuous but not monotonous, and it can be monotonous but not continuous, in fact monotony and continuity are two completely independent conditions; continuous functions and monotone functions are integrable that is the numerical sets of the lower and higher sums are contiguous; there are bounded functions that are not integrable, when the set of the lower sums is not contiguous with the set of the higher sums

A function can be integrable when the set of the lower sums is contiguous to the set of the higher sums, so there is only one element of separation between these sets and the value of this element is the integral; ∀ ε > 0, S(f,σ₂)-s(f,σ₁) < ε; σ = σ₁∪σ₂, σ is a more refined decomposition than σ₁ and σ₂ because it consists of both decomposition points of σ₁ and σ₂; with σ the lower sum increases and the higher sum decreases, therefore we get 2 sums closer than the starting sums; s(f,σ₁) ≤ s(f,σ) ≤ S(f,σ) ≤ S(f,σ₂); if we refine both decompositions we obtain 2 sums, relative to the same decomposition, which are closer to each other; S(f,σ₂)-s(f,σ₁) < ε ⇒ S(f,σ)-s(f,σ) < ε; the bounded function f is integrable if for any ε there exists a decomposition σ depending on ε, σ = σ_ε, such that S(f,σ_ε)-s(f,σ_ε) < ε; S(f,σ_ε) = ⁿΣ_k=1(E_k(x_k-x_k-1)), s(f,σ_ε) = ⁿΣ_k=1(e_k(x_k-x_k-1)), S(f,σ_ε)-s(f,σ_ε) = ⁿΣ_k=1(E_k(x_k-x_k-1))-ⁿΣ_k=1(e_k(x_k-x_k-1)) = ⁿΣ_k=1((E_k-e_k)(x_k-x_k-1))

A function can be integrable if, given a positive ε, it is possible to find a fine decomposition such that the sum of the areas of the rectangles with base x_k-x_k-1 and height E_k-e_k is less than ε; this is Bernhard Riemann's condition for integrability

If f is a bounded function on the interval [a,b], it is integrable on this interval if and only if it is possible to cover the Cartesian graph with a multi-rectangle, union of a finite number of rectangles with sides parallel to the axes, of area arbitrarily small

Given a positive quantity ε we can find a decomposition σ dependent on ε, such that the sum of the areas of the rectangles covering the Cartesian graph of the function is less than ε, and this is the integrability of Bernhard Riemann

Georg Friedrich Bernhard Riemann was a German mathematician, born in 1826 and died in 1866, who formulated the concept of integral, still today called the Riemann integral

The Riemann integrability condition states that a bounded function is integrable if it is possible to cover its Cartesian graph with a finite number of rectangles with sides parallel to the axes whose total area is less than a predetermined value ε

The Riemann integrability condition is certainly verified for monotonic functions even if they are not continuous; a monotone function is not necessarily continuous; a monotone function can have jump discontinuity, when at a point the function has both finite but different limit on the right and on the left; we divide the interval [a,b] into equal parts, x₀ = a < x₁ < x₂ < ... < x_n = b, and we divide this into n parts, and each part is x_k-x_k-1 = (b-a)/n; suppose that the function is monotone increasing, so the infimum of each interval is e_k = f(x_k-1), and the supremum of each interval is E_k = f(x_k); if the function were monotone decreasing then e_k = f(x_k) and E_k = f(x_k-1); E_k-e_k = f(x_k)-f(x_k-1); ((b-a)/n)ⁿΣ_k=1(f(x_k)-f(x_k-1)) = ((b-a)/n)(f(x₁)-f(x₀)+f(x₂)-f(x₁)+...+f(x_n)-f(x_n-1)) = ((b-a)/n)(f(x_n)-f(x₀)) = ((b-a)/n)(f(b)-f(a)), S(f,σ_n)-s(f,σ_n) = ((b-a)/n)(f(b)-f(a)); f(b)-f(a) is the variation of the function from point a to point b; if n tends to +∞ then ((b-a)/n)(f(b)-f(a)) < ε; a limited and monotonous function is integrable even if it has points of discontinuity; sums and differences of monotone functions are integrable; all the linearity properties of the integral are valid in the broadest class of integrable functions according to Riemann

Function f has a jump graph; F(x) = ^x∫_a(f(t)dt); the fundamental theorem of calculus is true locally, it is not necessary that the function f is continuous throughout the interval [a,b]; the fundamental theorem of calculus states that if the function is continuous at a point x the integral function F is differentiable at that point and its derivative is f(x); in the point of discontinuity the function F is not differentiable, but it is differentiable in the other points; the graph of function F is a broken line; the discontinuous point of the function f corresponds to a continuous and non-derivable point of the function F

If f is continuous on the interval [a,b], or if f is monotone on the same interval, then it is integrable

All monotone functions are integrable

The Heine-Cantor theorem states that if a function is continuous over a bounded and closed interval [a,b], then the function is uniformly continuous; a limited and closed interval is a compact interval; the oscillation of a function on an interval is the difference between the extremes of the function on the same interval, supremum minus infimum; Weierstrass's theorem states that in a compact interval the supremum is the maximum and the infimum is the minimum, therefore for functions on compact intervals the oscillation is simply the difference between the maximum and the minimum; the Heine-Cantor theorem states that if the function f is continuous over a limited and closed interval [a,b], for any ε it is possible to obtain a decomposition σ_ε of the interval [a,b] so that on each interval of the decomposition the oscillation of the function is less than ε, f: [a,b] → ℝ, ∀ ε ∃ σ_ε : E_k-e_k < ε; the proof of the Heine-Cantor theorem is done with an absurd reasoning similar to the reasoning used to prove the lemma preceding the Weierstrass theorem, which states that a continuous function on a compact interval is necessarily bounded; the Heine-Cantor theorem states that if a function is continuous over a compact interval then it is bounded and can be sliced into such small parts that the oscillation on each does not exceed ε

The Heine-Cantor theorem is used to prove the integrability of continuous functions; S(f,σ)-s(f,σ) = ⁿΣ_k=1((E_k-e_k)(x_k-x_k-1)), E_k-e_k is the oscillation of the function on the generic interval, x_k-x_k-1 is the base of each rectangle; the Heine-Cantor theorem states that we can choose σ in so that E_k-e_k < ε; S(f,σ)-s(f,σ) = ⁿΣ_k=1((E_k-e_k)(x_k-x_k-1)) ≤ ⁿΣ_k=1(ε(x_k-x_k-1)) = εⁿΣ_k=1(x_k-x_k-1) = ε(b-a); ε (b-a) is an arbitrarily small quantity; therefore also the continuous functions are integrable

There are infinite non-integrable functions; the function f is defined in the interval [0,1], it is 0 when x is a non-rational number, it is 1 when x is a rational number, f(x) := {0, x ∉ ℚ; 1, x ∈ ℚ}, x ∈ [0,1], this function is not integrable; rational numbers and irrational numbers constitute dense sets on the line of real numbers, so in every interval of the real line there are infinite rational numbers and infinite irrational numbers; decomposing the interval [0,1], in each interval there are infinite rational numbers and infinite irrational numbers, therefore e_k = 0 and E_k = 1; e_k is always 0, therefore the lower sums are all zero; E_k is always 1, so the higher sums are all 1 because S(f,σ_ε) = ⁿΣ_k=1(E_k(x_k-x_k-1)) = ⁿΣ_k=1(1(x_k-x_k-1)) = ⁿΣ_k=1(x_k-x_k-1) = 1; the set of the lower sums is 0, the set of the higher sums is 1, therefore these two sets are separate and not contiguous because between 0 and 1 there are infinite numbers, therefore this function cannot be integrable according to Riemann; according to Riemann's theory a function is integrable only if the set of the lower sums and the set of the higher sums are contiguous; the lower integral ^b∫_a(f(x)dx) is the supremum of the lower sums, sup (s(f,σ)); the higher integral ^b∫_a(f(x)dx) is the infimum of the higher sums, inf(S(f,σ)); ^b∫_a(f(x)dx) = sup(s(f,σ)) < ^b∫_a(f(x)dx) = inf(S(f,σ)); the lower integral is less than the higher integral, so the lower integral is different from the higher integral, therefore the integral of this function does not exist, so the function is not integrable

The lower integral is the supremum of the lower sums, ^b∫_a(f(x)dx) = sup(s(f,σ))

The upper integral is the infimum of the higher sums, ^b∫_a(f(x)dx) = inf(S(f,σ))

For continuous functions we consider as integral the supremum of the lower sums, but we could also consider as integral the infimum of the higher sums, this is because for continuous functions the lower integral coincides with the higher integral and both are equal to the integral; ^b∫_a(f(x)dx) = ^b∫_a(f(x)dx) = ^b∫_a(f(x)dx)

Continuous functions and monotone functions are part of bounded functions, and are integrable for Riemann's theory, but not all bounded functions are integrable for Riemann's theory

Riemann's integral theory dates from the mid-19th century; at the beginning of the 20th century the Lebesgue integral theory was born

Calculating an integral means knowing a primitive, but an integral can also be calculated numerically, in fact the lower sums are an underestimate of the integral, and the higher sums are an overestimate of the integral; if we know a lower sum and a higher sum, then we know an interval in which the integral is included, and in this way we know the integral in an approximate way

It is possible to use an integral to define a function; it is possible to use an integral to define a logarithmic function; we know that the logarithm is the inverse function of the exponential function; a logarithmic function is a function defined on a set of numbers strictly positive to value in the set of real numbers, f: ℝ⁺ → ℝ; the basic property of the logarithm function transforms the products of positive numbers into sums, f(x₁⋅x₂) = f(x₁)+f(x₂); ℝ⁺, the set of strictly positive numbers is equipped with the multiplication operation and has an algebraic structure which is called a commutative group; ℝ, the set of real numbers, is equipped with the addition operation; the logarithm function transforms the multiplicative group structure of the set of strictly positive numbers into an additive group structure of the set of real numbers, so the logarithm function is an isomorphism between two groups; from the property of logarithms f(x₁⋅x₂) = f(x₁)+f(x₂) descends the property f(1) = 0, and these two properties combined lead to the result f'(x) = 1/x; we can define the logarithm function as log(x) := ^x∫₁((1/t)dt), therefore log(x) is the area between the branch of equilateral hyperbola 1/t and the x-axis in the interval [1,x], and we can approximate this area with the lower sums and the upper sums, and therefore we can roughly estimate this function; in the past, the logarithm function was tabulated, and the tables of logarithms were used; using an integral to define a logarithmic function is a tool for the approximate calculation of the logarithm function

48 - APPLICATIONS OF INTEGRAL CALCULUS - PART 1

For the Riemann integration theory, within the class of bounded functions on an interval [a,b] there is the class of integrable functions; continuous and monotone functions belong to the class of integrable functions; a bounded function f is integrable if the numerical sets of the lower sums s(f,σ) and the higher sums S(f,σ) are contiguous; for the Riemann integration theory, ∀ ε > 0 ∃ σ_ε : S(f,σ_ε)-s(f,σ_ε) < ε

Another way to consider the integral is through Riemann sums; decompose the interval [a,b] into parts, x₀ = a, x₁, ..., x_n = b, and in the generic interval [x_k-1,x_k] we choose a point ξ_k; ⁿΣ_k=1(f(ξ_k)(x_k-x_k-1)); f(ξ_k)(x_k-x_k-1) is the area of the rectangle, x_k-x_k-1 is the base of the rectangle, f(ξ_k) is the height of the rectangle, therefore ⁿΣ_k=1(f(ξ_k)(x_k-x_k-1)) is the area of a multi-rectangle and its sign depends on the sign of (ξ_k); if the function is integrable, a lower sum and a higher sum can be determined such that S(f,σ_ε)-s(f,σ_ε) < ε; the integral is by definition the element of separation between the lower sums and the higher sums; if the function is integrable, then the Riemann sum ⁿΣ_k=1(f(ξ_k)(x_k-x_k-1)) relative to the decomposition σ_ε is included between s(f,σ_ε) and S(f,σ_ε), s(f,σ_ε) ≤ ⁿΣ_k=1(f(ξ_k)(x_k-x_k-1)) ≤ S(f,σ_ε), therefore the Riemann sum is as close to the integral as you want; therefore we can consider the integral as the limit of the Riemann sums, ^b∫_a(f(x)dx) = lim_{δ_σ→0}(ⁿΣ_k=1(f(ξ_k)(x_k-x_k-1)))

If f is a bounded and integrable function on the interval [a,b], its integral can be obtained as the limit of the Riemann sums, ^b∫_a(f(x)dx) = lim_{δ_σ→0}(ⁿΣ_k=1(f(ξ_k)(x_k-x_k-1)))

^b∫_a(f(x)dx) = lim_{δ_σ→0}(ⁿΣ_k=1(f(ξ_k)(x_k-x_k-1))); δ_σ is the greatest of the differences x_k-x_k-1, δ_σ := max(x_k-x_k-1), the largest of the amplitudes of the intervals of the decomposition tends to zero, therefore the number of these intervals tends to infinity; if the number of parts of the decomposition tends to infinity, it is not true that the largest interval tends to 0; if the interval [a,b] is decomposed into equal parts, x_k-x_k-1 = (b-a)/n, there is no difference if the number of parts n tends to +∞ or if the width of the intervals tends to 0; the Riemann sum approaches the value of the integral as the amplitude of each interval of the decomposition becomes smaller, regardless of the choice of the points ξ_k in each interval of the decomposition

The integral calculated with the Riemann sum is useful for calculating the volumes of certain solids; we have a solid in the space x,y,z and we slice it with a plane orthogonal to the x-axis and we obtain a section of the solid; the plane is a locus of the points for which x is constant, x = x; the section area is a function of x, it is A(x); for all values of x in the interval [a,b] we know A(x), and we must calculate the volume of the solid; we decompose the interval [a,b], x₀ = a < x₁ < ... < x_n = b, and the generic interval is [x_k-1,x_k]; if the section area is constant when x varies between x_k-1 and x_k, then its value coincides with the area at any point ξ_k of the interval, x_k-1 ≤ x ≤ x_k, A(x) = A(ξ_k); the volume of a solid slice is A(ξ_k)(x_k-x_k-1); the approximate volume of the whole solid is ⁿΣ_k=1(A(ξ_k)(x_k-x_k-1)); if A(x) is a continuous function then ⁿΣ_k=1(A(ξ_k)(x_k-x_k-1)) is a Riemann sum, so lim_{(x_k-x_k-1)→0}(ⁿΣ_k=1(A(ξ_k)(x_k-x_k-1))) = ^b∫_a(A(x)dx) = V, this is the volume of the solid; this is not a rigorous proof, but it is an intuitive, heuristic reasoning, which leads to a correct result

With the integral we can calculate the volume of a solid of rotation, that is a solid that is obtained from the rotation of a plane figure; the function z = f(x) ≥ 0 is non-negative, continuous, and integrable, a ≤ x ≤ b; the trapezoid of the function is in the x,z plane; the trapezoid rotates, making a whole revolution, around the x-axis, obtaining a solid of rotation in the space x,y,z; we cut the solid of rotation with a plane that passes through a point of the x-axis and we obtain a section that is a circle and the radius of the circle is f(x); the section area is the area of the circle of radius f(x), A = π·r² = π(f(x))²; V = ^b∫_a(A(x)dx) = ^b∫_a((π(f(x))²)dx) = π·^b∫_a((f(x))²dx)

The volume V of the solid of rotation obtained by rotating the trapezoid of the non-negative function f defined on the interval [a,b] around the x-axis is V = π·^b∫_a((f(x))²dx)

A straight line forms with the x-axis a trapezoid which is a right triangle with base h and height r, and this right triangle rotates around the x-axis forming a cone which has height h and the radius of the base is r; the equation of the line is z = f(x) = (r/h)x; V = π·^h∫₀(((r/h)x)²dx) = π(r²/h²)^h∫₀(x²dx) = π(r²/h²)^h[x³/3]₀ = π(r²/h²)(h³/3-0³/3) = π(r²/h²)(h³/3-0) = π(r²/h²)(h³/3) = π(r²)(h/3) = (1/3)π⋅r²⋅h; the volume of solids that have a tip, such as cone and pyramid, is volume = ((base area)(height))/3

Calculate the volume of a sphere using the integral; a sphere is obtained from the rotation around the x-axis of a semicircle of radius r; x²+z² = r² is the equation of a circle of radius r and center in the origin of the Cartesian axes; x²+z² = r², z² = (f(x))² = r²-x², z = f(x) = √(r²-x²); V = π·^r∫_-r((r²-x²)dx); the integrand function is an even function, because f(x) = f(-x), and is integrated over a symmetric interval, so V = π·^r∫_-r((r²-x²)dx) = 2π·^r∫₀((r²-x²)dx) = 2π^r[r²x-x³/3]₀ = 2π((r³-r³/3)-(0³-0³/3)) = (4/3)π·r³-(0-0)) = (4/3)π·r³-(0)) = (4/3)π·r³, V = (4/3)π·r³, this is the formula for the volume of a sphere

A function is even when f(x) = -f(x); the graph of an even function is symmetrical with respect to the y-axis; if f(x) is even, integrating over a symmetric interval [-a,a], ^a∫_-a(f(x)dx) = 2·^a∫₀(f(x)dx), because the trapezoid in the interval [-a,0] and the trapezoid in the interval [0,a] are equal

A function is odd when f(-x) = -f(x); the graph of an odd function is symmetrical with respect to the origin of the axes; if f(x) is odd, integrating over a symmetric interval [-a,a], ^a∫_-a(f(x)dx) = 0, because the trapezoid in the interval [-a,0] and the trapezoid in the interval [0,a] are equal but of opposite sign

The function 1/x is the derivative of the function log(x), and in the first quadrant it is a branch of equilateral hyperbola; calculate the volume of the solid obtained from the rotation around the x-axis of the trapezoid of the function 1/x in the interval [1,b]; V = π·^b∫₁((1/x²)dx) = π·^b[-1/x]₁ = π·¹[1/x]_b = π(1/1-1/b) = π(1-1/b); lim_b→+∞(π(1-1/b)) = π·lim_b→+∞(1-1/b) = π·(1-1/+∞) = π·(1-0) = π·1 = π, we have solved a generalized or improper integral; this solid of rotation tapers off to infinity because its section is a circle of radius 1/x, therefore as x increases the radius decreases; the solid is unlimited but its volume is finite, the region of space is unlimited but its volume is finite; similarly, the region of the plane between the x-axis and the function e^-x for x from 0 to +∞ is unlimited, but its area is finite

A trapezoid can rotate around the x-axis, but it can also rotate around the y-axis

Considering a constant function f(x) = h, in the interval [a,b] with a > 0, and from the rotation around the y-axis the solid obtained is a cylindrical tube of height h, external radius b, internal radius a, wall thickness b-a; the volume of the outer cylinder is V_e = (base area)·height = π·b²·h; the volume of the inner cylinder is V_i = (base area)·height = π·a²·h; the volume of the solid of rotation obtained is V_e-V_i = π·b²·h-π·a²·h = π(b²-a²)h = π(b+a)(b-a)h = π(a+b)(b-a)h = (2π)((a+b)/2)(b-a)h, b-a = d is the thickness of the cylindrical tube, (a+b)/2 = r is the mean radius of the cylindrical tube, considering that the inner radius is a, and the outer radius is b; V = (2π)((a+b)/2)(b-a)h = 2π·r·d·h is the volume of the cylindrical tube, 2π·r is the average circumference, d is the thickness of the cylindrical tube, h is the height of the cylindrical tube

f: [a,b] → ℝ, 0 ≤ a < b, f(x) > 0; the trapezoid of the non-negative function f(x) rotates around the y-axis; if the trapezoid rotates around the y-axis with a > 0, then the solid is similar to a donut with a hole; if the trapezoid rotates around the y-axis with a = 0, then the solid is similar to a donut without a hole; decomposing the interval [a,b], x₀ = a < x₁ < ... < x_n = b, the generic interval is [x_k-1,x_k]; if the function were constant in the generic interval then we could consider the point ξ_k that is in the center of the generic interval, ξ_k = (x_k-1+x_k)/2, and the volume would be V = ⁿΣ_k=1(2π·ξ_k); the approximate volume of the solid is V = ⁿΣ_k=1(2π·ξ_k(x_k-x_k-1)f(ξ_k)), ξ_k = (x_k-1+x_k)/2 is the average radius, x_k-x_k-1 is the thickness, f(ξ_k) is the height; we could progressively refine the decomposition and calculate the limit when the largest difference x_k-x_k-1 tends to 0, therefore the Riemann sum gives the volume, but this is not a Riemann sum relative to the function f(x), but is relative to x·f(x), V = 2π·^b∫_a((x·f(x))dx), the pioneers of calculus would have said that 2π·x is the circumference of the point x, f(x) is the height, dx is the infinitesimal thickness

The volume V of the solid of rotation obtained by rotating around the y-axis the trapezoid of the non-negative function f defined on the interval [a,b], with a ≥ 0, is V = 2π·^b∫_a((x·f(x))dx)

A circle with center (R,0) and radius r < R rotates around the z-axis, which in this example is the ordinate axis, forming a solid called torus, a solid that looks like a ring; since the circle is symmetrical with respect to the x-axis, we consider only the semicircle above the x-axis of equation z = f(x), obtaining half the volume, therefore the volume must be multiplied by 2, V = 4π^R+r∫_R-r((x⋅f(x))dx); we calculate the equation of the semicircle above the x-axis using the Pythagorean theorem, (x-R)²+z² = r², z² = r²-(x-R)², z = f(x) = √(r²-(x-R)²); V = 4π^R+r∫_R-r((x⋅f(x))dx) = V = 4π^R+r∫_R-r((x√(r²-(x-R)²))dx), to simplify the calculations we change the variable, x-R = t, x = R+t, dx = dt; if x = R-r, then R-r = R+t, t = -r; if x = R+r, then R+r = R+t, t = r; V = 4π⋅^R+r∫_R-r((x√(r²-(x-R)²))dx) = V = 4π⋅^r∫_-r(((R+t)√(r²-t²))dt) = V = 4π⋅^r∫_-r((R√(r²-t²)+t√(r²-t²))dt) = V = 4π⋅^r∫_-r((R√(r²-t²))dt)+4π^r∫_-r((t√(r²-t²))dt) = V = 4πR⋅^r∫_-r(√(r²-t²)dt)+4π⋅^r∫_-r((t√(r²-t²))dt); the function under root is an even function because the variable t is squared, t is an even function, the product of an even function by an odd function is an odd function, the integral of an odd function over a symmetric interval [-a,a] is 0, 4π⋅^r∫_-r((t√(r²-t²))dt) = 0; the integral of an even function over a symmetric interval [-a,a] is double the integral over the interval [0,a], 4πR⋅^r∫_-r(√(r²-t²)dt) = 8πR⋅^r∫₀(√(r²-t²)dt), ^r∫₀(√(r²-t²)dt) is the area of 1/4 of a circle of radius r that is (π/4)r², ^r∫₀(√(r²-t²)dt) = (π/4)r²; V = 4πR⋅^r∫_-r(√(r²-t²)dt)+4π⋅^r∫_-r((t√(r²-t²))dt) = 8πR⋅^r∫₀(√(r²-t²)dt)+0 = 8πR(π/4)r² = 2πR⋅πr² = 2π²Rr, 2πR is the length of the circumference of radius R obtained from the rotation of the center of the circle, πr² is the area of the circle of radius r; the volume of the torus is given by the product of the circumference, which is obtained from the rotation of the center of the circle, by the area of the circle; this confirms Guldin's theorem which states that the volume of a solid obtained by the rotation of a plane figure around a straight line external to the figure and contained in the same plane is equal to the product of the area of the figure by the length of the circumference given by the rotation of the centroid of the figure, and the centroid of a circle is its center

Guldin's theorem states that the volume of a solid obtained from a plane figure rotating around an axis coplanar to the figure is V = α⋅d⋅A, where α ∈ [0,2π] is the rotation angle, d is the distance of the centroid of the flat figure from the axis of rotation, A is the area of the flat figure

Considering the function f(x) = e^-x² in the interval [0,b], calculate the volume of the solid obtained from the rotation of the trapezoid around the y-axis; V = 2π⋅^b∫₀((x⋅e^-x²)dx); the function e^-x² is not elementary integrable; the integral ^b∫_a(e^-x²dx) can be calculated numerically; D(e^-x²) = -2xe^-x²; V = 2π⋅^b∫₀((xe^-x²)dx) = 2π(-1/2)^b∫₀((-2xe^-x²)dx) = -π^b∫₀((-2xe^-x²)dx) = -π⋅^b[e^-x²]₀ = π⋅⁰[e^-x²]_b = π(e^-0²-e^-b²) = π(1-e^-b²); V = π(1-e^-b²) is the volume of the bounded solid of rotation with base of radius b; lim_b→+∞(π(1-e^-b²)) = lim_b→+∞(π(1-1/e^b²) = π(1-1/e^+∞²) = π(1-1/+∞) = π(1-0) = π(1) = π; V = π is the volume of the unbounded solid of rotation with base of radius ∞; the function f(x) = e^-x² in the interval [0,+∞] rotates around the y-axis, the solid is described by the function of 2 variables f(x,y) = e^-x²-y², this function of 2 variables is constant on each circumference which has as its center the origin of the axes, and the volume of this solid in three-dimensional space x,y,z is π

^+∞∫_-∞(e^-x²dx) = √(π), the area between the graph e^-x² and the x-axis is √(π)

49 - APPLICATIONS OF INTEGRAL CALCULUS - PART 2

The fundamental theorem of calculus states that the integral of a continuous function between a fixed limit and a variable limit is the primitive of the function, and the integral from point a to point b of a continuous function is the difference between the primitive calculated in point b and the primitive calculated in point a

³∫₂((1/x)dx) = ³[log(x)]₂ = log(3)-log(2), long ago this result was calculated with the tables of logarithms, today calculators are used

The fundamental theorem of calculus allows to define functions, for example we can define the logarithm function, log(x) := ^x∫₁((1/t)dt), x is the supremum of integration and t is the integration variable; the definition of the function using the integral allows the approximate calculation of the function; ln(2) = ²∫₁((1/x)dx); the integral is the element of separation between lower and higher sums, and therefore can be used as a calculation tool by approximation; we divide the interval [1,2] into n equal parts, x₀ = 1 < x₁ < ... < x_n = 2, the length of each interval is h = (2-1)/n = 1/n; x₀ = 1, x₁ = 1+1/n = (n+1)/n, x₂ = 1+2/n = (n+2)/n, x_k = 1+k/n = (n+k)/n; the function 1/x is decreasing on the interval [1,2]; the lower sum is s_n = ⁿΣ_k=1(e_k(x_k-x_k-1)) = ⁿΣ_k=1(e_k(1/n)) = (1/n)ⁿΣ_k=1(e_k) = (1/n)ⁿΣ_k=1(1/x_k) = (1/n)ⁿΣ_k=1(1/((n+k)/n)) = (1/n)ⁿΣ_k=1(n/(n+k)); the higher sum is S_n = ⁿΣ_k=1(E_k(x_k-x_k-1)) = ⁿΣ_k=1(E_k(1/n)) = (1/n)ⁿΣ_k=1(E_k) = (1/n)ⁿΣ_k=1(1/x_k-1) = (1/n)ⁿΣ_k=1(1/((n+k-1)/n)) = (1/n)ⁿΣ_k=1(n/(n+k-1)); n = 10, s_n = 0.6687, S_n = 0.7187; n = 20, s_n = 0.6808, S_n = 0.7058; n = 100, s_n = 0.6906, S_n = 0.6956; ln(2) = 0.69..., in fact ln(2) è 0.6931...; s_n < ln(2) < S_n; ln(2)-s_n = S_n-ln(2) < S_n-s_n = ⁿΣ_k=1((E_k-e_k)(x_k-x_k-1)); this method is called the rectangle method, because we replace the integral, which is the area of the trapezoid, with a multi-rectangle contained in the trapezoid or containing the trapezoid; we want to understand the order of magnitude of the error; the width of the integration interval is h = (b-a)/n; we must understand how this error goes to 0 when h tends to 0; in a continuous function E_k is the maximum and e_k is the minimum, E_k-e_k = f(x₂)-f(x₁); if the first derivative of the function is continuous, by Lagrange's theorem, E_k-e_k = f(x₂)-f(x₁) = (x₂-x₁)f'(ξ), E_k-e_k = |x₂-x₁||f'(ξ)|; if the first derivative is continuous in the interval [a,b] and is bounded, by the theorem preceding the proof of Weierstrass's Theorem, there is a constant M which is greater than the first derivative, so E_k-e_k ≤ M(x_k-x_k-1); ⁿΣ_k=1((E_k-e_k)(x_k-x_k-1)) ≤ M⋅h⋅(b-a), because x_k-x_k-1 = h, and ⁿΣ_k=1(x_k-x_k-1) = n⋅h = n((b-a)/n) = b-a; 0 ≤ S_n-s_n ≤ M(b-a)h, this is also true for the difference between S_n and the integral, and for the difference between the integral and s_n; |M| is a majorant of the first derivative, so this error goes to 0 as h goes to 0, so 0 ≤ S_n-s_n ≤ M(b-a)h = O(h), the symbol O is called big o and is an uppercase letter o; h = (b-a)/n, if h tends to 0 then n tends to ∞; the difference S_n-s_n goes to 0 with the speed of h = (b-a)/n, so it is directly proportional to h and inversely proportional to n

A sequence a_n is an O of 1/n when a_n = O(1/n) ⇒ |a_n| ≤ c(1/n), this is the meaning of O, and the symbol O is a capital letter o

The rectangle method comes from the definitions of lower sum and higher sum, and is related to a decomposition into equal parts; with the same number of points in which the interval [a,b] is divided, we want to obtain an estimate that tends to the integral of the function not with the speed of h, but with the speed of h²; we consider a Cartesian plane where the abscissa axis is the h-axis, the bisector of the first quadrant goes to 0 as h, a parabola branch goes to 0 as h², h³ goes to 0 faster than h², h⁴ goes to 0 faster than h³, the power hⁿ goes to 0, when h tends to 0, the larger n is; we consider a continuous function and the generic interval [x_k-1,x_k], with the rectangle method we replace a piece of trapezoid with a rectangle, but we could also take a piece of secant that joins the point f(x_k-1) with the point f(x_k); f(x_k) := f_k, f(x_k-1) := f_k-1; the trapezius area is (x_k-x_k-1)((f_k-1+f_k)/2) = (h/2)(f_k-1+f_k); when the function is convex the concavity is upwards, and (h/2)(f_k-1+f_k) > x_k∫_{x_k-1}(f(x)dx); when the function is concave the concavity is downwards, and (h/2)(f_k-1+f_k) < x_k∫_{x_k-1}(f(x)dx); we can get an estimate of the integral with ⁿΣ_k=1((h/2)(f_k-1+f_k)), h = (b-a)/N, T(f,N) := (h/2)(f₀+2f₁+2f₂+...+2f_N-1+f_N), T(f,h) := (h/2)(f₀+2f₁+2f₂+...+2f_N-1+f_N)

If f is an integrable function on the interval [a,b], its integral can be approximated by the trapezoidal rule, ^b∫_a(f(x)dx) ≈ (h/2)(f₀+2f₁+2f₂+...+2f_N-1+f_N)

With the trapezoidal rule we do N+1 evaluations, compared to the rectangle formula we do an extra evaluation

If f is a two times differentiable function with continuity, the trapezoidal rule provides a second order method for estimating the integral of f, ^b∫_a(f(x)dx) = T(f,h)+O(h²), h → 0

We need to understand with what speed T(f,h) tends to the value of the integral; T(f,h) converges with a speed which is O(h²), the speed of convergence is double; the trapezoid formula provides a second order method and the error goes to 0 as a constant for h², the rectangle method is a first order method and the error goes to 0 as a constant for h

The trapezoidal rule is a more accurate method of approximation than the rectangle method, because it is better to approximate the function with a secant stroke than with a horizontal stroke

ln(2) = ²∫₁((1/x)dx), the function is convex because it has concavity upwards, therefore using the trapezoidal rule we obtain overestimates; n = 10, T = 0.6937...; n = 20, T = 0.6933...; n = 100, T = 0.69315...; we obtain a decreasing sequence of values that converges very quickly to the correct value; with 100 intervals, using the rectangle method we obtain 2 exact decimal digits, using the trapezoidal rule we obtain 4 exact decimal digits; with the trapezoidal rule the convergence speed doubles using the same number of decomposition intervals

The midpoint method is another method of numerical integration; in an interval [x_k-1,x_k] we consider the midpoint ξ_k = (x_k-1+x_k)/2, the area of the rectangle is f(ξ_k)(x_k-x_k-1) and is equal to the area of the trapezoid which has as its upper side a section of tangent to the point ξ_k,f(ξ_k)); ⁿΣ_k=1(f(ξ_k)(x_k-x_k-1)), this Riemann sum is the sum of the areas of many trapezoids whose upper side is the tangent to the function at the point (ξ_k,f(ξ_k)); this midpoint method is a second order method, the difference between this sum and the integral tends to 0 as h²; if the function is convex, the trapezoidal rule gives overestimates, but the midpoint method gives underestimates; if the function is concave, the trapezoidal rule gives underestimates, but the midpoint method gives overestimates; if we combine the trapezoidal rule with the midpoint method we can obtain both underestimates and overestimates at the same time

ln(2) = ²∫₁((1/x)dx), the function is convex because it has concavity upwards, therefore by using the midpoint method we obtain underestimates; n = 10, T = 0.6928...; n = 20, T = 0.6930...; n = 100, T = 0.69314...; we get 4 exact decimal digits after 100 intervals as with the trapezoidal rule

The trapezoidal rule and the midpoint method are numerical integration methods that are used to tabulate functions that can be defined by integrals or to calculate integrals that cannot be calculated elementarily

The function e^-x² it is not elementary integrable; ^a∫₀(e^-x²dx) = ^x∫₀(e^-t²dt), this integral can be calculated with approximate integral calculus methods such as the trapezoidal rule and the midpoint method

The arc length of a continuous function in the interval [a,b] with continuous first derivative is L = ^b∫_a(√(1+(f'(x))²)dx), but this integral cannot always be calculated elementarily; we want to calculate the arc length of sin(x) in the interval [0,π], L = ^b∫_a(√(1+(f'(x))²)dx) = ^b∫_a(√(1+(sin'(x))²)dx) = ^b∫_a(√(1+(cos(x))²)dx), this integral cannot be calculated elementarily, it is not possible to express a primitive of the integrand function using elementary functions, but we can calculate this integral with approximate integral calculus methods such as the trapezoidal rule and the midpoint method

With the trapezoidal rule we replace the integral of the function on the interval [x_k-1,x_k] with the integral of the first degree polynomial function which interpolates the function in the extremes of the interval

With the midpoint method we replace the integral of the function on the interval [x_k-1,x_k] with the integral of the first degree polynomial function which is the tangent to the function at the midpoint of the interval

When we cannot calculate ^x_k∫_{x_k-1}(f(x)dx) then we can find a first degree polynomial p₁(x) that approximates the function f(x) in the interval, p₁(x) ≈ f(x), therefore ^x_k∫_{x_k-1}(f(x)dx) ≈ ^x_k∫_{x_k-1}(p₁(x)dx), we can calculate this new integral because it is the area of a trapezoid; the trapezoid rule and the midpoint method transform an integral into the sum of areas of trapezoids

The Cavalieri-Simpson rule is a method for the approximate numerical calculation of definite integrals formulated by the English mathematician Thomas Simpson and the Italian mathematician Bonaventura Cavalieri; the Cavalieri-Simpson rule is a method that allows the approximation of a definite integral using a second degree polynomial, the function is replaced with an arc of parabola with axis of symmetry parallel to the y-axis; we divide the interval into an even number of intervals, therefore the number of parts into which the interval has been divided is 2N; the width of each interval is h = (b-a)/(2N); considering the points x₀, x₁, x₂, we look for the parabola p₂(x) = ax²+bx+c which passes through these 3 points; if the 3 points are aligned the parabola degenerates into a straight line, the coefficient a is 0; we must integrate the parabola on the interval, with an expression that depends on h and on f(x₀), f(x₁), f(x₂); the contribution given by the second degree polynomial is (h/3)(f(x₀)+4f(x₁)+f(x₂)), and this formula is about the pair of intervals [x₀,x₁] and [x₁,x₂]; ^x₂∫_x₀(f(x)dx) ≈ ^x₂∫_x₀(p₂(x)dx) = ^x₂∫_x₀(((h/3)(f(x₀)+4f(x₁)+f(x₂)))dx)

If f is an integrable function on the interval [a,b], its integral can be approximated by the Cavalieri-Simpson formula, ^b∫_a(f(x)dx) ≈ (h/3)(f₀+4f₁+2f₂+4f₃+...+2f_2N-2+4f_2N-1+f_2N)

If the function f is regular, differentiable 4 times with continuity, the Cavalieri-Simpson rule is a fourth order method

If f is a four times differentiable function with continuity on the interval [a,b], the Cavalieri-Simpson rule provides a fourth order method for estimating the integral of f, ^b∫_a(f(x)dx) = S(f,h)+O(h⁴), h → 0

If we apply the trapezoid rule or the midpoint method on a first degree polynomial function, there is no approximation, the result is equal to the integral

If we apply the Cavalieri-Simpson rule on a second degree or third degree polynomial function, there is no approximation, the result is equal to the integral

The Cavalieri-Simpson rule is also exact for third degree polynomials, and this has a geometric explanation; the integral of a section of a cubic function is equal to the integral of a section of a parabola that interpolates the cubic function