This website uses advertising and analytics technologies.

CALCULUS AND LINEAR ALGEBRA


1 - NATURAL NUMBERS

Natural numbers are used for counting and ordering

ℕ := {0,1,2,3,...}

Some definitions, including the standard ISO 80000-2, begin the natural numbers with 0, corresponding to the non-negative integers 0, 1, 2, 3, ..., sometimes collectively denoted by the symbol N0, to emphasize that zero is included, whereas others start with 1, corresponding to the positive integers 1, 2, 3, ..., sometimes collectively denoted by the symbol N1, N+, or N* for emphasizing that zero is excluded

Peano axioms: 1) zero (0) is a natural number; 2) to every natural number n is associated a natural σ(n) different from 0, called the next of n, and different natural numbers have different successors; 3) if a set A of natural numbers contains 0 and contains the next of each of its elements, then A = ℕ

Starting from the notion of succession we can define addition, and on the base of addition the multiplication of natural numbers

Adding 1 to a natural number means passing from n to the next of n

n+0 := n; n+1 = n+σ(0) := σ(n+0) = σ(n); n+2 = n+σ(1) := σ(n+1) = σ(σ(n))

Commutative property: a binary operation is commutative if changing the order of the operands does not change the result; addition is commutative, a+b = b+a; multiplication is commutative, a⋅b = b⋅a; subtraction and division are not commutative, for example 3−5 ≠ 5−3 and 7/5 ≠ 5/7

Associative property: the associative property is a property of some binary operations, which means that rearranging the parentheses in an expression will not change the result; (a+b)+c = a+(b+c); (a⋅b)⋅c = a(b⋅c)

The neutral element of the addition is the number 0

The neutral element of multiplication is the number 1

Distributive property: a(b+c) = ab+ac

The mathematical induction or induction principle is a mathematical proof technique; it is essentially used to prove that a statement P(n) holds for every natural number n = 0, 1, 2, 3, ...; a proof by induction consists of two cases, the first, the base case or base, proves the statement for n = 0 without assuming any knowledge of other cases, the second case, the induction step, proves that if the statement holds for any given case n = k, then it must also hold for the next case n = k + 1; these two steps establish that the statement holds for every natural number n; the base case does not necessarily begin with n = 0, but often with n = 1, and possibly with any fixed natural number n = N

The induction principle asserts that if P(0) is true, and if P(n) ⇒ P(n+1) ∀ n, then P(n) is true

A ⊆ ℕ; 0 ∈ A; n ∈ A ⇒ n+1 ∈ A

The sum of n natural numbers is n(n+1)/2; 0+1+2+...+n = n(n+1)/2; P(0): 0 = 0, true; 0+1+...+n+(n+1) = (n(n+1)/2)+(n+1) = (n(n+1)+2(n+1))/2 = (n+1)(n+2)/2, P(n) ⇒ P(n+1); P(n) is true

Calculate how many parts a plane is divided by n straight lines; r(n+1) = r(n)+(n+1); r(n) = r(n-1)+n; r(n) = 1+n(n+1)/2

a0 := 1; an+1 := a⋅an; 0! := 1; (n+1)! := (n+1)n!


2 - COMBINATORIAL CALCULUS

a2 := a⋅a; an := a⋅a⋅...⋅a for n factors; an+1 = a⋅an; a⋅a = a2 = a⋅a1 ⇒ a1 = a⋅a/a = a; a = a1 = a⋅a0 ⇒ a0 = a/a = 1; a-n = 1/an

n! := 1⋅...⋅n; (n+1)! = (n+1)⋅n!; 1 = 1! = 1⋅0! ⇒ 0! := 1

The Cartesian product of two sets A and B, denoted A × B, is the set of all ordered pairs (a, b) where a is in A and b is in B

A1 = {a,b,c}, A2 = {1,2}; A1 x A2 = (a,1), (a,2), (b,1), (b,2), (c,1), (c,2); the number of pairs formed is n1⋅n2

Ordered lists = dispositions

Dispositions with repetitions of n elements in groups of k elements: (r)Dn,k = nk

A = {a,b,c}, n = 3, k = 2; (a,a), (a,b), (a,c), (b,a), (b,b), (b,c), (c,a), (c,b), (c,c); rD3,2 = 32 = 9 dispositions with repetitions of 3 elements in groups of 2 elements

A byte is composed by 8 bits, a bit is 0 or 1, so a byte can assume 256 different values; {0,1}, n = 2, k = 8; (r)D2,8 = 28 = 256

Simple dispositions, dispositions with no repetitions of n elements in groups of k elements: Dn,k = n(n-1)...(n-k+1); k ≤ n

A = {a,b,c}, n = 3, k = 2; (a,b), (a,c), (b,a), (b,c), (c,a), (c,b); D3,2 = 3⋅2 = 6 dispositions with no repetitions of 3 elements in groups of 2 elements

Permutations, simple dispositions of n elements in groups of n elements: Pn = Dn,n = n(n-1)...2⋅1 = n!; permutations are simple dispositions with k = n

4 friends go to the theater and there are 4 armchairs; how many ways can they sit? In 24 different ways; permutation that is solved with n!

Unordered lists = combinations; combinations are a subset of the dispositions

Simple combinations of n elements in groups of k elements: Cn,k = Dn,k/Pk = Dn,k/k! = (n(n-1)...(n-k+1))/k! = (n(n-1)...(n-k+1)(n-k)!)/k!(n-k)! = n!/(k!(n-k)!)

Symmetry of binomial coefficients: Cn,k = Cn,n-k

C5,2 = C5,3; C5,2 = 5!/(2!3!) = 10; C5,3 = 5!/(3!2!) = 10

Cn,n = 1; n!/(n!0!) = 1

Binomial coefficient with upper index n and lower index k and 0 ≤ k ≤ n: C(n,k) = n!/(k!(n-k)!); C(n,k) ∈ ℕ

(a+b)0 = 1

(a+b)1 = a+b

(a+b)2 = a2+2ab+b2

(a+b)3 = a3+3a2b+3ab2+b3

1

1 1

1 2 1

1 3 3 1

1 4 6 4 1

1 5 10 10 5 1

1 6 15 20 15 6 1

1 7 21 35 35 21 7 1

1 8 28 56 70 56 28 8 1

1 9 36 84 126 126 84 36 9 1

1 10 45 120 210 252 210 120 45 10 1

Binomial formula: (a+b)n = C(n,0)an+C(n,1)an-1b+C(n,2)an-2b2+...+C(n,n-1)abn-1+C(n,n)bn = nΣk=0(C(n,k)an-kbk)

C(n,0) = C(n,n) = 1

C(n,k) = C(n-1,k)+C(n-1,k-1), 1 ≤ k ≤ n-1; each row of the arithmetic triangle is calculated starting from the previous row

n!/(k!(n-k)!) = (n!/((k-1)!(n-(k-1))!))((n-(k-1))/k) = C(n,k-1)((n-k+1)/k)


3 - INTEGERS AND RATIONALS

Commutative property: a+b = b+a; ab = ba

Associative property: (a+b)+c = a+(b+c); (ab)c = a(bc)

Neutral element: a+0 = a; a⋅1 = a

Distributive property: a(b+c) = ab+ac

The equation a+x = b can be solved in ℕ only if a ≤ b

The equation a+x = 0 can be solved in ℕ only if a = 0

The set of integers consists of zero (0), the positive natural numbers (1, 2, 3, ...), and their additive inverses, the negative integers, −1, −2, −3, ...; the set of integers is denoted by the symbol ℤ

Commutative property for integers: 2+(-5) = -3, -5+2 = -3

The solution of the equation a+x = b in ℤ is x = b-a, that is x = b+(-a), the sum of b with the opposite of a; subtraction is equivalent to addition with the opposite

The opposite of x is -x; if x is negative, -x is positive

|x| := {x, x ∈ ℕ; -x, x ∉ ℕ}

3⋅4 = 4+4+4 = 12; 4⋅3 = 3+3+3+3 = 12; 3⋅(-4) = -4-4-4 = -12; -4⋅3 = -12; 3⋅0 = 0⋅3 = 0; -4⋅0 = 0⋅(-4) = 0; -3⋅(-4) = 12

0 = -3⋅0 = -3(4-4) = -3⋅4+(-3)(-4) = -12+12; for the distributive property, the product of two negative numbers is positive.

The sum of a number with its opposite is zero.

The product of a number by its reciprocal is 1.

The product of two integers a and b is equal to the product of their absolute values if a and b have the same sign, it is the opposite of the product of their absolute values if they have opposite signs; the product is zero if at least one of the factors is zero

a+x = b is always solvable in ℤ

a+x = 0 is always solvable in ℤ; every number has its opposite

a⋅x = 1 in ℕ if and only if a = 1; in ℕ the only number to have a reciprocal is 1

a⋅x = 1 in ℤ if and only if a = 1 or a = -1; in ℤ the only numbers to have a reciprocal are 1 and -1

3⋅x = 1; there are no solutions in ℕ and ℤ

A rational number is a number that can be expressed as the quotient or fraction p/q of two integers, a numerator p and a non-zero denominator q; rationals are denoted with the symbol ℚ

The fraction n/m, where n is an integer and m is an integer different from 0, has n as its numerator and m as its denominator; if n is a multiple of m, n = q⋅m, then the fraction is apparent: n/m = q

n/m = p/q ⇔ n⋅q = m⋅p

3/-2 = -3/2

3/4 is a fraction reduced to the lowest terms, numerator and denominator are prime among themselves, they have no other divisor in common than 1

An irreducible fraction, or fraction in lowest terms or simplest form or reduced fraction, is a fraction in which the numerator and denominator are integers that have no other common divisors than 1, or -1 when negative numbers are considered; a fraction a⁄b is irreducible if and only if a and b are coprime, that is, if a and b have a greatest common divisor of 1.


4 - RATIONALS

(1/3)+(2/3) = 3/3 = 1

(2/3)+(4/3) = 6/3 = 2

(2/3)+(3/4) = (2⋅4+3⋅3)/(3⋅4) = 17/12

The least common multiple, lowest common multiple, or smallest common multiple of two integers a and b, usually denoted by lcm(a,b), is the smallest positive integer that is divisible by both a and b; when adding, subtracting, or comparing simple fractions, the least common multiple of the denominators, often called the lowest common denominator, is used because each of the fractions can be expressed as a fraction with this denominator; (2/21)+(1/6) = (2/(3⋅7))+(1/(2⋅3)) = (2⋅2/(2⋅3⋅7))+(1⋅7/(2⋅3⋅7)) = (4/42)+(7/42) = 11/42

3⋅(3/4) = 9/4

(2/3)⋅(4/7) = 8/21

Operations in ℚ: (n/m)+(p/q) = (nq+pm)/mq; (n/m)⋅(p/q) = np/mq

Comparing n/m and p/q; nq/mq < pm/mq, nq < pm; 2/3 < 3/4, 2⋅4 < 3⋅3

ℚ is an ordered field: n/m < p/q ⇔ nq < pm

ℚ is a field because every element, except 0, has an opposite and a reciprocal or multiplicative inverse; ℕ and ℤ are not fields because they do not have multiplicative inverses

Between two rational numbers there are always infinite rational numbers; ℚ is a dense numerical set

The rational numbers are a dense subset of the real numbers

ℕ and ℤ are not dense, ℚ and ℝ are dense

It is possible to find a rational number between 2/3 and 3/4; (2/3)+(3/4) = (8+9)/12 = 17/12; (17/12)/2 = 17/24; 2/3 < 17/24 < 3/4; 2⋅24 < 3⋅17, 48 < 51; 17⋅4 < 3⋅24, 68 < 72

A majorant or upper bound of a set A is a number M greater than or equal to any element of A; ∀ a ∈ A, a ≤ M

A not empty set A is bounded above if it admits majorants

If p < q then p/q < 1; 1 is a majorant

A minorant or lower bound of a set A is a number m less than or equal to any element of A; ∀ a ∈ A, a ≥ m

A not empty set A is bounded below if it admits minorants

The smallest of the majorants is called supremum; sup(A)

The largest of the minorants is called infimum; inf(A)

If the supremum belongs to the set, it is the maximum of the set; max(A) = M

If the infimum belongs to the set, it is the minimum of the set; min(A) = m

If a set is bounded, it has minorants and majorants

If a set is unbounded below, it has no minorants but it has majorants

If a set is unbounded above, it has minorants but it has no majorants

If a set is unbounded below and above, it has no minorants and no majorants

A = {p/q, 0 < p < q}; A is bounded above and 1 is the supremum, but it has no maximum; A is bounded below and 0 is the infimum, but it has no minimum

p/q < (p+1)/(q+1); p(q+1) < q(p+1), pq+p < pq+q, p < q; 3/4 < 4/5

In ℤ a set bounded above has maximum

In ℤ a set bounded below has minimum

In ℚ a set bounded above may not have maximum

In ℚ a set bounded below may not have minimum

The Pythagoreans discovered that the diagonal of a square is immeasurable respect to the side; 12+12 = (n/m)2, 2 = n2/m2 ⇔ 2m2 = n2, but there are no integers that can satisfy this equation

There is no rational whose square is equal to 2; the double of a perfect square cannot be a perfect square

A = {p/q; p2/q2 < 2}, there is no maximum; B = {n/m; n2/m2 > 2}, there is no minimum

ℚ is dense but not complete; in ℚ it is impossible to resolve x2 = 2; there are sets that are bounded above but without supremum

In ℚ there are sets bounded above but without supremum and maximum, and sets bounded below but without infimum and minimum, because ℚ is dense but not complete; this can be understood by studying the hyperbola (2x+2)/(x+2)


5 - DECIMALS

√(2) is not present in the rational number line

So there is a need for a numerical set without holes, which is dense but also continuous; therefore it is necessary to introduce irrational numbers; rational numbers and irrational numbers form the set of real numbers, denoted by the symbol ℝ

A decimal fraction is a fraction that has a power of 10 as its denominator; a decimal or equivalent fraction generates a finite decimal number

123 = 1⋅100+2⋅10+3⋅1; 1⋅102+2⋅101+3⋅100

1.25 = 1⋅100+2⋅10-1+5⋅10-2

3/2 = 15/10 = (10+5)/10 = 10/10 + 5/10 = 1,5

9/4 = 225/100 = (200+20+5)/100 = 200/100 + 20/100 + 5/100 = 2,25

1/7 = 1.142857; this is a simple repeating decimal representation, simple because the block of repeating digits begins immediately after the decimal point or comma

The periodicity of a number is given by the remains that are repeated during the division process

Rational numbers have periodic decimal representation, including period 0, excluding period 9

11/15 = 0.73; a decimal in which at least one of the digits after the decimal point is non-repeated and some digits are repeated is called a mixed repeating decimal

x = 1.235 = 1.2353535...; 100x = 123.5353535...; 100x-x = 99x = 122.3; x = 122.3/99 = 12230/990

A repeating decimal or recurring decimal is decimal representation of a number whose digits are periodic, repeating its values at regular intervals, and the infinitely repeated portion is not zero; it can be shown that a number is rational if and only if its decimal representation is repeating or terminating; the infinitely repeated digit sequence is called the repetend or reptend; if the repetend is a zero, this decimal representation is called a terminating decimal rather than a repeating decimal, since the zeros can be omitted and the decimal terminates before these zeros; every terminating decimal representation can be written as a decimal fraction, a fraction whose denominator is a power of 10 like 1.585 = 1585/1000; 1.585 may also be written as a ratio of the form k/2n5m, 1.585 = 317/(2352); every number with a terminating decimal representation also has an alternative representation as a repeating decimal whose repetend is the digit 9, this is obtained by decreasing the final (rightmost) non-zero digit by one and appending a repetend of 9 like 1.000... = 0.999...

Any number that cannot be expressed as a ratio of two integers is said to be irrational; their decimal representation neither terminates nor infinitely repeats but extends forever without regular repetition; examples of such irrational numbers are the square root of 2 and π

Irrational numbers are represented by unlimited, non-periodic decimal alignments


6 - REAL NUMBERS

The set ℝ of real numbers is the union of the set of rational numbers with the set of irrational numbers

A rational number can be represented as a fraction, and can be either a periodic decimal number or a finite decimal number derived from a decimal fraction or equivalent

Irrational numbers have infinite non-periodic decimal digits

The set of rational numbers and the set of irrational numbers form the set of real numbers

ℕ ⊂ ℤ ⊂ ℚ ⊂ ℝ

ℚ and ℝ are ordered fields, but ℝ is complete

In ℝ, every set that is bounded above has a supremum

z = 1.23751..., y = 1.23738...; 1.2375 < x < 1.2376; 1.2373 < y < 1.2374; y < x

The absolute value or modulus is equal to the number when it is positive and the opposite when it is negative; the absolute value or modulus exists in ℤ, ℚ, and ℝ

x ∈ ℝ, |x| := {x, x ≥ 0; -x, x < 0}

3.7 > 2.3; -3.7 < -2.3

The set ℝ of real numbers is complete because every set that is bounded above has supremum and every set that is bounded below has infimum.

∀ a ∈ A and ∀ b ∈ B, a < b; a ≤ sup(A) ≤ inf(B) ≤ b; the sets A and B are separated by the interval [sup(A),inf(B)]

Two sets A and B are separate if ∀ a ∈ A and ∀ b ∈ B, a < b, and this implies that sup(A) ≤ inf(B)

Two separate sets A and B are contiguous if sup(A) = inf(B)

0 ≤ inf(B)-sup(A) ≤ b-a

The notion of contiguity between sets can be used to define π, the ratio between the length of a circumference and the length of its diameter or, in equivalent form, the length of the semicircle of radius 1

Archimedes c. 240 BC, 3+10/71 < π < 3+1/7; J.H. Lambert 1767, π is irrational; F. Lindemann 1882, π is transcendental

An irrational number cannot be represented by a fraction and its decimal representation is unlimited and not periodic

A transcendental number is an irrational number that cannot be obtained as a solution of an algebraic equation

x = x0.c1c2...cn..., x0 ∈ ℕ; xn = x0.c1c2...cn; x'n = x0.c1c2...cn+1/10n; xn ≤ x ≤ x'n; x has an uncertainty of 1/10n


7 - INEQUALITY

|x+y| ≤ |x|+|y|; it is the triangle inequality

|x⋅y| = |x|⋅|y|

The geometric mean of two non-negative numbers does not exceed the arithmetic mean: √(xy) ≤ (x+y)/2

If x is a real number of any sign, the square root of its square is the absolute value of x: √(x2) = |x|

Considering a right triangle inscribed in a semicircle of radius r, where x and y are the projections of the cathets on the hypotenuse; r = (x+y)/2; an Euclid's theorem states that in a right triangle the height is the proportional mean between the projections of the cathets on the hypotenuse, that is x/h = h/y, xy = h2, h = √(xy); h ≤ r, √(xy) ≤ (x+y)/2

√(xy) ≤ (x+y)/2; xy ≤ ((x+y)/2)2, xy ≤ (x+y)2/4, 4xy ≤ x2+2xy+y2, 0 ≤ x2-2xy+y2, 0 ≤ (x-y)2; the square of a real number is always ≥ 0

The geometric mean of n ≥ 2 non-negative numbers does not exceed the arithmetic mean, n√(x1x2...xn) ≤ (x1+x2+...+xn)/n

If n ≥ 2 non-negative real numbers have n as their sum, their product does not exceed 1, (x1+x2+...+xn = n) ⇒ x1x2...xn ≤ 1; x1+x2 = 2, 1-α+1+α = 2, x1x2 = (1-α)(1+α) = 1-α2 < 1

Geometric mean ≤ Arithmetic mean, G ≤ A; A = (x1+x2+...+xn)/n, n = (x1+x2+...+xn)/A = x1/A+x2/A+...+xn/A; if a sum of n numbers equals n, then their product is ≤ 1, x1x2...xn/An ≤ 1, x1x2...xn ≤ An; G = n√(x1x2...xn) ≤ n√(An) = A

Considering a = a⋅1⋅1⋅...⋅1 where the number 1 is repeated n-1 times and a > 1; 1 < n√(a) ≤ (a+n-1)/n = 1+(a-1)/n, with this formula we can overestimate the nth root of a number; 1 < 3√(1.2) < 1+0.2/3 = 1+2/30

Bernoulli's inequality: (1+x)n ≥ 1+nx, n ∈ ℕ, x ∈ ℝ, x > -1

First demonstration of the Bernoulli's inequality: x1 = 1+nx, x2 = 1, xn = 1, n√(1+nx) ≤ (1+nx+n-1)/n, n√(1+nx) ≤ 1+x, 1+nx ≤ (1+x)n

Second demonstration of the Bernoulli's inequality: base of induction, P(n=0) (1+x)0 ≥ 1+0x, 1 ≥ 1 true; inductive step, (1+x)n+1 ≥ (1+x)(1+nx) ≥ 1+x+nx+nx2 > 1+(n+1)x


8 - REAL FUNCTIONS AND SEQUENCES

A function is a binary relation between two sets that associates each element of the first set to exactly one element of the second set; typical examples are functions from integers to integers, or from the real numbers to real numbers; a function is a process or a relation that associates each element x of a set X, the domain of the function or definition set, to a single element y of the set Y, the codomain of the function or image of the function; if the function is called f, this relation is denoted by y = f(x), where the element x is the argument or input of the function, and y is the value of the function, the output, or the image of x by f; the symbol that is used for representing the input is the variable of the function, for example f is a function of the variable x; a function is uniquely represented by the set of all pairs (x, f(x)), called the graph of the function

f: A → ℝ, A ⊆ ℝ, x → f → f(x); each number x of the set A is associated with a real number

Domain of the function = definition set = A = dom f

Codomain of the function = set that contains the output values of the function, usually it is ℝ

Image of the function = set of the output values of the function = f(A) = im(f)

f(x) = x2+2x+3, A = ℝ, A = (-∞,+∞)

f(x) = √(x), A = {x ∈ ℝ, x ≥ 0}, A = [0,+∞)

f(x) = √(x(1-x)), y = x-x2, A = {x ∈ ℝ, 0 ≤ x ≤ 1}, A = [0,1]

f(x) = 1/(x2-1), A = ℝ \ {-1,1}, A = (-∞,-1) ∪ (-1,1) ∪ (1,+∞)

A real interval is a set of real numbers that contains all real numbers lying between any two numbers of the set

[a,b] := {x ∈ ℝ, a ≤ x ≤ b}, bounded and closed interval; a is the left bound or infimum and b is the right bound or supremum, inf(A) = a, sup(A) = b; a is the minimum and b is the maximum, min(A) = a, max(A) = b

(a,b] := {x ∈ ℝ, a < x ≤ b}, bounded and left-open and right-closed interval; a is the left bound or infimum and b is the right bound or supremum, inf(A) = a, sup(A) = b; there is no minimum and b is the maximum, max(A) = b

[a,b) := {x ∈ ℝ, a ≤ x < b}, bounded and left-closed and right-open interval; a is the left bound or infimum and b is the right bound or supremum, inf(A) = a, sup(A) = b; a is the minimum and there is no maximum, min(A) = a

(a,b) := {x ∈ ℝ, a < x < b}, bounded and open interval; a is the left bound or infimum and b is the right bound or supremum, inf(A) = a, sup(A) = b; there are no minimum and maximum

[a,+∞) := {x ∈ ℝ, a ≤ x}, unbounded above and closed interval; a is the left bound or infimum and the right bound or supremum is +∞ and there are no majorants, inf(A) = a, sup(A) = +∞; a is the minimum and there is no maximum, min(A) = a

(a,+∞) := {x ∈ ℝ, a < x}, unbounded above and open interval; a is the left bound or infimum and the right bound or supremum is +∞ and there are no majorants, inf(A) = a, sup(A) = +∞; there are no minimum and maximum

(-∞,b] := {x ∈ ℝ, x ≤ b}, unbounded below and closed interval; the left bound or infimum is -∞ and the right bound or supremum is b and there are no minorants, inf(A) = -∞, sup(A) = b; there is no minimum and b is the maximum, max(A) = b

(-∞,b) := {x ∈ ℝ, x < b}, unbounded below and open interval; the left bound or infimum is -∞ and the right bound or supremum is b and there are no minorants, inf(A) = -∞, sup(A) = b; there are no minimum and maximum

In each interval, the connection property is valid: if x1 and x2 ∈ A, and x1 < x2, and x1 < x < x2, then x ∈ A

A sequence is a function defined in the set ℕ of natural numbers or in the set ℕ* of natural numbers greater than 0; n ∈ ℕ → f → f(n) ∈ ℝ

n ∈ ℕ* → 1+(1/n) = (n+1)/n

a0,a1,a2,...,an; (an)n∈ℕ

An arithmetic progression or arithmetic sequence is a sequence of numbers such that the difference between the consecutive terms is constant; an arithmetic progression is a sequence in which, given an initial term, each term is obtained from the previous one by adding a constant

Arithmetic progression: d = an+1-an, d is a difference and d ≠ 0, an+1 := an + d; a0, a1 = a0 + d, a2 = a1 + d = a0 + 2d, an = a0 + nd, an = a1 + (n-1)d, an = am + (n-m)d

A finite portion of an arithmetic progression is called a finite arithmetic progression and sometimes just called an arithmetic progression; the sum of a finite arithmetic progression is called an arithmetic series

A geometric progression or geometric sequence is a sequence of non-zero numbers where each term after the first is found by multiplying the previous one by a fixed, non-zero number called the common ratio; in a geometric progression, each term is obtained from the previous one by multiplying by a constant

Geometric progression: r = an+1/an, r is a ratio and r ≠ 1, an+1 := an⋅r; a0, a1 = a0⋅r, a2 = a1⋅r = a0⋅r2, an = a0⋅rn, an = a1⋅rn-1

A geometric series is the sum of the numbers in a geometric progression

The distinction between a progression and a series is that a progression is a sequence, whereas a series is a sum

Formula to calculate the sum of n terms in geometric progression with initial term = 1 and ratio = r; 1+r+...+rn-1 = ((1-r)/(1-r))(1+r+...+rn-1) = (1+r+...+rn-1-r-r2-rn-1-rn)/(1-r) = (1-rn)/(1-r); if 0 < r < 1, 1+r+...+rn-1 = (1-rn)/(1-r) < 1/(1-r)

Sum of n terms in geometric progression with initial term = 1 and ratio = 1/2; 1+1/2+...+(1/2)n-1 < 2

It is interesting to understand if there is a trend value for a succession, that is, a number to which the terms of the sequence are close for large indices

In a constant sequence the trend value is obvious

(-1)n = 1,-1,1,-1,1,-1,...; when n is even it is 1, when n is odd it is -1; there is no trend value

1/n, n ∈ ℕ*; n = 1, 1/n = 1; n = 2, 1/n = 0.5; n = 3, 1/n = 0.3; n = 4, 1/n = 0.25; n = 5, 1/n = 0.2; n = 10, 1/n = 0.1; n=11, 1/11 = 0.09; n = 100, 1/100 = 0.01; the trend value is 0; the k-th decimal digit reaches the value 0 when n > 10k

ε > 0, nε ∀ n > nε, |(1/n)-0| = 1/n < ε, n > 1/ε, n > nε ≥ 1/ε; the set of natural numbers is unbounded above, so for every ε there is an n > nε

n ∈ ℕ*, n ↦ na, a > 1; 13 = 3, 23 = 1.73205080757, 33 = 1.44224957031, 43 = 1.31607401295, 53 = 1.24573093962, 63 = 1.20093695518, 73 = 1.16993081276, 83 = 1.14720269044, 93 = 1.12983096391, 103 = 1.11612317403, 1003 = 1.01104669194, 10003 = 1.00109921598, 100003 = 1.00010986726, 1000003 = 1.00001098618, 10000003 = 1.00000109861

a > 1, a1/2 > a1/3; (a1/2)6 > (a1/3)6, a6/2 > a6/3, a3 > a2; an+1 > an

na = a1/n

2a = a1/2; 22a = 4a = a1/4


9 - LIMIT OF SEQUENCES - PART 1

n ↦ c/n, ∀ ε > 0, 0 < c/n < ε, n/c > 1/ε, n > c/ε, n > nε ≥ c/ε, limn→∞(c/n) = 0

1 < na < 1+(a-1)/n; 1 < na < 1+c/n; |na-1| = na-1 < c/n

an → L, limn→∞(an) = L; ∀ ε > 0, ∃ nε | ∀ n > nε ⇒ |an-L| < ε; -ε < an-L < ε, L-ε < an < L+ε

Definition of limit of a sequence: ∀ ε > 0, ∃ nε : n > nε ⇒ |an-L| < ε

Neighbourhood of center L and radius ε: I(L;ε) = (L-ε,L+ε)

Increasing sequence: ∀ n : an ≤ an+1

Decreasing sequence: ∀ n : an ≥ an+1

Strictly increasing sequence: ∀ n : an < an+1

Strictly decreasing sequence: ∀ n : an > an+1

Constant sequence: ∀ n : an = an+1

n ↦ c/n, is a strictly decreasing sequence

n ↦ na, for a > 1 is a strictly decreasing sequence that converges to the limit 1, for a = 1 is a constant sequence with the constant value 1, for 0 < a < 1 is a strictly increasing sequence that converges to the limit 1

Increasing and decreasing sequences are also called monotone sequences

Increasing and decreasing sequences, also called monotonous, always have a limit and for this reason they are called regular

Every increasing sequence, bounded above, has a limit that is its supremum

If the increasing sequence is not bounded above, the limit is the supremum that is +∞

An increasing monotone sequence has a limit that is its supremum

Every decreasing sequence, bounded below, has a limit that is its infimum

If the decreasing sequence is not bounded below, the limit is the infimum that is -∞

A decreasing monotone sequence has a limit that is its infimum

Supremum of a sequence means supremum of the image of the sequence, that is the set of values resulting from the sequence

Demonstration of an increasing sequence bounded above: an ≤ an+1; ∀ n, an ≤ sup(an) := sup{an, n ∈ ℕ}; sup(an) = L; L-ε < anε ≤ an, L-ε < an ≤ L < L+ε; this demonstration is the verification of the definition of limit

A sequence whose limit is +∞ or -∞ is called divergent; a divergent sequence diverges to +∞ or diverges to -∞; if a divergent sequence diverges to +∞, it diverges positively; if a divergent sequence diverges to -∞, it diverges negatively

Definition of a sequence whose limit is +∞: limn→+∞(an) = +∞; ∀ M > 0, ∃ nM : ∀ n > nM ⇒ an > M

Definition of a sequence whose limit is -∞: limn→+∞(an) = -∞; ∀ m < 0, ∃ nm : ∀ n < nm ⇒ an < m

Geometric progression, or geometric sequence, of first element 1 and reason a; if a is 1 it is a constant sequence; prove that if a > 1 then the sequence diverges to + ∞; 1, a, a2, a3, ...; an = (1+d)n ≥ 1+nd > M, nd > M-1, n > (M-1)/d, n > nM ≥ (M-1)/d

Considering the geometric progression an, if 0 < a < 1, limn→+∞(an) = 0; (1/2)n = 1/2n, 2n diverges to +∞, so 1/2n converges to 0

n ↦ nn = n1/n, n ∈ ℕ; n = 1, 1; n = 2, 22 = 1.41421356237; n = 3, 33 = 1.44224957031; n = 4, 44 = 1.41421356237; n = 5, 55 = 1.37972966146; n = 6, 66 = 1.3480061546; n = 7, 77 = 1.32046924776; n = 8, 88 = 1.29683955465; n = 9, 99 = 1.27651800701; n = 10, 1010 = 1.25892541179; n = 100, 10100 = 1.04712854805; n = 1000, 10001000 = 1.00693166885; n = 10000, 1000010000 = 1.00092145832; n = 100000, 100000100000 = 1.00011513588; n = 1000000, 10000001000000 = 1.00001381561

To compare two powers with different base and rational exponent, the exponents must be multiplied by their least common multiple

21/2, 31/3; (21/2)6, (31/3)6; 26/2, 36/3; 23 = 8, 33 = 9; 8 < 9 therefore 21/2 < 31/3

31/3, 41/4; (31/3)12, (41/4)12; 312/3, 412/4; 34 = 81, 43 = 64, 81 > 64, therefore 31/3 > 41/4

The sequence nn tends to 1 and it is a decreasing monotone sequence from the third term onwards

Demonstration that the sequence nn tends to 1: n = √n⋅√n⋅1⋅...⋅1, 1 is repeated n-2 times; 1 ≤ nn = nn⋅√n⋅1⋅...⋅1 ≤ (2√n+n-2)/n < 1+(2/√n); 1 ≤ nn < 1+(2/√n); 0 ≤ nn-1 < 2/√n < ε; 4/n < ε2, n/4 > 1/ε2, n > 4/ε2, nε > 4/ε2

Demonstration that the sequence nn is a decreasing monotone sequence from the third term onwards: n ≥ 3, n1/n > (n+1)1/(n+1); (n1/n)n(n+1) > ((n+1)1/(n+1))n(n+1), n⋅nn = nn+1 > (n+1)n, n > (n+1)n/nn, n > (1+1/n)n; (1+1/n)n < n for n ≥ 3; (1+1/n)n < 3 ≤ n; considering the binomial formula (a+b)n = nΣk=0(C(n,k)an-kbk), and C(n,k) = n!/k!(n-k)!; (1+1/n)n = nΣk=0(C(n,k)1n-k(1/n)k) = nΣk=0(C(n,k)1/nk) = nΣk=0(((n(n-1)...(n-k+1))/nk)(1/k!)) < nΣk=0(1/k!) = 1+1+1/2!+1/3!+...+1/n! < 1+1+1/2+1/22+...+1/2n-1; 1+1/2+1/22+...+1/2n-1 is the sum a geometric progression with first element = 1 and ratio = 1/2, and this sum is < 2, so 1 + this sum < 3


10 - LIMIT OF SEQUENCES - PART 2

To be monotonic is a sufficient condition for the existence of the limit, but it is not a necessary condition; a monotonic sequence has always a limit, but non-monotonic sequences can also have a limit

Fibonacci numbers: F0 := 0, F1 := 1, Fn+2 := Fn + Fn+1; F0 := 0, F1 := 1, Fn := Fn-1 + Fn-2, n > 1; n = 0, Fn = 0; n = 1, Fn = 1; n = 2, Fn = 1; n = 3, Fn = 2; n = 4, Fn = 3; n = 5, Fn = 5; n = 6, Fn = 8; n = 7, Fn = 13; n = 8, Fn = 21; n = 9, Fn = 34; n = 10, Fn = 55

Studying the ratio that each Fibonacci number has with the previous one: rn := Fn+1/Fn, n ≥ 1; n = 1, rn 1/1 = 1; n = 2, rn 2/1 = 2; n = 3, rn 3/2 = 1.5; n = 4, rn 5/3 = 1.6; n = 5, rn = 8/5 = 1.6; rn+1 = Fn+2/Fn+1 = (Fn+1+Fn)/Fn+1 = 1+1/rn; r1 := 1, r2 = 2; rn+1 := 1+1/rn; 1 ≤ r1 < r2, 1/r1 > 1/r2, 1+1/r1 > 1+1/r2 ⇔ r2 > r3; rn+1-rn = (1+1/rn)-(1+1/rn-1) = 1/rn-1/rn-1 = (rn-1-rn)/rnrn-1 = -(rn-rn-1)/rnrn-1; |rn+1-rn| = |rn-rn-1|/rnrn-1, rn ≥ 3/2, |rn+1-rn| ≤ |rn-rn-1|/2; this shows that this sequence converges to a limit value; limn→+∞(rn) = L, rn+1 = 1+1/rn ⇒ L = 1+1/L, L = (L+1)/L, L2-L-1 = 0, using the quadratic formula x = (-b±√b2-4ac)/2a, L = (1±√1+4)/2, L = (1±√5)/2, L = (1+√5)/2; (1+√5)/2 is an irrational number called golden ratio that is 1.61803398875...; this shows that a non-monotonic sequence can be convergent

Sequence studied by the Swiss mathematician Leonhard Euler (1707, 1783): an := (1+1/n)n < 3, this monotonic increasing sequence in bounded above by the number 3, the limit is < 3, and the limit is the supremum; bn := (1+1/n)n+1 = an(1+1/n) = an((n+1)/n), bn > an; bn is a monotonic decreasing sequence and the difference between bn and an tends to 0; the sequences an and bn converge to a limit called Euler's number, e = 2.7182818284590...; an is a monotonic increasing sequence because an < an+1, considering n+1 factors 1⋅(1+1/n)⋅...⋅(1+1/n) where (1+1/n) is repeated n times, n+1(1+1/n)n < (n+1+1)/(n+1) = 1+1/(n+1), (n+1(1+1/n)n)n+1 < (1+1/(n+1))n+1, (1+1/n)n < (1+1/(n+1))n+1, an < an+1; bn = an((n+1)/n), an = bn(n/(n+1)), bn-an = bn-bn(n/(n+1)) = bn(1-(n/(n+1))) = bn((n+1-n)/(n+1)) = bn(1/(n+1)), bn is a monotonic decreasing sequence and all terms are smaller than the first that is 4, bn-an ≤ 4/(n+1) < 4/n, this sequence tends to 0

limn→+∞((1+1/n)n) = limn→+∞((1+1/n)n+1) = e; e is the Euler's number and it is an irrational number; e = 2.7182818284590...

Considering the exponential function ax, for any value of a, the graph always passes through the point x = 0 and y = 1; the tangent to the graph at the point x = 0 and y = 1 has an angular coefficient 1 only when a = e, therefore in this case the exponential function is ex


11 - LIMIT OF FUNCTIONS - PART 1

f(x), f: A → ℝ, limx→x0(f(x)) = L

The function does not need to be defined at point x0; the value of the function at x0 is not important, but it is important that the function is defined close to x0

A neighborhood of a real number is an open interval centered on the number itself; I(x0,r) := (x0-r,x0+r)

A real number is the accumulation point of a set A if every neighborhood of this number contains infinite elements of A; if x0 is the accumulation point of A, then ∀ r > 0, I(x0,r) ⋂ A, this intersection contains infinite elements

A finite set has no accumulation points

The set ℕ of natural numbers is infinite but has no accumulation points; +∞ can be thought of as the only accumulation point of ℕ

A = (0,1]; 0 < x ≤ 1; points < 0 and > 1 are not accumulation points; the point 0 does not belong to A, but it is an accumulation point because the itersection I(x0,r) ⋂ A contains infinite elements; all the points of the interval A are accumulation points, even 0 that does not belong to A

An accumulation point may or may not be a point of A

When an accumulation point belongs to the set, it is said to be a non-isolated point of the set

f(x) = x2, A = ℝ; x0 = 1, f(x0) = 1; r(x) := (x2-1)/(x-1), A' = ℝ \ {1}, 1 does not belong to the definition set but is an accumulation point; as x approaches 1, r(x) is the angular coefficient of the tangent to the point (1,1), and is the derivative of f(x) calculated for x = 1; r(x) := (x2-1)/(x-1) = (x-1)(x+1)/(x-1) = x+1 := r*(x); x = 1, r*(x) = x+1 = 2, and for x ≠ 1 r(x) = r*(x), |r(x)-2| = |x+1-2| = |x-1| < ε

The limit of the angular coefficient of the secant, that is the angular coefficient of the tangent, measures the instantaneous rate of change of the function x

|f(x)-L| < ε, 0 < |x-x0| < δε

Definition of limit of a function: ∀ ε > 0, ∃ δε: x ∈ (A-{x0}) ⋂ I(x0ε) ⇒ |f(x)-L| < ε; the function converges to the limit L when x approaches x0 that is an accumulation point of the set A where the function is defined

|f(x)-L| < ε; f(x) ∈ I(L,ε); L-ε < f(x) < L+ε

0 < ε ≤ ε0, δε > 0

limx→x0(f(x)) = L; |f(x)-f(x0)| < ε, |x-x0| < δε

The function is continuous if, in a non-isolated point, the limit of the function coincides with the value of the function

All polynomial functions are continuous functions

f(x) = c, it is a continuous function, |f(x)-f(x0)| = 0

f(x) = x, it is a continuous function, |f(x)-f(x0)| = |x-x0|, δε = ε

f(x) = 2x, it is a continuous function, |f(x)-f(x0)| = |2x-2x0| = 2|x-x0| < ε, |x-x0| < ε/2 = δε

f(x) = mx, it is a continuous function, |f(x)-f(x0)| = |mx-mx0| = m|x-x0| < ε, |x-x0| < ε/m = δε

If a function is continuous then limx→x0(f(x)) = f(x0)

Continuous functions: all polynomial functions, exponential functions ax with a > 0 like ex, logarithmic functions log(x) and ln(x), sin(x), cos(x), tan(x) where it is defined, cot(x) where it is defined; these are called elementary functions

Example of discontinuous function: f(x) = [x], [0,1]; [x] represents the integer part; [x] = max {z ∈ ℤ; z ≤ x}; f(1) = 1, limx→1(f(x)) = 0


12 - LIMIT OF FUNCTIONS - PART 2

f: A → ℝ, L ∈ ℝ, x → x0, limx→x0(f(x)) = L

∀ ε > 0, ∃ δε > 0: x ∈ (A-{x0}) ⋂ I(x0ε) ⇒ |f(x)-L| < ε, f(x) ∈ I(L,ε)

Convergent function to the limit L, for x that tends to an accumulation point of domain A: ∀ ε > 0, ∃ δε > 0: x ∈ (A-{x0}) ⋂ I(x0ε) ⇒ |f(x)-L| < ε

A*(x0,δ) := (A-{x0}) ⋂ I(x0,δ); A(x0,δ) := A ⋂ I(x0,δ); I(L,ε), f(A*(x0,δ)) ⊆ I(L,ε); the definition of a continuous function is: I(f(x0),ε), f(A(x0,δ)) ⊆ I(f(x0),ε)

When the function is continuous at the point x0, x0 belongs to the definition set and is an accumulation point, so x0 is a non-isolated point; the function is continuous when the limit coincides with f(x0)

The concept of a function converging to an accumulation point is related to the concept of a continuous function at the same point

r(x) := (x2-1)/(x-1), r*(x) := x+1; f*(x) := {f(x), x ≠ x0; L, x = x0}; if f tends to L at x0, then f* is continuous at x0

f(x) = mx+q, m ≠ 0; f(x0) = mx0+q, f(x)-f(x0) = m(x-x0), |f(x)-f(x0)| = |m||x-x0| < ε, |x-x0| < ε/|m| =: δε; ε = f(x)-f(x0), δε = x-x0, |m| = ε/δε, δε = ε/|m|

f(x) = √x, A = [0,+∞], this function is continuous in all points of the domain; y = √x, y2 = x, 0 ≤ x1 < x2, √x2-√x1; 0 ≤ x1 < x2 ⇒ 0 ≤ √x2-√x1 ≤ √x2-x1, x2+x1-2√x1x2 ≤ x2-x1, 2x1 ≤ 2√x1x2, x1 ≤ √x1x2, x12 ≤ x1x2, x1 ≤ x2; verifying the continuity of the function, x0 < x, 0 < f(x)-f(x0) = √x-√x0 ≤ √x-x0 < ε, x-x0 < ε2 =: δε; note that if 0 < ε < 1, then ε2 < ε

f(x) = 1/x2, x ≠ 0; it is an even function or f(x) = f(-x); the graph is symmetrical with respect to the y-axis; ∀ M > 0, f(x) > M ⇔ 1/x2 > M ⇔ x2 < 1/M, |x| < 1/√M := δM, for 0 < |x| < δM f(x) > M, that is for x ∈ A*(x0M) f(x) > M

Positively diverging function for x tending to an accumulation point of domain A: ∀ M > 0, ∃ δM > 0: x ∈ (A-{x0}) ⋂ I(x0M) ⇒ f(x) > M

Negatively diverging function for x tending to an accumulation point of domain A: ∀ m < 0, ∃ δm > 0: x ∈ (A-{x0}) ⋂ I(x0m) ⇒ f(x) < m

Considering a function f defined in a set A unbounded above, that is A has no majorants and sup(A) = +∞, limx→+∞(f(x)) = L; ∀ ε > 0, ∃ δε: x ∈ A ⋂ (δε,+∞) ⇒ |f(x)-L| < ε

f(x) = 1/x, A = ℝ* = ℝ\{0}; L = 0, ∀ ε > 0, ∃ δε: x > δε, 0 < 1/x < ε, x > 1/ε := δε

Convergent function to the limit L as x tends to +∞: ∀ ε > 0, ∃ δε > 0: x ∈ A ⋂ (δε,+∞) ⇒ |f(x)-L| < ε

limx→0(1/x) = ∄

If a function has a limit, it is unique; it is impossible for the values assumed by a function to be in two different disjoint neighborhoods at the same time, considering for example L1 < L2 and ε < (L2-L1)/2

A function cannot be convergent and divergent at the same time, or simultaneously positively divergent and negatively divergent

The notions of limit on the right and limit on the left are related to the presence of an order relation in ℝ

f: A → ℝ, x0 is the accumulation point; A+(x0) := {x ∈ A, x > x0}, A-(x0) := {x ∈ A, x < x0}; limx→x0+(f(x)) = L+; limx→x0-(f(x)) = L-

If the limit on the right and the limit on the left exist and coincide, their common value is also the limit, and vice versa

f(x) := [x] = max {z ∈ ℤ, z ≤ x}; x0 = 1, limx→1+([x]) = 1 = [1], limx→1-([x]) = 0; the limit on the left is different from the limit on the right, the limit on the right coincides with the value of the function at the point, the function is continuous on the right and discontinuous on the left


13 - LIMIT THEOREMS - PART 1

The limit of the sum is equal to the sum of the limits

f,g: A → ℝ, x0 is the accumulation point; limx→x0(f(x)) = α, limx→x0(g(x)) = β; limx→x0(f(x)+g(x)) = α+β; the sum of two continuous functions is a continuous function; ∀ ε > 0, ∃ δ1 ∀ x ∈ A*(x01), |f(x)-α| < ε; ∀ ε > 0, ∃ δ2 ∀ x ∈ A*(x02), |g(x)-β| < ε; δ = min(δ12), x ∈ A*(x0,δ), |f(x)+g(x)-(α+β)| = |f(x)-α+g(x)-β| ≤ |f(x)-α|+|g(x)-β| < 2ε

If a function f(x) is convergent, as x tends to an accumulation point of its domain, it is bounded in a neighborhood of the same point

A function is regular when it has limit and can be convergent or divergent

A function is irregular when it has no limit

If a function is convergent, in A*(x0,δ) it is bounded, that is, with majorant and minorant

Remembering the triangle inequality |x+y| ≤ |x|+|y|, and |x⋅y| = |x|⋅|y|

f(x) → α, g(x) → β; A*(x0,δ), β-ε < g(x) < β+ε, |g(x)| = |g(x)-β+β| ≤ ε+|β| ≤ |β|+1 = c

|f(x)g(x)-αβ| = |f(x)g(x)-αg(x)+αg(x)-αβ| = |g(x)(f(x)-α)+α(g(x)-β)| ≤ |g(x)||f(x)-α|+|α||g(x)-β| ≤ cε+|α|ε = (c+|α|)ε

The limit of the product is equal to the product of the limits

Sums and products of continuous functions are still continuous functions; therefore polynomial functions are continuous functions

Every monomial function is continuous, so polynomial functions are continuous

x → c, it is the constant function and it is a continuous function

x → x, it is the identity function and it is a continuous function

x → cxn, it is a monomial function and it is a continuous function

A dividing rational expression is a ratio between polynomials and is defined at all points x that do not cancel the denominator

Property of permanence of the sign: if a function f(x) converges to a limit other than 0, when x tends to an accumulation point of the domain, it maintains the same sign as the limit in a suitable neighborhood of the point itself

f(x)/g(x); β ≠ 0, β > 0, 0 < β/2 ≤ β-ε < g(x) < β+ε, 0 < ε ≤ β/2

The limit of the quotient is equal to the quotient of the limits, but the denominator function must tend to a limit other than 0

The reciprocal function of g(x) is 1/g(x)

1/g(x) → 1/β, β > 0; |1/g(x)-1/β| = |(β-g(x))/βg(x)| < 2ε/β2 = (2/β2

f(x)/g(x) = f(x)(1/g(x)) → α/β

Polinomyal functions are continuous

Rational functions are continuous in all points where they are defined, that is, in all points where the denominator is non-zero

limx→x0(f(x)) = 0, f(x) > 0, ∀ x ∈ A*(x0,δ), limx→x0(1/f(x)) = +∞; ∀ M > 0, 1/f(x) > M ⇔ 0 < f(x) < 1/M := ε

limx→x0(f(x)) = 0, f(x) < 0, ∀ x ∈ A*(x0,δ), limx→x0(1/f(x)) = -∞

limx→0(x2) = 0, x2 > 0 for x ≠ 0, limx→0(1/x2) = +∞

limx→0(1/x) = +∞ for x > 0; limx→0(1/x) = -∞ for x < 0

limx→x0(|f(x)|) = +∞ ⇒ limx→x0(1/f(x)) = 0; |1/f(x)| = 1/|f(x)| < ε, |f(x)| > 1/ε := M

a > 1, limn→+∞(an) = +∞; a = 1+d, an = (1+d)n ≥ 1+nd, 1+nd diverges to +∞ and therefore also an; n < 0, n = -|n|, limn→-∞(an) = lim|n|→+∞(a-|n|) = limk→+∞(1/ak) = 0

A function is even when f(-x) = f(x), and is symmetrical to the y-axis

A function id odd when f(-x) = -f(x), and is symmetrical to the origin of the x and y axes

tg(x) = tan(x) = sin(x)/cos(x), cos(x) ≠ 0; cos(x) is an even function because cos(-x) = cos(x), and cos(x) is symmetrical to the y-axis; cos(x) = 0 for x = π/2 and -π/2; cos(x) = 0 for π/2+kπ, k ∈ ℤ; considering tan(x) for -π/2 < x < π/2; tan(x) is an odd function because tan(-x) = -tan(x), and tan(x) is symmetrical to the origin of the x and y axes; limx→(π/2)-(tan(x)) = +∞; limx→(-π/2)+(tan(x)) = -∞; tan(x) = sin(x)/cos(x) = 1/cos(x)/sin(x) → +∞, for A-(π/2,δ); tan(x) = sin(x)/cos(x) = 1/cos(x)/sin(x) → -∞, for A+(-π/2,δ)


14 - LIMIT THEOREMS - PART 2

f1: A1 → ℝ, f2: A2 → ℝ, f1(A1) ⊆ A2; f1 is defined in A1 and f2 is defined in A2 and the image of A1, obtained through f1, is contained in the set A2, that is the domain of the function f2; x ∈ A1 → f1 → f1(x) := y → f2 → f2(f1(x)); x ↦ f2(f1(x)) = (f2∘f1)(x), the order of the functions is important, f1 acts before f2; the commutative property does not apply to the composition of functions

f1: x → x+1, f2: x → 2x; x → f1 → x+1 → f2 → 2(x+1) = 2x+2; x → f2 → 2x → f1 → 2x+1; f2(f1(x)) ≠ f1(f2(x))

By composing continuous functions we obtain continuous functions

f1: A1 → ℝ, f2: A2 → ℝ, f1(A1) ⊆ A2; f1(x0) = y0, f2(y0) = f2(f1(x0)); x0 → f1 → f1(x0) := y0 → f2 → f2(f1(x0)); there is a positive δ such that the image of A1(x0,δ) through f2 after f1 is contained in the neighborhood of ε radius of the point f2(f1(x0)), and this is the continuity of the compound function

x ↦ √x2+x+1; f1 = x2+x+1 := y, f2 = √y; x ↦ f1 ↦ x2+x+1 := y ↦ f2 ↦ √y; f1 and f2 are continuous functions and the compound function f2 after f1 is also a continuous function; f1 is continuous because polynomials functions are continuous; f2 is continuous because the root function is continuous

The function f is minorant of g, in a set A, if f(x) is less than or equal to g(x) for every x of A

f,g: A → ℝ, f is minorant of g if f(x) ≤ g(x)

If the function f(x) is greater than or equal to the value c in a neighborhood of an accumulation point, the limit of f, if any, is greater than or equal to c

f(x) ≥ c, A*(x0,δ); L < c is false because ε < c-L

If the function f(x) converges to the limit L > c, as x tends to an accumulation point of its domain, it is greater than c in a suitable neighborhood of the same point

f*(x) = f(x)-c, f* → L-c > 0; it is an extension of the theorem of sign permanence

Squeeze theorem or theorem of the two carabinieri: a function between two functions converging to the same limit, also converges to the same limit

f,g,h: A → ℝ, x0, x ∈ A*(x0,δ); f(x) ≤ g(x) ≤ h(x); if f(x) → L and h(x) → L, then g(x) → L; L-ε < f(x) → L ≤ g(x) ≤ h(x) → L < L+ε, for the squeeze theorem, or theorem of the two carabinieri, g(x) → L

If a function has a positively divergent function as minorant, then it is positively divergent

f(x) ≤ g(x), if limx→x0(f(x)) = +∞ then limx→x0(g(x)) = +∞

a > 1, limn→+∞(an) = +∞; an = (1+d)n ≥ 1+nd, 1+nd is a positively divergent function; an has a positively divergent function as minorant, so it is positively divergent

If a function has a negatively divergent function as majorant, then it is negatively divergent

f(x) ≤ g(x), if limx→x0(g(x)) = -∞ then limx→x0(f(x)) = -∞

a > 1, n ∈ ℕ*, 1 < na < 1+(a-1)/n; 1 → 1 < na < 1 → 1 + (a-1)/n → 0, for the squeeze theorem, or theorem of the two carabinieri, na → 1

With the squeeze theorem, or theorem of the two carabinieri, is also possible to prove that nn → 1

The exponential function is continuous in ℝ

x ↦ ax, x0 ∈ ℝ, a > 1, x > x0; ax1+x2 = ax1ax2; 0 < ax-ax0 = ax-x0+x0-ax0 = ax-x0ax0-ax0 = ax0(ax-x0-1); na = a1/n → 1; 0 < x-x0 ≤ 1/n ⇔ n ≤ 1/(x-x0/sub>); considering the integer part n ≤ [1/(x-x0)], n = n(x) = [1/(x-x0)]; 0 < ax0(ax-x0-1) ≤ ax0(a1/n-1) ≤ ax0(a-1)/n because 1 < na < 1+(a-1)/n, 0 < na - 1 < (a-1)/n; (a-1)/n → 0 and ax → ax0; this is the demonstration of the continuity to the right of the function ax

an = (1+1/n)n, bn = (1+1/n)n+1; f(x) = (1+1/x)x = ((x+1)/x)x, x > 0 or x < -1; limx→±∞((1+1/x)x) = e ≈ 2.71


15 - LIMIT THEOREMS - PART 3

x ↦ ax, 0 < a ≠ 1; limx→+∞((1+1/x)x) = e, limx→-∞((1+1/x)x) = e; (x+1)/x, x > 0, x < -1

n = [x], n is the integer part of x or the smallest integer that does not exceed x; [x] ≤ x < [x]+1, n ≤ x ≤ n+1; (1+1/([x]+1))[x] < (1+1/x)x < (1+1/[x])[x]+1; (1+1/(n+1))n < (1+1/x)x < (1+1/n)n+1; ((1+1/(n+1))n+1)/(1+1/(n+1)) < (1+1/x)x < (1+1/n)n+1; ((1+1/(n+1))n+1)/(1+1/(n+1)) → e < (1+1/x)x < (1+1/n)n+1 → e, so limx→+∞((1+1/x)x) = e

Circular functions, sine and cosine, are continuous in ℝ

One radian is defined as the angle subtended from the center of a circle which intercepts an arc equal in length to the radius of the circle; the magnitude in radians of a subtended angle is equal to the ratio of the arc length to the radius of the circle; θ = a/r, where θ is the subtended angle in radians, a is arc length, and r is radius

The measure in radians is the ratio between the length of the arc and the length of the radius; an angle measures a radian when the length of the subtended arc is equal to the length of the radius; 1 radian in degrees = 180/π = 57.2957795131...°, it is an irrational number like π, 1 rad = 180/π ≈ 57°

|sin(x)| ≤ |x|; sin(x) is the length of the cathetus which is less than the length of the hypotenuse which is less than the length of the arc which is the supremum of the lengths of the inscribed polygons, therefore |sin(x)| < |x|

Using the prosthaphaeresis formula is possible to demonstrate the continuity of the function sin(x); sin(x)-sin(x0) = 2sin((x-x0)/2)cos((x+x0)/2); |sin(x)-sin(x0)| = 2|sin((x-x0)/2)||cos((x+x0)/2)|; |sin(x)-sin(x0)| ≤ 2|(x-x0)/2|⋅1 = |x-x0|, δε = ε

cos(x) = sin(x+π/2); x ↦ x+π/2 := t ↦ sin(t); using the prosthaphaeresis formula is possible to demonstrate the continuity of the function cos(x)

The trigonometric functions, or circular functions, are continuous functions

sin(x)/x tends to 1 as x tends to 0

limx→0(sin(x)/x) = 1, x ≠ 0; sin(x)/x is an even function because the ratio between two odd functions is an even function; an even function is symmetrical with respect to the y-axis and for this reason the limit on the right coincides with the limit on the left; sin(x) → 0, x → 0, so 0/0 is an indeterminate form; limx→0((1-cos(x))/x) = 0, 1-cos(x) → 0, x → 0; geometrically we know that 0 < sin(x) < x < tan(x), sin(x) < x < sin(x)/cos(x), sin(x)/sin(x) < x/sin(x) < sin(x)/(cos(x)sin(x)), 1 < x/sin(x) < 1/cos(x), 1 → 1 < x/sin(x) < 1/cos(x) → 1, for the squeeze theorem, or theorem of the two carabinieri, x/sin(x) → 1 and therefore sin(x)/x → 1; f(x) = {sin(x)/x, x ≠ 0; 1, x = 0}, in this way we obtain the continuous function on the whole real axis

A monotone sequence, increasing or decreasing, is regular and therefore has a limit; if the sequence is monotone increasing, it tends to its supremum, finite or +∞; if the sequence is monotone decreasing, it tends to its infimum, finite or -∞

If f(x) is an increasing monotone function in a set A unbounded above, then limx→+∞(f(x)) = sup(f(A))

If f(x) is a decreasing monotone function in set A unbounded below, then limx→+∞(f(x)) = inf(f(A))

ax, a> 1, it is an increasing monotone function; this function is unbounded above because the sequence an diverges positively; the values assumed by an constitute a set which is unbounded above or without majorants; the function f(ax) is devoid of majorants or its image is devoid of majorants; f(ax) is monotone increasing and limx→+∞(ax) = +∞; limn→-∞(an) = 0; the image of f(ax), that is the set of values it assumes, is an unbounded above set, that is, the supremum is +∞, and the function consists of positive values, but limn→-∞(an) = 0, the infimum of the function or the infimum of the image of the function is zero and the function, being monotonous, tends to its infimum, so limx→-∞(ax) = 0; ax, a > 1, limx→+∞(ax) = +∞, limx→-∞(ax) = 0

Property of connection of the intervals: if x1,x2 ∈ I and x1 < x2, then x1 < x < x2 and x ∈ I

If a function is continuous in a closed bounded interval, the image, that is the set of values it assumes, is bounded above and below, that is, it has majorants and minorants; f: [a,b] → ℝ and f is continuous ⇒ sup(f([a,b])) ∈ ℝ, we want to show that the function is bounded above, that is, the supremum is a real number; with a constructive proof we need to show the existence of a majorant; with a proof for absurdity, the hypothesis is the negation of the thesis and we need to show that it brings to a contradiction; we want to show that the function is bounded above, or that the supremum is a real number, so we suppose absurdly that the function is unbounded above; absurdly we consider sup(f([a,b])) = +∞, but f(x0)-ε < f(x) < f(x0)+ε, and if a function is bounded in a set A, it is bounded in a subset of A, so the function is bounded above and below; sup(f([a,b])) = E ∈ ℝ, inf f([a,b]) = e ∈ ℝ

If a function is continuous in a bounded and closed interval, the image is bounded above and below, the supremum is the maximum, and the infimum is the minimum


16 - PROPERTIES OF CONTINUOUS FUNCTIONS ON AN INTERVAL

A continuous function on a bounded and closed interval [a,b] is bounded

If the hypothesis is denied, the thesis is also denied; f(x) = 1/x, (0,1], the graph of the function is a branch of equilateral hyperbola, this function is unbounded above and bounded below

Weierstrass' theorem or extreme value theorem: a continuous function f on a bounded and closed interval [a,b] has a maximum and a minimum

Demonstration by absurdity of the Weierstrass' theorem: E := sup(f([a,b])), ∀ x, f(x) ≤ E, if absurdly f(x) < E, then E-f(x) > 0, g(x) := 1/(E-f(x)), g(x) is continuous and should be bounded above, instead it is unbounded above, considering that the supremum is the smallest of the majorants, ∀ ε > 0 ∃ xε, E-ε < f(xε) < E, E-f(xε) < ε, 1/(E-f(xε)) = g(xε) > 1/ε = M, so g(x) is unbounded above but it should be bounded above, so f(x) < E is a contradiction

f(x) = 1/x, A = [1,+∞), f(A) = im(f) = (0,1], 1 is the maximum of the function, but there is no minimum because the function never assumes the value of 0

If a function is continuous on an interval and passes from negative to positive values, or vice versa, then it assumes the value 0 at least at one point; considering two points a and b where a < b, f(a) < 0 and f(b) > 0, it is possible to find an intermediate point c for studyng the sign of f(c); [an,bn], f(an)f(bn) < 0, 0 ≤ f(x0)2 ≤ 0; considering that the square of a real number is always ≥ 0, f(x0) = 0

A continuous function on an interval cannot take two values without taking all the intermediate values; f: I → ℝ, f(x1) = y1 < y2 = f(x2), y1 < y < y2, g(x) := f(x)-y = 0, f(x) = y

Continuous functions transform intervals into intervals, and in particular they transform limited and closed intervals into intervals of the same type

sin(x), ℝ → [-1,1]; sin(π/2) = 1, sin(-π/2) = -1, sin(X) = [-1,1]

tan(x), (-π/2,π/2) → (-∞,+∞) = ℝ

x ↦ ax, a > 1; ℝ → (0,+∞) = ℝ+

nx, [0,+∞) = ℝ+ → [0,+∞) = ℝ+

The inverse function is not the reciprocal function

The reciprocal function of f(x) is 1/f(x)

An injective function (also known as injection, or one-to-one function) is a function f that maps distinct elements to distinct elements; that is, f(x1) = f(x2) implies x1 = x2; every element of the function's codomain is the image of at most one element of its domain; injective functions are monotonous functions, strictly increasing or decreasing; a function is injective when lines parallel to the x-axis intersect the curve of the function in at most one point; sin(x) is not an injective function because it is possible to draw lines parallel to the x-axis that intersect the curve of sin(x) in several points

If a function is injective it is possible to obtain the inverse function; x ∈ I → f → f(x), f(x) = y → f-1 → x

f(x) = y = mx+q, y-q = mx, x = (y-q)/m = (y-q)(1/m) = f-1(y)

The function y = x2 is represented by a parabola and is not injective, since the lines parallel to the x-axis cut the curve of the function in two points and so it is impossible to obtain the inverse function; considering y = x2 defined in [0,+∞), that is the branch of the parabola in the first quadrant, we have obtained an injective function from which we can obtain the inverse function; y = x2 ⇔ x = √y; f: x ↦ x2, x ≥ 0, f-1: x ↦ √x, x ≥ 0; the graph of √x is a half parabola that is symmetrical to the half parabola y = x2 [0,+∞) with respect to the bisector of the first quadrant

The graph of the inverse function is symmetrical to the graph of the starting function with respect to the bisector of the first quadrant

If f is a continuous and strictly monotone function that transforms the interval I into the interval J, the inverse function is continuous and strictly monotone on J

f: I → J, f-1: J → I

A surjective function (also known as surjection, or onto function) is a function f that maps an element x to every element y; for every y, there is an x such that f(x) = y; every element of the function's codomain is the image of at least one element of its domain; it is not required that x be unique, the function f may map one or more elements of X to the same element of Y

A function is surjective if the codomain is the image of the function

If a function is continuous and monotone, the inverse function is also continuous and monotone

Exponential function: x ↦ ax, x ∈ ℝ, ℝ → ℝ+ = (0,+∞)

Logarithmic function: x ↦ loga(x), x ∈ ℝ+ = (0,+∞), ℝ+ = (0,+∞) → ℝ; the logarithmic function is the inverse function of the exponential function

x → f = exponential function → ax → f-1 = logarithmic function → x = aloga(x)

The logarithm is the inverse function to exponentiation; the logarithm of a number x is the exponent to which another fixed number, the base b, must be raised, to produce that number x; logb(x) = y if by = x, x > 0, b > 0, b ≠ 1

The logarithmic function is defined on ℝ+ and the image is ℝ

The exponential function is defined on ℝ and the image is ℝ+

The graph of the logarithmic function is symmetrical to the graph of the exponential function with respect to the bisector of the first and third quadrant

by = x, the base b logarithm of x is logb(x) = y

ax = y, the base a logarithm of y is loga(y) = x; 102 = 100, log10(100) = 2; 103 = 1000, log10(1000) = 3; ln(e2) = 2; ln(e3) = 3

by = x, y = logb(x), the anti logarithm or inverse logarithm is calculated by rasing the base b to the logarithm y, x = logb-1(y) = by = blogb(x), logb-1(y) = by = blogb(by)

ax = y, x = loga(y), the anti logarithm or inverse logarithm is calculated by rasing the base a to the logarithm x, y = loga-1(x) = ax = aloga(y), loga-1(x) = ax = aloga(ax); 102 = 10log10(102); 103 = 10log10(103); e2 = eln(e2); e3 = eln(e3)

Logarithm product rule: logb(x⋅y) = logb(x)+logb(y)

Logarithm product rule: loga(x⋅y) = loga(x)+loga(y); 5 = log10(105) = log10(102⋅103) = log10(102)+log10(103) = 2+3 = 5; 5 = ln(e5) = ln(e2⋅e3) = ln(e2)+ln(e3) = 2+3 = 5

Logarithm quotient rule: logb(x/y) = logb(x)-logb(y)

Logarithm quotient rule: loga(x/y) = loga(x)-loga(y); 2 = log10(102) = log10(105/103) = log10(105)-log10(103) = 5-3 = 2; 2 = ln(e2) = ln(e5/e3) = ln(e5)-ln(e3) = 5-3 = 2

Logarithm power rule: logb(xy) = y⋅logb(x)

Logarithm power rule: loga(xy) = y⋅loga(x); 2 = log10(102) = 2⋅log10(10) = 2⋅1 = 2; 2 = ln(e2) = 2⋅ln(e) = 2⋅1 = 2

Logarithm base switch rule: logb(c) = 1/logc(b)

Logarithm base switch rule: ax = y, loga(y) = x, loga(y) = 1/logy(a); 2 = log10(102) = 1/log102(10) = 1/(1/2) = 1⋅2 = 2; 2 = ln(e2) = 1/loge2(e) = 1/(1/2) = 1⋅2 = 2

Logarithm change of base rule: logb(x) = logc(x)/logc(b)

Logarithm change of base rule: ax = y, loga(y) = x, loga(y) = logb(y)/logb(a); 2 = log10(102) = ln(102)/ln(10) = 2⋅ln(10)/ln(10) = 2

Circular functions, or trigonometric functions, are not injective because they are periodic, so they cannot be reversed globally, but they can be reversed locally

It is incorrect to say that the square root function is the inverse of the squaring function, because the squaring function is not injective; the square root function is the inverse of the squaring function but restricted to the set of non-negative numbers

sin(x) [-π/2,π/2] → [-1,1]; the function is continuous and strictly increasing on this interval and therefore it can be inverted and its inverse is continuous and strictly increasing; arcsin(x), [-1,1] → [-π/2,π/2]

cos(x) [0,π] → [-1,1]; the function is continuous and strictly decreasing on this interval and therefore it can be inverted and its inverse is continuous and strictly decreasing; arccos(x), [-1,1] → [0,π]

tan(x) [-π/2,π/2] → ℝ; the function is continuous and strictly increasing on this interval and therefore it can be inverted and its inverse is continuous and strictly increasing; arctan(x), ℝ → [-π/2,π/2]; the oscillation, that is the difference between the supremum and the infimum of the image, or the variation of the function, is π


17 - INTRODUCTION TO THE CONCEPT OF VECTOR SPACE

In physics, force, velocity, and acceleration are vectors; F = m⋅a, the force F and the acceleration a are vectors; v = a⋅t, the velocity v and the acceleration a are vectors; Fg = m⋅g, the force of gravity Fg and the acceleration of gravity g are vectors directed towards the center of the Earth

A Euclidean vector or simply a vector, sometimes called a geometric vector or spatial vector, is a geometric object that has magnitude or length and direction; vectors can be added to other vectors according to vector algebra; a Euclidean vector is frequently represented by a ray, a directed line segment, or graphically as an arrow connecting an initial point A with a terminal point B; a vector is what is needed to carry the point A to the point B, and the Latin word vector means carrier; it was first used by 18th century astronomers investigating planetary revolution around the Sun; the magnitude of the vector is the distance between the two points, and the direction refers to the direction of displacement from A to B

Vectors with different direction can be added using the parallelogram rule; the sum of vectors with equal direction is a simple addition of the lengths of the vectors

Two vectors that have the same length but opposite direction are called opposite and their sum is the null vector

Commutative property of vectors: v+w = w+v

Associative property of vectors: (v+w)+z = v+(w+z)

A vector can be multiplied by a real number and the length of the vector is given by the product; if the real number is negative, the resulting vector has opposite direction; if the real number is zero, the resulting vector is null

First distributive property of vectors: a(v+w) = av+aw

Multiplying a vector by the number 1 does not change the vector, 1⋅v = v

Second distributive property of vectors: (a+b)⋅v = a⋅v+b⋅v

(a⋅b)⋅v = a⋅(b⋅v) = b⋅(a⋅v)

A tuple is a finite ordered list, or sequence, of elements; an n-tuple is a sequence, or ordered list, of n elements, where n is a non-negative integer; there is only one 0-tuple, referred to as the empty tuple; an n-tuple is defined inductively using the construction of an ordered pair; tuples are pairs of numbers, triples of numbers, quadruples of numbers, and so on

2 is the set of all pairs (a,b) of real numbers

Sum of pairs: (a,b)+(c,d) = (a+c,b+d)

Product of a number by a pair: m(a,b) = (ma,mb)

Commutative property in ℝ2 for the sum: (a,b)+(c,d) = (c,d)+(a,b)

Associative property in ℝ2 for the sum:((a,b)+(c,d))+(e,f) = (a,b)+((c,d)+(e,f))

Existence of 0 in ℝ2 for the sum: (a,b)+(0,0) = (a,b)

Existence of the opposite in ℝ2 for the sum: (a,b)+(-a,-b) = (0,0)

First distributive property in ℝ2 for the product: m((a,b)+(c,d)) = m(a,b)+m(c,d)

Second distributive property in ℝ2 for the product: (m+n)(a,b) = m(a,b)+n(a,b)

Property of the number 1 in ℝ2 for the product: 1(a,b) = (a,b)

Property in ℝ2 for the product: (mn)(a,b) = m(n(a,b))

Pairs have the same properties as vectors

Vectors and tuples from ℝ2 to ℝn have sum, product by a number, and these eight properties


18 - VECTOR SPACES - LINEAR DEPENDENCE AND INDEPENDENCE

A vector space, also called a linear space, is a set of objects called vectors, which may be added together and multiplied, scaled, by numbers, called scalars; scalars are often taken to be real numbers, but there are also vector spaces with scalar multiplication by complex numbers, rational numbers, or generally any field; the operations of vector addition and scalar multiplication must satisfy certain requirements, called vector axioms; to specify that the scalars are real or complex numbers, the terms real vector space and complex vector space are often used

Axiom 1 of vector spaces, concerning the sum: v+w = w+v, commutative property

Axiom 2 of vector spaces, concerning the sum: (v+w)+z = v+(w+z), associative property

Axiom 3 of vector spaces, concerning the sum: v+0v = v, existence of the null element or zero

Axiom 4 of vector spaces, concerning the sum: v+(-v) = 0v, existence of the opposite

Axiom 5 of vector spaces, concerning the product: a(v+w) = av+aw, first distributive property

Axiom 6 of vector spaces, concerning the product: (a+b)v = av+bv, second distributive property

Axiom 7 of vector spaces, concerning the product: 1v = v, 1 is the neutral element of the product

Axiom 8 of vector spaces, concerning the product: (ab)v = a(bv) = b(av)

ℝ, ℝ2, ℝn, are vector spaces

Vectors of two-dimensional space form a vector space

{(x,x) ∈ ℝ2}; (1,1)+(2,2) = (3,3); α(x,x) = (αx,αx); (0,0)+(x,x) = (x,x); -(x,x) = (-x,-x)

v+(-v) = 0v; v-v = 0v

(x+y)+(-(x,y)) = (x+y)-(x+y)

0⋅v = 0⋅v; a⋅0v = 0v

Product cancellation law: a⋅v = 0v ⇒ a = 0 or v = 0v

-(av) = (-a)v = a(-v)

-(4(2,3)) = (-4)(2,3) = (-8,-12); -(4(2,3)) = 4(-2,-3) = (-8,12)

If W is a subspace of v, W ⊂ v, and W is a vector space with the sum and the product of v

{(x,x) ∈ ℝ2} is a subset of ℝ2, and it is a vector subspace of ℝ2

{(x,y,x+y) ∈ ℝ3}; is a subset of ℝ3, a vector subspace of ℝ3; (3,2,5)

If in a set the properties of vector space are not valid, then the set is not a vector space

W = {(x,x+1)} is a subset of ℝ2 but it is not a vector subspace of ℝ2, because sum and product operations are not valid; (3,3+1) = (3,4), (3,4)+(5,6) = (8,10) ∉ {(x,x+1)}, 2(3,4) = (6,8) ∉ {(x,x+1)}; W = {(x,x+1)} is not a vector space

Linear combination: a1v1+a2v2+...+anvn

3(2,-2)+(-4)(0,1)+7(1,7) is a linear combination in ℝ2; 3(2,-2)+(-4)(0,1)+7(1,7) = (6,-6)+(0,-4)+(7,49) = (13,39)

food A: 30% fats, 20% carbohydrates, 40% protein; food B: 10% fats, 30% carbohydrates, 5% protein; food C: 12% fats, 2% carbohydrates, 9% protein; 100 grams of food A (30,20,40); 100 grams of food B (10,30,5); 100 grams of food C (12,2,9); 40 grams of A with 50 grams of B with 40 grams of C = (40/100)(30,20,40)+(50/100)(10,30,5)+(40/100)(12,2,9) = (12,8,16)+(5,15,2.5)+(4.8,0.8,3.6) = (21.8,23.8,22.1); it is a linear combination of 3 elements of vector space of triples

Calculate the linear combination of these 3 vectors of ℝ3: 2(1,1,1)+1(2,2,2)-(4,4,4) = (0,0,0); these are linearly dependent vectors because it is possible to obtain a null linear combination using non-zero coefficients

a(1,0)+b(1,1) = (0,0); a+b = 0, b = 0, a = 0; these are linearly independent vectors because it is not possible to obtain a null linear combination using non-null coefficients

Linearly independent vectors: a1v1+a2v2+...+anvn = 0 ⇒ a1 = a2 = ... = an = 0

If it is possible to linearly combine vectors with at least one non-null coefficient and get the null vector, then the vectors are linearly dependent

v1, ..., vm are linearly independent vectors if and only if: v1 ≠ 0v, v2 is not a multiple of v1, v3 is not a linear combination of v1 and v2, vm is not a linear combination of v1, ..., vm-1

Vectors are linearly independent if they cannot be obtained from a linear combination of the preceding vectors

The fundamental versors of ℝ3 e1 = (1,0,0), e2 = (0,1,0), e3 = (0,0,1) are linearly independent in ℝ3; (1,0,0) ≠ (0,0,0), (0,1,0) ≠ a(1,0,0), (0,0,1) ≠ a(1,0,0)+b(0,1,0) because it is impossible that 0 = a and 0 = b and 1 = 0

The fundamental versors of ℝn e1 = (1,0,0,...,0), e2 = (0,1,0,...,0), en = (0,0,0,...,1), are linearly independent in ℝn

In a linear combination of linearly independent vectors the coefficients are unique; if v = a1v1+...+anvn and v = b1v1+...+bnvn are linearly independent then a1 = b1,...,an = bn

Linearly independent vectors can be described by only one linear combination because coefficients are unique

(1,0,0),(0,1,0),(0,0,1); (1,2,1) = a(1,0,0)+b(0,1,0)+c(0,0,1), a = 1, b = 2, c = 1

If the vectors are linearly dependent their linear combination can be written in different ways because the coefficients are not unique

The coefficients of a linear combination are also called components of the vector


19 - GENERATORS, BASES AND DIMENSION OF A VECTOR SPACE

{v1,v2,...,vn} generate V if for each v in V is v = a1v1+a2v2+...+anvn

The fundamental versors e1 = (1,0,0,...,0), e2 = (0,1,0,...,0), en = (0,0,0,...,1), generate ℝn; fundamental versors are generators of ℝn, that is, each element of ℝn can be written as a linear combination of fundamental versors

n = 3, (1,0,0),(0,1,0),(0,0,1); ℝ3, (a,b,c) = a(1,0,0)+b(0,1,0)+c(0,0,1); fundamental versors can generate any vector

(1,1),(1,0) are not fundamental versors of ℝ2 but are generators of ℝ2, that is, with their linear combinations a(1,1)+b(1,0) it is possible to obtain any vector of ℝ2; a(1,1)+b(1,0) = (α,β), a+b = α, a = β, b = α-a = α-β; a(1,1)+b(1,0) = (3,5), a+b = 3, a = 5 = β, b = 3-a = 3-5 = -2 = α-β; this is a system of generators for the vector space ℝ2

v1 = (1,1), v2 = (1,0), v3 = (0,1), these 3 vectors are generators for ℝ2; a(1,1)+b(1,0)+c(0,1) = (α,β), a+b = α, a+c = β, there are infinite solution; these generators are not bases of vector space

The bases of vector spaces are linearly independent generators; {v1,v2,...,vn} base of V ⇒ linearly independent generators

A vector is a linear combination of linearly independent vectors when the coefficients are unique; v = a1v1+...+anvn, the coefficient a1...an are unique because it is a linear combination of linearly independent vectors

A base is a set of generators such that each vector v of the vector space is written as a linear combination of these vectors in a unique way

The fundamental versors of ℝn are not only generators but are also a base of ℝn because they are linearly independent; each element of ℝn can be written uniquely as a linear combination of these vectors; e1 = (1,0,...,0), e2 = (0,1,...,0), en = (0,0,...,1), versors of ℝn are a base of ℝn

In vector spaces there are other bases besides the fundamental versors

The fundamental versors of ℝ2 are (1,0) and (0,1) and are generators and bases of ℝ2; vectors (1,1) and (1,0) are generating vectors of ℝ2 because they can generate any other vector of ℝ2, and they form a base because they are linearly independent since (1,1) is not zero and (1,0) is not a multiple of (1,1); any linear combination of the vectors (1,1) and (1,0) has unique coefficients

v1 = (1,1), v2 = (1,0), v3 = (0,1), are generators of ℝ2, but they are not a base of the vector space because they are non linearly independent generators; if they were linearly independent, each vector of vector space could be written in a unique way as a linear combination of these vectors

Vectors are generators of a vector space when they can generate all the vectors of the vector space; the generating vectors can form a base when the linear combination of these vectors has unique coefficients

In a vector v = a1v1+a2v2+...+anvn the coefficients, a1, a2, an written in order, are the components of v with respect to the base v1, v2, vn

A base must be in order; the vectors of a base have a precise order; changing the order of the vectors in the linear combination changes the base because the components change

Steinitz theorem: x1,x2,...,xn are generators, and y1,y2,...,ym are linearly independent, then m ≤ n; the number m of linearly independent vectors can never exceed the number n of generators

Corollary of Steinitz's theorem: all the bases of a vector space have the same number of elements; base1 = v1,...,vm, base2 = w1,...,wr, a base is a linear combination of linearly independent vectors, so for Steinitz theorem m ≤ r and r ≤ m, therefore m = r, that is, all the bases of a vector space have the same number of elements

Dimension of a vector space: if v1,v2,...,vn is a base of V, all bases of V have n elements, then n = dim(V); in a vector space all the bases have the same number of vectors, and this number is the dimension of the vector space; the dimension of a vector space is the number of elements of any base of the vector space

dim(ℝ2) = 2, all bases of ℝ2 are of 2 elements

dim(ℝ3) = 3, all bases of ℝ3 are of 3 elements

dim(ℝ4) = 4, all bases of ℝ4 are of 4 elements

dim(ℝn) = n, all bases of ℝn are of n elements

3 vectors cannot form a base of ℝ2, because a base of ℝ2 is of 2 elements

Consequences of Steinitz's theorem: dim(V) = n ⇒ n linearly independent vectors form a base and there is no need to verify that they are generators, dim(V) = n ⇒ n generators form a base and there is no need to verify that they are linearly independent

Dimension of subspaces: if W is a subspace of V then dim(W) ≤ dim(V); if dim(V) = n, then 0 ≤ dim(W) ≤ n; if dim(W) = 0, then W = {0v}; if dim(W) = n, then W = V

In an environment V there is a subspace W = L(v1,...,vm); the subspace W has the vectors v1,...,vm as generators; vectors of W are linear combinations of vectors v1,...,vm; it is important to find a method for obtaining a base for this vector subspace; a system of generators is not a base when the vectors are not linearly independent because they are too many, and then the removal method is used

Removal method: to find the base, from a generator system v1,v2,...,vm, the null vectors and any vector that is a linear combination of the previous ones are removed

v1 = (1,1), v2 = (1,0), v3 = (0,1), are generators of ℝ2, ℝ2 = L(v1,v2,v3); a base is (v1,v2), but also (v3,v1) is a base

Completion method of linearly independent vectors: if w1,w2,...,wr is a system of linearly independent vectors, it possible to add a system of generators of V v1,v2,...,vm obtaining the system w1,w2,...,wr,v1,v2,...,vm, and from this it is possible to obtain a base with the method of removal

Find the base of the generator system of R3 (1,1,1),(1,0,0),(0,1,0),(0,0,1); (1,1,1) is not null, so it is part of the base; (1,0,0) is not a multiple of (1,1,1), so it is part of the base; (1,1,0) = a(1,1,1)+b(1,0,0), 1 = a+b, 1 = a, 0 = a, the vector (1,1,1) is not it is a linear combination of the previous ones so it forms part of the base; considering the Steinitz's theorem the number of elements of a base is equal to the dimension of the vector space, therefore the base of R3 is formed by the 3 vectors (1,1,1),(1,0,0),(0,1,0)

Base: linearly independent generators; a base of a vector space is a set of linearly independent generators

Dimension: number of elements of each base; the dimension of a vector space is the number of vectors of the base


20 - MATRICES - PART 1 - RANK AND REDUCTION

food A = 30% fats, 10% carbohydrates, 10% proteins; food B = 20% fats, 20% carbohydrates, 5% proteins; food C = 15% fats, 15% carbohydrates, 10% proteins; 100 grams of food A = (30,10,10); 100 grams of food B = (20,20,5); 100 grams of food C = (15,15,10); matrix = [(30,10,10),(20,20,5),(15,15,10)]; a matrix is a table, in this case each row is a food and the columns are fats, carbohydrates and proteins

Example of matrix: [(2,1,0),(3,4,-2)]

Example of matrix: [(0,3,4,1),(0,1,1,1),(1,-1,0,0)]

Matrix with m rows and n columns: A = [(a1,1,...,a1,n),(...,...,...),(am,1,...,am,n)]

ai,j is the element a of the matrix in row i and column j

Row space: L(R1,...,Rm) = vector space in Rn generated by R1 = (a1,1,...,a1,n), ..., Rm = (am,1,...,am,n); each row is a tuple and an element of Rn; the rows are vectors of Rn and generate a vector subspace of Rn; linear combinations of the row vectors of the matrix generate the space of rows

Considering the matrix [(1,1,1),(2,1,0)], the row space is generated by the vectors (1,1,1) and (2,1,0) that are a vector subspace of 2 vectors in ℝ3

Column space: L(C1,...,Cm) = vector space in Rm and generated by C1 = (a1,1,...,a1,n), ..., Cm = (a1,n, ..., am,n); each column is a tuple and an element of Rm; the columns are vectors of Rm and generate a vector subspace of Rm; linear combinations of the column vectors of the matrix generate the space of columns

Considering the matrix [(1,1,1),(2,1,0)], the column space is generated by the vectors (1,2), (1,1), (1,0), that are a vector subspace of 3 vectors in ℝ2

The row space and the column space are completely different spaces, generated by different vectors, which can also be in different environments

A matrix is square if the number of rows is equal to the number of columns, so the row space and the column space are contained in the same environment

The matrix [(1,1,1),(2,2,2,),(0,0,3)] is square; the space of rows is generated by the vectors (1,1,1), (2,2,2), (0,0,3), that are a vector subspace of 3 vectors in ℝ3; the space of columns is generated by the vectors (1,2,0), (1,2,0), (1,2,3), that are a vector subspace of 3 vectors in ℝ3; only in the special case of a square matrix the row space and the column space are contained in the same environment

Rank of a matrix: dim(L(R1,...,Rm)) = dim(L(C1,...,Cn)) = rank of A = ρ(A)

Calculate the rank of the matrix A = [(1,2,1),(0,1,3)]; row space = L((1,2,1),(0,1,3)) ⊆ R3, the vector (1,2,1) is not null and the vector (0,1,3) is not a multiple of (1,2,1), so the dimension of the vector row space is 2 = ρ(A); column space = L((1,0),(2,1),(1,3)) ⊆ R2, the vector (1,0) is not null and the vector (2,1) is not a multiple of (1,0), dimension of column space = 2 = ρ(A)

Considering the matrix A = [(1,2,3-1),(4,3,2,1),(5,5,-1,7)], we need a technique for finding the dimension of the row space, the dimension of the column space, and the rank of a matrix, even when the matrix has many rows and many columns

Calculate the dimension of the row space, the dimension of the column space and the rank of the matrix A = [(1,2,1),(0,1,3),(0,5,0)]; the first row is not null, the second row is not a multiple of the first row; we need to understand if the third row is a linear combination of the first and the second row, R3 = aR1+bR2 that is false, this matrix has 3 linearly independent rows which are 3 linearly independent vectors that generate the vector space, therefore they are a base of the vector space, so ρ(A) = 3; this is an example of a reduced matrix per row and the dimension of the row space is simply the number of the rows and from this we immediately obtain the rank of the matrix which is equal to the dimension of the row space

A matrix is reduced by rows when there are only zeros under some elements; if a matrix is reduced by rows, the rank of the matrix is the number of non-zero rows

[(1,0,0),(2,4,0),(4,3,-1)] this is a matrix reduced by columns, so the rank of the matrix is the number of non-zero columns

A matrix is reduced by columns when there are only zeros to the right of some elements; if a matrix is reduced by columns, the rank of the matrix is the number of non-zero columns

To find the rank of a non-reduced matrix it is possible to calculate the rank of a reduced matrix that has the same rank as the starting matrix

To reduce a matrix, it is possible to perform elementary transformations on the rows that do not change the row space and therefore the rank: multiply a row by a non-zero number, Ri → aRi, a ≠ 0; replace a row with another row, Ri ↔ Rj, i ≠ j; add to a row the multiple of another row, Ri → Ri+aRj, a ≠ 0, i ≠ j; these operations do not change the row space; to calculate the rank of a matrix we perform these elementary transformations, obtaining a new matrix reduced by rows, and the rank of this new matrix is equal to the rank of the starting matrix

Calculate the rank of the matrix A = [(1,1,1),(2,1,1),(3,1,-1)]; A → A1, ρ(A) = ρ(A1); [(1,1,1),(2,1,1),(3,1,-1)], R2 → R2-R1, [(1,1,1),(1,0,0),(3,1,-1)], R3 → R3+R1, [(1,1,1),(1,0,0),(4,2,0)], R3 → R3-4R2, [(1,1,1),(1,0,0),(0,2,0)], the rank of this matrix is 3, ρ(A) = 3; another method is [(1,1,1),(2,1,1),(3,1,-1)], R2 → R2-R1, [(1,1,1),(1,0,0),(3,1,-1)], R3 → R3+R1, [(1,1,1),(1,0,0),(4,2,0)], R2 ↔ R3, [(1,1,1),(4,2,0),(1,0,0)], the rank of this matrix is 3, ρ(A) = 3

The rank of a matrix is obtained by reducing the matrix by rows or by columns, and this allows to calculate the dimension of a vector space generated by its vectors

The vectors of a vector space can be written as rows of a matrix; calculating the rank of the matrix allows to obtain the dimension of the row space, and the row space is the vector space that has these vectors as generators

v1 = (a1,1,...,a1,n), ..., vm = (am,1,...,am,n) ⇒ A = [(a1,1,...,a1,n),(...,...,...),(am,1,...,am,n)]

In the vector space R4, calculate the dimension of the subspace W generated by the vectors (1,1,2,1),(2,1,0,3),(4,4,1,0); these 3 vectors become the rows of a matrix with 3 rows and 4 columns [(1,1,2,1),(2,1,0,3),(4,4,1,0)]; reducing this matrix by rows we obtain the dimension of the row space, that is the dimension of the vector subspace; [(1,1,2,1),(2,1,0,3),(4,4,1,0)], R3 → 2R3-R1, [(1,1,2,1),(2,1,0,3),(7,7,0,-1)], R3 → R3-7R2, [(1,1,2,1),(2,1,0,3),(-7,0,0,-22)], ρ(A) = 3, dim(W) = 3; the 3 rows are 3 linearly independent vectors, and are a base of the row space


21 - MATRICES - PART 2 - OPERATIONS

It is possible to add two matrices when they have the same number of rows and the same number of columns, and the procedure is to add the elements of the two matrices that have the same indices: [(a1,1,...,a1,n),(...,...,...),(am,1,...,am,n)]+[(b1,1,...,b1,n),(...,...,...),(bm,1,...,bm,n)] = [(a1,1+b1,1,...,a1,n+b1,n),(...,...,...),(am,1+bm,1,...,am,n+bm,n)]; the result of the addition is a matrix with the same number of rows and columns as the initial matrices; it is not possible to add two matrices that have different number of rows and columns

A = [(2,1,0),(3,0,2)], B = [(1,4,1),(2,1,-1)]; A+B = [(3,5,1),(5,1,1)]

It is possible to multiply a matrix by a real number: r⋅A = r[(a1,1,...,a1,n),(...,...,...),(am,1,...,am,n)] = [(r⋅a1,1,...,r⋅a1,n),(...,...,...),(r⋅am,1,...,r⋅am,n)]; the result of the product is a matrix with the same number of rows and columns as the initial matrix; it is possible to multiply any number by any matrix

a = -2, A = [(2,1,0),(3,0,2)]; -2A = [(-4,-2,0),(-6,0,-4)]

a = 0, A = [(2,1,0),(3,0,2)]; 0A = [(0,0,0),(0,0,0)]; multiplying a matrix by 0, the result is a null matrix

The sum of matrices is commutative: A+B = B+A

The sum of matrices is associative: A+(B+C) = (A+B)+C

The null matrix exists: A+0 = 0+A = A

The opposite matrix exists: A+(-A) = A-A = 0

Null matrix = 0 = [(0,...,0),(...,...,...),(0,...,0)]

A+(-A) = [(0,...,0),(...,...,...),(0,...,0)]

A = [(1,2,0),(-1,4,2)], -A = [(-1,-2,0),(1,-4,-2)]; A-A = [(0,0,0),(0,0,0)]

Distributive property of the product of a number for a sum of matrices: a(A+B) = aA+aB

Distributive property of the sum of two numbers for a matrix: (a+b)A = aA+bA

A matrix multiplied by 1 remains unchanged, the number 1 is the neutral element of the product: 1A = A

Distributive property of a product of two numbers for a matrix: (ab)A = a(bA) = b(aA)

A matrix with m rows and n columns is indicated as ℝm,n, and in a matrix the operations sum of matrices and matrix product for a real number are defined, and this is a vector space; ℝ2,3 is the vector space of matrices with 2 rows and 3 columns; ℝ4,2 is the vector space of matrices with 4 rows and 2 columns; ℝn,n is the vector space of square matrices with n rows and n columns; the matrix is an extension of the concept of tuple

Product of matrices: A = [(a1,1,a1,2,a1,3),(a2,1,a2,2,a2,3),(a3,1,a3,2,a3,3)], B = [(b1,1,b1,2,b1,3),(b2,1,b2,2,b2,3),(b3,1,b3,2,b3,3)]; calculating the element c1,3 of the matrix C = AB, c1,3 = a1,1⋅b1,3+a1,2⋅b2,3+a1,3⋅b3,3

The product of a matrix A by a matrix B can be made if the number of elements of the rows of A is equal to the number of elements of the columns of B; the number of elements in a row equals the number of columns, and the number of elements in a column equals the number of rows; to make the product A⋅B, the number of columns of the matrix A must be equal to the number of rows of the matrix B; Am,p⋅Bp,n

A = [(2,1,0),(3,3,0)], B = [(4,4,2)(2,2,4)]; the product A⋅B cannot be done because A has 3 columns and B has 2 rows

A = [(2,1),(3,3)], B = [(4,4,2)(2,2,4)]; C = A⋅B, c1,1 = 2⋅4+1⋅2 = 8+2 = 10, c1,2 = 2⋅4+1⋅2 = 8+2 = 10, c1,3 = 2⋅2+1⋅4 = 4+4 = 8, c2,1 = 3⋅4+3⋅2 = 12+6 = 18, c2,2 = 3⋅4+3⋅2 = 12+6 = 18, c2,3 = 3⋅2+3⋅4 = 6+12 = 18; C = A⋅B = [(2,1),(3,3)][(4,4,2)(2,2,4)] = [(10,10,8),(18,18,18)]; the number of rows of C is 2 as the number of rows of A, the number of columns of C is 3 as the number of columns of B

The matrix resulting from the product of matrix A by matrix B has a number of rows equal to the number of rows in matrix A, and a number of columns equal to the number of columns in matrix B; Am,p⋅Bp,n = Cm,n

A = [(1,1,1),(3,3,0),(0,-1,4)], B = [(1,2,1)]; C = A⋅B, c1,1 = 1⋅(-1)+1⋅2+1⋅1 = -1+2+1 = 2, c2,1 = 3⋅(-1)+3⋅2+0⋅1 = -3+6+0 = 3, c3,1 = 0⋅(-1)+(-1)⋅2+4⋅1 = 0+(-2)+4 = 2; C = A⋅B = [(1,1,1),(3,3,0),(0,-1,4)][(1,2,1)] = [(2),(3),(2)]; the matrix C, resulting from the product of A⋅B, has 3 rows as the matrix A and 1 column as the matrix B

Matrix A can be multiplied by matrix B when the number of columns of A equals the number of rows of B, and the resulting matrix C has the number of rows of A and the number of columns of B

In an identical matrix, the elements of the diagonal are equal to 1 and all the others are equal to 0; a1,1 = 1, a2,2 = 1, a3,3 = 1, ap,p = 1; I = [(1,0,0),(0,1,0),(0,0,1)]; I = [(1,0,0,0),(0,1,0,0),(0,0,1,0),(0,0,0,1)]

Multiplying a matrix A by the identical matrix I, the result is the same matrix A; A⋅I = A; A = [(1,2,3),(4,4,2)], I = [(1,0,0),(0,1,0),(0,0,1)], A⋅I = [(1,2,3),(4,4,2)][(1,0,0),(0,1,0),(0,0,1)], c1,1 = 1⋅1+1⋅0+1⋅0 = 1+0+0 = 1, c1,2 = 1⋅0+2⋅1+3⋅0 = 0+2+0 = 2, c1,3 = 1⋅0+2⋅0+3⋅1 = 0+0+3 = 3, c2,1 = 4⋅1+4⋅0+2⋅0 = 4+0+0 = 4, c2,2 = 4⋅0+4⋅1+2⋅0 = 0+4+0 = 4, c2,3 = 4⋅0+4⋅0+2⋅1 = 0+0+2 = 2, A⋅I = [(1,2,3),(4,4,2)][(1,0,0),(0,1,0),(0,0,1)] = [(1,2,3),(4,4,2)] = A

food A = 30% fats, 10% carbohydrates, 10% proteins; food B = 20% fats, 20% carbohydrates, 5% proteins; food C = 15% fats, 15% carbohydrates, 10% proteins; 100 grams of food A = (30,10,10); 100 grams of food B = (20,20,5); 100 grams of food C = (15,15,10); arranging each food in a column, M = [(30,20,15),(10,20,15),(10,5,10)]; calculate fats, carbohydrates, proteins contained in a meal consisting of 120 grams of A, 50 grams of B, 150 grams of C; [(30,20,15),(10,20,15),(10,5,10)][(1.2,0.5,1.5)], c1,1 = 30⋅1.2+20⋅0.5+15⋅1.5 = 36+10+22.5 = 68.5, c2,1 = 10⋅1.2+20⋅0.5+15⋅1.5 = 12+10+22.5 = 44.5, c3,1 = 10⋅1.2+5⋅0.5+10⋅1.5 = 12+2.5+15 = 29.5, [(30,20,15),(10,20,15),(10,5,10)][(1.2,0.5,1.5)] = [(68.5,44.5,29.5)], the meal contains 68.5 grams of fats, 44.5 grams of carbohydrates, 29.5 grams of proteins

Associative property of the product of matrices: A(BC) = (AB)C

Distributive property of the product with respect to the sum of matrices: A(B+C) = AB+AC

Properties of the identical matrix: AI = IA = A

Multiplying any matrix by the null matrix yields a null matrix; A = [(1,2),(3,4)], B = [(0,0),(0,0)], A⋅B = [(1,2),(3,4)][(0,0),(0,0)], c1,1 = 1⋅0+2⋅0 = 0+0 = 0, c1,2 = 1⋅0+2⋅0 = 0+0 = 0, c2,1 = 3⋅0+4⋅0 = 0+0 = 0, c2,2 = 3⋅0+4⋅0 = 0+0 = 0, A⋅B = [(1,2),(3,4)][(0,0),(0,0)] = C = [(0,0),(0,0)]

The product of two real numbers is zero if one of the two numbers is zero, while the product of two matrices can be zero even if the two matrices are different from zero; A ≠ 0, B ≠ 0, it is possible that AB = 0; with matrices the law of the cancellation of the product of real numbers does not apply; it may happen that the product of two non-zero matrices is zero; obviously the result of the product of any matrix multiplied by a null matrix is a null matrix

Two square matrices can be multiplied with each other as long as they are squares of the same order, that is, they must have the same number of rows and columns

A = [(0,-1),(0,-1)], B = [(0,-1),(0,0)]; AB = [(0,-1),(0,-1)][(0,-1),(0,0)], c1,1 = 0⋅0+(-1)⋅0 = 0+0 = 0, c1,2 = 0⋅(-1)+(-1)⋅0 = 0+0 = 0, c2,1 = 0⋅0+(-1)⋅0 = 0+0 = 0, c2,2 = 0⋅(-1)+(-1)⋅0 = 0+0 = 0, AB = [(0,-1),(0,-1)][(0,-1),(0,0)] = C = [(0,0),(0,0)]; this example is the product of two square matrices and neither is a null matrix, but their product is a null matrix

To reduce a matrix containing a parameter you have to pay attention to the value of the parameter; in this example of row reduction of a matrix with a parameter, it is necessary to avoid that the element h is a special element, because it could be zero; A = [(1,1,h),(2,1,3)], R2 → R2-R1, [(1,1,h),(1,0,3-h)] = B, matrix B has the same rank as matrix A because for any value of h the two rows are not zero, ρ(A) = ρ(B) = 2

In this example, row reduction of a matrix containing a parameter requires more attention: A = [(1,1,h),(1,1,3)], R2 → R2-R1, [(1,1,h),(0,0,3-h)] = B, the rank of matrix B changes according to the value of the parameter h, ρ(A) = 2, but ρ(B) = 2 if h ≠ 3 and ρ(B) = 1 if h = 3

A matrix Am,n can be added; a matrix Am,n can be multiplied by a number; two matrices can be multiplied with each other when the number of rows of the first matrix is equal to the number of columns of the second matrix, Am,n⋅Bn,p = Cm,p


22 - MATRICES - PART 3 - INVERSE MATRIX AND TRANSPOSE MATRIX

Inverse matrix: if the matrix A is invertible, A-1 is the inverse matrix of A, hence A⋅A-1 = A-1⋅A = I, where I is the identical matrix

A = [(2,1),(3,-1)], X = [(a,b),(c,d)], A⋅X = I = [(1,0),(0,1)]; A⋅X = [(2,1),(3,-1)][(a,b),(c,d)], x1,1 = 2a+c, x1,2 = 2b+d, x2,1 = 3a-c, x2,2 = 3b-d, X = [(2a+c,2b+d),(3a-c,3b-d)], 2a+c = 1, 2b+d = 0, 3a-c = 0, 3b-d = 1, adding 2a+c = 1 and 3a-c = 0 the result is 5a = 1 that is a = 1/5 and c = 3/5, adding 2b+d = 0 and 3b-d = 1 the result is 5b = 1 that is b = 1/5 and d = -2/5, X = A-1 [(1/5,1/5),(3/5,-2/5)]; we have proved that the matrix A is invertible, the inverse matrix exists and we have computed it; A⋅A-1 = I = [(2,1),(3,-1)][(1/5,1/5),(3/5,-2/5)], i1,1 = 2⋅1/5+1⋅3/5 = 2/5+3/5 = 5/5 = 1, i1,2 = 2⋅1/5+1⋅(-2/5) = 2/5-2/5 = 0, i2,1 = 3⋅1/5+(-1)⋅3/5 = 3/5-3/5 = 0, i2,2 = 3⋅1/5+(-1)⋅(-2/5) = 3/5+2/5 = 5/5 = 1, I = [(1,0),(0,1)]

Calculate the inverse matrix of the matrix A = [(2,1,-1),(3,0,4),(0,0,2)]; A⋅X = I, X = [(a,b,c),(d,e,f),(g,h,i)], A⋅X = I = [(1,0,0),(0,1,0),(0,0,1)], A⋅X = [(2,1,-1),(3,0,4),(0,0,2)][(a,b,c),(d,e,f),(g,h,i)], x1,1 = 2a+d-g, x1,2 = 2b+e-h, x1,3 = 2c+f-i, x2,1 = 3a+4g, x2,2 = 3b+4h, x2,3 = 3c+4i, x3,1 = 2g, x3,2 = 2h, x3,3 = 2i, I = [(2a+d-g,2b+e-h,2c+f-i),(3a+4g,3b+4h,3c+4i),(2g,2h,2i)]; 2a+d-g = 1, 2b+e-h = 0, 2c+f-i = 0, 3a+4g = 0, 3b+4h = 1, 3c+4i = 0, 2g = 0, 2h = 0, 2i = 1, 9 first degree equations with 9 unknowns; 2g = 0, g = 0; 2h = 0, h = 0; 2i = 1, i = 1/2; 3a+4g = 0, 3a = 0, a = 0; 3b+4h = 1, 3b = 1, b = 1/3; 3c+4i = 0, 3c+4(1/2) = 0, 3c+4/2 = 0, 3c = -4/2, 3c = -2, c = -2/3; 2a+d-g = 1, d = 1; 2b+e-h = 0, 2(1/3)+e = 0, (2/3)+e = 0, e = -2/3; 2c+f-i = 0, 2(-2/3)+f-1/2 = 0, (-4/3)+f-(1/2) = 0, f = (4/3)+(1/2) = (8/6)+(3/6) = 11/6; a = 0, b = 1/3, c = -2/3, d = 1, e = -2/3, f = 11/6, g = 0, h = 0, i = 1/2; X = A-1 = [(0,1/3,-2/3),(1,-2/3,11/6),(0,0,1/2)]; A⋅A-1 = I, A⋅A-1 = [(2,1,-1),(3,0,4),(0,0,2)][(0,1/3,-2/3),(1,-2/3,11/6),(0,0,1/2)], i1,1 = 2⋅0+1⋅1+(-1)⋅0 = 0+1+0 = 1, i1,2 = 2(1/3)+1(-2/3)+(-1)0 = (2/3)-(2/3) = 0, i1,3 = 2(-2/3)+1(11/6)+(-1)(1/2) = (-4/3)+(11/6)-(1/2) = (-8/6)+(11/6)-(3/6) = (11/6)-(11/6) = 0, i2,1 = 3⋅0+0⋅1+4⋅0 = 0+0+0 = 0, i2,2 = 3(1/3)+0(-2/3)+4⋅0 = 3/3+0+0 = 1, i2,3 = 3(-2/3)+0(11/6)+4(1/2) = -2+0+2 = 0, i3,1 = 0⋅0+0⋅1+2⋅0 = 0+0+0 = 0, i3,2 = 0(1/3)+0(-2/3)+2⋅0 = 0+0+0 = 0, i3,3 = 0(-2/3)+0(11/6)+2(1/2) = 0+0+1 = 1; I = [(1,0,0),(0,1,0),(0,0,1)]

Example of a non-invertible matrix: A = [(2,3),(0,0)]; A⋅X = I = [(1,0),(0,1)], X = [(a,b),(c,d)], A⋅X = [(2,3),(0,0)][(a,b),(c,d)] = I, i1,1 = 2a+3c, i1,2 = 2b+3d, i2,1 = 0a+0c = 0, i2,2 = 0b+0d = 0; I = [(2a+3c,2b+3d),(0,0)] = [(1,0),(0,1)], 2a+3c = 1, 2b+3d = 0, 0 = 0, 0 = 1, it cannot be solved, the matrix A is not invertible

Transpose matrix: considering the matrix A with m rows and n columns, A = [(a1,1,...,a1,n),(...,...,...),(am,1,...,am,n)], to obtain the transpose matrix, the rows are exchanged with the columns, transpose of A = AT = [(a1,1,...,am,1),(...,...,...),(a1,n,...,am,n)], the rows of matrix A become the columns of the transpose matrix, and the columns of matrix A become the rows of the transpose matrix; if matrix A has m rows and n columns, the transpose matrix has n rows and m columns; the exchange between rows and columns determines the change of the indexes of the elements

A = [(4,5,6),(7,8,9),(0,4,3)]; AT = [(4,7,0),(5,8,4),(6,9,3)]

A = [(1,0,7,4),(3,3,2,2)]; AT = [(1,3),(0,3),(7,2),(4,2)]

First property of the transpose matrix: (A+B)T = AT+BT, the transpose of the sum of two matrices is equal to the sum of the transposes of the two matrices

Second property of the transpose matrix: (AB)T = BTAT, the transpose of the product of two matrices is equal to the product of the transposes of the two matrices but with inverted order, because the product of matrices is not commutative, in fact, to make the product of matrix A by matrix B, the number of columns of A must be equal to the number of rows of B

Third property of the transpose matrix: (aA)T = aAT, the transpose of the product of a number by a matrix is equal to the product of the number by the transpose of the matrix

If a matrix is equal to its transposed matrix, A = AT, then the matrix is symmetrical; a symmetrical matrix has its elements symmetrical with respect to its diagonal; [(1,4,5),(4,2,0),(5,0,3)] is a symmetrical matrix; obviously a symmetrical matrix must be square

Example of a symmetrical matrix: A = [(1,0,3),(0,2,4),(3,4,-1)]

A = [(1,0,2),(0,2,4),(3,4,-1)] is not a symmetrical matrix

If a matrix is equal to the opposite of its transpose matrix, A = -AT, then the matrix is antisymmetrical; [(0,1,2),(-1,0,3),(-2,-3,0)] is an antisymmetrical matrix; in an antisymmetrical matrix, the diagonal is made up of zeros as 0 = -0, 0 is the only number that is equal to its opposite

Example of antisymmetrical matrix: [(0,1,5),(-1,0,2),(-5,-2,0)]

An orthogonal matrix must be square and invertible and its transpose must be equal to its inverse, AT = A-1; in an orthogonal matrix, inverse matrix and transpose matrix coincide; orthogonal matrices are particular invertible square matrices, such that the inverse is the transpose matrix

Example of orthogonal matrix: A = [(cos(α),-sin(α)),(sin(α),cos(α))]; AT = [(cos(α),sin(α)),(-sin(α),cos(α))]; AT⋅A = [(cos(α),sin(α)),(-sin(α),cos(α))][(cos(α),-sin(α)),(sin(α),cos(α))], e1,1 = cos(α)⋅cos(α)+sin(α)⋅sin(α) = (cos(α))2+(sin(α))2 = 1, e1,2 = cos(α)⋅(-sin(α))+sin(α)⋅cos(α) = -sin(α)⋅cos(α)+sin(α)⋅cos(α) = 0, e2,1 = -sin(α)⋅cos(α)+cos(α)⋅sin(α) = -sin(α)⋅cos(α)+sin(α)⋅cos(α) = 0, e2,2 = -sin(α)⋅(-sin(α))+cos(α)⋅cos(α) = (sin(α))2+(cos(α))2 = 1, AT⋅A = [(1,0),(0,1)], so the transpose matrix of A is also the inverse matrix of A, AT = A-1, so A is an orthogonal matrix

Calculate the inverse of the matrix A = [(1,h),(2,0)]; X = [(a,b)(c,d)], AX = I = [(1,0),(0,1)]; [(1,h),(2,0)][(a,b)(c,d)] = [(1,0),(0,1)], a+hc = 1, b+hd = 0, 2a = 0, 2b = 1; a = 0; b = 1/2; a+hc = 1, 0+hc = 1, hc = 1, matrix A is not invertible for h = 0, if h ≠ 0 then c = 1/h and d = -1/2h; for any h other than zero the matrix is invertible; X = A-1 = [(0,1/2)(1/h,-1/2h)]; if h = 0 then matrix A = [(1,0),(2,0)] and ρ(A) = 1

A matrix A is invertible when the inverse matrix A-1 exists such that A⋅A-1 = A-1⋅A = I, where I denotes the identical matrix; the inverse matrix is only defined when matrix A is square

A matrix is symmetrical when it is equal to transpose matrix, A = AT

A matrix is antisymmetrical when it is equal to the opposite of the transpose matrix, A = -AT

The orthogonal matrix is a matrix whose inverse coincides with the transpose matrix


23 - THE CONCEPT OF LINEAR APPLICATION

Application is synonymous with function, but the term application is more used in this context

Function, or application, between the vector space ℝ2 and the vector space ℝ: ℝ2 → ℝ, f(x,y) = x+y; (x,y) → f → x+y; (x',y') → f → x'+y'; (x,y)+(x',y') = (x+x',y+y') → f → (x+x')+(y+y'); (x+y)+(x'+y') = (x+x')+(y+y'), the function f preserves the sum of pairs; a(x,y) = (ax,ay); a(x,y) → f → a(x+y); (ax,ay) → f → ax+ay; a(x+y) = ax+ay, the function f preserves the product by a number; a linear application between two vector spaces preserves the sum and the product by a number

Properties of linear applications: the function f is a linear application between the vector spaces V and W, f: V → W; f(v+v') = f(v)+f(v'), the sum is preserved; f(a⋅v) = a⋅f(v), the product by a number is preserved; f(0V) = 0W, the 0 of V is transformed into the 0 of W; f(-v) = -f(v), the opposite of v in the vector space V coincides with the opposite of f(v) in the vector space W

3 → f → ℝ2, f(x,y,z) = (x,y+z); it is a linear application because it preserves the sum and the product; f(x,y,z) = (x,y+z), f(x',y',z') = (x',y'+z'), f(x+x',y+y',z+z') = (x+x',(y+y')+(z+z')); f(x,y,z) = (x,y+z), f(x',y',z') = (x',y'+z'), (x,y+z)+(x',y'+z') = (x+x',(y+z)+(y'+z')) = (x+x',(y+y')+(z+z')); the function of the sum of elements is equal to the sum of the functions of the individual elements, the sum is preserved; a(x,y,z) = (ax,ay,az), f(ax,ay,az) = (ax,ay+az) = a (x,y+z) = af(x,y,z), the product by a number is preserved; f(0,0,0) = (0,0), the 0 is transformed; f(-x,-y,-z) = (-x,-y-z) = -(x,y,z), because f(-v) = -f(v)

f(x,y) = x2, f: ℝ2 → ℝ; (x,y)+(x',y') = (x+x',y+y'), (1,2)+(3,1) = (1+3,2+1) = (4,3); f(1,2) = 12 = 1, f(3,1) = 32 = 9, f(4,3) = 42 = 16, 1+9 ≠ 16, the sum is not preserved; f(2,1) = 4, 5⋅4 = 20, 5(2,1) = (10,5), f(10,5) = 100, 20 ≠ 100, the product by a number is not preserved; it is not a linear application

f(x,y) = x+1, f: ℝ2 → ℝ; it is not a linear application, the sum is not preserved, the product by a number is not preserved, f(0,0) = 1 ≠ 0

Properties of linear applications: f: V → W, f is a linear application, V and W are vector spaces; f(v) = 0W ⇔ v ∈ Ker(f), f(0) = 0, f(0) = 0, but non-zero vectors can exist in the vector space V whose image is zero, these vectors form a set called Ker(f); the vector subspace Ker(f) can also contain only the null vector; w = f(v) ⇔ w ∈ Im(f), the vectors of W, that are images of f, form Im(f), a subset of W

f(x,y) = x+y; f(0,0) = 0; f(x,y) = 0, x+y = 0, x = -y, (x,-x) → f → 0, Ker(f) is the subset of ℝ2, formed by the pairs (x,-x), and this is a vector subspace generated by the pair (1,-1), because it is x(1,-1); f: V → W, if v → 0W, it is the vector subspace Ker(f) ⊆ V

f: V → W, f(v) = w, Im(f) ⊆ W; f(x,y) = (x,x), ℝ2 → ℝ2; the pairs (0,0), (1,1), (2,2), are images of the function; the pairs (1,0), (0,1), (1,2), are not images of the function; the image is the set of all the pairs obtained by multiplying the pair (1,1) by an arbitrary number x; is the vector subspace of ℝ2 formed by multiples of (1,1), that is, it has as base (1,1), and the image is only a part of ℝ2

f: V → W; Ker(f) or nucleus of f, is a subspace of V formed by all the elements that transform into 0

f: V → W; Im(f) or image of f, is a subspace of W formed by vectors which are a function of something

Ker is the abbreviation of the term kernel which means core or nucleus; Ker(f) is formed by the vectors that transform into 0; Ker(f) can be formed by only {0V} or it can also contain other vectors, if it is formed by only {0V} it means that f is an injective linear application; injective means that if v ≠ v' then f(v) ≠ f(v'); to check that a linear application is injective just check that there is only 0 in the ker(f)

Determine if the function f(x,y) = (x,x) is injective; if its core contains only 0 it is injective, but in this case it is not true; f(x,y) = (x,x) is (0,0) for every (0,y) = y(0,1) that is the vectorial subspace of ℝ2 generated by the vector (0,1), Ker(f) is not reduced to just zero, so this application is not injective

f(x,y) = (2x+y,x-y); 2x+y = 0; x-y = 0, x = y; 2x+y = 0, x = y, 2y+y = 0, 3y = 0, y = 0; x = y = 0; there are no pairs other than the pair (0,0) that have the pair (0,0) as image; this f is an injective linear application

Ker(f), or nucleus of a linear application allows to study the injectivity; Img(f), or image of a linear application allows to study the surjectivity

The function f: V → W is surjective if ∀ w ∈ W ∃ v ∈ V: f(v) = w, that is Im(f) = W; therefore, to understand if a linear application is surjective, it is necessary to study the image

f(x,y) = (x,x) is not a surjective linear application because Im(f) ≠ ℝ2

f(x,y) = 3x, ℝ2 → ℝ; m ∈ ℝ, f(x,y) = m, 3x = m, x = m/3, f(m/3,y) = m, each number m has a counter image in ℝ2; this is a surjective linear application; by studying the image of f, which is a vector subspace, we understand if the linear application is surjective

A linear application f: V → W can be both injective and surjective at the same time; it is a function because for every v there is an image in W, ∀ v ∃ Im(v) = f(v) ⊆ W; it is injective because two different v have different images, v ≠ v' ⇒ f(v) ≠ f(v'); is surjective because every w comes from a v, ∀ w ∈ W ∃ v ∈ V: f(v) = w, that is Im(f) = W; between V and W there is a one-to-one correspondence, that is, f is a bijective linear application, also called isomorphism; linear applications which are isomorphisms are simultaneously injective and surjective, the nucleus is reduced to only 0 and the image is all W

f(x,y) = (2x+y,x-y) it is an injective and surjective linear application, that is bijective, therefore it is an isomorphism; to prove that this linear application is surjective, for every α and β, we have to find x and y; (α,β) = (2x+y,x-y); 2x+y = α; x-y = β, y = x-β; 2x+y = α, 2x+x-β = α, 3x-β = α, 3x = α+β, x = (α+β)/3; y = x-β, y = ((α+β)/3)-β; this linear application is injective and surjective, that is bijective, that is an isomorphism; an isomorphism is a bijective linear application

Find the Ker(f) of f(x,y,z) = (x+y,x-y,x+2z), ℝ3 → ℝ3; finding the Ker(f) means solving these first degree equations: x+y = 0, x-y = 0, x+2z = 0; finding the Im(f) means solving these first degree equations: α = x+y, β = x-y, γ = x+2z

Sum of linear applications: f: V → W, g: V → W, (f+g)(v) = f(v)+g(v)

Product of a number by an application: f: V → W, αf: v → αf(v)

Product or composition of linear applications: g∘f, f: V → W, g: W → Z, v → f → f(v) → g → g(f(v))

The function f: V → W is a linear application when f(v+w) = f(v)+f(w) and f(av) = af(v)

The function f is injective when f(v) = 0W ⇔ v ∈ Ker(f) ⇔ Ker(f) = {0V}

The function f is surjective when w = f(v) ⇔ w ∈ Im(f) ⇔ Im(f) = W


24 - LINEAR APPLICATIONS AND MATRICES

Matrix associated with a linear application: f(x,y,z) = (ax+by+cz,dx+ey+fz), ℝ3 → ℝ2, Mf = [(a,b,c),(d,e,f)]; the fundamental versors of R3 are e1 = (1,0,0), e2 = (0,1,0), e3 = (0,0,1); e1 = (1,0,0) → f → (a,d); e2 = (0,1,0) → f → (b,e); e3 = (0,0,1) → f → (c,f); (a,d) is the first column of the matrix Mf that is f(e1); (b,e) is the second column of the matrix Mf that is f(e2); (c,f) is the third column of the matrix Mf that is f(e3); the elements of the matrix Mf have a double meaning, the rows contain the coefficients of the variables x, y, z, the first row contains the coefficients of x, y, z of the first component, the second row contains the coefficients of x, y, z of the second component, the columns express the f of the fundamental versors, the first column indicates f of the first versor, the second column indicates f of the second versor, the third column indicates f of the third versor; now we have to understand what happens with linear applications between ℝn and ℝm, ℝn → ℝm

The relationship between a matrix and a linear application is given by the fact that the elements of the rows of the matrix indicate the coefficients of the components of the linear application, and the elements of the columns of the matrix indicate the images of the fundamental versors of the linear application

f: ℝn → ℝm, f is a linear application of ℝn in ℝm, n is the number of elements contained in a row and equals the number of columns, m is the number of elements contained in a column and equals the number of rows; e1 = (1,0,...,0) → f → (a1,1,a2,1,...,am,1), e2 = (0,1,...,0) → f → (a1,2,a2,2,...,am,2), en = (0,0,...,n) → f → (a1,n,a2,n,...,am,n); in this way we have constructed n vectors which are the images of the fundamental versors of ℝn; Mf = [(a1,1,a1,2,...,a1,n),(a2,1,a2,2,...,a2,n),...,(am,1,am,2,...,am,n)], this is the matrix associated with the linear application f, the first column is the image of the first fundamental versor or f(e1), the second column is the image of the second fundamental versor or f(e2), the nth column is the image of the nth fundamental versor or f(en); f operates on tuples, f(x1,x2,...,xn) that is an element of ℝn, and on this tuple is applied the linear application and the result is an element of ℝm, so it must have m components, and the first row is the set of coefficients that I must apply to x1, xn to have the first component, f(x1,x2,...,xn) = (a1,1x1+a1,2x2+...+a1,nxn,a2,1x1+a2,2x2+...+a2,nxn,...,am,1x1+am,2x2+...+am,nxn); the rows of the matrix are the coefficients to assig to x1, x2, xn, with the first row of the matrix we get the first component of f(x1,xn), with the second row of the matrix the second component, with the last row of the matrix we get the last component; the matrix Mf associated with the linear application f: ℝn → ℝm has m rows and n columns, the number of rows coincides with the dimension of the arrival space ℝm, the number of columns coincides with the dimension of the space of ℝn on which f is defined, that is the starting space; the components of f(x1,x2,xn) are m elements and are each a linear combination of x1, x2, xn, that is a homogeneous polynomial of first degree in x1, x2, xn, with no constant term, and this is valid for all the components; a linear application is characterized by the fact that f of a tuple has m components, each of which is a linear combination, a homogeneous first degree polynomial in x1, x2, xn with coefficients deducible from the matrix

f(x,y) = (x+2y,x-2y,x+y), ℝ2 → ℝ3; the associated matrix must have 2 columns and 3 rows; Mf = [(1,2),(1,-2),(1,1)]; f(1,0) = (1,1,1) that is the first column, f(0,1) = (2,-2,1) that is the second column; the application is linear because the 3 components (x+2y,x-2y,x+y) are homogeneous first degree polynomials in the variables x and y in ℝ2; a polynomial is homogeneous when all the monomials that compose it have the same degree

g(x,y,z) = (x+z,y+1), ℝ3 → ℝ2 that means from 3 to 2 variables; it is not a linear application because x+z is a homogeneous polynomial of first degree in (x,y,z), but y+1 is not a homogeneous polynomial of first degree in (x,y,z) because there is the constant term +1

To create the matrix associated with a linear application, the coefficients of the components are written on the rows, or the images of the fundamental versors are written on the columns

From a matrix to a linear application, using this matrix with 2 rows and 3 columns, A = [(1,-1,3),(2,2,0)]; f: ℝ3 → ℝ2, f(x,y,z) = (x-y+3z,2x+2y); it is a linear application because x-y+3z and 2x+2y are homogeneous polynomials of first degree in (x,y,z); from this linear application it is possible to obtain the matrix again, so the correspondence is one-to-one, Mf = [(1,-1,3),(2,2,0)] = A

From a linear application we obtain a matrix, and from a matrix we obtain a linear application; from a linear application of ℝn in ℝm we obtain a matrix with m rows and n columns; from a matrix with m rows and n columns we obtain a linear application of ℝn in ℝm

A linear application f: ℝn → ℝm can be injective when v ≠ v' ⇒ f(v) ≠ f(v'), surjective if Im(f) = ℝm; when a linear application is injective and surjective or bijective, it is invertible, that is the linear application f-1 is the inverse of the linear application f, and ℝn and ℝm must coincide for a linear application to be invertible, that is f: ℝn → ℝn, v → f → f(v) = v', v' → f-1 → v = f-1(v')

A linear application f: ℝn → ℝm is surjective if Im(f) = ℝm; the matrix Mf consists of the columns f(e1), ..., f(en); a vector v is a linear combination of e1, ..., en; f(v) is a linear combination of f(e1), ..., f(en); f(e1), ..., f(en) are generators of the image of f; Im(f) is a vector subspace of ℝm generated by these vectors, and the linear application is surjective if f(e1), ..., f(en) generate all the vector space ℝm; the vector space ℝm has dimension m, and to calculate the dimension of the space generated by these generating vectors we must calculate the rank; if ρ(Mf) = m then the dimension of the column space is m, that is Im(f) = ℝm; a linear application is surjective if the rank of the matrix is equal to the number of rows, ρ(Mf) = m = number of rows

A linear application f: ℝn → ℝm is injective if Ker(f) = {0n}; dim(Ker(f)) = n-dim(Im(f)) = n-ρ(Mf) = 0, ρ(Mf) = n = number of columns; the linear application is injective if the rank of the associated matrix is equal to the number of columns

The rows of the matrix indicate if the linear application is surjective, the columns of the matrix indicate if the linear application is injective

f: ℝn → ℝn, ρ(Mf) = number of columns n = number of rows n, so the linear application is injective and surjective at the same time, so it is an invertible linear application

f: ℝn → ℝm, ρ(Mf) = n, Ker(f) = {0V}, the linear application is injective

f: ℝn → ℝm, ρ(Mf) = m, Im(f) = ℝm, the linear application is surjective

f: ℝn → ℝm, ρ(Mf) = n = m, the linear application is invertible, the matrix is square

f(x,y,z,t) = (x-y+z,x+y-t), ℝ4 → ℝ2; the matrix has 2 rows and 4 columns, Mf = [(1,-1,1,0),(1,1,0,-1)]; this matrix has two non-zero rows, ρ(Mf) = 2 = dim(ℝ2), so this matrix represents a surjective linear application; to be injective the number of columns must be equal to the rank of the matrix, and in this example it is false because the columns are 4 and ρ(Mf) = 2; this linear application is surjective but not injective

Considering f: ℝn → ℝm the rank of the associated matrix cannot be greater than the smaller of the two numbers n and m; if n > m, the rank of the matrix cannot be n, and the linear application cannot be injective; if n < m, the rank of the matrix cannot be m, and the linear application cannot be surjective

f(x,y) = (x-y,x+y,x), ℝ2 → ℝ3; the matrix has 2 columns and 3 rows, Mf = [(1,-1),(1,1),(1,0)]; ρ(Mf) = 2 = dim(ℝ2); this linear application is injective but not surjective because the rank of the matrix is not 3

f(x,y) = (x-y,x+y), ℝ2 → ℝ2; the matrix has 2 columns and 2 rows, Mf = [(1,-1),(1,1)]; ρ(Mf) = 2; to calculate the rank there is no need to reduce the matrix, because the two rows are not linearly independent, because the second row is not a multiple of the first; the rank of the matrix is 2 which is equal to the number of rows and columns, therefore the linear application is simultaneously injective and surjective, therefore it is invertible; f(x,y) = (x-y,x+y) is invertible; the inverse linear application of f is f-1: ℝ2 → ℝ2; Mf is the matrix associated with the linear application f, Mf-1 is the matrix associated with the inverse linear application f-1, Mf-1 is equal to the inverse of the matrix Mf, Mf-1 = (Mf)-1; to find the inverse of the linear application we have to calculate the inverse of the matrix; Mf⋅(Mf)-1 = I, [(1,-1),(1,1)][(a,b)(c,d)] = [(1,0),(0,1)], a-c = 1, b-d = 0, a+c = 0, b+d = 1; a-c = 1, a = c+1; b-d = 0, b = d; a+c = 0, a = -c; b+d = 1, b = 1-d; b = d, b+d = 1, d+d = 1, 2d = 1, d = 1/2; b = d, b = 1/2; a = c+1, a = -c, -c = c+1, -2c = 1, 2c = -1, c = -1/2; a = -c, c = -1/2, a = -(-1/2) = 1/2; a = 1/2, b = 1/2, c = -1/2, d = 1/2; Mf-1 = (Mf)-1 = [(1/2,1/2),(-1/2,1/2)]; the inverse linear application, associated with the inverse matrix is f-1(x,y) = ((1/2)x+(1/2)y,(-1/2)x+(1/2)y)

Mf-1 = (Mf)-1

g∘f, Mg⋅Mf = Mg∘f; g∘f is the linear application composed of the linear application g and the linear application f; the product of the matrices of linear applications is equal to the matrix of the composite application

f⋅f-1 = identity; Mf⋅(Mf)-1 = I


25 - LINEAR SYSTEMS - PART 1 - RESOLUTION OF REDUCED SYSTEMS

To find the Ker(f) of the linear application f(x,y,z) = (x+y,x-z), ℝ3 → ℝ2, we need to solve the system of linear equations of first degree {x+y = 0, x-z = 0}; y = -x, x = z; the solution is (x,-x,x), by varying x we find all the solutions

Example of a system of linear equations of first degree: {x+3y-2z+t = 5, 2x-8y+4z-t = 0, -x+5y-3t = 12}

{x+y+2z = 1, 2y-z = 0, 4y = 5} this system of linear equations of first degree is easy to resolve because there are many zeros

A system of m equations with n unknowns can be represented like this: {a1,1x1+a1,2x2+...+a1,nxn = b1, a2,1x1+a2,2x2+...+a2,nxn = b2, ..., am,1x1+am,2x2+...+am,nxn = bm}; the different variables a form a matrix with m rows and n columns; the different variables a are real numbers; in addition to the matrix formed by the coffiecients of the variables x, there is also a column formed by the constant terms; the matrix formed by the coefficients of the x is indicated by A, the matrix formed by the coefficients of the x and by the constant terms is indicated by (A|B)

A system of linear equations follows the formula AX = B, where A is the matrix of the coefficients formed by m rows and n columns, X is the matrix of unknowns, B is the matrix of constant terms and consists of a column; (A|B) = [(a1,1,...,a1,n),...,(am,1,...,am,n)][(b1),...,(bm)]; x = [(x1),(x2),...,(xn)]

x+y-z = 1, 2x+y-2z = 0, x+y = 4; (A|B) = [(1,1,-1)|(1),(2,1,-2)|(0),(1,1,0)|(4)], X = [(x),(y),(z)]; AX = B

A linear system can be solved more easily if there are many zeros; if AX = B is a linear system of m equations with n unknowns, it is easier to solve the system if it is reduced

The linear system AX = B is reduced when the matrix A is reduced by rows

{2x-y+z = 1, 4y-3z = 0, 5y = 10}, this is a system of 3 equations with 3 unknowns x, y, z; AX = B, A = [(2,-1,1),(0,4,-3),(0,5,0)], B = [(1),(0),(10)]; the matrix A of the coefficients is reduced by rows, and the special elements of the matrix A are 2, 5, -3; to solve a reduced system we start from the last row of the matrix, that is the last equation, because the last equation contains more zeros, that is fewer unknowns; 5y = 10, y = 10/5, y = 2; 4y-3z = 0, 4⋅2-3z = 0, 8-3z = 0, 3z = 8, z = 8/3; 2x-y+z = 1, 2x = y-z+1, x = (y-z+1)/2, x = (2-(8/3)+1)/2 = (3-(8/3))/2 = (1/3)/2 = 1/6; 5y = 10, y = 2, 5⋅2 = 10, 10 = 10, true; 4y-3z = 0, y = 2, z = 8/3, 4⋅2-3(8/3) = 0, 8-8 = 0, 0 = 0, true; 2x-y+z = 1, y = 2, z = 8/3, x = 1/6, 2⋅(1/6)-2+(8/3) = 1, (1/3)-2+(8/3) = 1, (9/3)-2 = 1, 3-2 = 1, 1 = 1, true; the solution is easy when the system is reduced by rows; (A|B) = [(2,-1,1)|(1),(0,4,-3)|(0),(0,5,0)|(10)], this is a row reduced system because the matrix A is row reduced, and therefore the complete matrix (A|B) is also row reduced; when a system is row reduced, it can be solved starting from the last equation with fewer unknowns, up to the first; this example is simple because the system is made up of only 3 equations with 3 unknowns, with a row reduced matrix of rank 3, the situation is more complicated when the number of equations and the number of unknowns are not the same

A system AX = B is reduced when the matrix A is reduced, that is, when in the last row there are all zeros except one number, in the penultimate row there are two numbers, up to the first row where there are no zeros but only numbers; special elements are those numbers below which there are only zeros

The general rule to solve a row reduced system AX = B is that we take the last equation, where there is the least possible number of unknowns because there are as many zeros as possible, we solve this equation with respect to one of the unknowns as a function of the others, we repeat everything in the penultimate equation, and so on up to the first

(A|B) = [(2,1,4,1)|(1),(0,2,-1,1)|(1),(0,0,2,1)|(0)], this is the complete matrix of the system AX = B, and by counting the elements of the first row we understand that there are 4 unknowns, but there are only 3 equations; the last row of the matrix allows to obtain the last equation that is 2z+t = 0, so it is possible to derive one of these two unknowns as a function of the other; t = -2z; the second row of the matrix allows to obtain the second equation that is 2y-z+t = 1; t = -2z, 2y-z+t = 1, 2y-z-2z = 1, 2y-3z = 1; we have to write the unknown y as a function of the unknown z because the unknown y does not appear in the last equation; 2y-3z = 1, 2y = 3z+1, y = (3z+1)/2; now it is possible to replace all the unknowns found in the first equation; 2x+y+4z+t = 1, t = -2z, y = (3z+1)/2, 2x+((3z+1)/2)+4z-2z = 1; we need to obtain the unknown x as a function of the unknown z because the unknown x does not appear in the other equations; 2x+((3z+1)/2)+4z-2z = 1, 2x+((3z+1)/2)+2z = 1, 2x = -((3z+1)/2)-2z+1, 2x = ((-3z-1)/2)-2z+1, 2x = ((-3z-1)/2)-(4z/2)+(2/2), 2x = (-3z-1-4z+2)/2, 2x = (-7z+1)/2, x = (-7z+1)/4; there are 4 unknowns and 3 equations, one of the unknowns can be chosen at will, in this example we have chosen the unknown z, the other unknowns are expressed as a function of z, then z is the free unknown; free unknown means that it can be assigned at will, it means that the system has infinite solutions; z is the free unknown and the system has infinite solutions, so for each z we have a different solution; z = 0, x = (-7z+1)/4, x = 1/4; z = 0, y = (3z+1)/2, y = 1/2; z = 0, t = -2z, t = 0; if we assign another value to z, we find other values of x, y, t, that is a different solution of the linear system

If a reduced system has a free unknown, then the system has infinite solutions

To solve a system of linear equations we start from the last equation, taking an unknown and obtain it as a function of the others, we go up to the penultimate equation and we obtain an unknown that does not appear in the last equation and we continue up to the first equation; never start with the first equation, always start with the last equation and go up

A reduced system can be without solutions

(A|B) = [(2,1,4,1)|(1),(0,2,-1,1)|(1),(0,0,2,1)|(0),(0,0,0,0)|(2)]; the last equation is 0x+0y+0z+0t = 2, so there are no solutions

There are systems without solutions and they are called irresoluble or incompatible systems; if the system is reduced and on a row of the coefficients matrix there are all zeros and the constant term is different from zero, then the system has no solutions, that is, the system is irresoluble, also called incompatible system

With a system of the type AX = B we must understand if the system is solvable, that is, if there are solutions; solvable means that there are numbers that put into the equation AX = B lead to an identity, that is, the first member is equal to the second member; if the system is solvable, it is necessary to find the solutions or to find all the tuples of numbers that put into the equation AX = B give the identity; if the system is reduced by rows, finding the solutions is simple, but if the system is not reduced by rows then finding the solutions can be complicated

What we have to do is transform the system AX = B into the reduced system by rows A'X = B' which has the same solutions as the starting system; if the system is reduced by rows we immediately understand if it is solvable and we can apply the method to find the solutions

The method is to transform the system AX = B into the row reduced system A'X = B' which has the same solutions as the starting system AX = B; if the system is row reduced we immediately understand if it is solvable and we can apply the method to find the solutions

The system AX = B can be reduced with the elementary transformations on the rows to become the system A'X = B'; the complete matrix (A|B) is reduced as a consequence of the row reduction of the matrix A of the coefficients; the matrix A' of the coefficients is row reduced and the linear system A'X = B' is therefore row reduced; after the system has been reduced, solutions can be found

{x+y+z-t = 1, x-y+2z+t = 0}, this is a system of 2 equations with 4 unknowns, (A|B) = [(1,1,1,-1)|(1),(1,-1,2,1)|(0)], as can be seen from the matrix the system is not reduced; we can reduce it by adding the first line to the second line, (A|B) = [(1,1,1,-1)|(1),(1,-1,2,1)|(0)], R2 → R2+R1, (A|B) = [(1,1,1,-1)|(1),(2,0,3,0)|(1)], the reduced system is {x+y+z-t = 1, 2x+3z = 1}, reduce the system by rows is the best way to solve it; 2x+3z = 1, 2x = -3z+1, x = (-3z+1)/2

; x+y+z-t = 1, ((-3z+1)/2)+y+z-t = 1, ((-3z+1)/2)+(2z/2)+y-t = 1, ((-3z+1+2z)/2)+y-t = 1, ((-z+1)/2)+y-t = 1, t = ((-z+1)/2)+y-1

A system can be solved using the row reduction, instead the column reduction is not useful; if the coefficient matrix is not reduced, then we have to reduce it by rows


26 - LINEAR SYSTEMS - PART 2 - ROUCHE-CAPELLI THEOREM - FREE UNKNOWNS

AX = B, A = [(1,1,-1,1,1),(-1,-1,0,1,1),(-3,-1,0,1,2)], B = [(0),(1),(1)]; the matrix A of the coefficients consists of 3 rows and 5 columns, therefore the linear system is composed of 3 equations with 5 unknowns; to solve this system we need to reduce the complete matrix (A|B) = [(1,1,-1,1,1)|(0),(-1,-1,0,1,1)|(1),(-3,-1,0,1,2)|(1)], R3 → R3-R2, (A|B) = [(1,1,-1,1,1)|(0),(-1,-1,0,1,1)|(1),(-2,0,0,0,1)|(0)], now the matrix is row reduced and it is pissible to find the linear system {x+y-z+t+u = 0, -x-y+t+u = 1, -2x+u = 0}; -2x+u = 0, u = 2x; -x-y+t+u = 1, -x-y+t+2x = 1, x-y+t = 1, the unknown t is present in the second equation but is not present in the last equation, t = -x+y+1; x+y-z+t+u = 0, x+y-z-x+y+1+2x = 0, 2x+2y-z+1 = 0, 2x+2y-z = -1, the unknown z appears only in the first equation, z = 2x+2y+1; z = 2x+2y+1, t = -x+y+1, u = 2x; x and y are free unknowns, that is, they can be assigned at will, but the unknowns z, t, u, cannot be assigned at will because they depend on x and y; this system contains 5 unknowns, 2 are free unknowns, and 3 unknowns depend on the 2 free unknowns; the reduced matrix of the coefficients has rank 3 because it consists of 3 rows, ρ(A') = 3, and the rank of the reduced matrix of the coefficients A' is equal to the rank of the reduced complete matrix (A'|B'), ρ(A') = ρ(A'|B') = 3; this is the first property of the Rouché-Capelli theorem, a system that is solvable has the rank of the coefficient matrix equal to the rank of the complete matrix, this is the solvability condition; the number of non-free unknowns is equal to the rank of the coefficient matrix, and the number of free unknowns is equal to the total number of unknowns minus the rank of the coefficient matrix, number of free unknowns = n-ρ(A), and this is another property of the Rouché-Capelli theorem

AX = B, (A|B) = [(1,1,1)|(1),(2,1,1)|(0),(3,2,2)|(4)], R2 → R2-R1, R3 → R3-2R1, (A|B) = [(1,1,1)|(1),(1,0,0)|(-1),(1,0,0)|(2)], R3 → R3-R2, (A|B) = [(1,1,1)|(1),(1,0,0)|(-1),(0,0,0)|(3)], transforming the last row in equation is 0 = 3, the system is without solution; the rank of the reduced coefficient matrix ρ(A) = 2, because the rank is given by the number of non-zero rows; the rank of the reduced complete matrix ρ(A|B) = 3, because the third row of the reduced complete matrix is not null; ρ(A) = 2 < ρ(A|B) = 3; if the rank of the coefficient matrix is different from the rank of the complete matrix, this indicates that the system is not solvable;

When a linear system has no solutions, the rank of the complete matrix is different from the rank of the coefficient matrix

A system is said to be solvable or compatible when it admits at least one solution

Rouché-Capelli theorem: in a linear system AX = B of m equations in n unknowns with ρ(A) = p and ρ(A|B) = q, the system is solvable ⇔ p = q, then there are ∞n-p solutions, or n-p free unknowns

The Rouché-Capelli theorem allows us to understand if a linear system admits solutions even before solving it; a linear system can be studied using the Rouché-Capelli theorem before being solved; AX = B, if ρ(A) = ρ(A|B) then the system can be solved and we have to calculate the solutions, but if ρ(A) ≠ ρ(A|B) then the system cannot be solved and it is useless to calculate the solutions; to calculate the rank it is necessary to reduce by rows the matrix A of the coefficients, and the rank is the number of non-zero reduced rows; the Rouché-Capelli theorem says that the number of free unknowns is equal to the total number of unknowns minus the rank, number of free unknowns = n-ρ(A), and if we find a different number of free unknowns then there is an error in the resolution

{x+y = 1, x-3y = 2, x+7y = 0}, this linear system has 2 unknowns and 3 equations; the number of equations is not important, the rank is important; (A|B) = [(1,1)|(1),(1,-3)|(2),(1,7)|(0)], the complete matrix (A|B) has 3 rows and 3 columns, but the coefficient matrix A has 3 rows and 2 columns, so its rank cannot be greater than 2, ρ(A) ≤ 2; (A|B) = [(1,1)|(1),(1,-3)|(2),(1,7)|(0)], R2 → R2-R1 and R3 → R3-R1, (A|B) = [(1,1)|(1),(0,-4)|(1),(0,6)|(-1)], R3 → (4/6)R3+R2, (A|B) = [(1,1)|(1),(0,-4)|(1),(0,0)|(1/3)]; the matrix of the coefficients A has rank 2 which is the maximum possible, ρ(A) = 2, the complete matrix (A|B) has rank 3, ρ(A|B) = 3, and from this we understand that the system is not solvable; for the Rouché-Capelli theorem, this linear system is not solvable, therefore it is useless to try to solve it, because there are no solutions

A homogeneous system AX = 0 is a system where the constant terms are 0; a homogeneous system AX = 0 always has solutions for the Rouché-Capelli theorem; a homogeneous system AX = 0 always has a number of free unknowns which is equal to the number of unknowns minus the rank of the matrix of the coefficients A, number of free unknowns = n-ρ(A); for the Rouché-Capelli theorem, in a homogeneous system AX = 0, the matrix B of the constant terms is zero, therefore the rank of the complete matrix can never be greater than the rank of the coefficient matrix; in a homogeneous system ρ(A) = ρ(A|B); moreover if x1 = x2 = ... = xn = 0, all null unknowns form a solution; homogeneous systems always have solutions

In a linear system consisting of n unknowns and m equations, p = q where p is the rank of the matrix of the coefficients and q is the rank of the complete matrix, the free unknowns are n-p, so there is only one solution when there are no free unknowns or when n = p, or when the number of unknowns is equal to the rank of the coefficient matrix; in general, for any linear system, if there are no free unknowns, there is one and only one solution, and this happens when the number of unknowns is equal to the rank of the coefficient matrix; when there are no free unknowns, it is denoted by ∞n-p = 1 = ∞0, but it is only a symbolic notation with no mathematical meaning

A homogeneous system has only one solution, which is null, formed by all zeros, when n = p, when the number of unknowns is equal to the rank of the coefficient matrix; a homogeneous system has other solutions besides the null solution when n-p > 0, that is n > p, that is when the number of unknowns is greater than the rank of the coefficient matrix

A non-homogeneous system can have no solutions, while a homogeneous system cannot have no solutions; homogeneous and non-homogeneous systems have only one solution when all the unknowns can assume only one value; non-homogeneous systems have infinite solutions when they have at least one free unknown; it is not possible for a linear system to have a finite number of solutions other than 1 or 0; either there is 1 solution or there are infinite solutions; homogeneous systems always have either only 1 null solution, or they have infinite solutions; in a homogeneous system, looking for non-zero solutions is equivalent to seeing if there are infinite solutions, that is the number of unknowns must be greater than the rank of the coefficient matrix also called the system rank, n-p > 0, n > p

{x-y+z = 0, 2x+3y-5z = 0}, it is a system of 2 equations and 3 unknowns; we must understand if this system admits solutions other than the null solution; p = ρ(A) ≤ 2 and the number of unknowns n is 3, n-p > 0, there is at least one free unknown and therefore there are infinite solutions; to find the solutions it is necessary to reduce the matrix by rows

{x-y+z = 0, 2x+3y-5z = 0, x+5y-z = 0, y+2z = 0}, is a homogeneous linear system of 4 equations and 3 unknowns; usually when there are many equations and few unknowns there are no solutions, but a homogeneous system always has solutions, so it is necessary to understand if there are non-zero solutions; A = [(1,-1,1),(2,3,-5),(1,5,-1),(0,1,2)], now it is necessary to calculate the rank of the matrix which is certainly at least 2 because we immediately see that the first and second rows are linearly independent; A = [(1,-1,1),(2,3,-5),(1,5,-1),(0,1,2)], R2 → R2-2R1 and R3 → R3-R1, A = [(1,-1,1),(0,5,-7),(0,6,-2),(0,1,2)], R3 → 5R3-6R2 and R4 → 5R4-R2, A = [(1,-1,1),(0,5,-7),(0,0,32),(0,0,17)], R4 → 32R4-17R3, A = [(1,-1,1),(0,5,-7),(0,0,32),(0,0,0)]; the rank of the matrix is 3, p = ρ(A) = 3, the number of unknowns n is 3, n = p, and therefore there is only a null solution

If f: ℝn → ℝm is a linear application, to find the elements that form the nucleus of the linear application we have to solve a homogeneous system; if A is the matrix associated with the linear application, A = Mf, the system we have to solve to find the nucleus of the linear application is AX = 0, X = [(x1),...,(xn)], f(x1,...,xn) = (a1,1x1+...); Mf = A, Ker(f) = solutions of AX = 0, dim(Ker(f)) = n-ρ(Mf), the dimension of the nucleus is related to the rank of the Mf matrix, that is, the dimension of the nucleus is equal to the dimension of the starting space minus the rank of the matrix Mf which is the dimension of the image; the nucleus is the set of solutions of the homogeneous system AX = 0; ρ(A) = ρ(Mf) = p, the nucleus is the set of solutions of a homogeneous system that has an associated matrix of rank p, then n-p is the number of free unknowns which is equal to the dimension of Ker(f)

To solve any linear system AX = B it is necessary to reduce matrix A by rows; if AX = B is solvable then ρ(A) = ρ(A|B) = p, and the system has n-p free unknowns, where n is the number of unknowns and p is the rank of matrix A


27 - LINEAR SYSTEMS - PART 3 - EXAMPLES AND APPLICATIONS

Counter image of a matrix: f: ℝn → ℝm is a linear application and the associated matrix is Mf = A, the counter image of the matrix Mf is given by AX = [(a1),...,(am)] and can be written as f-1(a1,...,am), but f-1 is not the inverse function

f(x,y,z) = (x+y,x-z), is a linear application ℝ3 → ℝ2; we choose an element of ℝ2, a vector v of random components (1,-1), and find all vectors of ℝ3, the triples (x,y,z), such that f(x,y,z) = (1,-1); ; Mf = A = [(1,1,0),(1,0,-1)]; {x+y = 1, x-z = -1}; x-z = -1, x = z-1; x+y = 1, y = -x+1 = -(z-1)+1 = -z+1+1 = -z+2; z is the free unknown and the solution is (z-1,-z+2,z); the counter images of the vector (1,-1) are infinite and are triples in the form (z-1,-z+2,z); the counter image for z = 0 is (-1,2,0); the counter image for z = 10 is (9,-8,10)

It is possible that there is no counter image

f(x,y,z) = (2x-y,2x-y,0), is a linear application ℝ3 → ℝ3 with associated matrix A = [(2,-1,0),(2,-1,0),(0,0,0)]; to calculate the counter image of the triad (4,4,1) it is necessary to solve the system {2x-y = 4, 2x-y = 4, 0 = 1}, which has no solutions, therefore the counter image of the vector (4,4,1) does not exist; in this linear application there is no counter image; in this case the linear application is not surjective, because not all the elements of ℝ3 have a counter image; when in the matrix there is a null row or a null column, then the linear application is not surjective; all columns are dependent so ρ(A) = 1 < 3, the linear application is neither injective nor surjective and this explains why there are vectors without counter image; it is interesting to find the counter image of zero, in this case of the triad (0,0,0), f-1(0,0,0), {2x-y = 0, 2x-y = 0, 0 = 0}, y = 2x, x can be assigned at will, but z can also be assigned at will, x and z are 2 free unknowns, whereas y is a non-free unknown that depends on x, so the solution is (x,2x,z), therefore there are infinite elements in the nucleus of this linear application; the nucleus coincides with the counter image of the null vector; the nucleus coincides with the counter image of the null vector and contains infinite elements dependent on the 2 free unknowns x and z; examples of nucleus elements are (1,2,0), (0,0,1), (10,20,85); the nucleus of this linear application is (x,2x,z), there are 2 free unknowns and this means that the dimension of the nucleus is 2; if x = 1 and z = 0 we find the vector (1,2,0), if x = 0 and z = 1 we find the vector (0,0,1), and these 2 vectors together form a base of the nucleus; when in f-1 of 0, in the nucleus, there are free unknowns, it is enough to set them to 0 and 1 and we find a base of the nucleus

The counter image of 0 is the nucleus of the linear application

The matrix A multiplied by its inverse A-1 is equal to the identity matrix I, A⋅A-1 = A-1⋅A = I; find the inverse of the matrix A = [(1,1),(2,3)]; A⋅A-1 = I, [(1,1),(2,3)][(a,b),(c,d)] = [(1,0),(0,1)], {a+c = 1, b+d = 0, 2a+3c = 0, 2b+3d = 1}; a+c = 1, a = -c+1; b+d = 0, b = -d; 2a+3c = 0, a = -c+1, 2(-c+1)+3c = 0, -2c+2+3c = 0, c+2 = 0, c = -2; a = -c+1, c = -2, a = -(-2)+1 = 2+1 = 3, a = 3; 2b+3d = 1, b = -d, 2(-d)+3d = 1, -2d+3d = 1, d = 1; b = -d, d = 1, b = -1; a = 3, b = -1, c = -2, d = 1; A-1 = [(3,-1),(-2,1)]; A⋅A-1 = I, [(1,1),(2,3)][(3,-1),(-2,1)] = [(1,0),(0,1)]; 1⋅3+1(-2) = 1, 3-2 = 1, 1 = 1; 1(-1)+1⋅1 = 0, -1+1 = 0, 0 = 0; 2⋅3+3(-2) = 0, 6-6 = 0, 0 = 0; 2(-1)+3⋅1 = 1, -2+3 = 1, 1 = 1; using this method, to find the inverse matrix, we solve a linear system or equations of first degree where the unknowns are the elements of the inverse matrix; if we look for the inverse of a square matrix 2x2 the unknowns are 4 as in this case; if we look for the inverse of a square matrix 3x3 the unknowns are 9; if the matrix were of order 25 we would have a system of linear equations with 25⋅25 = 625 unknowns; if n is the order of the matrix then n2 is the number of unknowns and equations, so it is necessary to find a better method for calculating the inverse matrix

A⋅A-1 = I, A⋅X = I, [(a,b,c),(a',b',c'),(a",b",c")]X = [(1,0,0),(0,1,0),(0,0,1)], X = [(X1),(X2),(X3)]; X1, X2, X3 = rows of X = A-1; we consider unknowns not the coefficients of the matrix X, but the rows of the matrix X; instead of having the 9 coefficients of the matrix X as unknowns, we take the 3 rows of the matrix X as unknowns, and obviously each of the unknowns is a triple because each row of the matrix contains 3 coefficients

Let us consider a linear system in which the unknowns are the rows; we can consider the rows X1, X2, X3 as 3 unknowns; we consider the rows as unknowns of a 3x3 matrix, a linear combination of 3 rows and each row is a triad equal to the corresponding row of the identical matrix; {aX1+bX2,cX3 = (1,0,0), a'X1+b'X2,c'X3 = (0,1,0), a"X1+b"X2,c"X3 = (0,0,1)}; in this linear system the unknowns are triples, the coefficients are the elements of the matrix A, and the column of constant terms is not a column because each of the constant terms is a triplet; this linear system is associated with the coefficient matrix A = [(a,b,c),(a',b',c'),(a",b",c")] and the complete matrix (A|B) = [(a,b,c)|(1,0,0),(a',b',c')|(0,1,0),(a",b",c")|(0,0,1)]; this system can be solved by reducing by rows, finding the reduced system, the unknowns are the rows of the inverse matrix, following the rules of linear systems

A = [(2,1),(0,3)]; inverse matrix A-1 = X must have 2 rows and two columns; matrix A has two linearly independent rows so the rank is 2, it is invertible, that is the inverse matrix exists; X1 and X2 are the rows of the inverse matrix A-1, and they are the unknowns of our linear system; A⋅X = I, [(2,1),(0,3)][(X1),(X2)] = [(1,0),(0,1]; 2X1+X2 = (1,0), 3X2 = (0,1); we have to solve a system of 2 equations with 2 unknowns X1 and X2 which represent 2 rows with 2 elements each; 3X2 = (0,1), X2 = (0,1/3); 2X1+X2 = (1,0), 2X1 = (1,0)-X2, X2 = (0,1/3), 2X1 = (1,0)-(0,1/3), 2X1 = (1,-1/3), X1 = (1/2,-1/6); X = A-1 = [(1/2,-1/6),(0,1/3)]; A⋅A-1 = I, [(2,1),(0,3)][(1/2,-1/6),(0,1/3)] = [(1,0),(0,1)]; 2(1/2)+1⋅0 = 1+0 = 1, 2(-1/6)+1(1/3) = -1/3+1/3 = 0, 0(1/2)+3⋅0 = 0+0 = 0, 0(-1/6)+3(1/3) = 0+1 = 1

It is impossible to find the inverse matrix of the matrix A = [(2,1),(4,2)], because this matrix has rank 1 and therefore corresponds to a non-invertible linear application; AX = I, 2X1+X2 = (1,0), 4X1+2X2 = (0,1); (A|B) = [(2,1)|(1,0),(4,2)|(0,1)], R2 → R2-2R1, (A|B) = [(2,1)|(1,0),(0,0)|(-2,1)], therefore it is impossible to find the inverse matrix because matrix A is not invertible

To the linear systems used to find the inverse of a matrix, we can apply the Rouchè-Capelli theorem, precisely this theorem says that a linear system admits solutions if and only if the rank of the matrix of the coefficients A is equal to rank of the complete matrix (A|B), ρ(A) = ρ(A|B); the identical matrix I has rows that are all linearly independent as they are fundamental versors, then the complete matrix (A|I) has the maximum possible rank, therefore the linear system that allows to find the inverse matrix is solvable if and only if the matrix of coefficients A has the highest possible rank, and since it is a square matrix of order n, its rank is equal to the number n of the rows, which is equal to the number of columns, and to the order of the matrix

With a matrix of order 3, if we use the rows of the inverse matrix we get 3 equations, if we use the coefficients of the inverse matrix we get 9 equations; A is a 3x3 matrix, we look for the inverse following the formula AX = I, if we take the elements of X as unknowns we get 9 equations, X = [(a,b,c),(d,e,f),(g,h,i)]; if we take the rows X1, X2, X3, we get 3 equations and each unknown is a triple; A = [(1,2,1),(2,0,1)(0,0,-1)], find the inverse matrix of this 3x3 matrix; finding the inverse means writing the linear system that has the 3 rows of the inverse matrix as unknowns, each of which is a triple, the coefficient matrix is A, and the constant terms are the 3 rows of the identical 3x3 matrix; (A|I) = [(1,2,1)|(1,0,0),(2,0,1)|(0,1,0),(0,0,-2)|(0,0,1)], we have to reduce this system by rows, the matrix A is already reduced by rows and it has rank 3, therefore the inverse exists; being already reduced by rows we can start from the last row to solve the system; -X3 = (0,0,1), X3 = (0,0,-1); 2X1+X3 = (0,1,0), 2X1 = (0,1,0)-X3, X3 = (0,0,-1), 2X1 = (0,1,0)-(0,0,-1), 2X1 = (0,1,1), X1 = (0,1/2,1/2); X1+2X2+X3 = (1,0,0), 2X2 = (1,0,0)-X1-X3, X1 = (0,1/2,1/2), X3 = (0,0,-1), 2X2 = (1,0,0)-(0,1/2,1/2)-(0,0,-1), 2X2 = (1,-1/2,1/2), X2 = (1/2,-1/4,1/4); A-1 = [(0,1/2,1/2),(1/2,-1/4,1/4),(0,0,-1)]; A⋅A-1 = I, [(1,2,1),(2,0,1),(0,0,-1)][(0,1/2,1/2),(1/2,-1/4,1/4),(0,0,-1)] = [(1,0,0),(0,1,0),(0,0,1)]; I1,1 = 1⋅0+2(1/2)+1⋅0 = 0+1+0 = 1; I1,2 = 1(1/2)+2(-1/4)+1⋅0 = 1/2-1/2+0 = 0; I1,3 = 1(1/2)+2(1/4)+1(-1) = 1/2+1/2-1 = 0; I2,1 = 2⋅0+0(1/2)+1⋅0 = 0+0+0 = 0; I2,2 = 2(1/2)+0(-1/4)+1⋅0 = 1+0+0 = 1; I2,3 = 2(1/2)+0(1/4)+1(-1) = 1+0-1 = 0; I3,1 = 0⋅0+0(1/2)-1⋅0 = 0+0-0 = 0; I3,2 = 0(1/2)+0(-1/4)+(-1)0 = 0+0+0 = 0; I3,3 = 0(1/2)+0(1/4)+(-1)(-1) = 0+0+1 = 1

Calculate for which values of a parameter the matrix is invertible; the matrix is dependent on a parameter h, we must calculate for which real values of the parameter h the matrix is invertible and we must find the inverse matrix; A = [(1,h),(3,-1)], this matrix depends on the parameter h, we want to find the values of h for which matrix A is invertible; (A|I) = [(1,h)|(1,0),(3,-1)|(0,1)], calculate for which h this system admits solutions; this system admits solutions if and only if the rank of the matrix A is 2; A = [(1,h),(3,-1)], R2 → R2-3R1, [(1,h),(0,-1-3h)], the matrix is invertible for the values of h for which the rank is 2, or for the values of h for which the two rows are linearly independent, for the other values of h is not invertible; the matrix is not invertible when -1-3h = 0, -3h = 1, h = -1/3, the matrix is invertible for h ≠ -1/3; we reduce by rows the complete matrix (A|I) = [(1,h)|(1,0),(3,-1)|(0,1)], R2 → R2-3R1, [(1,h)|(1,0),(0,-1-3h)|(-3,1)]; it is possible to find the inverse matrix for any value of h other than -1/3, for example h = -1, [(1,-1)|(1,0),(0,2)|(-3,1)]; 2X2 = (-3,1), X2 = (-3/2,1/2); X1-X2 = (1,0), X1 = (1,0)+X2, X2 = (-3/2,1/2), X1 = (1,0)+(-3/2,1/2) = (-1/2,1/2); A-1 = [(-1/2,1/2),(-3/2,1/2)]; A⋅A-1 = I, A = [(1,h),(3,-1)]; for h = -1, A = [(1,-1),(3,-1)]; for h = -1, A-1 = [(-1/2,1/2),(-3/2,1/2)]; [(1,-1),(3,-1)][(-1/2,1/2),(-3/2,1/2)] = [(1,0),(0,1]; I1,1 = 1(-1/2)+(-1)(-3/2) = -1/2+3/2 = 2/2 = 1; I1,2 = 1(1/2)+(-1)(1/2) = 1/2-1/2 = 0; I2,1 = 3(-1/2)+(-1)(-3/2) = -3/2+3/2 = 0; I2,2 = 3(1/2)+(-1)(1/2) = 3/2-1/2 = 2/2 = 1; for any h different from -1/3 it is possible to find the inverse matrix of the matrix A, while for h = -1/3 it is not possible to solve the system.

Linear systems are used to find inverse matrices, counter images, Ker (f); by Ker (f) we mean the nucleus or f: ℝn → ℝm, Mf = A, and AX = 0 is the system that must be solved; to find the counter image and the nucleus of a linear application we have to solve ordinary linear systems, that are linear systems in which the unknowns are real numbers; when we use the technique to reduce the number of equations and unknowns for finding the inverse matrix, the unknowns are not numbers but tuples, and the constant terms are not real numbers but tuples


28 - THE DETERMINANT OF A SQUARE MATRIX

A = [(a1,1,a1,2),(a2,1,a2,2)], is a square matrix 2x2; the determinant of A is denoted by |A|, or det(A), or |(a1,1,a1,2),(a2,1,a2,2)|; the determinant of a square matrix is a real number calculated from the coefficients of the matrix; det(A) = a1,1⋅a2,2-a1,2⋅a2,1; the determinant is given by the product of the elements of the main diagonal minus the product of the elements of the secondary diagonal; det[(1,3),(5,-2)] = 1(-2)-3⋅5 = -2-15 = -17

A = [(a1,1,a1,2,a1,3),(a2,1,a2,2,a2,3),(a3,1,a3,2,a3,3)], it is a square matrix 3x3; det(A) = a1,1|(a2,2,a2,3),(a3,2,a3,3)|-a1,2|(a2,1,a2,3),(a3,1,a3,3)|+a1,3|(a2,1,a2,2),(a3,1,a3,2)|

Considering a square matrix of order n, A = [(a1,1,a1,2,...,a1,n),(a2,1,a2,2,...,a2,n),...,(an,1,an,2,...,an,n)], and for each element of the matrix we define its algebraic complement; the algebraic complement of an element is the determinant of the matrix which is obtained by deleting the row and the column that cross in that element considering also the sign; we calculate the algebraic element of a2,1 obtaining a submatrix which has one row less and one column less than the matrix A, in this case we delete the second row and the first column obtaining the submatrix [(a1,2,...,a1,n),(a3,2,...,a3,n),...,(an,2,...,an,n)]; this submatrix has a determinant which is the algebraic complement of the element a2,1; the sign of the algebraic complement is positive when the row number of the element added to the column number of the element is an even number, while the sign of the algebraic complement is negative when the row number of the element added to the column number of the element is an odd number; in this case the determinant is obtained by deleting row 2 and column 1, 2+1 = 3 which is an odd number, therefore the determinant is preceded by the sign -, so the algebraic complement is negative; the algebraic complement of the element a2,2 is positive because 2+2 = 4 which is an even number

A = [(1,3,5,-1),(2,2,0,1),(1,0,1,4),(1,1,-1,0)]; calculate the algebraic complement of a2,3, element in the second row and third column which is 0; we delete the second row and the third column and we get the determinant |(1,3,-1),(1,0,4),(1,1,0)| which is a 3x3 submatrix; the element is in row 2 and column 3, 2+3 = 5 which is an odd number, so the algebraic complement of a2,3 is -|(1,3,-1),(1,0,4),(1,1,0)|

The algebraic complement of an element is the determinant of the matrix that we get by deleting the row and column of the element, and the sign is positive if the row number of the element added to the column number of the element is an even number, while the sign is negative if the row number of the element added to the column number of the element is an odd number

To define the determinant of a 4x4 matrix, we take the elements of a row, for example the first, a1,1, a1,2, a1,3, a1,4, and we multiply each element of the row by its algebraic complement which is a 3x3 determinant; the algebraic complement of a1,1 is A1,1, the algebraic complement of a1,2 is A1,2, the algebraic complement of a1,3 is A1,3, the algebraic complement of a1,4 is A1,4; the determinant of the matrix A is therefore det(A) = a1,1⋅A1,1+a1,2⋅A1,2+a1,3⋅A1,3+a1,4⋅A1,4; to define the determinant of a 4x4 matrix we could also take the elements of a column, for example the first column, a1,1, a2,1, a3,1, a4,1, and we multiply each element of the column by its algebraic complement which is a determinant 3x3; the algebraic complement of a1,1 is A1,1, the algebraic complement of a2,1 is A2,1, the algebraic complement of a3,1 is A3,1, the algebraic complement of a4,1 is A4,1; the determinant of matrix A is therefore det(A) = a1,1⋅A1,1+a2,1⋅A2,1+a3,1⋅A3,1+a4,1⋅A4,1; this same rule is also used for larger square matrices, the elements of a row or a column are taken and multiplied by the algebraic complements and then the sum is made, in this way we can calculate the determinant of a square matrix of any order

In a 1x1 matrix A = [(a)], det (A) = a

Laplace's rule: the determinant of a square matrix A is obtained by multiplying the elements of a row, or column, by their algebraic complements and adding the results; det(A) = ai,1⋅Ai,1+ai,2⋅Ai,2+...+ai,n⋅Ai,n = a1,j⋅A1,j+a2,j⋅A2,j+...+an,j⋅An,j

Using Laplace's rule, calculate the determinant of the 3x3 square matrix A = [(1,0,1),(2,1,-1),(0,0,2)]; using the second row is det[(1,0,1),(2,1,-1),(0,0,2)] = -2|(0,1),(0,2)|+1|(1,1),(0,2)|-(-1)|(1,0),(0,0)| = -2(0⋅2-1⋅0)+1(1⋅2-1⋅0)+1(1⋅0-0⋅0) = -2(0-0)+1(2-0)+1(0-0) = -2(0)+1(2)+1(0) = 0+2+0 = 2

Properties of the determinants: A is a square matrix of order n, and det(A) is the determinant of matrix A which is obtained using Laplace's rule; when a row of matrix A is null, or a column of matrix A is null, det(A) = 0; when there are two equal rows Ri = Rj, or when there are two equal columns Ci = Cj, det (A) = 0; if we multiply each element of the matrix A by a number a, then det(A') = det(aA) = an⋅det(A); if we multiply the elements of a row of matrix A by a number a, then det(A') = a⋅det(A); if in a matrix A a row is the sum of two rows, that is each element of a row is the sum of two elements that we call a and b, the determinant of the matrix is equal to the determinant of the matrix that contains only the elements a in that row, added to the determinant of the matrix that contains only the elements b in that row, that is det[(a1,1+b1,1,...,a1,n+b1,n),(...)] = det[(a1,1,...,a1,n),(...)]+det[(b1,1,...,b1,n),(...)]

If A is a square matrix of order n, and if we apply elementary transformations to the rows or columns of matrix A, the determinant follows some rules; if we exchange 2 rows between them, the new matrix A' has opposite determinant to the determinant of matrix A, Ri ↔ Rj, det(A') = -det(A); if we multiply a row by a number other than 0, Ri → aRi, A → A', then det(A') = a⋅det(A); if Ri → Ri+aRj, the determinant of the new matrix does not change, A → A', det(A') = det(A)

Matrix A is a 4x4 square matrix, A = [(1,1,1,1),(1,-1,0,2),(0,1,0,0),(-1,4,2,1)], we can calculate the determinant of this matrix using Laplace's rule on the third row, specifically on the second element of the third row; we have to delete row 3 and column 2, and the determinant is preceded by the sign - because the element chosen is from row 3 and column 2 and 3+2 = 5 which is an odd number; C1 → C1-(1/2)C3, det(A) = -1|(1/2,1,1),(0,0,2),(-3/2,2,1)|, with this operation the determinant remains unchanged; now we develop the calculation using the second row, where the only non-null element is 2 which is in row 2 and column 3, 2+3 = 5 which is an odd number, so the sign is -, det(A) = (-1)2(-1)|(1/2,1),(-3/2,2)| = 2|(1/2,1),(-3/2,2)| = 2((1/2)2-1(-3/2)) = 2(1+3/2) = 2(5/2) = 5; we computed the determinant in a simpler way by making zeroes appear with the elementary transformations

Calculate the determinant of the matrix A = [(1,1,1,2),(1,3,2,1),(4,3,2,1),(-1,-1,2,5)]

The determinants are used to understand some properties of matrices; A is a square matrix of order n and we want to know when the matrix is invertible; if det(A) = 0, then matrix A is not invertible; if det(A) ≠ 0, then matrix A is invertible; A = [(1,2),(3,-1)], det(A) = 1(-1)-2(3) = -1-6 = -7, matrix A is invertible; B = [(1,4),(2,8)], det(A) = 1⋅8-4⋅2 = 8-8 = 0, matrix B is not invertible; invertible matrices have the highest possible rank, matrix A is invertible when det(A) ≠ 0, so ρ(A) = n; non-invertible matrices have rank less than n, matrix A is not invertible when det(A) = 0, so ρ(A) < n

The determinant is used to find the inverse matrix; if A is an invertible matrix because det(A) ≠ 0, the inverse matrix A-1 is equal to the reciprocal of the determinant of A multiplied by the matrix obtained by writing instead of the elements their algebraic complements, after having swapped the rows with the columns, A-1 = (1/det(A))|(A1,1,A2,1,...,An,1),...,(A1,n,A2,n,...,An,n)|; det(A-1) = 1/det(A)

A = [(1,3),(-1,5)]; det(A) = 1⋅5-3(-1) = 5+3 = 8, 8 ≠ 0, so matrix A is invertible; the algebraic complement of a1,1 = 1 is A1,1 = 5; the algebraic complement of a1,2 = 3 is A1,2 = 1; the algebraic complement of a2,1 = -1 is A2,1 = -3; the algebraic complement of a2,2 = 5 is A2,2 = 1; A-1 = (1/8)[(5,-3),(1,1)] = [(5/8,-3/8),(1/8,1/8)]; A⋅A-1 = I, [(1,3),(-1,5)][(5/8,-3/8),(1/8,1/8)] = [(1,0),(0,1)], i1,1 = 1(5/8)+3(1/8) = 5/8+3/8 = 8/8 = 1, i1,2 = 1(-3/8)+3(1/8) = -3/8+3/8 = 0, i2,1 = -1(5/8)+5(1/8) = -5/8+5/8 = 0, i2,2 = -1(-3/8)+5(1/8) = 3/8+5/8 = 8/8 = 1

A square matrix A of order n is orthogonal if the transpose of A is equal to the inverse of A, that is if A is an orthogonal matrix then AT = A-1, therefore orthogonal matrices are invertible matrices; the determinant of an orthogonal matrix A can be 1 or -1, that is if A is an orthogonal matrix then det(A) = 1, or det(A) = -1; there are orthogonal matrices with determinant equal to 1, and there are orthogonal matrices with determinant equal to -1; an orthogonal matrix is called special when it has determinant equal to 1; a matrix is square when the number of rows n equals the number of columns m, that is, when n = m; a square matrix is generically A = [(a1,1,...,an,n),...,(an,1,...,an,n)]; if A is an orthogonal square matrix then A⋅AT = I ⇔ AT = A-1; if A is an orthogonal square matrix then det (A) = ±1; if det(A) = 1 then the orthogonal square matrix A is special

A = [(cos(α),-sin(α)),(sin(α),cos(α))], this is a special orthogonal matrix because det(A) = cos(α)cos(α)-(-sin(α))sin(α) = cos(α)cos(α)+sin(α)sin(α) = (cos(α))2+(sin(α))2 = 1

Laplace's rule: det(A) = ai,1⋅Ai,1+ai,2⋅Ai,2+...+ai,n⋅Ai,n = a1,j⋅A1,j+a2,j⋅A2,j+...+an,j⋅An,j; A is invertible if and only if det(A) ≠ 0; det(A-1) = 1/det(A); if det(A) ≠ 0, then A-1 = (1/det(A))[(A1,1,...,An,1),...,(A1,n,...,An,n)]

det(A⋅B) = det(A)⋅det(B)


29 - CRAMER'S RULE

Cramer's rule is used to solve linear systems that have one and only one solution; Cramer's rule must be modified when a solvable linear system has infinite solutions that is when it has free unknowns

The determinant of a matrix with a row and a column is equal to the only element present in the matrix, this is a special case of Laplace's rule

Considering a linear system AX = B, A is the matrix of the coefficients, X is the column of unknowns, B is the column of known terms; suppose that our system is solvable, and therefore ρ(A) = ρ(A|B), that is the matrix of the coefficients and the complete matrix must have the same rank; if this system has m equations and n unknowns, there is one and only one solution when number of free unknowns = n-ρ(A) = 0, that is the matrix A and the complete matrix (A|B) must have rank equal to the number of unknowns; we are not interested in the number of equations m, but in the number of independet equations n, that is the rank, that is the number of independent rows of the matrix A and (A|B); the equations that are not interesting, which are linear combinations of others, must be discarded, therefore the rank of A is also equal to the number of equations, ρ(A) = m; therefore ρ(A) is equal to the number of unknowns and is equal to the number of equations after having canceled the superfluous equations, so n = m, so the system is square; the first condition for using Cramer's rule is that the system must have n equations and n unknowns; if the system is solvable and ρ(A) = n, the system has one and only one solution; to find the solutions of a solvable linear system, the complete matrix (A|B) must be reduced by rows

A = [(2,1),(1,3)], B = [(0),(1)], 2x+y = 0, x+3y = 1; it is a system with 2 equations and 2 unknowns; we want to solve this system not with the method of reducing the rows, but using the determinant; in matrix form this system is written AX = B, the matrix A is a square matrix of order n and rank n, ρ(A) = n = 2; det(A) = 2⋅3-1⋅1 = 6-1 = 5 ≠ 0, it means that the matrix has maximum rank ρ(A) = 2 and is invertible so the inverse matrix A-1 exists; A⋅X = B, A-1⋅A⋅X = A-1⋅B, I⋅X = A-1⋅B, X = A-1⋅B, this is the only solution of the linear system; we write the inverse matrix A-1 with the rule of algebraic complements, A-1 = 1/5[(3,-1),(-1,2)] = [(3/5,-1/5),(-1/5,2/5)]; X = A-1⋅B = [(3/5,-1/5),(-1/5,2/5)][(0),(1)], x1,1 = (3/5)0+(-1/5)1 = 0-1/5 = -1/5, x2,1 = (-1/5)0+(2/5)1 = 0+2/5 = 2/5, X = [(-1/5,2/5)]; x1,1 = -1/5 = x, x2,1 = 2/5 = y, so x = -1/5 and y = 2/5; this is a way of finding the solution using the inverse matrix; A = [(2,1),(1,3)], B = [(0),(1)], we replace the first column of matrix A with column B of the constant terms obtaining [(0,1),(1,3)], and from this new matrix we get the determinant |(0,1),(1,3)| = 0⋅3-1⋅1 = 0-1 = -1, and we divide this number by the determinant of the matrix A which is 5, and we get -1/5 which is the value of the unknown x; A = [(2,1),(1,3)], B = [(0),(1)], we replace the second column of matrix A with column B of the constant terms, obtaining [(2,0),(1,1)], and from this new matrix we get the determinant |(2,0),(1,1)| = 2⋅1-0⋅1 = 2-0 = 2, and we divide this number by the determinant of the matrix A which is 5, and we get 2/5 which is the value of the unknown y; when a square linear system, with non-zero determinant, has only one solution, this unique solution can be obtained with this method

Cramer's rule: A is a square matrix of order n invertible, A⋅X = B is a linear system of n equations in n unknowns that admits one and only one solution, the solution is unique because the unknowns are n and A is a matrix of rank n, so there are no free unknowns; the only solution is equal to the product of the inverse matrix of A by the matrix of constant terms which is a column, following the formula X = A-1⋅B = (1/det(A))[(A1,1,...,An,1),...,(A1,n,...,An,n)][(b1),...,[(bn)], where the capital letters A indicate the algebraic complements of the elements of the matrix A after having swapped the rows with the columns, and matrix B is a column that contains the constant terms; the unknowns contained in the matrix X, which is a column, are obtained with the formula X = [(x1),...,(xn)] = det(α)/det(A), where by α we mean the matrix A with a column swapped with the column of constant terms, the first unknown is obtained by exchanging the first column, the second unknown is obtained by exchanging the second column, and so on

AX = B, A = [(1,2,-1),(0,4,2),(0,0,3)], B = [(0),(1),(0)]; this is a non-homogeneous system with 3 equations and 3 unknowns; the matrix A has 3 rows and 3 columns, it is reduced by rows, there are no null rows, therefore ρ(A) = 3 = ρ(A|B) and therefore the system is solvable, and since the unknowns are 3 and the rank is 3, 3-3 = 0, then the matrix A has one and only one solution; the only solution X is obtained by multiplying the inverse matrix of A by the column of known terms B, X = A-1⋅B, but so we should calculate the inverse matrix A-1; calculate the determinant using row 3, det(A) = 3|(1,2),(0,4)| = 3(1⋅4-2⋅0) = 3(4-0) = 3⋅4 = 12; x = |(0,2,-1),(1,4,2),(0,0,3)|/12 = 3|(0,2),(1,4)|/12 = 3(0⋅4-2⋅1)/12 = 3(0-2)/12 = 3(-2)/12 = -6/12 = -1/2, determinant calculated using row 3, x = -1/2; y = |(1,0,-1),(0,1,2),(0,0,3)|/12 = 1|(1,2),(0,3)|/12 = 1(1⋅3-2⋅0)/12 = 1(3-0)/12 = 1(3)/12 = 3/12 = 1/4, determinant calculated using column 1, y = 1/4; z = |(1,2,0),(0,4,1),(0,0,0)|/12 = 0/12 = 0, since row 3 is null, the determinant calculated using row 3 is 0, therefore z = 0; the solution of the system is x = -1/2, y = 1/4, z = 0; this system has one and only one solution and it is the triad (-1/2,1/4,0)

A = [(1,2,0),(3,1,-1)], B = [(0,2)]; this system has 2 equations and 3 unknowns, ρ(A) = 2, there are 2 rows, the system is solvable, and since the unknowns are 3, 3-2 = 1, therefore there is a free unknown; it is not a system with one and only one solution, it is a system that has more than one solution, ∞1 solutions dependent on a free unknown, but it is possible to solve this system using Cramer's rule; it is necessary to identify the free unknown, therefore we take inside the matrix A a submatrix formed by 2 rows and 2 columns in such a way that the matrix we obtain has the maximum possible rank; the unknown x is column 1, the unknown y is column 2, the unknown z is column 3; deleting column 3 of matrix A we obtain matrix A' = [(1,2),(3,1)] which is a square matrix, det(A') = 1⋅1-2⋅3 = 1-6 = -5 ≠ 0, therefore A' is an invertible square matrix; we move the unknown z, that is column 3, from matrix A to matrix B of the constant terms, considering that the system is {x+2y = 0, 3x+2y-z = 2), that is {x+2y = 0, 3x+2y = z+2}, therefore the new column of constant terms is B' = [(0),(z+2)], so we can assign to z a value of the field of real numbers; the new system A'⋅[(x),(y)] = B' has 2 equations and 2 unknowns, and the constant term depends on the free unknown z, and this new system can be solved with Cramer's rule; x = |(0,2),(z+2,1)|/-5 = (0⋅1-2(z+2))/-5 = (0-2z-4)/-5 = (-2z-4)/-5 = -(2z+4)/-5 = (2z+4)/5, x = (2z+4)/5; y = |(1,0),(3,z+2)|/5 = (1(z+2)-0⋅3)/-5 = (z+2-0)/-5 = (z+2)/-5, y = (z+2)/-5; we have found x and y as a function of the free unknown z, so the system has infinite solutions; we used Cramer's rule, after having moved the free unknown z into matrix B which is the column of constant terms

If we want to use Cramer's rule, the system AX = B must be solvable, that is ρ(A) = ρ(A|B), where with ρ(A) we indicate the rank of the matrix A and with ρ(A|B) we indicate the rank of the complete matrix (A|B); when a system does not have only one solution, then it has infinite solutions, and this depends on the presence of free unknowns; number of free unknowns = n-ρ(A), where n means the number of unknowns and ρ(A) is the rank of matrix A; we cancel the superfluous equations or the linearly dependent equations, and we obtain only linearly independent equations; a free unknown is identified by deleting columns and obtaining a square submatrix of maximum rank or with a determinant other than zero; and the unknowns outside the square matrix are moved to the column of constant terms; at this point we can apply Cramer's rule

{x+y+z+t = 0, x-y-z-t = 0, 2x+y = 0}, is a homogeneous system of 3 equations and 4 unknowns; the coefficient matrix is A = [(1,1,1,1),(1,-1,-1,-1),(2,1,0,0)]; to calculate the rank we have to reduce the matrix, A = [(1,1,1,1),(1,-1,-1,-1),(2,1,0,0)], R2 → R2 + R1, [(1,1,1,1),(2,0,0,0),(2,1,0,0)], R2 ↔ R3, [(1,1,1,1),(2,1,0,0),(2,0,0,0)], therefore the rank of this matrix is 3, ρ(A) = 3; number of free unknowns = n-ρ(A), where n indicates the number of unknowns and ρ(A) the rank of the matrix, n-ρ(A) = 4-3 = 1, therefore there is a free unknown; to identify the free unknown we must look for a 3x3 submatrix with a determinant other than 0; we want to understand if the unknown t can be chosen as a free unknown, and then we take the submatrix consisting of rows 1, 2, 3 and columns 1, 2, 3, and calculate the determinant using row 3, |(1,1,1),(1,-1,-1),(2,1,0)| = 2|(1,1),-1,-1)|-1|(1,1),(1,-1)| = 2(1(-1)-1(-1))-1(1(-1)-1⋅1) = 2(-1+1)-1(-1-1) = 2(0)-1(-2) = 0+2 = 2, the determinant is 2, so t can be chosen as a free unknown; the homogeneous system can be rewritten as a non-homogeneous system, by moving the unknown t into the column of constant terms, {x+y+z = -t, x-y-z = t, 2x+y = 0}; the unknown t is the free unknown, that is the unknown to which we can assign a value at will, and now we can solve the system using Cramer's rule, so we can find the unknowns x, y, z, as function of the free unknown t


30 - COMPLEX NUMBERS - PART 1

ax2+bx+c = 0 is a quadratic equation, therefore non-linear, with a, b, c, 3 real numbers and a ≠ 0; the solutions of this equation can be obtained with the quadratic formula x = (-b±√b2-4ac)/2a; it is possible that a second degree equation with real coefficients has no real solutions because if b2-4ac is a negative number, the square root of a negative number cannot be computed in the field of real numbers; 3x2+10 = 0, b2-4ac = 0-4⋅3⋅10 = 0-120 = -120, the square root of a negative number cannot be calculated in the field of real numbers, so there are no solutions in the field of real numbers; x2+1 = 0, x2 = -1, no real number squared can result in -1; no real number squared can result in a negative number; in the field of real numbers not all second degree equations can be solved; x4+20 = 0, there is no solution in the field of real numbers, 4-20 it doesn't mean anything; in the field of real numbers not all equations of degree higher than the first can be solved; complex numbers have been introduced to obtain solutions that do not exist in the field of real numbers; in the field of real numbers √-1 does not make sense

-1 does not exist in the field of real numbers, so we introduce the symbol i such that i2 = -1

Complex numbers are formal expressions of the type a+bi or a+bi, where a and b are real numbers and i is a symbol such that i2 = -1; a is the real part of the complex number, b is the imaginary coefficient, bi is the imaginary part of the complex number; a+0i is a real number; 0+bi is a pure imaginary number; a+bi is a complex number; a-bi is the conjugate complex number of a+bi, z = a+bi, z = a-bi, z usually indicates a complex number, z usually indicates a complex conjugate number

3+2i, 3 is the real part, 2i is the imaginary part, 2 is the imaginary coefficient

Complex numbers a+bi, with a and b real numbers and i2 = -1, can be added together

(a+bi)+(a'+bi') = (a+a')+i(b+b'), sum of 2 complex numbers; (3+2i)+(5-7i) = 8-5i

(a+bi)+(a'+bi') = (a'+bi')+(a+bi), commutative property of the sum of complex numbers

((a+bi)+(a'+bi'))+(a''+bi'') = (a+bi)+((a'+bi')+(a''+bi'')), associative property of the sum of complex numbers

Existence of the zero, 0+0i; (a+bi)+(0+0i) = a+bi, zero is the number that, added to any other, leaves it unchanged

Existence of the opposite, (-a)+(-b)i = -a-bi; (a+bi)+(-a-bi) = 0, a number added to its opposite gives 0 as result

Complex numbers a+bi, with a and b real numbers and i2 = -1, can be multiplied with each other

(a+bi)(a'+b'i) = aa'+ab'i+a'bi+bb'ii = aa'+ab'i+a'bi-bb' = (aa'-bb')+i(ab'+a'b), product of 2 complex numbers

(a+bi)(a'+b'i) = (a'+b'i)(a+bi), commutative property of the product of complex numbers

((a+bi)(a'+b'i))(a''+b''i) = (a+bi)((a'+b'i)(a''+b''i)), associative property of the product of complex numbers

Existence of the neutral element, 1+0i; (1+0i)(a+bi) = a+bi, the complex number 1+0i is a neutral element because multiplied by any other leaves it unchanged

The difference between complex numbers and real numbers is given by the number i; in the field of real numbers there is no number that multiplied by itself gives -1; the complex number i, also called imaginary unit, multiplied by itself gives as a result -1

(0+1i)(0+1i) = 0⋅0+0⋅1i+1i⋅0+1i⋅1i = i2 = -1

z = a+bi, z = a-bi, zz = (a+bi)(a-bi) = aa-abi+abi-bbii = a2-b2i2 = a2-b2(-1) = a2+b2, the product of a complex number by its conjugate is the real number a2+b2, which is the square of the modulus of z, |z| = |z| = √a2+b2

If z = 0 then a = 0 and b = 0, so |z| = |z| = 0; if z ≠ 0, then a ≠ 0 or b ≠ 0, so |z| = |z| ≠ 0

r ∈ ℝ, r ≠ 0, ∃ r-1 = 1/r, called inverse or reciprocal of r so that r(1/r) = 1; z ∈ ℂ, z = a+bi, z ≠ 0, 1/z ≠ 1/(a+bi); z = a+bi ≠ 0, 1/z = z-1 = (a/(a2+b2))-(b/(a2+b2))i = (a/|z|2)-(b/|z|2)i = z/|z|2; (a+bi)((a/(a2+b2))-(b/(a2+b2))i) = (a2/(a2+b2))-(ab/(a2+b2))i+(ab/(a2+b2))i-(b2/(a2+b2))i2 = (a2/(a2+b2))+(b2/(a2+b2)) = (a2+b2)/(a2+b2) = 1

Calculate the inverse of the complex number z = 1+2i; z = 1+2i ≠ 0, a = 1, b = 2, |z| = √a2+b2 = √12+22 = √1+4 = √5; 1/z = z-1 = (a/(a2+b2))-(b/(a2+b2))i = (a/|z|2)-(b/|z|2)i = (1/√52)-(2/√52)i = (1/5)-(2/5)i = (1-2i)/5; 1/z = z/|z|2

i2 = -1; i3 = i2i = -1i = -i; i4 = i2i2 = (-1)(-1) = 1

i2 = -1; i3 = i2i = -1i = -i; i4 = i2i2 = (-1)(-1) = 1

Calculate 1/i; z = i, a = 0, b = 1; 1/z = z/|z|2; i = 0+1i = 0-1i = -i; |i|2 = a2+b2 = 02+12 = 1; 1/i = i/|i|2 = -i/1 = -i; i(-i) = 1, so the reciprocal of i is -i

To calculate the quotient of a complex number, we need to know the inverse; (2+i)/i = (2+i)(1/i) = (2+i)(-i) = -2i+i(-i) = -2i-(i)2 = -2i-(-1) = -2i+1 = 1-2i

In the field of complex numbers we can solve, x2+1 = 0, x2 = -1, x = ±√-1, x1 = i, x2 = -i

ax2+bx+c = 0, x = (-b±√b2-4ac)/2a, ∆ = b2-4ac; a second degree equation has no solutions in the field of real numbers when the determinant ∆ = b2-4ac < 0, for example if ∆ = -20, ±√ = ±√-20 = ±√-120 = ±i√20, therefore when a second degree equation has a negative determinant, the solutions can be found in the field of complex numbers

ax2+bx+c = 0, is a second degree equation with real coefficients, but we can also solve second degree equations with complex coefficients such as x2+2ix+5 = 0; x = (-b±√b2-4ac)/2a = (-2i±√(2i)2-4⋅1⋅5)/2⋅1 = (-2i±√-4-20)/2 = (-2i±√-24)/2 = (-2i±√24i)/2 = -i±(√24i)/2 = i(-1±(√24)/2)

In the field of real numbers ℝ it is possible to define vector spaces, but also in the field of complex numbers ℂ it is possible to define vector spaces

2 is the set of pairs of complex numbers (z1,z2), (z1,z2)+(z1',z2') = (z1+z1',z2+z2'); α(z1,z2) = (αz1,αz2); vector spaces in the field of complex numbers have the same properties as vector spaces in the field of real numbers

V is a vector space in the field of complex numbers ℂ, if the sum and the product of a number is possible; ℝ, ℝ2, ℝ3,...,ℝn, are examples of real vector spaces; ℂ, ℂ2, ℂ3,...,ℂn, are examples of complex vector spaces; ℂn are the tuples of complex numbers; in complex vector spaces, the generators, the bases, the independence, the dimension, exactly all that is valid in the field of real numbers, can be defined; a base of ℂ2 is formed by the 2 pairs (1,0), (0,1), that is each pair of complex numbers can be written in this way; the pair of complex numbers (z,z') can be written as a linear combination z(1,0)+z'(0,1); complex element matrices can be defined

We want to know if the vectors (i,0,1), (0,i,0), (0,0,2), which are 3 triples, elements of ℂ3, are linearly independent in ℂ3; we have to reduce the matrix by rows A = [(i,0,1),(0,i,0),(0,0,2)]; the matrix is already reduced by rows, therefore ρ(A) = 3, so these 3 vectors of ℂ3 are linearly independent; ℂ3 has dimension 3, then these 3 linearly independent vectors are a base of ℂ3

Complex vector spaces have a double nature of complex vector space and real vector space because, if we multiply a real number by the complex numbers of a tuple, we obtain complex numbers; real vector spaces do not have the double nature of complex vector space and real vector space because, if we multiply a complex number by the real numbers of a tuple, we do not obtain real numbers; α(z,z') = (αz,αz'), αz and αz' are complex numbers; i(x,y) = (ix,iy), ix and iy are not real numbers

If we multiply a real number by a tuple of complex numbers we get a tuple of complex numbers; 3(i,2i) = (3i,6i); ℂ2 also has the nature of a real vector space; every complex vector space also has the nature of a real vector space

2 has dimension 2 over ℂ, but has dimension 4 over ℝ; in ℂ2 we have 4 vectors which are (1,0), (0,1), (i,0), (0,i), and these 4 pairs of ℂ2 form a base of ℂ2 on ℝ, that is each pair (a+bi,a'+b'i) can be written as a linear combination of the 4 vectors (1,0), (0,1), (i,0), (0,i), therefore (a+bi,a'+b'i) = a(1,0)+a'(0,1)+b(i,0)+b'(0,i); therefore ℂ2 has dimension 4 on the field of real numbers ℝ

If the vector space V has dimension n on ℂ, then V has dimension 2n on ℝ

{ix+y = 0, x+2iy = 1} is a linear system with complex coefficients and the associated matrix is [(i,1)|(0),(1,2i)|(1)]; to solve this system we must first reduce it by rows


31 - COMPLEX NUMBERS - PART 2

The complex plane, or Argand-Gauss plane, is the plane formed by the complex numbers, with a Cartesian coordinate system such that the x-axis, called real axis, is formed by the real numbers, and the y-axis, called imaginary-axis, is formed by the imaginary numbers

z = x+yi, x is the real part and y is the coefficient of the imaginary part; in the Argand-Gauss plane the complex number z is represented by a point which has the real part as x coordinate and the coefficient of the imaginary part as y coordinate

In the Argand-Gauss plane we can trace a vector that starts from the origin o and arrives at the point z, called the vector oz; the vector oz has a length, that is the modulus, which we denote by ρ; the modulus ρ is always a positive number, but it can be 0 if z coincides with the origin that is when z = 0+0i; with θ we denote the angle that the vector oz forms with the x-axis; the modulus ρ can vary from 0 to +∞, the angle θ can vary from 0 to 2π; x = ρ·cos(θ), y = ρ·sin(θ); z = x+yi = ρ·cos(θ)+ρ·sin(θ)·i = ρ(cos(θ)+i·sin(θ)), this is the trigonometric form of a complex number; ρ is the modulus or the length of the vector oz, and θ is the argument or the angle formed by the vector oz with the x-axis; for the same value of modulus ρ, different angles θ can lead to the same coordinates of points z, when θ' = θ±2π, in fact the same value of the angle θ is repeated with a periodicity of 2π; x = ρ·cos(θ), x2 = (ρ·cos(θ))2 = ρ2(cos(θ))2, y = ρ·sin(θ) , y2 = (ρ·sin(θ))2 = ρ2(sin(θ))2, x2+y2 = ρ2(cos(θ))22(sin(θ))2 = ρ2((cos(θ))2+(sin(θ))2) = ρ2, ρ2 = x2+y2, ρ = √x2+y2 = |z|; y/x = (ρ·sin(θ))/(ρ·cos(θ)) = sin(θ)/cos(θ) = tan(θ); the angle θ is the arc whose tangent is equal to y/x; cos(θ) = x/ρ, sin(θ) = y/ρ

x = ρ·cos(θ); y = ρ·sin(θ); x+iy = ρ(cos(θ)+i·sin(θ)); ρ = √x2+y2 = |z|; cos(θ) = x/ρ = x/√x2+y2 = x/|z|; sin(θ) = y/ρ = y/√x2+y2 = y/|z|

z = x+iy, x = 1, y = 0, z = 1+i·0 = 1, the complex number z coincides with the real number 1, ρ = 1, θ = 0; positive real numbers have θ = 0, negative real numbers have θ = π, pure imaginary numbers with positive imaginary coefficient have θ = π/2, pure imaginary numbers with negative imaginary coefficients have θ = (3/2)π

z = 1+i, x = 1, y = 1, θ = π/4, ρ = √x2+y2 = √12+12 = √1+1 = √2, z = x+iy = ρ(cos(θ)+i·sin(θ)), z = 1+i = √2(cos(π/4)+i·sin(π/4)), z = √2(cos(π/4)+i·sin(π/4)) is the trigonometric form of z = 1+i, because z = √2(cos(π/4)+i·sin(π/4)) = √2(√2/2+i√2/2) = √2(√2/2)(1+i) = 2/2(1+i) = 1(1+i) = 1+i

z = 0+1i = i, x = 0, y = 1, θ = π/2, ρ = √x2+y2 = √02+12 = √0+1 = √1 = 1, z = x+iy = ρ(cos(θ)+i·sin(θ)), z = 0+i = 1(cos(π/2)+i·sin(π/2)), in fact z = 1(cos(π/2)+i·sin(π/2)) = 1(0+i·1) = 1(i) = i

The trigonometric form of complex numbers is useful for simplifying the multiplication operation between complex numbers; z1 = 1+i, z2 = 1-i; z2 is the symmetrical of z1 with respect to the x-axis; z1 = 1+i = √2(cos(π/4))+i·sin(π/4)), z2 = 1-i = √2(cos(7π/4))+i·sin(7π/4)); z1·z2 = 2(cos(π/4)·cos(7π/4)-sin(π/4)·sin(7π/4)+i(cos(π/4)·sin(7π/4)+sin(π/4)·cos(7π/4)) = 2(cos(2π)+i·sin(2π)) = 2(1+i·0) = 2(1+0) = 2(1) = 2, and this is confirmed by z1·z2 = (1+i)(1-i) = 1·1+1(-i)+i·1+i(-i) = 1-i+i-i2 = 1-(-1) = 1+1 = 2

sin(α+β) = sin(α)·cos(β)+cos(α)·sin(β)

sin(α-β) = sin(α)·cos(β)-cos(α)·sin(β)

cos(α+β) = cos(α)·cos(β)-sin(α)·sin(β)

cos(α-β) = cos(α)·cos(β)+sin(α)·sin(β)

If we have 2 complex numbers, z1 of modulus ρ1 and argument θ1, and z2 of modulus ρ2 and argument θ2, then the product of complex numbers z1·z2 has as modulus the product of the moduli ρ1·ρ2 and as argument the sum of the arguments θ12; z1 = z2, ρ1 = ρ2, θ1 = 2θ; z1 = z3, ρ1 = ρ3, θ1 = 3θ

z is a complex number, n > 0, w is a complex number, nz = w, wn = z, the complex number z has modulus ρ and argument θ, the complex number w has modulus σ and argument φ, σn = ρ, nφ = θ+2kπ, σ = nρ, φ = (θ+2kπ)/n, to find the nth roots we have to vary k from 0 to n-1; to find the nth root of a complex number it is better to use the trigonometric form

Nth root of a complex number z: wn = z, σ = nρ, φ = (θ+2kπ)/n, nρ is the nth arithmetic root of ρ, and k is an integer between 0 and n-1

For n = 2, there are 2 nth roots of 1, both real roots, 1 and -1, so the square roots of the real number 1 are the real numbers 1 and -1; 21 = ± 1, 12 = 1, -12 = 1

For n = 3, there are 3 nth roots of 1, 1 root is real and 2 roots are complex; the real root is 1 because in the field of real numbers 31 = 1, in fact 13 = 1; z = a+bi, 1 = 1+0i, a = 1, b = 0, ρ = √a2+b2 = √12+02 = √1+0 = √1 = 1, θ = 0, 1 = 1(cos(0)+i⋅sin(0)); ρ' = 3ρ = 31 = 1; θ' = (θ+2kπ)/n with n = 3 and k from 0 to n-1, θ1' = (0+2·0π)/3 = 0, θ2' = (0+2·1π)/3 = (2/3)π, θ3' = (0+2·2π)/3 = (4/3)π; the first cube root of the real number 1 is 1(cos(0)+i·sin(0)) = 1(1+i0) = 1(1) = 1, that we had already found as it is the real root; the second cube root of the real number 1 is 1(cos(2π/3)+i·sin(2π/3) = 1(-1/2+i(√3/2)) = -1/2+i(√3/2); the third cube root of the real number 1 is 1(cos(4π/3)+i·sin(4π/3) = 1(-1/2-i(√3/2)) = -1/2-i(√3/2)

Find the roots of √i; there are 2 square roots of i which are 2 complex numbers that we can find using the trigonometric form; z = a+bi, z = i, a = 0, b = 1, i = 0+1i, ρ = 1, θ = π/2, z = ρ(cos(θ)+i·sin(θ)), i = 1(cos(π/2)+i·sin(π/2)) = cos(π/2)+i·sin(π/2); ρ' = √ρ = √1 = 1; θ' = (θ+2kπ)/n with n = 2 and k from 0 to n-1, θ1' = ((π/2)+2·0π)/2 = (π/2)/2 = π/4, θ2' = ((π/2)+2·1π)/2 = ((π/2)+2π)/2 = ((5/2)π)/2 = (5/4)π; the first square root of the complex number i is 1(cos(π/4)+i·sin(π/4)) = 1(√2/2+i√2/2) = √2/2+i√2/2 = √2/2(1+i); the second square root of the complex number i is 1(cos(5π/4)+i·sin(5π/4)) = 1(-√2/2-i√2/2) = -√2/2-i√2/2 = -√2/2(1+i)

The nth roots of a complex number z are n distinct numbers if z ≠ 0, if z = 0 the only root is 0; nz has n distinct roots if z ≠ 0, has only root 0 if z = 0; the modulus of z is ρ, |z| = ρ, then the modulus of all nth roots is the nth root of ρ, ρ' = nρ; in Argand-Gauss plane the roots are in the circle that has radius equal to the nth root of ρ, and the nth roots are the vertices of a regular polygon with n sides inscribed in the circle with radius nρ; if n = 2 it is a degenerate polygon, they are two points diametrically opposite to the origin, if n = 3 it is a triangle, if n = 4 it is a square, and so on

All complex numbers other than 0 have n roots, that is the equation xn = z has n solutions in the field of complex numbers; in the equation a0xn+a1xn-1+...+an = 0, where a denotes any complex number, then by the fundamental theorem of algebra an equation of this type always has n roots; to be precise, the fundamental theorem of algebra says that an equation of this type has 1 root, but if there is 1 root then it can be shown that there are n roots; the polynomial (x-i)3 = 0 apparently has only one root, x = i, but the roots must be counted with their multiplicity; we consider the polynomial P(x) and the number z is its root, then the polynomial P(x) is divisible by (x-z), P(x) = (x-z)⋅Q(x), and it is possible that also Q(x) is divisible by (x-z), P(x) = (x-z)2⋅Q1(x), reaching the last root P(x) = (x-z)mQ(x), this is multiplicity; the polynomial (x-i)3 = 0, has root x = i of multiplicity 3; therefore if we consider the multiplicity of the roots, then the number of the roots is equal to n

Fundamental theorem of algebra: let p(x) ∈ ℂ[x] be a polynomial of degree n > 0, then p(x) is a product of complex polynomials of first degree, in particular p(x) always has at least one root in ℂ; the fundamental theorem of algebra concerns complex polynomials, if p(x) is a polynomial of degree n positive with complex coefficients, then p(x) is a product of complex polynomials of degree 1, and consequently p(x) has always at least one complex root; the fundamental theorem of algebra is true in the field of complex numbers, it is not true in the field of real numbers because a real polynomial may not be a product of first degree polynomials and may have no root; it is important to remember that the roots must be considered with their multiplicity, for example the polynomial (x-i)3 has only the root x = i, but this root is triple, that is multiplicity equal to 3

A consequence of the fundamental theorem of algebra is the identity principle, that is if 2 complex or real polynomials assume the same values for each value of the variable x, then they have the same coefficients


32 - EIGENVALUES AND EIGENVECTORS OF AN ENDOMORPHISM

An endomorphism is a linear application of a vector space ℝn in itself, ℝn → ℝn

Considering a linear application of ℝ2 in itself, f: ℝ2 → ℝ2, that is an endomorphism of ℝ2, defined as f(x,y) = (x+y,-y), and this linear application is associated with a matrix A = [(1,1),(0,-1)]; we look for a vector v = (x,y) ≠ (0,0) such that f(x,y) = λ(x,y), so we look for a pair (x,y) different from (0,0) which transformed by f gives the same pair multiplied by any real number λ; {x+y = λx, -y = λy}, {x+y-λx = 0, -y-λy = 0}, {x+y-λx = 0, -y-λy = 0}, {(1-λ)x+y = 0, (-1-λ)y = 0}, this is a homogeneous system of 2 equations in 2 unknowns, therefore it admits solutions other than solution (0,0) only if there is a free unknown; Aλ = [(1-λ,1),(0,-1-λ)], this matrix must have a rank lower than the maximum, λI = [(λ,0),(0,λ)], Aλ = A-λI, a square matrix has rank less than the maximum when its determinant is equal to 0, |A-λI| = 0, (1-λ)(-1-λ) = 0, this equation reveals for which real numbers λ it is possible to find a pair (x,y) different from (0,0) which through f is transformed into its own multiple according to λ, 1-λ1 = 0, λ1 = 1, -1-λ2 = 0, λ2 = -1; to find the pairs (x, y) that satisfy this condition we must use the system {(1-λ)x+y = 0, (-1-λ)y = 0}, where instead of λ we have to substitute λ1 and λ2; λ1 = 1, {(1-λ)x+y = 0, (-1-λ)y = 0}, {(1-1)x+y = 0, (-1-1)y = 0}, {0x+y = 0, -2y = 0}, {y = 0, y = 0}; λ2 = -1, {(1-λ)x+y = 0, (-1-λ)y = 0}, {(1-(-1))x+y = 0, (-1-(-1))y = 0}, {(1+1)x+y = 0, (-1+1)y = 0}, {2x+y = 0, 0y = 0}, {y = -2x, 0 = 0}; if λ1 = 1, the solutions are all pairs with y = 0, all pairs (x,0); if λ2 = -1, the solutions are all pairs with y = -2x, all pairs (x,-2x); the real numbers λ1 = 1 and λ2 = -1 are the eigenvalues of the endomorphism f and are the solutions of the equation |A-λI| = 0 or det (A-λI) = 0; the eigenvectors, that are the pairs (x,y), such that f(x,y) = λ(x,y), are the solutions of the system {(1-λ)x+y = 0, (-1-λ)y = 0} where instead of λ we put λ1 and λ2

f: ℝ2 → ℝ2, A = [(1,1),(-1,1)], is a square matrix of an endomorphism of ℝ2, f(x,y) = (x+y,-x+y); we look for the numbers λ such that f(x,y) = λ(x,y), A-λI = [(1-λ,1), (-1,1-λ)], |A-λI| = det(A-λI) = (1-λ)(1-λ)-(1)(-1) = (1-λ)2+1 = 0, this is our equation to find the λ values or the eigenvalues of endomorphism f and if λ is a real number this equation cannot be equal to zero because it is always > 0; we must consider f: ℂ2 → ℂ2, because in the field of complex numbers the equation (1-λ)2+1 = 0 has roots, (1-λ)2 = -1, 1-λ = ±√-1, 1-λ = ±i, λ = 1±i, so there are 2 roots, λ1 = 1-i, λ2 = 1+i; therefore in the complex field there are 2 eigenvalues λ1 = 1+i and λ2 = 1-i; and it is possible to find the eigenvectors that have properties relating to the eigenvalues of transforming into their own multiples; this example shows that sometimes these numbers λ are not real numbers, but are complex numbers

f: ℝn → ℝn, n ≥ 1, is a linear application of ℝn in itself or an endomorphism, and λ is an eigenvalue of f if there is a non-zero vector v such that we have: f(v) = λv; any v such that f(v) = λ is called the eigenvector of f relative to λ

With λ we indicate an eigenvalue, with v we indicate the eigenvectors such that f(v) = λv, with v0 we indicate the null vector, with Vλ we indicate the eigenspace that is a subset of ℝn consisting of the eigenvectors and the null vector, Vλ ⊆ ℝn

We have to find the eigenvalues and consequently the eigenvectors of an endomorphism; f: ℝn → ℝn, f is an endomorphism of ℝn, we want to find v such that f(v) = λv, a v is a tuple, v = (x1,...,xn) = X, X is a column, A is the matrix associated with our linear application, therefore A is a square matrix of order n; to find the vector of ℝn, image of the column vector X, we make the product AX, then AX = λX = λIX, therefore AX-(λI)X = 0 , (A-λI)X = 0, we want this linear system in the unknowns X to have non-zero solutions; a homogeneous linear system has non-zero solutions when the matrix A-λI has rank < n that is when det(A-λI) = 0

A-λI = [(a1,1-λ,...,a1,n),...,(an,1,...,an,n-λ)]; det(A-λI) = 0; we develop the determinant of the square matrix of order n with Laplace's rule and we obtain a polynomial in the unknown λ, P(λ) = (-1)nλn+..., and it is a polynomial in λ of degree n, and this is the characteristic polynomial of the matrix or of the endomorphism f of the linear application; to find the eigenvalues we look for the roots of the characteristic polynomial, P (λ) = 0; the characteristic polynomial, if we are dealing with an endomorphism of ℝn, is a polynomial of degree n with real coefficients; in the field of real numbers, the characteristic polynomial could also have no roots or have less than n, for example P(λ) = λ2+1 has no real roots; in the field of complex numbers, the characteristic polynomial has n roots and each of these must be counted with the multiplicity, for example in the polynomial P(λ) = (λ-λ1)3(λ-λ2)4 there are multiple roots

If we want to find the eigenvalues of a linear application of ℝn in ℝn, we must look for the characteristic polynomial obtained from det(A-λI) = 0, and this is an equation of degree n in λ

A = [(0,0,1),(0,0,0),(0,0,3)], this matrix corresponds to an endomorphism of ℝ3, f(x,y,z) = (z,0,3z), we must look for the eigenvalues of this matrix, that is, we must look for the eigenvalues of the endomorphism of ℝ3; A-λI = [(-λ,0,1),(0,-λ,0),(0,0,3-λ)] and the characteristic polynomial is det(A-λI) = |A-λI|; if a matrix has all the null elements below or above the diagonal, it is called a triangular matrix, then its determinant is the product of the elements of the diagonal, so in this case det(A-λI) = |A-λI| = P(λ) = (-λ)(-λ)(3-λ) = λ2(3-λ) = 0, λ1 = 3 with multiplicity 1 or simple root, λ2 = 0 with multiplicity 2 or double root, therefore this matrix A or the endomorphism f has 3 real eigenvalues, λ1 = 3 with multiplicity 1 and λ2 = 0 with multiplicity 2; to find the eigenvectors we have to use λ1 and λ2 separately; λ1 = 3, we have to rewrite the linear system corresponding to the matrix A-λI = [(-λ,0,1),(0,-λ,0),(0,0,3-λ)] where instead of λ we substitute the value 3, {-3x+z = 0, -3y = 0, 0 = 0}, there are 2 independent equations so there are non-zero solutions, y = 0, z = 3x, the most general solution is formed by the vectors of the form (x,0,3x), there is a free unknown and a base of vector space, the autospace Vλ1 is formed by the vector (1,0,3) which is obtained by substituting x = 1, that is the only free unknown equal to 1, and is a vector space of dimension 1; λ2 = 0, we have to rewrite the linear system corresponding to the matrix A-λI = [(-λ,0,1),(0,-λ,0),(0,0,3-λ)] where instead of λ we substitute the value 0, {z = 0, 0 = 0, 3z = 0}, there is only one distinct equation, the solution is (x,y,0) where x and y are free unknowns, the space vector Vλ2 has 2 vectors because 2 are the free unknowns, therefore a base of the vector space Vλ2 is formed by 2 linearly independent vectors because 2 are the free unknowns, (1,0,0), (0,1,0), therefore Vλ2 is a subspace of dimension 2; the eigenspace Vλ1 has a base formed by the vector (1,0,3), the eigenspace Vλ2 has a base formed by 2 vectors (1,0,0), (0,1,0); in this example an eigenvalue of multiplicity 1 has generated an eigenspace of dimension 1, and an eigenvalue of multiplicity 2 has generated an eigenspace of dimension 2, but it is only a coincidence

A = [(0,0,2),(0,0,0),(0,0,0)], this matrix corresponds to the linear application of ℝ3 in ℝ3, f(x,y,z) = (2z,0,0); we must find the characteristic polynomial; A-λI = [(-λ,0,2),(0,-λ,0),(0,0,-λ)], det(A-λI) = |A-λI| = P(λ) = (-λ)(-λ)(-λ) = -λ3 = 0, the only solution is λ1 = 0 with multiplicity m1 = 3

; we look for the eigenvectors relative to the eigenvalue λ1 = 0, in the matrix A-λI we have to replace λ with 0 and we get again the matrix A with the system {2z = 0, 0 = 0, 0 = 0}, therefore we must study the system AX = 0, that is, if the eigenvalue is λ = 0, the corresponding eigenvectors form the eigenspace relative to λ, but they are actually the vectors of the nucleus of the linear application or ker (f); the equation 2z = 0 has as solution (x,y,0), and the bases of the eigenspace are the vectors (1,0,0) and (0,1,0); therefore the eigenspace which coincides with ker(f) has dimension 2 which is the number of vectors of the basis which are (1,0,0) and (0,1,0); in this example the eigenspace has dimension 2, but it originates from λ1 = 0 with multiplicity 3; therefore there is no relation between the dimension of the vector space of the eigenvectors, the eigenspace, and the multiplicity of the eigenvalue which is obtained as the root of the characteristic polynomial; in this example there is only 1 eigenvalue, its multiplicity is 3, equal to the dimension of the vector space ℝ3

By definition, an eigenvalue is a number λ such that there are non-zero vectors and f(v) = λv, so by definition the eigenspace of a number λ cannot be the only zero, because this negates the definitions of eigenvalue and eigenvector, therefore it is impossible that Vλ = {0n}

A = [(3,0,0),(0,5,0),(0,0,-2)] we look for the eigenvalues of this matrix; A-λI = [(3-λ,0,0),(0,5-λ,0),(0,0,-2-λ)], matrix A-λI, like matrix A, is a diagonal matrix, that is a matrix where only the elements of the main diagonal are non-zero; the determinant of the matrix A-λI, which is its characteristic polynomial, is the product of the elements of the diagonal, det(A-λI) = |A-λI| = P(λ) = (3-λ)(5-λ)(-2-λ), the roots of this characteristic polynomial are λ1 = 3, λ2 = 5, λ3 = -2; if a matrix is diagonal the eigenvalues are equal to the n elements of the diagonal, and if in the diagonal there are elements equal to each other we find eigenvalues with multiplicity > 1

A = [(3,0,0),(0,5,0),(0,0,3)], A-λI = [(3-λ,0,0),(0,5-λ),(0,0,3-λ)], det(A-λI) = |A-λI| = P(λ) = (3-λ)(5-λ)(3-λ) = (3-λ)2(5-λ), λ1 = 3 with multiplicity m1 = 2, λ2 = 5 with multiplicity m2 = 1; we calculate the eigenvector relative to λ1 = 3, therefore in the matrix A-λI we replace λ with the value 3, A-λI = [(3-λ,0,0),(0,5-λ),(0,0,3-λ)] = [(3-3,0,0),(0,5-3,0),(0,0,3-3)] = [(0,0,0),(0,2,0),(0,0,0)], therefore the only equation is 2y = 0, and x and z are free unknowns, the eigenspace has dimension 2 which in this case is equal to the multiplicity of the eigenvalue λ1 = 3 with multiplicity m1 = 2; the equation 2y = 0 has as solution (x,0,z), and the bases of the eigenspace are the vectors (1,0,0) and (0,0,1), therefore the eigenspace relative to the eigenvalue λ1 = 3 has dimension 2

To find the eigenvalues we use the characteristic polynomial P(λ) which is obtained by calculating the determinant of the matrix A-λI and setting it equal to 0, P(λ) = det(A-λI) = |A-λI| = 0; these eigenvalues can be in the real field, and some can have multiplicity > 1; the corresponding eigenspaces always have dimension at least 1, their dimension is never 0; to find the eigenvectors that form the eigenspace we have to solve the homogeneous linear system (A-λI)X = 0, putting the value of the eigenvalue in the place of λ


33 - THE DIAGONALIZATION OF SQUARE MATRICES

Considering an endomorphism of ℝn, we can calculate the dimension of the eigenspaces; an eigenspace always has dimension ≥ 1; A = [(0,0,2),(0,0,0),(0,0,0)], this is a square matrix of order 3, so it represents an endomorphism of ℝ3, and if we want to find eigenvalues and eigenspaces we have to compute the determinant of the matrix A-λI = [(-λ,0,2),(0,-λ,0),(0,0,-λ)] = 0, (-λ)3 = 0, λ1 = 0 with multiplicity m1 = 3; the only eigenspace of this endomorphism is calculated by replacing λ in the matrix A-λI with the value λ1 = 0, so A-λI = [(0,0,2),(0,0,0),(0,0,0)], and therefore the homogeneous linear system obtained is {2z = 0, 0 = 0, 0 = 0}, x and y are free unknowns and the solution of the system is (x,y,0); Vλ1 is the eigenspace generated by vectors (1,0,0) and (0,1,0), Vλ1 = L((1,0,0),(0,1,0)); to the eigenvalue λ1 = 0 with multiplicity m1 = 3 is related the eigenspace with dimension d1 = 2 and d1 < m1; the dimension of the eigenspace must always be ≥ 1

A = [(0,2,0),(0,0,1),(0,0,0)], we need to find the eigenvalues and eigenvectors of this matrix; A-λI = [(-λ,2,0),(0,-λ,1),(0,0,-λ)], is a triangular matrix so the determinant is calculated by multiplying the elements of the main diagonal, det(A-λI) = (-λ)(-λ)(-λ) = -λ3, -λ3 = 0, so λ1 = 0 with multiplicity m1 = 3; to find the eigenspace corresponding to λ1 = 0, in the matrix A-λI = [(-λ,2,0),(0,-λ,1),(0,0,-λ)] we have to substitute λ with 0 and we get the matrix [(0,2,0),(0,0,1),(0,0,0)] which is equal to matrix A, and from this matrix we write the corresponding system of equations {2y = 0, z = 0, 0 = 0}, therefore x is a free unknown and the solutions are of the form (x,0,0), so a base of Vλ1 is formed by the only vector (1,0,0) from which we deduce that dim(V1) = d1 = 1, and the value 1 is the minimum possible, and the value 1 is different from the multiplicity m1 = 3 of λ1 = 0

To calculate the dimension of the autospace we look for the solutions of the homogeneous system of the autospace and count the elements of the base, but there is also another method; looking for the eigenspace means looking for the solutions of the homogeneous system (A-λI)X = 0, that is looking for the nucleus of the linear application associated with the matrix A-λI and this is a new linear application that we denote by fn → ℝn, and f is f-λ⋅idn, that is the application f associated to the matrix A minus lambda times the identical linear application of ℝn and to each element is associated f(x,y,z)-λ(x,y,z); it is the nucleus of the linear application associated with the matrix A-λI, and we must calculate the dimension of the nucleus without doing the calculations to find a base for the nucleus; f is a linear application between ℝn and ℝn, the nucleus dimension of this linear application is equal to n minus the image dimension, dim(Ker(f)) = n-ρ(A-λI); to calculate the dimension of the nucleus, that is the eigenspace relative to the eigenvalue λ, we can use the formula dim(Ker(f)) = n-ρ(A-λI)

Calculate the dimension of the nucleus of A = [(0,2,0),(0,0,1),(0,0,0)]; n = 3, A-λI = [(-λ,2,0),(0,-λ,1),(0,0,-λ)], λ = 0, A-λI = A = [(0,2,0),(0,0,1),(0,0,0)], ρ(A-λI) = 2, because matrix A is reduced by rows and has 2 non-zero rows, dim(Ker(f)) = n-ρ(A-λI) = 3-2 = 1, which is the dimension of the nucleus

Calculate the dimension of the nucleus of A = [(0,0,2),(0,0,0),(0,0,0)]; A-λI = [(-λ,0,2),(0,-λ,0),(0,0,-λ)], det(A-λI) = (-λ)(-λ)(-λ) = -λ3, -λ3 = 0, λ1 = 0 with m1 = 3, the only eigenvalue is the null eigenvalue of multiplicity 3; dim(Ker(f)) = dim(Vλ1) = n-ρ(A-λI) = 3-1 = 2

In general the eigenspace is the nucleus of the linear application associated with the matrix A-λI, therefore the dimension of the eigenspace relative to the eigenvalue λ is dim(Vλ) = n-ρ(A-λI)

The eigenspace relating to an eigenvalue can have a variable dimension; the dimension of the eigenspace relative to an eigenvalue is always ≥ 1 and is always ≤ the multiplicity of the eigenvalue as the root of the characteristic polynomial, and this is a theorem

λ is an eigenvalue of multiplicity m, Vλ is the eigenspace associated with λ, then 1 ≤ dim(Vλ) ≤ m

λ is an eigenvalue of multiplicity m of the endomorphism f: ℝn → ℝn, Vλ is the eigenspace associated with λ or the set of eigenvectors relative to the eigenvalue λ, then 1 ≤ dim(Vλ) ≤ m, the dimension of the eigenspace associated with the eigenvalue λ is always between 1 and the multiplicity of the eigenvalue λ as the root of the characteristic polynomial

We need to understand how we can diagonalize a matrix; we have a linear application f: ℝn → ℝn with the associated matrix A; and this matrix A is diagonalizable if we can transform it with a standard operation into a diagonal matrix; a matrix is called a diagonal when on the main diagonal there are any elements and all the other elements are 0; an example of a diagonal matrix is D = [(1,0,0),(0,2,0),(0,0,-1)]; an example of a non-diagonal matrix is A = [(1,1),(0,1)]; only some matrices can be diagonalized, because only some matrices, after a suitable transformation, can become diagonal

Considering an endomorphism f: ℝn → ℝn with the associated matrix A, and with eigenvalues λ1,...,λr, with multiplicity m1,...,mr, as roots of the characteristic polynomial P(λ) = (λ-λ1)m1⋅...; the characteristic polynomial can have all real roots, or some real roots and some complex roots, or all complex roots; the characteristic polynomial λ2+1 = 0 has no real roots; the characteristic polynomial (λ-1)(λ+1)(λ2+2) has real and complex roots; we consider a characteristic polynomial with all real roots, that is when λ1,...,λr are all real roots with their multiplicities m1,...,mr, and the corresponding eigenspaces are Vλ1,...,Vλr with their dimensions d1,...,dr, and 1 ≤ dim(Vλ) ≤ m, if d1 = m1, ..., dr = mr then f is a simple endomorphism; if all the eigenspaces have the dimension equal to the maximum possible then f is a simple endomorphism

If all the eigenvalues are real and all the dimensions of the eigenspaces coincide with the multiplicities of the eigenvalues as roots of the characteristic polynomial, then f is a simple endomorphism; the diagonalization is related to the condition of simple endomorphism; endomorphism is simple when all the eigenvalues are real and the dimensions of the eigenpaces are as high as possible

Endomorphism f(x,y) = (x+y,-y) with associated matrix A = [(1,1),(0,-1)]; A-λI = [(1-λ,1),(0,-1-λ)], det(A-λI) = |(1-λ,1),(0,-1-λ)| = P(λ) = (1-λ)(-1-λ), (1-λ)(-1-λ) = 0, λ1 = 1, λ2 = -1; λ1 = 1 and λ2 = -1 are two eigenvalues, and in general the number of eigenvalues of an endomorphism is n, therefore all the eigenvalues of this endomorphism are real numbers; the dimension of the eigenspace is obtained by calculating the rank of the matrix A-λI after having replaced λ with its value; λ1 = 1, A-λ1I = [(1-λ1,1),(0,-1-λ1)] = [(1-1,1),(0,-1-1)] = [(0,1),(0,-2)], the rank of the matrix A-λ1I is 1, so the rank of the eigenspace of the eigenvalue λ1 is 1, dim(Vλ1) = 1 = multiplicity of λ1; λ2 = -1, A-λ2I = [(1-λ2,1),(0,-1-λ2)] = [(1-(-1),1),(0,-1-(-1))] = [(1+1,1),(0,-1+1)] = [(2,1),(0,0)], the rank of the matrix A-λ2I is 1, so the rank of the eigenspace of the eigenvalue λ2 is 1, dim(Vλ2) = 1 = multiplicity of λ2; this is a simple endomorphism

If an endomorphism ℝn → ℝn has eigenvalues λ1,...,λn all real and distinct, distinct because the multiplicity is 1, m1 = 1, ..., mn = 1, the corresponding eigenspaces all have dimension 1, dim(Vλ1) = 1, ..., dim(Vλn) = 1; if all the eigenvalues are real and have multiplicity 1, that is when all the eigenvalues are real and distinct, then all the eigenspaces have dimension 1, and this happens in a simple endomorphism

A = [(0,0,2),(0,0,0),(0,0,0)]; A-λI = [(-λ,0,2),(0,-λ,0),(0,0,-λ)], det(A-λI) = (-λ)(-λ)(-λ) = -λ3, -λ3 = 0, λ1 = 0 with multiplicity m1 = 3; dim(Vλ1) = n-ρ(A-λI) = 3-1 = 2; this is an example of endomorphism which is not simple because the dimension of the autospace Vλ1 is less than the multiplicity of λ1

A matrix A is diagonalizable if it can be transformed into a diagonal matrix; suppose that there exists a square matrix P of the same order n as matrix A, and P is an invertible matrix so it admits the inverse matrix P-1; P-1⋅A⋅P = D, the matrices A, P, P-1 are 3 square matrices of order n, therefore also D is a square matrix of order n, and the matrix D is said to be similar to A; if we can find an invertible matrix P such that matrix D is diagonal, then A is a diagonalizable matrix

There are matrices that are already diagonal and therefore are certainly diagonalizable, for example, A = [(1,0),(0,2)] is a diagonal matrix and to diagonalize a matrix that is already diagonal P = I, I-1⋅A⋅I = A because I-1 = I, so diagonal matrices are diagonalizable

f(x,y) = (x+y,-y) is a simple endomorphism and we need to understand if the matrix is diagonalizable; {x+y = 0, -y = 0}, A = [(1,1),(0,-1)]; A-λI = [(1-λ,1),(0,-1-λ)], det(A-λI) = P(λ) = (1-λ)(-1-λ), (1-λ)(-1-λ) = 0, 1-λ = 0, λ1 = 1, -1-λ = 0, λ2 = -1; eigenvalue λ1 = 1, A-λ1I = [(1-λ1,1),(0,-1-λ1)] = [(1-1,1),(0,-1-1)] = [(0,1),(0,-2)], {y = 0, -2y = 0}, y = 0, eigenvector Vλ1 = (1,0); eigenvalue λ2 = -1, A-λ1I = [(1-λ1,1),(0,-1-λ1)] = [(1-(-1),1),(0,-1-(-1)] = [(1+1,1),(0,-1+1)] = [(2,1),(0,0)], {2x+y = 0, 0 = 0}, y = -2x, eigenvector Vλ2 = (1,-2); P = [(1,1),(0,-2)], the columns of this matrix P are the eigenvectors Vλ1 and Vλ2, this matrix P is certainly invertible because the determinant of P is different from 0, det(P) = 1(-2)-1⋅0 = -2 ≠ 0; to calculate the matrix P-1, that is the inverse matrix of the matrix P, we use the method of algebraic complements, P-1 = -1/2[(-2,-1),(0,1)] = [(1,1/2),(0,-1/2)]; P⋅P-1 = [(1,1),(0,-2)][(1,1/2),(0,-1/2)], a1,1 = 1⋅1+1⋅0 = 1+0 = 1, a1,2 = 1(1/2)+1(-1/2) = (1/2)-(1/2) = 0, a2,1 = 0⋅1+(-2)0 = 0+0 = 0, a2,2 = 0(1/2)+(-2)(-1/2) = 0+1 = 1, P⋅P-1 = [(1,1),(0,-2)][(1,1/2),(0,-1/2)] = [(1,0),(0,1)] = I; P-1⋅A = [(1,1/2),(0,-1/2)][(1,1),(0,-1)] = a1,1 = 1⋅1+(1/2)0 = 1+0 = 1, a2,1 = 1⋅1+(1/2)(-1) = 1-(1/2) = 1/2, a2,1 = 0⋅1+(-1/2)0 = 0+0 = 0, a2,2 = 0⋅1+(-1/2)(-1) = 0+1/2 = 1/2, P-1⋅A = [(1,1/2),(0,-1/2)][(1,1),(0,-1)] = [(1,1/2),(0,1/2)]; P-1⋅A⋅P = [(1,1/2),(0,1/2)][(1,1),(0,-2)], a1,1 = 1⋅1+(1/2)0 = 1+0 = 1, a1,2 = 1⋅1+(1/2)(-2) = 1-1 = 0, a2,1 = 0⋅1+(1/2)0 = 0+0 = 0, a2,2 = 0⋅1+(1/2)(-2) = 0-1 = -1, P-1⋅A⋅P = [(1,1/2),(0,1/2)][(1,1),(0,-2)] = [(1,0),(0,-1)] = D, the result is the matrix D which is a diagonal matrix, and the elements of the diagonal of matrix D are the eigenvalues of matrix A which are λ1 = 1 e λ2 = -1

If f: ℝn → ℝn is a simple endomorphism, then there are real eigenvalues λ1,...,λr, with multiplicity m1,...,mr, there are the eigenspaces Vλ1,...,Vλr, and dim(Vλ1) = m1, ..., dim(Vλr) = mr, each of these eigenspaces has a base, and m1+...+mr = n, v,...,vn is the set of base vectors, and these base vectors are the columns of the matrix P; vectors v,...,vn are linearly independent being bases of different eigenspaces; the matrix P has the columns formed by n linearly independent vectors, then the matrix P has maximum rank, ρ(P) = n, therefore the determinant is different from 0, det(P) ≠ 0, therefore P is invertible, then we can use the formula P-1⋅A⋅P = D, where D is a diagonal matrix having on the main diagonal the eigenvalues of matrix A and elsewhere all 0; D = P-1⋅A⋅P = [(λ1,0,...,0,0),(0,λ1,...,0,0),...,(0,0,...,0,λn)], the number of times that an eigenvalue is repeated on the main diagonal is its multiplicity, the eigenvalues of the matrix A are repeated on the diagonal of the matrix D as many times as is their multiplicity as roots of the characteristic polynomial; if f is a simple endomorphism, that is if it has all the real eigenvalues, and the sum of the multiplicity of the eigenvalues is equal to n, which is the dimension of ℝn, and if the dimension of each eigenspace is equal to the multiplicity of the corresponding eigenvalue, then the matrix A associated with the linear application f, which is a simple endomorphism, is a diagonalizable matrix, so there is a matrix P which is invertible such that the product P-1⋅A⋅P is the diagonal matrix D; the matrix P is obtained by inserting in column 1 the vectors of a base of Vλ1, up to column n where we insert the vectors of a base of Vλn, considering that every eigenspace has a base, and on the diagonal of the diagonalized matrix D we obtain the eigenvalues of the matrix A repeated each with their multiplicity

Not all endomorphisms are simple, therefore not all matrices are diagonalizable; for example the matrix A = [(0,0,2), (0,0,0), (0,0,0)] is not diagonalizable, therefore the result of the product P-1⋅A⋅P is not a diagonal matrix; the matrix A = [(0,0,2),(0,0,0),(0,0,0)] cannot be diagonalized because there are too few eigenvectors, the eigenspace associated with the only eigenvalue has dimension 2, so it is not possible to create the column 3 of matrix P

The characteristic polynomial λ2+1, in ℂ admits 2 roots which are i and -i; a similar characteristic polynomial certainly generates a simple endomorphism so the matrix is diagonalizable, but this is true in ℂ, it is not true in ℝ

Eigenvalue: ∃ v ≠ 0 so that f(v) = λv, and det(A-λI) = 0; the eigenvalues are calculated by solving the equation det(A-λI) = 0

Eigenspace: (A-λI)X = 0; the eigenspaces are computed by solving the system (A-λI)X = 0

D = P-1AP = [(λ1,0,...,0,0),(0,λ1,...,0,0),...,(0,0,...,0,λn)], this is when the matrix A is diagonalizable


34 - CONCEPT OF DERIVATIVE

The function f(x) = x2 has a parabola as its graph; we consider two points with abscissa x0 and x, and we consider the angular coefficient of the secant joining the point (x0,f(x0)) and (x,f(x)); the slope is given by the ratio between the variation of the function and the variation of the independent variable, r(x) = (f(x)-f(x0))/(x-x0) = (x2-x02)/(x-x0), we are interested in the limit of this ratio which is a function defined for all x except for x = x0, and we are interested in the limit of this function when x = x0 because this ratio represents the angular coefficient, that is the slope, of the secant to the graph of the function f passing through the points x0 and x, and the limit of this function represents the slope, that is the angular coefficient of the tangent to the graph at the point x0; factoring the numerator (x2-x02)/(x-x0) = ((x-x0)(x+x0))/(x-x0) = x+x0, that is the function for all x ≠ x0 coincides with the function x+x0 which is a polynomial function of first degree and is defined on all ℝ, is continuous, and the limit it tends to is 2x0; if we calculate the slope of the tangent at a point of abscissa x0 to the parabola which is the graph of the function f(x) = x2, then the slope is 2x0; when x0 is positive, the slope is positive; when x0 = 0, the slope is zero; when x0 is negative, the slope is negative; f(x) = x2, x0, x, r(x) = (f(x)-f(x0))/(x-x0) = (x2-x02)/(x-x0) = ((x-x0)(x+x0))/(x-x0) = x+x0 → 2x0

Galielo Galilei, born in Pisa in 1564 and died in Arcetri in 1642, studied the falling bodies and their naturally accelerated motion using inclined planes, and discovered that the spaces covered by the falling bodies are proportional to the square of the times

s(t) = (g/2)t2, s(t) is the space traveled by a falling body after the time t; we want to calculate the average speed in the time interval between t0 and t, and we calculate the limit of this average speed by tending t to t0 obtaining the instantaneous speed; (s(t)-s(t0))/(t-t0) = mean velocity relative to the time interval between t0 and t; (s(t)-s(t0))/(t-t0) = ((g/2)t2-(g/2)t02)/(t-t0) = (g/2)(t2-t02)/(t-t0) = (g/2)(t-t0)(t+t0)(t-t0) = (g/2)(t+t0), when t tends to t0 (g/2)(t+t0) = (g/2)(t0+t0) = (g/2)(2t0) = g⋅t0, therefore the limit exists and is g⋅t0; the instantaneous speed, which is the limit of the average speed, exists at every point, is well defined and is g⋅t0

s(t) = (g/2)t2, (s(t+h)-s(t))/h = ((g/2)(t+h)2-(g/2)t2)/h = (g/2)((t+h)2-t2)/h = (g/2)(t2+2th+h2-t2)/h = (g/2)(2th+h2)/h = (g/2)(2t+h), the limit for h tending to 0 is (g/2)(2t) = gt

f: I → ℝ, x0 ∈ I, x ≠ x0, (f(x)-f(x0))/(x-x0), this incremental ratio is the angular coefficient of the secant to the graph and passes through the abscissa points x0 and x; this function, which is an incremental ratio, is defined for all x of the interval I, except the point x0, and we are interested in the limit for x which approaches x0; limx→x0((f(x)-f(x0))/(x-x0)) = f'(x0); limx→x0((f(x)-f(x0))/(x-x0)) = f'(x0), Df(x0) = Df(x)|x=x0, this is the derivative of the function f at the point x0

f: I → ℝ, x0 ∈ I, x ≠ x0, limx→x0((f(x)-f(x0))/(x-x0)) = f'(x0)

The derivative is the limit of the incremental ratio: limx→x0((f(x)-f(x0))/(x-x0)) = f'(x0)

The incremental ratio is (f(x)-f(x0))/(x-x0), ratio between the variation of the function and the variation of the independent variable; incremental in this context means variation

f(x) = mx+q, is a polynomial function of first degree; we take 2 points on a straight line, the straight line joining these 2 points is the starting line; for polynomial functions of degree ≤ 1 the incremental ratio must be constant; f(x) = mx+q, f(x0) = mx0+q, (f(x)-f(x0))/(x-x0) = (mx+q-mx0-q)/(x-x0) = m(x-x0)/(x-x0) = m

We compute the derivative of xn, where n is a natural number; f(x) = xn, (f(x+h)-f(x))/h = ((x+h)n-xn)/h, we must apply the binomial formula, ((x+h)n-xn)/h = (xn+nxn-1h+C(n,2)xn-2h2+...+hn-xn)/h = nxn-1+C(n,2)xn-2h+hn-1, by calculating the limit for h which tends to zero all the addends containing h are canceled, nxn-1+C(n,2)xn-2h+hn-1 = nxn-1, so the derivative of xn is nxn-1

f(x) = c, f'(x) = 0; f(x) = x, f'(x) = 1;f(x) = x2, f'(x) = 2x; f(x) = xn, f'(x) = nxn-1

f(x) = x0 = 1, f'(x) = 0, because x0 = 1 = c

A function is differentiable when the derivative exists and is a real number; if the limit of the incremental ratio is +∞ or -∞ the function is not defined as differentiable, but derivative at +∞ or -∞

f(x) = c, f'(x) = 0; f(x) = x, f'(x) = 1;f(x) = x2, f'(x) = 2x; f(x) = xn, f'(x) = nxn-1; these are all derivable functions in all the points of their natural domain which is all ℝ; the derivative function is the function which, to each x of the domain of the starting function, associates the value of the derivative calculated at that point

The function f is said to be differentiable at a point if its derivative exists and is finite at the same point

Continuity is a necessary but not sufficient condition for derivability; a function that can be differentiated at a point is certainly continuous at that point, but a function that is continuous at a point may not be differentiable at that point

f: I → ℝ, in x0 the function is differentiable and we must prove that it is also continuous; we must show that the limit of f(x) for x tending to x0 is f(x0), that is the difference f(x)-f(x0) tends to zero as x approaches x0; f(x)-f(x0) = ((f(x)-f(x0))/(x-x0))(x-x0), calculating the limit for x tending to x0 we obtain f'(x0)⋅0, therefore by the product limit theorem everything tends to 0

A function can be continuous at a point but not differentiable at that point; f(x) = |x|, the graph is the union of the bisectors of the first and second quadrant, and this function is continuous everywhere but cannot be differentiated in the origin; this curve which is actually a broken line does not have a tangent in the origin, in the origin the derivative does not exist; the incremental ratio of the function f(x) = |x| in the origin is (|x|-0)/(x-0) = |x|/x, this incremental ratio is 1 when x is positive, because the absolute value of x coincides with x, but it is -1 for negative x, therefore this incremental ratio does not admit a limit, but admits a limit on the right which is 1, and in fact in the first quadrant the graph is a ray with angular coefficient 1, and admits the limit to the left which is -1, and in fact in the second quadrant the graph is a ray with angular coefficient -1; therefore in these cases we must consider the derivative on the right and the derivative on the left

The derivative on the right is the limit of the incremental ratio when x approaches x0+ or from the right; the derivative on the left is the limit of the incremental ratio when x approaches x0- or from the left

The derivative at a point exists if and only if at the same point the derivative on the right and the derivative on the left exist and coincide

We know how to calculate the derivative of monomial functions, but now we are interested in calculating the derivative of the circular functions, sine and cosine

Using a prostaferesis formula, sin(x)-sin(x0) = 2(sin((x-x0)/2))(cos((x+x0)/2)), we get the incremental ratio of the sine function, (sin(x)-sin(x0))/(x-x0) = (2/(x-x0))(sin((x-x0)/2))(cos((x+x0)/2)), we have to calculate the limit for x which approaches x0, (x-x0)/2 = t, the limit for x tending to x0 is the limit for t tending to 0, we know that limx→0(sin(x)/x) = 1, so limt→0(sin(t)/t) = 1, limx→x0(cos((x+x0)/2)) = cos((x0+x0)/2) = cos(2(x0)/2) = cos(x0), therefore the derivative of the sine function calculated at the point x0 is the cosine of x0, f'(sin(x)) = cos(x), (sin(x))' = cos(x), D(sin(x)) = cos(x)

The cosine function is differentiable at any point of its domain, which is all ℝ; f'(cos(x)) = -sin(x), (cos(x))' = -sin(x), D(cos(x)) = -sin(x)

We know how to derive monomials and the circular functions sine and cosine; we will study how to derive the exponential function and the logarithm function; we will study how to derive integer rational functions also called polynomial functions

f(x) = ax, a > 1; if the base a = e, the tangent to the exponential curve at the point (0,1) has an angular coefficient 1, so it is parallel to the bisector of the first and third quadrant; if we choose as the base a the eluter number, denoted by the letter e, the derivative of the function ex, calculated for x = 0 is 1, and this is the reason that makes the euler number important in the context of exponential functions; we calculate the incremental ratio for a generic exponential function f(x) = ax, (f(x)-f(x0))/(x-x0) = (ax-ax0)/(x-x0) = (ax-x0+x0-ax0)/(x-x0) = (ax0⋅ax-x0-ax0)/(x-x0) = ax0((ax-x0-1)/(x-x0)) = ax0((ah-1)/h), is an incremental ratio because 1 = a0, is the incremental ratio of the exponential function in the origin, the limit of the incremental ratio for x tending to x0, that is for h tending to 0, is the derivative, and is a quantity that does not depend on x0 but is a constant that depends on a, c(a), which is the derivative of ax calculated at x = 0, c(a) = D(ax)|x=0, limh→0(ax0((ah-1)/h)) = ax0⋅c(a); D(ax) = c(a)⋅ax = D(ax)|x=0⋅ax, therefore the derivative of ax is equal to ax multiplied by the derivative of ax at the point x = 0; D(ex) = ex, the derivative of ex for x = 0 is 1; the derivative of ex is equal to ex, that is in the curve ex the ordinate of each point is equal to the angular coefficient of the tangent line to that point

The inverse function of the exponential function is the logarithmic function; f(x) = loga(x), the function is defined for x > 0, and the incremental ratio is (loga(x+h)-loga(x))/h = (1/h)(loga((x+h)/x)) = (1/h)(loga(1+h/x)), h/x = t, (1/h)(loga(1+h/x)) = (1/x)(x/h)loga(1+t) = (1/x)(1/t)loga(1+t) = (1/x)loga((1+t)1/t), we need to calculate the limit for t which tends to 0 of the incremental ratio of the logarithmic function, limt→0((1/x)loga((1+t)1/t)), t = 1/x, (1+t)1/t = (1+1/x)x, limx→±∞((1+1/x)x) = e, limt→0±((1+t)1/t) = e, the derivative of the logarithm with base e is 1/x


35 - DERIVATIVE THEOREMS

Calculate the derivative of the function loga(x), a > 1; (loga(x+h)-loga(x))/h = (1/x)loga((1+t)1/t), t := h/x; x is a fixed positive number, h is the increment that tends to zero, therefore t tends to zero; limx→±∞((1+(1/x))x) = e; limt→0±((1+t)1/t) = e; by virtue of the continuity of the logarithm function, D(loga(x)) = (1/x)loga(e); a = e, D(loge(x)) = D(ln(x)) = 1/x; D(ex) = ex; the curve of the function ln(x), at the point (1,0), has a tangent with angular coefficient 1, since the derivative is 1/x and 1/1 = 1, therefore the tangent at the point (1,0) of the function ln(x) is parallel to the bisector of the first and third quadrant; the tangent at the point (0,1) of the function ex has angular coefficient 1, because D(ex) calculated for x = 0 is 1, so it is parallel to the bisector of the first and third quadrant; the tangent at the point (0,1) of the function ex is parallel to the tangent at the point (1,0) of the function ln(x), and they are both parallel to the bisector of the first and third quadrant, as they have an angular coefficient of 1; the function ex and the function ln(x) are symmetrical with respect to the bisector of the first and third quadrant, therefore they are the inverse of each other

The limit of the sum is equal to the sum of the limits; the limit of the difference is equal to the difference of the limits; the limit of the product is equal to the product of the limits; the limit of the quotient is equal to the quotient of the limits

The derivative of the sum of two functions is equal to the sum of their respective derivatives: (f(x)+g(x))' = f'(x)+g'(x); the incremental ratio of the sum function is equal to the sum of the incremental ratios; limx→x0((f(x)+g(x)-(f(x0)+g(x0)))/(x-x0)) = limx→x0((f(x)-f(x0))/(x-x0))+limx→x0((g(x)-g(x0))/(x-x0)) = f'(x0)+g'(x0), and therefore the theorem is proved

(c⋅f(x))' = c⋅(f(x))', limx→x0((c⋅f(x)-c⋅f(x0))/(x-x0)) = c⋅limx→x0((f(x)-f(x0))/(x-x0))

To derive a polynomial it is enough to know how to derive monomials, because the derivative of a polynomial is obtained simply by deriving term by term

p(x) = x3-5x2+4x+3; the first derivative of p(x) is p'(x) = 3x2-10x+4; the second derivative of p(x) is p''(x) = 6x-10; the third derivative of p(x) is p'''(x) = 6; the fourth derivative of p(x) is p''''(x) = 0; starting from a polynomial of third degree and deriving 4 times, or once more than its degree, we have found the constant 0; starting from a polynomial of degree n, its nth derivative is a constant, and the (n+1)th derivative is zero

The derivative of the product is not the product of derivatives; the derivative of the product of two functions is not the product of the derivatives of the functions; x2 = x⋅x, if it were true that the derivative of the product is the product of the derivatives, then D(x2) = D(x)⋅D(x) = 1⋅1 = 1, but it is false because D(x2) = 2x

The derivative of the product of 2 functions is equal to the derivative of the first function for the second function, plus the first function for the derivative of the second function: (f(x)⋅g(x))' = f'(x)⋅g(x)+f(x)⋅g'(x); D(x2) = D(x⋅x) = 1⋅x+x⋅1 = x+x = 2x; limx→x0((f(x)g(x)-f(x0)g(x0))/(x-x0)) = limx→x0((f(x)g(x)-f(x0)g(x0)-f(x0)g(x)+f(x0)g(x))/(x-x0)) = limx→x0(g(x)((f(x)-f(x0))/(x-x0))+f(x0)((g(x)-g(x0))/(x-x0))) = f'(x0)g(x0)+f(x0)g'(x0)

Differentiability implies continuity

D(sin(x)2) = D(sin(x)sin(x)) = cos(x)sin(x)+sin(x)cos(x) = 2sin(x)cos(x)

D(x⋅ex) =1⋅ex+x⋅ex = ex(1+x)

The derivative of the reciprocal function is equal to minus the ratio between the derivative of the function and the square of the function: (1/g(x))' = -(g'(x)/(g(x))2)

((1/g(x))-(1/g(x0)))/(x-x0) = (1/(x-x0))((1/g(x))-(1/g(x0))) = (1/(x-x0))((g(x0)-g(x))/(g(x)g(x0))) = (-1/(x-x0))((g(x)-g(x0))/(g(x)g(x0))) = -((g(x)-g(x0))/(x-x0))(1/g(x)g(x0)); limx→x0(-((g(x)-g(x0))/(x-x0))(1/g(x)g(x0))) = -(g'(x0)/(g(x0))2); -g'(x)/(g(x))2

D(1/x) = -1/x2

D(1/xn) = D(x-n) = -nxn-1/x2n = (-nxn-1)(x-2n) = -nx-2n+n-1 = -nx-n-1

D(1/x3) = D(x-3) = -3x-4

D(1/x4) = D(x-4) = -4x-5

The derivative of the quotient of two functions is equal to the derivative of the numerator for the denominator minus the numerator for the derivative of the denominator, all divided by the square of the denominator: (f(x)/g(x))' = (f'(x)g(x)-f(x)g'(x))/(g(x))2

f(x)/g(x) = f(x)(1/g(x)); D(f(x)/g(x)) = D(f(x)(1/g(x))) = f'(x)/g(x)-f(x)g'(x)/(g(x))2 = (f'(x)g(x)-f(x)g'(x))/(g(x))2

D(tan(x)) = D(sin(x)/cos(x)) = (cos(x)cos(x)-sin(x)(-sin(x)))/(cos(x))2 = ((cos(x))2+(sin(x))2)/(cos(x))2 = 1+(tan(x))2 = 1/(cos(x))2; the tangent function is sin(x)/cos(x), and is defined where the cosine is not null, that is for x ≠ (π/2)+kπ with k ∈ ℤ

D(cot(x)) = D(cos(x)/sin(x)) = (-sin(x)sin(x)-cos(x)cos(x))/(sin(x))2 = (-(sin(x))2-(cos(x))2)/(sin(x))2 = -((sin(x))2+(cos(x))2)/(sin(x))2 = -1/(sin(x))2; the cotangent function is cos(x)/sin(x), and it is defined where the sine is not null, that is for x ≠ kπ with k ∈ ℤ

We can calculate the derivative of any function that is a ratio between polynomials

(1-x2)/(1+x2), the denominator has no real zeros, it is not null for any real value of x, so this function is defined on the whole real line; D((1-x2)/(1+x2)) = (-2x(1+x2)-(1-x2)2x)/(1+x2)2 = (-2x-2x3-2x+2x3)/(1+x2)2 = -4x/(1+x2)2

The sine function is an odd function, changing x to -x, then f becomes -f; an odd function, such as the sine function, is symmetrical with respect to the origin; in an odd function such as the sine function, the tangent at a point x and the tangent at a point -x are parallel, therefore in an odd function the derivative calculated at the point x is equal to the derivative calculated at the point -x; the derivative of an odd function is an even function, in fact the derivative of the odd sine function is the even cosine function; an even function is symmetrical with respect to the ordinate axis and the tangents at the point x and -x are symmetrical with respect to the ordinate axis and the tangents have opposite angular coefficients; deriving an even function we obtain an odd function, and deriving an odd function we obtain an even function; the derivative of the even function cos(x) is the odd function -sin(x)

The function xn is even when n is even; the function xn is odd when n is odd; D(xn) = nxn-1, the derivative of an even function is an odd function, and the derivative of an odd function is an even function

The tangent function is an odd function, its graph is symmetrical with respect to the origin; D(tan(x)) = 1+(tan(x))2 = 1/(cos(x))2, 1+(tan(x))2 and 1/(cos(x))2 are even functions

The derivative of an even function is an odd function, and the derivative of an odd function is an even function

D((1+ex)/(1+x2)) = (ex(1+x2)-(1+ex)2x)/(1+x2)2 = (ex(1+x2)-2x(1+ex))/(1+x2)2

The graph f(x) and the graph f(-x) are symmetrical with respect to the ordinate axis; the graph f(x) and the graph -f(x) are symmetrical with respect to the abscissa axis

D(e-x) = D(1/ex); the graphs ex and e-x = 1/ex are symmetrical with respect to the ordinate axis; D(e-x) = D(1/ex) = -ex/e2x = -ex⋅e-2x = -e-x; the graphs e-x and -e-x are symmetrical with respect to the abscissa axis

The function ex is increasing in every point, therefore in every point the tangent has a positive angular coefficient, in fact the derivative of ex is ex which is a positive function; the function e-x is decreasing in every point, therefore in every point the tangent has negative angular coefficient, in fact the derivative of e-x is -e-x which is a negative function

If a function is increasing the derivative is ≥ 0; if a function is strictly increasing the derivative is > 0

If a function is decreasing the derivative is ≤ 0; if a function is strictly decreasing the derivative is < 0

If in an interval the derivative is positive, then in that interval the function is increasing

If in an interval the derivative is negative, then in that interval the function is decreasing

f(x) = √x, x ≥ 0, x ∈ [0,+∞); the function f(x) = √x is increasing therefore the derivative is positive, but in the origin the tangent ray coincides with the ordinate axis, therefore for x = 0 the function is not differentiable because the derivative is +∞; a function is differentiable at a point when the derivative exists and when the value of the derivative is a finite number; (f(x+h)-f(x))/h = (√x+h-√x)/h = ((√x+h-√x)/h)((√x+h+√x)/(√x+h+√x)) = (x+h-x)/(h(√x+h+√x)) = h/(h(√x+h+√x)) = 1/(√x+h+√x); limh→0(1/(√x+h+√x)) = 1/(√x+√x) = 1/(2√x); D(√x) = 1/(2√x); when x approaches 0+, 1/(2√x) tends to +∞; calculating the derivative in the origin, (f(x)-f(0))/(x-0) = (√x-0)/(x-0) = √x/x = 1/√x, when x tends to 0+, 1/√x tends to +∞; D(√x) = {1/(2√x) for x > 0; +∞ for x = 0}; the function √x is strictly increasing and its derivative 1/(2√x) is strictly positive

There is a close link between the sign of the derivative and the monotonic property of the function


36 - DERIVATION OF COMPOUND FUNCTIONS

f1: I1 → ℝ, f2: I2 → ℝ, suppose that the image of f1, f1(I1), is contained in I2, x → f1 → f1(x) → f2 → f2(f1(x)), that is (f2∘f1)(x), or composite function f2 after f1; if f1 is differentiable at a point x0, x0 → f1(x0), and if f2 is differentiable in f1(x0), it is possible to calculate the derivative of the composite function (f2∘f1)'(x0)

The derivative of a compound function is equal to the product of the derivatives of the component functions: (f2(f1(x)))' = (f2'(f1(x)))(f1'(x)); we can demonstrate this with the incremental ratio (f2(f1(x))-f2(f1(x0)))/(x-x0); suppose that for x other than x0 implies that f1(x) is different from f1(x0), that is f1 is an injective function; ((f2(f1(x))-f2(f1(x0)))/(f1(x)-f1(x0)))((f1(x)-f1(x0))/(x-x0)), f1(x) = y, f1(x0) = y0, ((f2(y)-f2(y0))/(y-y0))((f1(x)-f1(x0))/(x-x0)); limx→x0((f1(x)-f1(x0))/(x-x0)) = f1'(x0); limy→y0((f2(y)-f2(y0))/(y-y0)) = f2'(y0) = f2'(f1(x0)); (f2(f1(x0)))' = (f2'(f1(x0)))(f1'(x0))

Calculate the derivative of f(x) = sin(x2); x → f1 → x2 := t → f2 → sin(t) = sin(x2) := y; to calculate the derivative of y with respect to x, we must derive y with respect to t, and t with respect to x, and the order is not important as the product has the commutative property; (sin(t))' = cos(t) = cos(x2) = f2'(f1(x)); (t)' = (x2)' = 2x = f1'(x); (sin(x2))' = 2x⋅cos(x2)

Calculate the derivative of f(x) = sin2(x) = (sin(x))2; x → f1 → sin(x) := t → f2 → t2 = sin2(x) := y; we must derive the variable y with respect to x using the chain rule, that is, we derive y with respect to t and t with respect to x, and we make the product, and the order is not important because the product has the commutative property; (t2)' = 2t = 2sin(x); (t)' = (sin(x))' = cos(x); (sin2(x))' = 2sin(x)cos(x); this derivative can also be calculated with the product rule because sin2(x) = sin(x)sin(x), (sin2(x))' = (sin(x)sin(x))' = cos(x)sin(x)+sin(x)cos(x) = 2sin(x)cos(x)

f2(f1(x)) ≠ f1(f2(x)); (f2(f1(x)))' ≠ (f1(f2(x)))'; in a compound function, the order of the functions is fundamental; the composition of functions is not always possible; f2 after f1 may be possible, but f1 after f2 may not be possible

Usually the independent variable is denoted by x and the dependent variable is denoted by y, but we can also use other symbols

f(t) = A⋅cos(ωt+γ); A, ω, γ are constants, t is the independent variable; t → ωt+γ := y → A⋅cos(y) := z; to derive z with respect to t, we must derive z with respect to y, and y with respect to t, and make the product of the 2 derivatives, that is dz/dt = (dz/dy)(dy/dt); (ωt+γ)' = ω, ωt+γ is a first degree polynomial function and its derivative is ω; (A⋅cos(y))' = -A⋅sin(y) = -A⋅sin(ωt+γ); (A⋅cos(ωt+γ))' = (A⋅cos(y))'(ωt+γ)' = -A⋅sin(ωt+γ)ω

Symbols to indicate the derivative: (f(x))' = f'(x), used by Newton; D(f(x)), used by Cauchy; if y = f(x) the derivative of y with respect to x is dy/dx, used by Leibniz

The derivative of the inverse function is equal to the reciprocal of the derivative of the direct function: D(f-1(f(x))) = 1/Df(x); D(f-1(x)) = 1/D(f(x))

To be invertible, a function must be injective; function means that only one value of the dependent variable is associated with each value of the independent variable, so only one output corresponds to an input, therefore a vertical line meets the graph of a function at no more than one point; injective function means that distinct values of the independent variable are associated with distinct values of the dependent variable, so an output can be obtained from a single input, therefore a horizontal line meets the graph of an injective function at no more than one point

An inverse function allows us to return to the value of x; x → f → f(x) → f-1 → f-1(f(x)) = x; f-1(f(x)) = x, f-1(f(x)) is the identity function, suppose that f'(x) ≠ 0, D(f-1(f(x)))⋅D(f(x)) = 1, so D(f-1(f(x))) and D(f(x)) must be the reciprocal of one another, but this is true if the derivative of f-1(f(x)) exists

An increasing function is certainly injective; the derivative is the angular coefficient of the tangent line at a point; the inverse function is symmetrical with respect to the bisector of the first and third quadrant; the tangent line at the symmetrical point of the inverse function has the angular coefficient which is the reciprocal of the angular coefficient of the tangent line of the direct function, that is D(f-1(f(x))) = 1/D(f(x))

The derivative at a point of a function is zero when the tangent at the point is parallel to the x-axis; if the derivative at a point of an increasing function is zero, then the derivative of the inverse function, in the symmetrical point with respect to the bisector of the first and third quadrant, is +∞, therefore the inverse function is not differentiable at that point; if the derivative at a point of a decreasing function is zero, then the derivative of the inverse function, in the symmetrical point with respect to the bisector of the first and third quadrant, is -∞, therefore the inverse function is not differentiable at that point

If f-1 is the inverse function of f, then f is the inverse function of f-1

y = f(x) = √x, x ≥ 0, y2 = x, x = f-1(y) = y2, y ≥ 0, dy/dx = 1/(dx/dy) = 1/2y = 1/2√x; limx→0(1/(2(√x))) = +∞

If an increasing function has null derivative at a point, the inverse function at that point has derivative +∞; if a descending function has null derivative at a point, the inverse function at that point has derivative -∞

If a function is increasing, its inverse function is also increasing; if a function is decreasing, its inverse function is also decreasing

The arctangent function is the inverse function of the tangent function restricted in the open interval (-π/2,π/2); y = arctan(x), x ∈ ℝ ⇔ x = tan(y), -π/2 < y < π/2; the derivative of the inverse function is equal to the reciprocal of the derivative of the direct function, dy/dx = 1/(dx/dy) = 1/(1+tan2(y)) = 1/(1+x2) > 0

If a function is increasing the derivative is ≥ 0, and if the derivative is ≥ 0 the function is increasing; if a function is decreasing the derivative is ≤ 0, and if the derivative is ≤ 0 the function is decreasing

We can get the derivative of the exponential function from the derivative of the logarithm function, and we can get the derivative of the logarithm function from the derivative of the exponential function

The exponential function ex has a derivative equal to 1 in the point (0,1), so in the point (0,1) the tangent is parallel to the bisector of the first and third quadrant; the logarithm function ln(x) has a derivative equal to 1 in the point (1,0), so in the point (1,0) the tangent is parallel to the bisector of the first and third quadrant; the tangent of the exponential function ex at the point (0,1) is parallel to the tangent of the logarithm function ln(x) at the point (1,0), therefore they have the same angular coefficient, and both are symmetrical with respect to the bisector of the first and third quadrant

D(ex) = ex; y = ex, x ∈ ℝ, x = ln(y), y > 0; the derivative of the exponential function is dy/dx = ex; the derivative of the inverse function is equal to the reciprocal of the derivative of the direct function, dx/dy = 1/(dy/dx) = 1/ex = 1/y, the derivative of the natural logarithm of y with respect to y is 1/y, D(ln(y)) = 1/y, therefore the derivative of the natural logarithm of x with respect to x is 1/x, D(ln(x)) = 1/x

The inverse function of sin(x) in the closed interval [-π/2,π/2] is arcsin(x); the inverse function of cos(x) in the closed interval [0,π] is arccos(x); the inverse function of tan(x) in the open interval (-π/2,π/2) is arctan(x)

When we talk about the maximum, minimum, supremum, infimum of a function, we are referring to the image of the function, that is the set of values it assumes; f: I → ℝ, if x0 is an absolute maximum point, then f(x0) is the maximum of the image of the function, and this maximum does not always exist; Weierstrass's theorem states that if f is continuous and I is a compact interval, that is a bounded and closed interval, then the maximum and the minimum exist, that are the points where the function assumes the maximum value and the minimum value; x0 can indicate the maximum or minimum that the function assumes in a neighborhood of x0, therefore not an absolute maximum or minimum, but a relative maximum or minimum, so in the neighborhood [x0-δ, x0+δ], f(x0) is the maximum or minimum that the function assumes

f(x) = x4-x2 = x2(x2-1), the graph of the function intercepts the abscissa axis at the points (-1,0), (0,0), (1,0); -1 < x < 1, f(x) < 0; this function is unbounded above, therefore it has no maximum, and it is bounded below, therefore it has an absolute minimum; the point (0,0) is a point of relative maximum in the neighborhood of radius < 1

At a relative maximum the left derivative is ≥ 0, and the right derivative is ≤ 0

At a relative minimum the left derivative is ≤ 0, and the right derivative is ≥ 0

If a point is an absolute maximum it is also a relative maximum, but not vice versa; if a point is an absolute minimum it is also a relative minimum, but not vice versa

Considering the graph of a function with maximum point x0; the derivative on the left is the angular coefficient of the ray tangent to the left; the derivative on the right is the angular coefficient of the ray tangent to the right; fl'(x0) ≥ 0, the derivative of the function to the left of the maximum point is ≥ 0; the derivative of the function to the right of the maximum point is ≤ 0; to the left of the maximum point x0 limx→x0((f(x)-f(x0))/(x-x0)) = fl'(x0) ≥ 0; to the right of the maximum point x0 limx→x0((f(x)-f(x0))/(x-x0)) = fr'(x0) ≤ 0

In a point of relative maximum, or relative minimum, the derivative exists and is zero, because the tangent is parallel to the x-axis

If f is differentiable in an interval and admits a relative maximum or minimum point, inside this interval, the derivative at that point is zero

The cancellation points of the first derivative are called critical points or stationarity points of the function

The points of relative maximum or minimum of a differentiable function, within its definition range, are to be found among the critical points, that are the points where the derivative is canceled

The relative maximum or minimum points of a differentiable function must be within the definition interval; the function √x has an absolute minimum point and therefore also relative for x = 0, but it is not inside the definition interval which is [0,+∞); the derivative at the minimum point of the function √x is +∞, so it is not 0

The relative maximum and minimum points are to be found among the critical points within the function definition interval


37 - MAXIMUM AND MINIMUM POINTS

If a differentiable function admits a point of relative maximum or minimum inside the definition interval, at this point the derivative of the function is zero

If f is differentiable in an interval, and admits a relative maximum or minimum point within that interval, the derivative at that point is zero

The points where the first derivative is equal to 0 are called critical points or points of stationarity

The relative maximum or minimum points of a differentiable function, within its definition range, are critical points, that are points where the derivative is equal to 0

A point of relative maximum or minimum, within the interval, has derivative 0, but the opposite is not always true, because a point that has derivative 0 may not be a point of relative maximum or minimum

f(x) = x3 è una funzione dispari, il grafico è simmetrico rispetto all'origine, il grafico passa per i punti (-1,-1), (0,0), (1,1), la funzione è strettamente crescente, la derivata prima f'(x) = 3x2 è nulla per x = 0, dunque il punto (0,0) è un punto critico ma non è un punto di massimo o minimo relativo, in questo caso il punto (0,0) è un punto di flesso ovvero un punto in cui la curva passa da una parte all'altra della propria tangente.

f(x) = x3; it is an odd function, so the graph is symmetrical with respect to the origin and it passes through the points (-1,-1), (0,0), (1,1); the function is strictly increasing; the first derivative f'(x) = 3x2 is zero for x = 0, therefore the point (0,0) is a critical point but it is not a point of relative maximum or minimum, and in this case the point (0,0) is an inflection point or a point where the curve passes from one side of its tangent to the other

Usually, considering the graph of a function, in a point with derivative 0 there is either a maximum point, or a minimum point, or an inflection point, but there are more complicated functions where a point with derivative equal to 0 may not be a maximum, minimum, or inflection point

The Weierstrass's theorem says that a continuous function on a compact interval, that is bounded and closed, has maximum and minimum

Combining the Weierstrass's theorem with the fact that the derivative is zero at a point of relative maximum or minimum, we can understand the trend of a function

We have a square of paper with side L and we want to obtain a container in the shape of a parallelepiped by folding the sheet and cutting 4 squares at the corners of the sheet, and these 4 squares have side x, and we must find the value of x to obtain the parallelepiped with maximum volume; the parallelepiped has base L-2x and height x, therefore the volume as a function of x is the product of the area of the base by the height, V(x) = (L-2x)2x ≥ 0, 0 ≤ x ≤ L/2, therefore this function at the extremes is 0 and positive internally to the interval; Weierstrass's theorem assures us that there is at least one maximum point; the function is a third degree polynomial function, and at the maximum point the first derivative must be equal to 0; if we discover that the first derivative of the function expressing the volume is 0 in a single point inside the interval of extremes 0 and L/2, then that point is necessarily the maximum point; Weierstrass's theorem assures us that there is at least one maximum point, this maximum point is internal because at the extremes the function is 0, and in this point of internal maximum the derivative is zero; if we find that within the interval there is only one critical point, then the critical point is the maximum point; V(x) = (L-2x)2x, V'(x) = 2(L-2x)(-2)x+(L-2x)2 = -4x(L-2x)+(L-2x)2 = (L-2x)(-4x+L-2x) = (L-2x)(L-6x); we must find the critical points, that are the points where the first derivative is 0; the first null point of the first derivative is L-2x = 0, L = 2x, x = L/2 which is an extreme of the interval, it is not an interior point; the second null point of the first derivative is L-6x = 0, L = 6x, x = L/6 which is a point inside the interval, therefore it is the critical point that interests us; x = L/6 is the critical point inside the interval, therefore x = L/6 is the maximum point; the function V(x) = x(L-2x)2 reaches its maximum when x = L/6; to obtain the parallelepiped of maximum volume we have to cut at the corners of the large square 4 squares that have side 1/6 of the side of the large square; the maximum volume of the parallelepiped is V(x) = (L-2x)2x when x = L/6, V(x) = (L-2(L/6))2(L/6) = (L-(L/3))2(L/6) = ((3L-L)/3)2(L/6) = (2L/3)2(L/6) = (4L2/9)(L/6) = (2L2/9)(L/3) = 2L3/27 = (2/27)L3; in summary, the function is cubic, a null point is x = L/2 but it is not inside the interval, and in the interval [0,L/2] there is a single internal critical point x0 = L/6

Two variables x and y are ≥ 0 and their sum s is ≥ 0, and we must find for which values of x and y the product xy is maximum; x ≥ 0, y ≥ 0, s ≥ 0, x+y = s ⇔ y = s-x; we have to find the maximum of xy = x(s-x) = f(x), 0 ≤ x ≤ s, f(x) is a second degree polynomial function, f(x) = xy = x(s-x) = -x2+sx, the graph is a parabola and since the coefficient is negative the parabola turns its concavity downwards and the function is equal to 0 for x = 0 and for x = s, and by intuition the point of maximum is x = s/2, which is the vertex of the parabola; the first derivative of the function f(x) = -x2+sx is f'(x) = -2x+s; setting the first derivative equal to 0 we find the maximum point, -2x+s = 0, 2x = s, x = s/2; the product xy has maximum value when x = s/2; we find the maximum value for x = s/2, f(x) = -x2+sx = -(s/2)2+s(s/2) = (-s2/4)+(s2/2) = (-s2+2s2)/4 = s2/4; xy ≤ s2/4 = (x+y)2/4, and applying the square root √xy ≤ (x+y)/2, therefore the geometric mean of two non-negative numbers is less than or equal to their arithmetic mean, and the two averages are equal when the geometric mean reaches the maximum value that is when the numbers x and y are equal to each other

A light beam starts from point A and reaches point B, moving through two mediums with different refractive index, in the first medium it has speed v1 and in the second medium it has speed v2 ≠ v1; in the first medium the light beam travels the straight path from A to P with speed v1, and in the second medium the light beam travels the straight path from P to B with speed v2 ≠ v1; when the light passes through two mediums with different refractive indexes it does not follow a straight path from A to B, but a broken path from A to P and from P to B; minimizing the space the path would be straight from A to B, but the light beam moves minimizing the travel time, for this reason the light beam follows a broken path from A to P and from P to B; in the Cartesian plane we consider the points A(0,a), B(c,b), P(x,0); we want to minimize time, and time is a function of x; space = velocity⋅time, s = v⋅t; time = space/velocity, t = s/v; s1 is the distance between point A and point P and is obtained using the Pythagorean theorem, s12 = x2+a2, s1 = √x2+a2; t1 = s1/v1 = √x2+a2/v1; s2 is the distance between point P and point B and is obtained using the Pythagorean theorem, s22 = (c-x)2+b2, s2 = √(c-x)2+b2; t2 = s2/v2 = √(c-x)2+b2/v2; t(x) = t1+t2 = (√x2+a2/v1)+(√(c-x)2+b2/v2); we must find the minimum of the function t(x) with 0 ≤ x ≤ c, minimum that exists by Weierstrass's theorem, and we must calculate the first derivative of t(x) and set it equal to 0, and consider that there is a critic point; t'(x) = (1/v1)(x/√x2+a2)+(1/v2)((x-c)/√(c-x)2+b2); to find the minimum of the function t(x) we must set the first derivative of t(x) equal to 0, (1/v1)(x/√x2+a2)+(1/v2)((x-c)/√(c-x)2+b2) = 0, (1/v1)(x/√x2+a2) = (1/v2)((c-x)/√(c-x)2+b2); the null condition of the first derivative has brought us to this equality, we call x0 the position of the critical point, (1/v1)(x0/√x02+a2) = (1/v2)((c-x0)/√(c-x0)2+b2); we have to understand what this equality means geometrically, the critical point is P = P0 and its x coordinate is x0; considering the angle α that the segment AP = s1 forms with the y-axis, and this angle α is equal to that which the segment AP = s1 forms with the perpendicular to the x-axis passing through the point P = P0; α = angle of incidence, sin(α) = (opposite cathetus)/(hypotenuse), sin(α) = x0/√x02+a2; considering the angle β that the segment PB = s2 forms with the perpendicular to the x-axis passing through point B, and this angle β is equal to the angle formed by the segment PB = s2 with the perpendicular to the x-axis passing through the point P = P0; β = angle of refraction, sin(β) = (opposite cathetus)/(hypotenuse), sin(β) = (c-x0)/√(c-x0)2+b2; (1/v1)sin(α) = (1/v2)sin(β), sin(α)/sin(β) = v1/v2, and this is the Snell's law, when a light ray passes through 2 mediums with different refractive indices, the optimal path of the light ray follows this law, that is the ratio between the sine of the angle of incidence and the sine of the angle of refraction is equal to the ratio of the speed in the first medium and the speed in the second medium; we have to prove the uniqueness of the critical point of t'(x), because the function t(x) has a minimum, therefore we consider the equation (1/v1)(x/√x2+a2) = (1/v2)((c-x)/√(c-x)2+b2); the function x/√x2+a2 is 0 for x = 0, is positive, it has always positive derivative therefore it is increasing; the function ((c-x)/√(c-x)2+b2) is positive when x < c, and it is 0 for x = c, and the derivative is always negative so the function is decreasing; (1/v1)(x/√x2+a2) = ((c-x)/√(c-x)2+b2), the point where this equality is true is the intersection point of the graph of the function to the left of the equal sign and of the graph of the function to the right of the equal sign; the strictly positive derivative implies that the function increases, and the strictly negative derivative implies that the function decreases; an increasing function and a decreasing function can intersect at only one point, so there is a single point of intersection; f(x) = x/√x2+a2, f(x) = g(x)/h(x), f'(x) = (g'(x)h(x)-g(x)h'(x))/h(x)2, f'(x) = (√x2+a2-x(2x/2√x2+a2))/(x2+a2) = (√x2+a2-(x2/√x2+a2))/(x2+a2) = (x2+a2-x2)/(√x2+a2)/(x2+a2) = a2/((√x2+a2)(x2+a2)) = a2/((x2+a2)1/2(x2+a2)) = a2/(x2+a2)3/2; f(x) = (c-x)/(√(c-x)2+b2), f(x) = g(x)/h(x), f'(x) = (g'(x)h(x)-g(x)h'(x))/h(x)2, f'(x) = (-√(c-x)2+b2-(c-x)((2x-2c)/2√(c-x)2+b2))/((c-x)2+b2) = (-((c-x)2+b2)-(c-x)(x-c))/(√(c-x)2+b2((c-x)2+b2)) = (-(c2-2cx+x2+b2)-(cx-x2-c2+cx))/(√(c-x)2+b2((c-x)2+b2)) = (-c2+2cx-x2-b2+x2+c2-2cx)/(((c-x)2+b2)1/2((c-x)2+b2)) = -b2/((c-x)2+b2)3/2; we have proved the Snell's law

Descartes used the first letters of the alphabet for the known quantities and the last letters of the alphabet for the unknowns, also called variables

We have n pairs (x,y) and we represent them in the form of points in a Cartesian plane and a generic point is (xk,yk); these points are approximately aligned, therefore we can hypothesize a link between x and y of the type y = mx+q, that is a linear relationship, also called affine, between x and y; the equation y = mx+q is represented by a straight line on the plane of the Cartesian axes; these points (xk,yk) are not exactly aligned, if they were exactly aligned we would take two of these points and find the line passing through the two points and all the others; we have a cloud of points (xk,yk) which are approximately aligned, and it means that choosing m and q we still get a straight line y = mx+q which does not pass exactly through all points; the equation y = mx+q is a mathematical model, which we now denote by ^yk = mxk+q, where k is the number of the point we are considering, xk is the abscissa of the point, and ^yk is the estimated ordinate of the point which is different from yk which is the experimental ordinate of the point; yk differs from ^yk of a positive or negative quantity, therefore we have differences between the experimental value yk and the estimated value ^yk, and the difference is ^yk-yk and it can be positive or negative; to treat the positive and negative differences in the same way, we must make the squares of these differences and then see if it is possible to determine m and q in order to minimize the sum of the squares of these differences, and the line obtained is the line of least squares; the ordinate of the experimental value is yk = mxk+q, the ordinate of the estimated value is ^yk = mxk+q, so ^yk-yk = mxk+q-yk, and by making the square we treat the positive and negative differences in the same way (^yk-yk)2 = (mxk+q-yk)2, and we consider the function of 2 variables m and q, f(m,q) = nΣk=1(mxk+q-yk)2, where n is the number of points; therefore we have a function of 2 variables, then we fix m and think q as a variable, and developing the square we obtain a second degree trinomial in the variable q, with second degree term nq2, where n is the number of points and it is a positive quantity; therefore we have to study this trinomial in q, the second degree term is nq2, there is a first degree term in q, and a known term that is a term without the variable q, therefore if we put q on the abscissa it is a parabola, and the coefficient n of the second degree term is positive, therefore the parabola has the concavity facing upwards, and the minimum point of the parabola, that is the vertex, is obtained making the first derivative with respect to q and equating it to 0, and the vertex is the only minimum point; f(m,q) = nΣk=1(mxk+q-yk)2, if we fix m and consider q as the only variable then we have a second degree polynomial in q, with a quadratic term with positive coefficient, therefore to minimize this equation with respect to q we must calculate the derivative of f with respect to q; fixing m means choosing the slope of the line, or its angular coefficient, and varying q means varying the y-intercept, or the point of intersection that the line has with the y-axis, and among all the lines that have an inclination m the one that minimizes the sum of the squares of the differences; f(m,q) = nΣk=1(mxk+q-yk)2, we calculate the first derivative of f with respect to q, Dqf = 2nΣk=1(mxk+q-yk), we set the first derivative equal to 0 to find the minimum, 2⋅nΣk=1(mxk+q-yk) = 0, nΣk=1(mxk+q-yk) = 0, m⋅nΣk=1(xk)+nq-nΣk=1(yk) = 0, m⋅nΣk=1(xk)+nq = nΣk=1(yk), m(1/n)nΣk=1(xk)+(1/n)nq = (1/n)nΣk=1(yk), m(1/n)nΣk=1(xk)+q = (1/n)nΣk=1(yk), the sum of n numbers divided by n is the arithmetic mean, mx+q = y, considering m a fixed value then q = y-mx, the line we are looking for passes through the point (x,y), among all the straight lines that have a constant slope m, the one that minimizes the sum of the squares of the differences is the one that passes through the center of gravity of the cloud of points; the center of gravity is a point whose coordinates are the mean of the x-values and the mean of the y-values, therefore among all the straight lines that have an assigned slope, the optimal one is the one that minimizes the sum of the squares of the differences passing through the center of gravity; we must find the optimal line among all the lines that pass through the center of gravity, f(m,q) = nΣk=1(mxk+q-yk)2 = nΣk=1(mxk+y-mx-yk)2 = nΣk=1(m(xk-x)-(yk-y))2, f(m) = nΣk=1(m(xk-x)-(yk-y))2, we must calculate the derivative of f with respect to m and equal it to 0, we have to make the derivative of f with respect to m and equal it to 0, it is a second degree polynomial in m and the coefficient of m2 is positive, so the graph of this function in the variable m is a parabola with concavity upward, m = (nΣk=1(xk-x)(yk-y))/(nΣk=1(xk-x)2), m is the angular coefficient of the least squares line; we have found the linear model, also called affine model, which allows us to find the line that best approximates a distribution of points on the plane


38 - MEAN VALUE THEOREM

We have a function f defined in a closed interval [a,b] with real values, and suppose that the function is continuous in the interval [a,b], and that this function in the interior points is differentiable, and that f(a) = f(b), then there is at least one point inside the interval (a,b) which we indicate with the Greek letter Xi such that the first derivative is 0; if f: [a,b] → ℝ, f is continuous in [a,b], f'(x) ∀ x ∈ (a,b), f(a) = f(b), then ∃ ξ ∈ (a,b) such that f'(ξ) = 0; this is Rolle's theorem; Rolle's theorem was stated for polynomial functions, but this restriction is inessential; Weierstrass's theorem assures us that this function has a maximum and a minimum, so there is a point x1 where the function reaches the minimum m, f(x1) = m, and there is a point x2 where the function reaches the maximum M, f(x2) = M; if x1 = a and x2 = b or vice versa, because f(a) = f(b), then m = M, therefore the function is constant, and then at all points ξ of the interval (a,b) the first derivative is zero, in fact the derivative of a constant is 0; suppose that x2 is an interior point, the function is differentiable, then x2 is the absolute maximum point and therefore also the relative maximum point; the extreme points inside the interval, that are the maximum and minimum points, are necessarily critical, so the first derivative in these points is 0; if we drop the hypothesis that f(a) = f(b), then the thesis falls, for example the function f(x) = x, or any other function that is strictly increasing or strictly decreasing like f(x) = mx+q with m ≠ 0, the function is continuous in its compact interval, the first derivative is continuous, but there are no points where the first derivative is zero

Rolle's theorem: if f is continuous in the interval [a,b] and derivable at least in the points inside this interval, and f(a) = f(b), then there is a point inside the interval [a,b] where the derivative of f is 0

By Rolle's theorem f(a) = f(b) or in the extremes of the interval the graph of the function has the same ordinate, and within this interval there is at least one point where the first derivative is zero or where the tangent line is arranged horizontally, so it is parallel to the x-axis and it is also parallel to the secant that passes through the points (a,f(a)) and (b,f(b))

The function f is defined and continuous in the interval [a,b] with real values, f: [a,b] → ℝ, it is differentiable at least in the interior points, f'(x) ∀ x ∈ (a,b), then there is at least one point ξ in which the first derivative in the point ξ is equal to the incremental ratio, ∃ ξ such that f'(ξ) = (f(b)-f(a))/(b-a); f: [a,b] → ℝ, f'(x) ∀ x ∈ (a,b) then ∃ ξ so that f'(ξ) = (f(b)-f(a))/(b-a); f(a) must not equal f(b), f'(ξ) is the slope of the tangent to the graph at the point (ξ,f(ξ)), (f(b)-f(a))/(b-a) is the angular coefficient of the secant passing through the points (a,f(a)) and (b,f(b)), therefore the tangent to the graph at the point (ξ,f(ξ)) is parallel to the secant passing for the extremes, and this is the Lagrange's theorem or mean value theorem

Lagrange's theorem or mean value theorem: if f is continuous in the interval [a,b] and derivable at least in the points inside this interval, then there is a point inside the interval [a,b] in which the derivative of f is ((f(b)-f(a))/(b-a)

To prove Lagrange's theorem, or mean value theorem, we must use Rolle's theorem; we use an auxiliary function F(x) which is the difference between f(x) and a function g(x) which we choose to be a simple function like a first degree polynomial, g(x) is therefore a linear or affine function, so it has a straight line as a graph; F(x) := f(x)-g(x), f(a) = g(a), f(b) = g(b), the function g(x) in the extremes of the interval coincides with the function f(x), therefore F(x) is 0 in a and in b, that is we choose g(x) in such a way that F(x) is null at the extremes a and b; if the function g(x) is a first degree polynomial and at the extremes a and b coincides with the function f(x), then g(x) is the secant passing through the points (a,f(a)) and (b,f(b)); g(x) = f(a)+((f(b)-f(a))/(b-a))(x-a), (f(b)-f(a))/(b-a) is the slope of the secant and it is a constant that we could indicate with m, if x = a then f(x) = f(a), if x = b then g(x) = f(b); F(x) = f(x)-f(a)-((f(b)-f(a))/(b-a))(x-a), this function F(x) verifies the hypotheses of Rolle's theorem because in the extremes it is 0, therefore it also verifies the thesis of Rolle's theorem, that there is a point ξ in which the first derivative is 0; F'(x) = f'(x)-(f(b)-f(a))/(b-a), Rolle's theorem assures us that there is a point ξ in which the first derivative is 0, so there is a point ξ in which f'(ξ) = (f(b)-f(a))/(b-a), that it is precisely the thesis of Lagrange's theorem that we have therefore proved

If a function is constant, its derivative is null everywhere; if a function has a null derivative in all points of an interval, then all points in the interval are critical points, so in all points of the interval the tangent line is horizontal, therefore the function is constant

If a function has a null derivative at all points of an interval, it is constant over that interval

Suppose that a function has null derivative at all points of an interval, f'(x) = 0, ∀ x ∈ I, we consider two generic points x0 and x of the interval and apply Lagrange's theorem to the interval of endpoints x0 and x, so there is a point ξ between x0 and x such that f'(ξ) = (f(x)-f(x0))/(x-x0), therefore the derivative is null in all points and in particular it is null in the point ξ, so if the ratio (f(x)-f(x0))/(x-x0) is null then f(x) = f(x0), and x is any point, therefore f(x) is a constant function

f'(x) ≥ 0, f'(ξ) = (f(x2)-f(x1))/(x2-x1) ≥ 0, if x2 ≥ x1 then f(x2) ≥ f(x1), and if x2 ≤ x1 then f(x2) ≤ f(x1), the function is increasing

f'(x) > 0, f'(ξ) = (f(x2)-f(x1))/(x2-x1) > 0, if x2 > x1 then f(x2) > f(x1), and if x2 < x1 then f(x2) < f(x1), the function is strictly increasing

f'(x) ≤ 0, f'(ξ) = (f(x2)-f(x1))/(x2-x1) ≤ 0, if x2 ≥ x1 then f(x2) ≤ f(x1), and if x2 ≤ x1 then f(x2) ≥ f(x1), the function is decreasing

f'(x) < 0, f'(ξ) = (f(x2)-f(x1))/(x2-x1) < 0, if x2 > x1 then f(x2) < f(x1), and if x2 < x1 then f(x2) > f(x1), the function is strictly decreasing

A function is increasing in an interval if it has derivative ≥ 0 at all points in the interval

A function is strictly increasing in an interval if it has derivative > 0 at all points in the interval

A function is decreasing in an interval if it has derivative ≤ 0 at all points in the interval

A function is strictly decreasing in an interval if it has derivative < 0 at all points in the interval

Cauchy's theorem: if f and g are continuous in the interval [a,b] and differentiable at least in (a,b) and g(b) ≠ g(a), then there is a point ξ in (a,b) where f'(ξ) = g'(ξ) = 0, or f'(ξ)/g'(ξ) = (f(b)-f(a))/(g(b)-g(a))

f,g: [a,b] → ℝ, g(a) ≠ g(b), ∃ f'(x) ∀ x ∈ (a,b), ∃ g'(x) ∀ x ∈ (a,b), or f'(ξ) = g'(ξ) = 0, or f'(ξ)/g'(ξ) = (f(b)-f(a))/(g(b)-g(a)); if g'(x) ≠ 0 ∀ x, then f'(ξ) = g'(ξ) = 0 is not true; if g'(x) ≠ 0 ∀ x, then automatically g(a) ≠ g(b), because by Rolle's theorem if g(a) = g(b) then there is a point where g'(x) = 0

If in Cauchy's theorem we introduce the hypothesis g(x) = x, then Cauchy's theorem coincides with Lagrange's theorem because g(b) = b and g(a) = a and the derivative of g(x) = x is 1 in every point, therefore f'(ξ) = (f(b)-f(a))/(b-a) which is Langrange's theorem; if in Lagrange's mean value theorem we introduce the hypothesis f(b) = f(a), then we find Rolle's theorem, because if f(b) = f(a) the numerator of the incremental ratio (f(b)-f(a))/(b-a) is 0; Cauchy's theorem is a generalization of Lagrange's theorem and Lagrange's theorem is a generalization of Rolle's theorem

If the derivative is > 0 at all points of an interval, then the function is strictly increasing, but the reverse is not true; if the derivative is < 0 at all points of an interval, then the function is strictly decreasing, but the reverse is not true; the function x3 is strictly increasing on all ℝ but at the point x = 0 the derivative is zero, so if a function is increasing or strictly increasing its derivative is ≥ 0 because it is the positive limit of an incremental ratio, and a positive function can tend to a positive or null limit; if a function is strictly decreasing then the derivative is ≤ 0, but we cannot exclude that at some point this derivative is 0

The proof of Cauchy's theorem is similar to Lagrange's theorem, we construct an auxiliary function F(x) = f(x)-f(a)-((f(b)-f(a))/(g(b)-g(a)))(g(x)-g(a)), F(x) is continuous and differentiable as f(x) and g(x), if x = a then F(x) = 0, if x = b then F(x) = 0; this auxiliary function verifies the hypotheses of Rolle's theorem and therefore also verifies the thesis of Rolle's theorem; F'(ξ) = f'(ξ)-((f(b)-f(a))/(g(b)-g(a)))(g'(ξ)) = 0, se g'(ξ) = 0 then also f'(ξ) = 0, f'(ξ) = ((f(b)-f(a))/(g(b)-g(a)))(g'(ξ)), f'(ξ)/g'(ξ) = (f(b)-f(a))/(g(b)-g(a)), we have proved Cauchy's theorem, precisely the second eventuality

f'(ξ)/g'(ξ) = (f(b)-f(a))/(g(b)-g(a)), f'(ξ)/(f(b)-f(a)) = g'(ξ)/(g(b)-g(a))

A geometric interpretation can be given to the Cauchy theorem; f'(ξ)/g'(ξ) = (f(b)-f(a))/(g(b)-g(a)), f'(ξ)/(f(b)-f(a)) = g'(ξ)/(g(b)-g(a)); we consider time the independent variable and we indicate it with the letter t and we have two functions x = f(t) and y = g(t) which are the parametric representation of a graph in the xy plane, when t goes from point a to point b, the curve starts from the coordinate point (f(a),g(a)) and arrives at the coordinate point (f(b),g(b)), and is a continuous curve because f(x) and g(x) are continuous functions, and in the interior points of the interval [a,b] there exist f'(x) and g'(x); the pair (f'(t),g'(t)) represents the tangent vector which is the velocity vector, so (f'(t),g'(t)) are the components of the velocity vector at instant t; (f(b)-f(a),g(b)-g(a)) are the components of the displacement vector, that is the vector that joins the initial point with the final point of the displacement; f'(ξ)/(f(b)-f(a)) = g'(ξ)/(g(b)-g(a)), there is an instant ξ in which the components of the velocity vector (f'(t),g'(t)) are proportional to the components of the displacement vector (f(b)-f(a),g(b)-g(a)), so there exists an instant ξ in which the velocity vector is parallel to the displacement vector, parallel means same direction or opposite direction

Cauchy's theorem is also called the finite growth theorem, or finite increments theorem, where with finite growth or finite increment we indicate the variation f(b)-f(a) and g(b)-g(a); at the time of Cauchy the finite term was opposed to the infinitesimal term; Cauchy's theorem concerns the variations of function f and function g in the passage from point a to point b

Cauchy's mean value theorem, also known as the extended mean value theorem, is a generalization of the mean value theorem; it states that if the functions f and g are both continuous on the closed interval [a,b] and differentiable on the open interval (a,b), then there exists some c ∈ (a,b), such that (f(b)-f(a))g'(c) = (g(b)-g(a))f'(c); if g(a) ≠ g(b) and g'(c) ≠ 0, then f'(c)/g'(c) = (f(b)-f(a))/(g(b)-g(a)); geometrically this means that there is some tangent to the graph of the curve {[a,b] → ℝ2, t → (f(t),g(t))} which is parallel to the line defined by the points (f(a),g(a)) and (f(b),g(b))

When we solve the limit of a quotient we must assume that the function in the denominator must be different from 0 and must tend towards a limit other than 0; when we solve the limit of a sum or a difference we must assume that the limits both exist and that they are finite; limx→∞(f(x)) = ∞, limx→∞(g(x)) = ∞, limx→∞(f(x)-g(x)) = ∞-∞ that is not 0 but it is an indeterminate form; limx→∞(x2-x) = ∞-∞ is an indeterminate form, limx→∞(x(x-1)) = +∞·+∞ = +∞, in fact ∞-∞ ≠ 0; limx→0(f(x)) = 0, limx→0(g(x)) = 0, limx→0(f(x)/g(x)) = 0/0 that is an indeterminate form; limx→0(sin(x)) = 0, limx→0(x) = 0, limx→0(sin(x)/x) = 0/0 that is an indeterminate form, limx→0(sin(x)/x) = 1; the derivative is the limit of an incremental ratio, limx→x0((f(x)-f(x0))/(x-x0)) = 0/0 when f is continuous, and 0/0 is an indeterminate form; a limit can be finite, infinite, null, non-null, it may not exist when the function is irregular or oscillating; limits of the type 0/0, ∞-∞, 0·∞, are indeterminate forms, forms of indecision, it does not mean that the limit does not exist, but that the information we have is insufficient to determine if the limit exists or does not exist and when it exists what is its value; knowing that the numerator tends to 0 and the denominator tends to 0 is not sufficient to decide the behavior at the limit of this ratio, it is therefore a form of indecision or an indeterminate form

Guillaume de l'Hôpital was a French mathematician, a friend of the Swiss mathematician Johann Bernoulli; de l'Hôpital's theorem is used to compute limits of the type f(x)/g(x) when f(x) and g(x) are both infinitesimal functions, that are functions converging to 0


39 - L'HOSPITAL'S RULE

The rule of De l'Hôpital, or L'Hospital's theorem, allows to compute limits of quotients of real functions of a real variable that result in indeterminate forms 0/0 and ∞/∞, stating that the limit of the quotient of two functions is equal to the limit of the quotient of their derivatives

f(x) and g(x) are 2 continuous functions defined in an interval I of the real line, and in a point x0, inside the interval, the functions f(x) and g(x) both are 0, limx→x0(f(x)) = 0 and limx→x0(g(x)) = 0, therefore f(x) and g(x) are infinitesimal functions or functions that tend to 0; f,g: I → ℝ, x0, f(x0) = g(x0) = 0, limx→x0(f(x)) = 0, limx→x0(g(x)) = 0; we want to calculate limx→x0(f(x)/g(x)), and if x ≠ x0 then g(x) ≠ 0, and suppose that f(x) and g(x) are differentiable in the interval I, and g'(x) ≠ 0, then L'Hospital's theorem states that limx→x0(f'(x)/g'(x)) = L ⇒ limx→x0(f(x)/g(x)) = L; it is important to note that the ratio of the derivatives is different from the derivative of the ratio, that is f'(x)/g'(x) ≠ (f(x)/g(x))'; if the limit of the ratio of the derivatives exists, finite or infinite, then there is the limit of the ratio of the functions we are looking for, but not vice versa, so there are situations in which limx→x0(f(x)/g(x)) = L ∃, but limx→x0(f'(x)/g'(x)) = L ∄, so in this case the L'Hospital rule cannot be used

L'Hôpital's rule: if f(x) and g(x) are continuous in the interval [a,b], null in x0 and differentiable for x ≠ x0 with g'(x) ≠ 0, then if exists the limit limx→x0(f'(x)/g'(x)), also exists and has the same value the limit limx→x0(f(x)/g(x))

To prove de l'Hospital's theorem we use Cauchy's theorem, applying it to the interval of extremes x0 and x; we assumed that g'(x) ≠ 0, therefore of the two alternatives of the thesis of Cauchy's theorem, the first cannot be verified, that there is a point ξ in which f'(x) and g'(x) are 0; the second alternative of the thesis of Cauchy's theorem is verified, considering the interval [x0,x], all the hypotheses of Cauchy's theorem are verified, there is a point ξ = ξ(x) between x0 and x where f'(ξ)/g'(ξ) = (f(x)-f(x0))/(g(x)-g(x0)), but f(x0) = 0 and g(x0) = 0, then f'(ξ)/g'(ξ) = (f(x)-f(x0))/(g(x)-g(x0)) = f(x)/g(x); suppose that this limit exists and is L, and if we have a tolerance ε, we can find a δ dependent on ε, so that L-ε < f'(x)/g'(x) < L+ε for x ≠ 0 and for x-x0 < δε, so ξ-x0 < δε and L-ε < f'(ξ)/g'(ξ) < L+ε, therefore if x ≠ 0 and x-x0 < δε then L-ε < f(x)/g(x) < L+ε, and this is the definition of limit, so f(x)/g(x) = f'(x)/g'(x) = L; the proof is also true for limx→x0(f'(x)/g'(x)) = +∞ considering f'(x)/g'(x) > M; the proof is also true for limx→x0(f'(x)/g'(x)) = -∞ considering f'(x)/g'(x) < -M; Cauchy's theorem is also true when the functions f(x) and g(x) tend to ∞; the cases 0/0 and ∞/∞ are not different from each other because f(x)/g(x) = (1/g(x))/(1/f(x)), so if f(x) and g(x) tend to 0, then 1/f(x) and 1/g(x) tend to +∞ or -∞ depending on the sign of f(x) and g(x), and vice versa, if f(x) and g(x) tend to ∞ then 1/f(x) and 1/g(x) tend to 0

The implication contained in de l'Hospital's theorem is not reversed, limx→x0(f(x)/g(x)) can exist, but limx→x0(f'(x)/g'(x)) cannot exist

limx→+∞(sin(x)/x); |sin(x)/x| ≤ 1/|x|, limx→+∞(1/|x|) = 0, therefore limx→+∞(|sin(x)/x|) = 0; in this case L'Hospital's theorem cannot be used because D(sin(x))/D(x) = cos(x)/1, limx→+∞(cos(x)) = ∄, the function oscillates between -1 and 1 and therefore has no limit

f(x) = x⋅sin(1/x), x ≠ 0, calculate limx→0(x⋅sin(1/x)); |sin(1/x)| ≤ 1, limx→0(x) = 0, |x⋅sin(1/x)| ≤ |x|, therefore limx→0(x⋅sin(1/x)) = 0

f(x) = x2⋅sin(1/x), x ≠ 0, g(x) = x, f(x)/g(x) = x2⋅sin(1/x)/x = x⋅sin(1/x), |sin(1/x)| ≤ 1, limx→0(x) = 0, |x⋅sin(1/x)| ≤ |x|, limx→0(x⋅sin(1/x)) = 0; f'(x)/g'(x) = 2x⋅sin(1/x)+x2⋅cos(1/x)⋅(-1x-2) = 2x⋅sin(1/x)-x2⋅cos(1/x)⋅(1/x2) = 2x⋅sin(1/x)-cos(1/x); limx→0(2x⋅sin(1/x)-cos(1/x)) = limx→0(2x⋅sin(1/x))-limx→0(cos(1/x)), limx→0(2x⋅sin(1/x)) = 0, limx→0(cos(1/x)) = ∄, limx→0(2x⋅sin(1/x)-cos(1/x)) = limx→0(2x⋅sin(1/x))-limx→0(cos(1/x)) = ∄; this example shows that in some situations L'Hospital's theorem cannot be applied

limx→0((1-cos(x))/x2) =H limx→0(sin(x)/2x) = 1/2, because limx→0(sin(x)/x) = 1; the letter H indicates that we apply L'Hospital's theorem, it is a conditional equality, after we show that the second member exists, finite or infinite, we can say that the first member also exists and that they are equal; 1-cos(x) tends to zero with the same speed as x2 because this ratio produces a finite limit other than 0

limx→0((1-cos(x))/x) =H limx→0(sin(x)/1) = limx→0(sin(x)) = 0, that is 1-cos(x) tends to 0 faster than x, in fact 1-cos(x) tends to 0 with the same speed as x2

limx→0((x-sin(x))/x3) =H limx→0((1-cos(x))/3x2) =H limx→0(sin(x)/6x) = 1/6, because limx→0(sin(x)/x) = 1

When the limit of the ratio of 2 functions tends to 1, then the 2 functions are asymptotically equal, they behave asymptotically in the same way; limx→x0(f(x)/g(x)) = 1, f(x) ~ g(x); the symbol ~ is called tilde, and means asymptotically equal; sin(x) ~ x, x→0; 1-cos(x) ~ x2/2, x→0; x-sin(x) ~ x3/6, x→0

limx→∞(√1+x2/x) =H limx→∞(2x/2√1+x2) = limx→∞(x/√1+x2), in this case the L'Hospital's rule is ineffective; limx→∞(√1+x2/x) = limx→∞(√1+x2/√x2) = limx→∞(√(1/x2)+1) = 1; √1+x2 ~ x, x→+∞, so y = x is the asymptote to the right of the function y = √1+x2 as x tends to +∞

limx→0+(x⋅ln(x)) = 0⋅-∞, limx→0+(x) = 0, limx→0+(ln(x)) = -∞, the function x⋅ln(x) is negative for x < 1, limx→0+(x⋅ln(x)) = limx→0+(ln(x)/(1/x)) = -∞/+∞, limx→0+(x⋅ln(x)) = limx→0+(x/(1/ln(x))) = 0/0; limx→0+(x⋅ln(x)) = limx→0+(ln(x)/(1/x)) =H limx→0+((1/x)/(-1/x2)) = limx→0+((1/x)(-x2)) = limx→0+(-x) = 0; x overrides ln(x) because x tends to 0 faster than ln(x) tends to -∞, so the limit is 0; limx→0+(x⋅ln(x)) = limx→0+(x/(1/ln(x))) =H limx→0+(1/(-1/x⋅ln2(x)) = limx→0+(-x⋅ln2(x)), it is not the easiest way to proceed

limx→+∞(ln(x)/x) = +∞/+∞; the tangent to the curve of the function ln(x) at the point (1,0) has angular coefficient 1, and therefore is parallel to the bisector of the first and third quadrant; the function ln(x) grows slower than the linear function x, therefore we expect the limit of their ratio to be equal to 0; limx→+∞(ln(x)/x) =H limx→+∞(1/x) = 0

limx→+∞(ex/x) = +∞/+∞; the exponential function is the inverse function of the logarithm function; ex is the inverse function of ln(x) and their graphs are symmetrical with respect to the bisector of the first and third quadrant; the tangent to the curve of the function ex in the point (0,1) has angular coefficient 1, and therefore is parallel to the bisector of the first and third quadrant; the exponential function ex grows faster than the linear function x, so we expect the limit of their ratio to be equal to +∞; limx→+∞(ex/x) =H limx→+∞(ex) = +∞

limx→+∞(ex/x2) = +∞/+∞; we study the trend of the exponential function ex with respect to the parabola function x2; limx→+∞(ex/x2) =H limx→+∞(ex/2x) =H limx→+∞(ex/2) = +∞, therefore the exponential function ex grows faster than the parabola function x2

We guess that ex goes to infinity faster than x, faster than x2, faster than xn, whatever the positive exponent n; n ≥ 1, limx→+∞(ex/xn) =H limx→+∞(ex/nxn-1) =H limx→+∞(ex/n(n-1)xn-2) =H limx→+∞(ex/n(n-1)(n-2)xn-3) =H ..., applying the rule of L'Hospital the numerator remains unchanged and the denominator takes as value the successive derivatives of xn, and after having derived n times, in the numerator we have ex, and in the denominator we have the product of the numbers from n to 1, which is n!, that is a constant, so this ratio tends to +∞, and in conclusion ex grows faster than xn whatever the positive exponent n is; ex is a higher order infinity than xn

We must understand the link between the sign of the second derivative at a critical point and the behavior of the function; if a function is defined in an interval I, and x0 is a relative maximum or minimum point within the interval I, and if the function is differentiable, then x0 is a critical point, therefore the first derivative in this point is zero; f: I → ℝ, x0, f'(x0) = 0; suppose that this function admits the second derivative, so the first derivative exists in the interval I and is further differentiable, since the second derivative is the derivative of the first derivative; f''(x0) = limx→x0((f'(x)-f'(x0)/(x-x0)), x0 is a critical point therefore f'(x0) = 0; we assume that x0 is a critical point and that the second derivative exists, limx→x0((f(x)-f(x0))/(x-x0)2) =H limx→x0((f'(x)-f'(x0)/(2(x-x0))) = (1/2)f''(x0); suppose we know that the second derivative at the point x0 exists and is positive, f'(x0) = 0, f''(x0) > 0, and we know that if a function tends towards a positive limit, the function is also strictly positive for all x next to x0; limx→x0((f(x)-f(x0))/(x-x0)2), this ratio is positive, the denominator is a square and it is certainly positive, therefore the numerator is positive for x close to x0, so f(x) > f(x0), therefore x0 is a relative minimum point; limx→x0((f(x)-f(x0))/(x-x0)2), if this ratio is negative, the denominator is a square and it is certainly positive, therefore the numerator is negative for x close to x0, so f(x) < f(x0), therefore x0 is a relative maximum point

If f is twice differentiable in an interval I, and in a critical point inside I the second derivative is positive, then this point is a proper relative minimum point, and if the second derivative is negative, then this point is a proper maximum point; proper means that f(x) is strictly greater than f(x0) for x quite close to x0 and obviously distinct from it

At a critical point the first derivative is zero, and if the second derivative is negative then it is a relative maximum point, if the second derivative is positive then it is a relative minimum point

f(x) = sin(x), we must verify that x = π/2 is a critical point, an absolute maximum point and therefore also a relative maximum point, in fact sin(π/2) = 1; f'(x) = cos(x), if x = π/2 then cos(π/2) = 0, so x = π/2 is a critical point; f''(x) = -sin(x), if x = π/2 then -sin(π/2) = -1, so x = π/2 is a maximum point; the function sin(x) at the point x = π/2 has the first derivative equal to zero and the second derivative is negative, therefore x = π/2 is a maximum point of the function sin(x)

f(x) = sin(x), we must verify that x = -π/2 is a critical point, an absolute minimum point and therefore also a relative minimum point, in fact sin(-π/2) = -1; f'(x) = cos(x), if x = -π/2 then cos(-π/2) = 0, so x = -π/2 is a critical point; f''(x) = -sin(x), if x = -π/2 then -sin(-π/2) = 1, so x = -π/2 is a minimum point; the function sin(x) at the point x = -π/2 has the first derivative equal to zero and the second derivative is positive, therefore x = -π/2 is a minimum point of the function sin(x)

In a critical point, that is a point with first derivative equal to 0, limx→x0((f(x)-f(x0))/(x-x0)2) =H limx→x0((f'(x)-f'(x0)/(2(x-x0))) = (1/2)f''(x0); if x0 is a minimum point then f(x) > f(x0), f(x)-f(x0 > 0, (x-x0)2 > 0, so f''(x0) ≥ 0; if x0 is a maximum point then f(x) < f(x0), f(x)-f(x0 < 0, (x-x0)2 > 0, so f''(x0) ≤ 0; if we know the sign of the second derivative, we know if the critical point is a relative maximum or a relative minimum, while if we know the behavior of the function, we do not have precise information on the sign of the second derivative, but we can exclude that it is positive or negative

f(x) = x4, the curve of this function looks like a parabola but it is not, it has a concavity pointing upwards but the tip is flattened; the function is strictly positive for all x other than 0, and is 0 for x equal to 0, f(x) > 0 for x ≠ 0, f(x) = 0 for x = 0; x = 0 is the absolute minimum point and therefore also the relative minimum point; f'(x) = 4x3; f''(x) = 12x2; if x = 0 then f'(x) = 4x3 = 0, at the point x = 0 the first derivative is zero; if x = 0 then f''(x) = 12x2 = 0, at the point x = 0 the second derivative is zero; x = 0 is a point of absolute minimum and therefore also a point of relative minimum, the first derivative is zero and the second derivative is greater than or equal to 0, therefore it is not less than 0, but we cannot exclude that it is equal to 0

If f is two times differentiable in an interval I, in a point of relative minimum inside I the first derivative is 0 and the second derivative is greater than or equal to 0, in a point of relative maximum inside I the first derivative is 0 and the second derivative is less than or equal to 0

If the derivative is greater than or equal to 0, then we can exclude that it is less than 0; if the derivative is less than or equal to 0, then we can exclude that it is greater than 0; it may happen that the second derivative is zero and then we could examine the sign of the successive derivatives


40 - CONCAVITY AND CONVEXITY

The second derivative is the derivative of the first derivative; it is important to study the sign of the second derivative to understand the trend of the function; the analysis of the sign of the second derivative in a critical point of the function, or in a point where the first derivative is 0, allows us to identify whether this point is a relative maximum or a relative minimum

A second degree polynomial function has a parabola as its graph; f(x) = ax2+bx+c; f'(x) = 2ax+b; f''(x) = 2a; the second derivative is constant and has as its sign the sign of the coefficient a, that is the coefficient of the second degree term of the polynomial; if the coefficient a is positive, the parabola has upward concavity, if the coefficient a is negative, the parabola has downward concavity; a = 1, b = 0, c = 0, f(x) = x2, the parabola has its vertex at the point (0,0), is symmetrical with respect to the y-axis, has concavity upwards; a = -1, b = 0, c = 0, f(x) = -x2, the parabola has its vertex at the point (0,0), is symmetrical with respect to the y-axis, has concavity downwards

Let us consider the simplest third degree polynomial function, f(x) = x3; f(x) = x3 is an odd function, that is f(-x) = -f(x), the graph of the function is symmetrical with respect to the origin of the Cartesian axes; the graph of an odd function is symmetrical with respect to the point (0,0); each point (x,y) of an odd function has its symmetrical with respect to the origin, that is the point (-x,-y); f(x) = x3; f'(x) = 3x2; f''(x) = 6x; the second derivative, f''(x) = 6x, has the same sign as x, the second derivative is negative if x is negative, the second derivative is zero if x is zero, the second derivative is positive if x is positive; f''(x) < 0 if x < 0, f''(x) = 0 if x = 0, f''(x) > 0 if x > 0; the graph of the function f(x) = x3 is positive for x > 0, is 0 for x = 0, is negative for x < 0, and passes through the points (-1,-1), (0,0), (1,1); f'(x) = 3x2, the first derivative is always positive but it is 0 for x = 0, and the tangent to the point (0,0) coincides with the x-axis; half of the graph for x > 0 has concavity upwards, while half of the graph for x < 0 has concavity downwards; at point (0,0) the graph changes curvature, the graph of the function passes from one side of its tangent to the other, the graph crosses the tangent

A set of the plane is said to be convex if for each pair of points belonging to it all the segment that joins them belongs to the same set

A half-plane is a convex set, because a segment that joins any two points of the half-plane is entirely contained in the half-plane; a triangle is a convex set, because a segment joining any two points of the triangle is entirely contained in the triangle; a regular polygon is a convex set, because a segment that joins any two points of the regular polygon is entirely contained in the regular polygon; a quadrilateral can be convex or concave; a quadrilateral is convex when a segment joining any two points of the quadrilateral is entirely contained in the quadrilateral; a quadrilateral is concave when a segment joining any two points of the quadrilateral is not entirely contained in the quadrilateral; a circular crown, that is the area of ​​the plane contained between 2 concentric circumferences, is concave, because a segment that joins any two points is not entirely contained in the circular crown; a half moon is concave, because a segment connecting any two points of the half moon is not entirely contained in the half moon

The graph of a function has upward concavity if the set of points above the graph is convex

Let us consider the parabola corresponding to the second degree polynomial f(x) = ax2+bx+c, with a > 0; the set of points above this parable is {(x,y), x ∈ ℝ, y ≥ f(x)}

The set of points above the graph of a function is indicated with f: I → ℝ, G+(f) := {(x,y), x ∈ I, y ≥ f

The set of points under the graph of a function is indicated with f: I → ℝ, G-(f) := {(x,y), x ∈ I, y ≤ f(x)}

When a function, in a certain interval, has concavity upwards, then the set of points above the graph is a convex set

The graph of a function has downward concavity if the set of points below the graph is convex

The graph of the function f(x) = x3 has upward concavity for x ≥ 0 and has downward concavity for x ≤ 0; the point (0,0) is an inflection point because it separates the interval of x ≤ 0 in which the function has concavity downwards, from the interval of x ≥ 0 in which the function has concavity upwards

An inflection point, or flex, is a point on a smooth plane curve at which the curvature changes sign; it is a point where the function changes from being concave to convex, or vice versa

f(x) = sin(x), in the interval [0,π] the graph of the function sin(x) has downward concavity, so the set of points under the graph is convex, and in the interval [π,2π] the graph of the function sin(x) has upward concavity, so the set of points above the graph is convex; all points kπ with k ∈ ℤ are inflection points or points where the function inverts its concavity

A function f is convex in an interval I if it has upward concavity; a function f is concave in an interval I if it has downward concavity; this convention gives importance to the points above the graph, that is, if the set of points above the graph is convex then the function is defined as convex, and if the set of points above the graph is concave then the function is defined as concave; with this terminology the function sin(x) is concave in the interval [0,π] and convex in the interval [π,2π]

A parabola, ax2+bx+c, is convex when a > 0, and is concave when a < 0; f(x) = ax2+bx+c, f'(x) = 2ax+b, f''(x) = 2a; the second derivative has the same sign as the coefficient a, that is the coefficient of the second degree term

If the second derivative is positive then the function is convex or has upward concavity; if the second derivative is negative then the function is concave or has downward concavity

f(x) = sin(x), f'(x) = cos(x), f''(x) = -sin(x); f''(x) = -sin(x) = -f(x), the second derivative of the function sin(x) is the opposite of the function sin(x); in the interval [0,π], f(x) = sin(x) > 0, f''(x) = -sin(x) < 0, the second derivative is negative and the curve of the function sin(x) is concave, it has downward concavity; in the interval [π,2π], f(x) = sin(x) < 0, f''(x) = -sin(x) > 0, the second derivative is positive and the curve of the function sin(x) is convex, it has upward concavity

The first derivative is the angular coefficient of the tangent at a point on the curve of the function; the second derivative is the derivative of the first derivative, that is the angular coefficient of the tangent at a point on the curve of the first derivative; if the second derivative is negative the first derivative is decreasing, if the second derivative is positive the first derivative is increasing; x = 0, f'(x) = cos(x) = cos(0) = 1; x = π/2, f'(x) = cos(x) = cos(π/2) = 0; x = π, f'(x) = cos(x) = cos(π) = -1; x = 3π/2, f'(x) = cos(x) = cos(3π/2) = 0; x = 2π, f'(x) = cos(x) = cos(2π) = 1; the first derivative is 1 in x = 0 and decreases to -1 in x = π, so the angular coefficient of the tangent to the function sin(x) decreases from x = 0 to x = π, in fact the second derivative is negative in the interval [0,π]; the first derivative is -1 in x = π and increases to 1 in x = 2π, so the angular coefficient of the tangent to the function sin(x) increases from x = π to x = 2π, in fact the second derivative is positive in the interval [π,2π]

The second derivative is the derivative of the first derivative; the second derivative is negative in an interval when the first derivative is decreasing, that is the angular coefficient of the tangent line to the function is decreasing; the second derivative is positive in an interval when the first derivative is increasing, that is, the angular coefficient of the tangent line to the function is increasing

If the second derivative is positive or null, the function is convex; if the second derivative is negative or null, the function is concave

If a function f is twice differentiable in an interval I and its second derivative is at any point greater than or equal to 0, then f is convex, and if the second derivative is less than or equal to 0, then f is concave

f: I → ℝ, f''(x) ≥ 0, we must show that the set of points above the graph is convex; we must show that the segment connecting any 2 points of the graph of the function is entirely contained in the set of points above the graph; we take 2 points of the graph, the point (x1,y1) and the point (x2,y2) with x2 > x1; a segment joins the point (x1,y1) with the point (x2,y2), and any point x of the segment with coordinates (x,r(x)), comprised between x1 and x2, has an ordinate greater than or equal to the ordinate of the point on the curve with coordinates (x,f(x)); the extreme points of the segment are points of the graph with coordinates (x1,y1) and (x2,y2); we must show that the segment connecting the point (x1,y1) with the point (x2,y2) is entirely contained in the set of points above the graph; to show that the set of points above the graph is convex we must show that the segment that joins any two points on the graph is entirely contained in the set of points above the graph; we have to find the equation of the line containing the segment, and show that r(x) ≥ f(x) for every x between x1 and x2, where r(x) is the ordinate of the point of the segment with coordinates (x,r(x)) and f(x) is the ordinate of the point of the curve with coordinates (x,f(x)); to prove that the set of points above the graph is convex we must show that r(x)-f(x) ≥ 0, assuming that f''(x) ≥ 0; the equation of the bundle of straight lines passing through the point (0,0) is y = mx, the equation of the bundle of straight lines passing through the point (x1,y1) is y-y1 = m(x-x1), and the angular coefficient m of the straight line is given by the ratio (y2-y1)/(x2-x1), so y-y1 = ((y2-y1)/(x2-x1))(x-x1), y = y1+((y2-y1)/(x2-x1))(x-x1) = (y1(x2-x1)+(y2-y1)(x-x1))/(x2-x1) = (y1x2-y1x1+y2x-y1x-y2x1+y1x1)/(x2-x1) = (y1(x2-x)+y2(x-x1))/(x2-x1), r(x)-f(x) = ((y1(x2-x)+y2(x-x1))/(x2-x1))-f(x) = (y1(x2-x)+y2(x-x1)-f(x)(x2-x1))/(x2-x1), y1 = f(x1), y2 = f(x2), r(x)-f(x) = (f(x1)(x2-x)+f(x2)(x-x1)-f(x)((x2-x)+(x-x1)))/(x2-x1) = ((f(x1)-f(x))(x2-x)+(f(x2)-f(x))(x-x1))/(x2-x1), aα+bβ-c(α+β) = aα+bβ-cα-cβ = α(a-c)+β(b-c) = (a-c)α+(b-c)β, according to Lagrange's mean value theorem (f(b)-f(a))/(b-a) = f'(ξ) where ξ is an intermediate point between point a and point b, therefore f(b)-f(a) = (b-a)f'(ξ), r(x)-f(x) = (f'(ξ1)(x1-x)(x2-x)+f'(ξ2)(x2-x)(x-x1))/(x2-x1) = (-f'(ξ1)(x-x1)(x2-x)+f'(ξ2)(x2-x)(x-x1))/(x2-x1) = ((f'(ξ2)-f'(ξ1))(x-x1)(x2-x))/(x2-x1) = (f''(ξ)(ξ21)(x-x1)(x2-x))/(x2-x1), ((ξ21)(x-x1)(x2-x))/(x2-x1) > 0, if f''(ξ) ≥ 0 then r(x) ≥ f(x), therefore the function f(x) is convex; the function is convex because the segment r(x), which has as extreme points 2 points of f(x), is above the graph of f(x); if r(x) > f(x) the function f(x) is strictly convex and we can exclude that there are straight lines of the graph

If the second derivative is ≥ 0 the function is convex; if the second derivative is > 0 the function is strictly convex; if the second derivative is ≤ 0 the function is concave; if the second derivative is < 0 the function is strictly concave

The function sin(x) is strictly concave in the range [0,π] and strictly convex in the range [π,2π]

The function x3 is strictly concave for x < 0 and strictly convex for x > 0

A linear or affine function has a straight line as its graph and can be considered either convex or concave

The graph of a function, such as the graph of a line, can be convex and concave at the same time, but it cannot be strictly convex and strictly concave at the same time

If f(x) is concave then -f(x) is convex and vice versa

The sign of the first derivative indicates whether the function increases or decreases in an interval, and the sign of the second derivative indicates whether the function is concave or convex in the interval

An inflection point is a point that separates an interval of concavity from an interval of convexity, so if a function is differentiable twice, an inflection point is a point where the second derivative is zero

In an inflection point the second derivative is necessarily equal to 0, but that the second derivative is equal to 0 is not sufficient to establish whether the point is an inflection point

We must not confuse a sufficient condition for a necessary condition; if a function is twice differentiable, in an inflection point the second derivative is necessarily zero, but there are situations in which the second derivative is zero in a point but it is not an inflection point

f(x) = x4 is an even function and it is strictly positive, and it is null only for x = 0; f'(x) = 4x3; f''(x) = 12x2; the point (0,0) has a first derivative equal to 0 and is a point of absolute and relative minimum, and has a second derivative equal to 0 but it is not an inflection point; an inflection point separates an interval where the second derivative is ≥ 0 from an interval where the second derivative is ≤ 0, but in this case the point (0,0) separates 2 intervals in which the second derivative is > 0 to the left and to the right; the function f(x) = x4 is strictly convex; the condition f''(x) > 0 is sufficient for the function to be strictly convex, but as this example shows, a function can be strictly convex and have a point where the second derivative is equal to 0

The cancellation points of the first derivative are called critical points; not all critical points are points of relative maximum and relative minimum, but the points of relative maximum and relative minimum must be sought among the critical points; being a critical point is a necessary condition for an interior point to be a point of relative minimum or relative maximum, but it is not a sufficient condition

The cancellation points of the second derivative are possible inflection points; the inflection points are to be found among the cancellation points of the second derivative, but then it is necessary to verify if the considered point is really an inflection point

In circular functions the points kπ are points of inflection

The exponential and logarithmic functions have no inflection points

f(x) = ex > 0; f'(x) = ex > 0, the function is strictly increasing; f''(x) = ex > 0, the function is strictly convex

f(x) = ln(x) > 0; f'(x) = 1/x > 0, the domain of the logarithm is the set of points > 0, the function is strictly increasing; f''(x) = -1/x2 < 0, the function is strictly concave


41 - GRAPHS OF FUNCTIONS - PART 1

The sign of the first derivative gives us information on the monotony of a function, that is if the first derivative is ≥ 0 the function is increasing, if the first derivative is ≤ 0 the function is decreasing, if in a point the first derivative is 0 it can be a point of minimum or maximum

The sign of the second derivative gives us information on the concavity or convexity of the function, that is if the second derivative is ≥ 0 the function is convex, if the second derivative is ≤ 0 the function is concave, if in a point the second derivative is 0 it can be an inflection point

The cubic function f(x) = x3 has an inflection point in the origin of the Cartesian axes; we need to understand how many inflection points a generic third degree polynomial function can have; f(x) = ax3+bx2+cx+d, a ≠ 0; f'(x) = 3ax2+2bx+c; f''(x) = 6ax+2b; at an inflection point the second derivative is 0, f''(x) = 6ax+2b = 0, 3ax+b = 0, 3ax = -b, x = -b/3a is the x coordinate of the inflection point; at the inflection point which has coordinate x = -b/3a, depending on the sign of the coefficient a, the second derivative passes from negative to positive values or vice versa; each cubic polynomial function has only one inflection point which has coordinate x = -b/3a, and is the symmetry point of the graph of the function

f(x) = x3-x, it is an odd function that is f(-x) = -f(x), therefore the graph is symmetrical with respect to the origin; f'(x) = 3x2-1; f''(x) = 6x; f''(x) < 0 when x < 0, f''(x) = 0 when x = 0, f''(x) > 0 when x > 0, therefore the point (0,0) it is an inflection point, for x < 0 the function is strictly concave and for x > 0 the function is strictly convex; f(x) = x3-x = x(x2-1) = x(x+1)(x-1), therefore the graph of the function intersects the X axis at the points x = -1, x = 0, x = 1, so the equation x3-x = 0 has 3 distinct real zeros; f(x) < 0 when x < -1 and 0 < x < 1, f(x) > 0 when -1 < x < 0 and x > 1; f'(x) = 3x2-1, the first derivative in the point x = 0 is -1, the tangent to the graph of the function at the origin has angular coefficient -1 and is the bisector of the second and fourth quadrant; to locate the points of relative maximum and minimum we must examine the critical points, that are the points where the first derivative is 0, 3x2-1 = 0, 3x2 = 1, x2 = 1/3, x = ±√1/3 = , ±1/√3 = ±√3/3, therefore the relative minimum point is x = √3/3 and the relative maximum point is x = -√3/3; the function f(x) = x3-x is odd and therefore the first derivative is even; the function f(x) = x3-x has as domain all ℝ and as image all ℝ, continuous functions transform intervals into intervals therefore the domain is all ℝ and the image is all ℝ; the function f(x) = x3-x tends to -∞ as x tends to -∞, and tends to +∞ as x tends to +∞, therefore it is unbounded below and unbounded above, therefore it has no absolute minimum and no absolute maximum, the infimum is by convention -∞ and the supremum is by convention +∞, it has an inflection point at x = 0, it is concave for x < 0 and convex for x > 0, has a relative minimum point at x = √3/3 and a relative maximum point at x = -√3/3

Euler found a constant, denoted by the symbol e, as the limit of the sequence of (1+1/n)n because he wanted to find an exponential function of the type ax that would meet the point (0,1) with an angular coefficient equal to 1, therefore the tangent line to the function f(x) = ex at the point (0,1) is parallel to the bisector of the first and third quadrant

f(x) = ax, a > 0; 0 < ax = eln(ax) = ex⋅ln(a) = e(ln(a))x = eλx, λ = ln(a), eλ is the ordinate of the exponential function eλx at the abscissa point 1; a > 1, ln(a) > 0, eln(a)⋅x is a strictly increasing and strictly convex function, it has no inflection points ; 0 < a < 1, ln(a) < 0, eln(a)⋅x is a strictly decreasing and strictly convex function, it has no inflection points

f(x) and -f(x) are symmetrical functions with respect to the x-axis

f(x) and f(-x) are symmetrical functions with respect to the y-axis

f(x) and -f(-x) are symmetrical functions with respect to the origin of the x and y axes

f(x) = e-x = 1/ex is a strictly decreasing and strictly convex function, and the tangent at the point (0,1) has angular coefficient -1, so it is parallel to the bisector of the second and fourth quadrant; e-x is the symmetrical function of ex with respect to the y-axis

f(x) = -ex is a strictly decreasing and strictly concave function, and the tangent at the point (0,-1) has angular coefficient -1, so it is parallel to the bisector of the second and fourth quadrant; -ex is the symmetrical function of ex with respect to the x-axis

f(x) = -e-x is a strictly increasing and strictly concave function, and the tangent at the point (0,-1) has angular coefficient of 1, so it is parallel to the bisector of the first and third quadrant; -e-x is the symmetrical function of ex with respect to the origin of the x and y axes

Hyperbolic functions are analogues of the ordinary trigonometric functions, but defined using the hyperbola rather than the circle; just as the points (cos(t),sin(t)) form a circle with a unit radius, the points (cosh(t),sinh(t)) form the right half of the unit hyperbola; the derivatives of sin(t) and cos(t) are cos(t) and –sin(t), the derivatives of sinh(t) and cosh(t) are cosh(t) and sinh(t)

cosh(x) := (ex+e-x)/2, this is the hyperbolic cosine

sinh(x) := (ex-e-x)/2, this is the hyperbolic sine

cosh(x) is an even function because f(x) = f(-x), therefore the graph of the hyperbolic cosine function is symmetrical with respect to the y-axis; cosh(x) is always a positive function because the numerator and denominator are always positive; the hyperbolic cosine graph is entirely contained in the first and second quadrant; (cosh(x))' = (ex-e-x)/2 = sinh(x); the derivative of ex is ex, and the derivative of e-x with respect to -x is e-x multiplied by the derivative of -x with respect to x which is -1; (cosh(x))' < 0 for x < 0, (cosh(x))' = 0 for x = 0, (cosh(x))' > 0 for x > 0; the hyperbolic cosine is a decreasing function in the second quadrant, it intersects the y-axis at the point (0,1) which is the absolute minimum point, and is increasing in the first quadrant, therefore cosh(x) is ≥ 1; (cosh(x))' = sinh(x), (sinh(x))' = cosh(x), (cosh(x))'' = cosh(x), the hyperbolic cosine has the second derivative equal to the function itself and so it is always > 0, therefore the hyperbolic cosine is a strictly convex function

sinh(x) is an odd function because f(x) = -f(-x), therefore the graph of the hyperbolic sine function is symmetrical with respect to the origin of the x and y axes; sinh(x) is a positive function for x > 0, and negative for x < 0; the hyperbolic sine graph is entirely contained in the first and third quadrant; (sinh(x))' = (ex+e-x)/2 = cosh(x); the derivative of ex is ex, and the derivative of e-x with respect to -x is e-x multiplied by the derivative of -x with respect to x which is -1; (sinh(x))' > 1 for x < 0, (sinh(x))' = 1 for x = 0, (sinh(x))' > 1 for x > 0; the hyperbolic sine is an increasing function in the third quadrant, passes through the origin of the x and y axes which is the inflection point, and is increasing in the first quadrant; (sinh(x))' = cosh(x), (cosh(x))' = sinh(x), (sinh(x))'' = sinh(x), the hyperbolic sine has the second derivative equal to the function itself and therefore (sinh(x))'' < 0 for x < 0, so the function is strictly concave for x < 0, (sinh(x))'' = 0 for x = 0, so the point (0,0) is an inflection point, (sinh(x))'' > 0 for x > 0, so the function is strictly convex for x > 0; the first derivative of the hyperbolic sine is the hyperbolic cosine, therefore the tangent of the curve sinh(x) at the point (0,0) has angular coefficient 1, therefore it is parallel to the bisector of the first and third quadrant

(sin(x))' = cos(x), (cos(x))' = -sin(x); (sinh(x))' = cosh(x), (cosh(x))' = sinh(x)

(cosh(x)+sinh(x))/2 = (((ex+e-x)/2)+((ex-e-x)/2))/2 = ((ex+e-x+ex-e-x)/2)/2 = (2ex/2)/2 = ex/2, the arithmetic mean between the hyperbolic cosine and the hyperbolic sine is ex/2

The hyperbolic cosine is an even function and is ≥ 1 and is unbounded above; the hyperbolic sine is an odd function and is unlimited below and unlimited above; the arithmetic mean between cosh(x) and sinh(x) is the function ex/2; the function ex/2 meets the y-axis at the point (0,1/2)

The fundamental identity of circular functions is cos2(x)+sin2(x) = 1, but for the hyperbolic cosine and the hyperbolic sine the relation is cosh2(x)-sinh2(x) = 1; cosh2(x) = ((ex+e-x)/2)2 = (e2x+2exe-x+e-2x)/4; sinh2(x) = ((ex-e-x)/2)2 = (e2x-2exe-x+e-2x)/4; cosh2(x)-sinh2(x) = ((ex+e-x)/2)2-((ex-e-x)/2)2 = ((e2x+2exe-x+e-2x)/4)-((e2x-2exe-x+e-2x)/4) = (e2x+2exe-x+e-2x-e2x+2exe-x-e-2x)/4 = 4exe-x/4 = exe-x = ex(1/ex) = ex/ex = 1, so cosh2(x)-sinh2(x) = 1

{x = cos(t), y = sin(t)}, t is the independent variable, and the curve represented by this parametric equation is the circle with center (0,0) and radius 1; cos2(t)+sin2(t) = 1, x2+y2 = 1, this is the equation of the circle with center (0,0) and radius 1

{x = cosh(t), y = sinh(t)}, t is the independent variable and the curve represented by this parametric equation is an equilateral hyperbola; cosh(t) ≥ 1, for t = 0 the curve intersects the x-axis at the point (1,0); the hyperbolic cosine is an even function and the hyperbolic sine is an odd function, therefore the curve obtained from the parametric equation is symmetrical with respect to the x-axis; cosh2(t)-sinh2(t) = 1, x2-y2 = 1, this is the equation of an equilateral hyperbola, precisely of the branch in the semi-plane x > 0 since cosh(t) ≥ 1, and since it is equilateral the asymptotes are perpendicular to each other and are the bisector of the first quadrant and the bisector of the fourth quadrant; a generic equation of a hyperbola is (x2/a2)-(y2/b2) = 1, with a ≠ 0 and b ≠ 0, and if a = b, then the hyperbola is equilateral, so the asymptotes are perpendicular to each other; this is why cosh(x) and sinh(x) are called hyperbolic functions

f(x) = f(x) = a⋅e-(x-b)2/c2, Gaussian function

f(x) = e-x2, simplified version of the Gaussian function; f(x) is an even and positive function; ex > 0, e-x2 > 0; the graph of f(x) is in the first and second quadrant and intersects the y-axis at the point (0,1); f(x) = e-x2 = 1/ex2, when x tends to +∞ or -∞ f(x) tends to 0 always keeping itself above the x-axis which is the horizontal asymptote, limx→+∞(1/ex2) = 0, limx→-∞(1/ex2) = 0; f(x) is increasing for x < 0 and decreasing for x > 0; the first derivative of f(x) is the derivative of -x2 with respect to x which is -2x, multiplied by the derivative of e-x2 with respect to -x2 that is e-x2, so f'(x) = -2x⋅e-x2; f(x) = e-x2, f'(x) = -2x⋅e-x2; f'(x) > 0 for x < 0, so the function is increasing for x < 0; f'(x) < 0 for x > 0, so the function is decreasing for x > 0; f''(x) = -2e-x2+(-2x)(-2xe-x2) = -2e-x2+4x2e-x2 = 2e-x2(-1+2x2) = 2e-x2(2x2-1); f(x) = e-x2, f'(x) = -2x⋅e-x2, f''(x) = 2e-x2(2x2-1); 2x2-1 = 0, 2x2 = 1, x2 = 1/2, x = ±√1/2 = ±(1/√2) = ±(√2/2), so f(x) has 2 inflection points, for x = -√2/2, and for √2/2; f''(x) > 0 so f(x) is strictly convex in the intervals (-∞,-√2/2) and (√2/2,+∞), f''(x) < 0 so f(x) is strictly concave in the interval (-√2/2,√2/2); the graph of the function f(x) = e-x2 is a bell-shaped curve, also called Gaussian

A horizontal line y = q is a horizontal asymptote of a function when limx→+∞(f(x)-q) = 0, or when limx→-∞(f(x)-q) = 0, so the graph of the function approaches the straight line when x tends to +∞ or -∞; a horizontal asymptote can be left or right, or simultaneously left and right; there are also vertical asymptotes and oblique asymptotes

f(x) = 1/(1+x2), this rational function is the derivative of the arctangent function, it is a positive and even function, it is increasing for x < 0 and decreasing for x> 0; the derivative of the function can be calculated using the formula (1/f(x))' = -f'(x)/(f(x))2, or the formula (f(x)/g(x))' = (f'(x)g(x)-f(x)g'(x))/(g(x))2, or the rule of the compound function considering that f(x) = 1/(1+x2) = (1+x2)-1; f'(x) = -2x/(1+x2)2; f'(x) > 0 for x < 0, so f(x) is increasing for x < 0; f'(x) < 0 for x > 0, so f(x) is decreasing for x > 0; f'(x) = 0 for x = 0, so the point (0,1) is a critical point, a relative and also an absolute maximum point; by calculating the second derivative we can find the inflection points


42 - GRAPHS OF FUNCTIONS - PART 2

f(x) = 1/(1+x2) = (1+x2)-1; f'(x) = -(2x)/(1+x2)2 = -2x(1+x2)-2; f''(x) = -2((1+x2)-2-2x(1+x2)-32x) = -2(1/(1+x2)2)-4x2/(1+x2)3) = -2((1+x2-4x2)/(1+x2)3) = -2((1-3x2)/(1+x2)3) = 2((3x2-1)/(1+x2)3), 3x2-1 = 0, 3x2 = 1, x2 = 1/3, x = ±√(1/3) = ±(1/√(3)) = ±(√(3)/3), the inflection point I1 has coordinate x1 = -1/√(3), the inflection point I2 has coordinate x2 = 1/√(3), y = 1/(1+x2), y1 = 1/(1+(-1/√(3))2) = 1/(1+1/3) = 1/(4/3) = 3/4, y2 = 1/(1+(1/√(3))2) = 1/(1+1/3) = 1/(4/3) = 3/4, F1(-1/√(3),3/4), F1(1/√(3),3/4); at the inflection points the concavity changes sign, and the second derivative is equal to 0; the second derivative of this function is negative between -1/√(3) and 1/√(3), and therefore the graph of the function is strictly concave in the interval between these two inflection points; the second derivative of this function is positive for x < -1/√(3) and for x > 1/√(3), and therefore the graph of the function is strictly convex for x < -1/√(3) and for x > 1/√(3); limx→+∞(1/(1+x2)) = 0, limx→-∞(1/(1+x2)) = 0, therefore this function has the x-axis as its horizontal asymptote

A rational function is a ratio of polynomials

When in a rational function f(x) = p(x)/q(x), the polynomial in the numerator has a lower degree than the polynomial in the denominator, then limx→±∞(f(x)) = 0, therefore the x-axis is a horizontal asymptote

When in a rational function f(x) = p(x)/q(x), the polynomial in the numerator and the polynomial in the denominator have the same degree, then f(x) = (anxn+...+a0)/(bnxn+...+b0), dividing by xn, f(x) = (an+an-1/x+...+a0/xn)/(bn+bn-1/x+...+b0/xn), limx→±∞((an+an-1/x+...+a0/xn)/(bn+bn-1/x+...+b0/xn)) = an/bn, y = an/bn is the equation of the horizontal asymptote, and in this case the horizontal asymptote on the right, for x tending to +∞, is equal to the horizontal asymptote on the left, for x tending to -∞

In some cases the horizontal asymptote on the left, for x tending to -∞, is equal to the horizontal asymptote on the right, for x tending to +∞, but in other cases the horizontal asymptote on the left, for x tending to -∞, is different from the horizontal asymptote on the right, for x tending to +∞

f(x) = 1/(1+e-x) = 1/(1+1/ex) = ex/(ex+1); 0 < f(x) < 1, the image of the function is contained in the open interval (0,1); limx→∞(f(x)) = 1; (1/f(x))' = -f'(x)/(f(x))2, (1/(1+e-x))' = -(1+e-x)'/(1+e-x)2 = -(-e-x)/(1+e-x)2 = e-x/(1+e-x)2 = e-x/(e-x+1)2 = 1/(ex(1/ex+1)2) = 1/(ex((1+ex)/ex)2 = 1/(ex((1+ex)2/e2x)) = 1/((1+ex)2/ex) = ex/(1+ex)2, f'(x) > 0, the function is increasing; limx→-∞(f(x)) = 0; the graph of the function intersects the y-axis at the point (0,1/2), the x-axis is the horizontal asymptote on the left, y = 1 is the horizontal asymptote on the right; im(f) = (0,1), the image of the function is the open interval (0,1); continuous functions transform intervals into intervals; the function is defined on the whole set ℝ which is an interval; the image of the function is an interval contained in the open interval (0,1); the function tends to 1 when x tends to +∞, therefore the supremum of the image of the function is 1; the function tends to 0 when x tends to -∞, therefore the infimum of the image of the function is 0; the image of the function is an interval contained in the open interval (0,1), and this interval has 0 as infimum and 1 as supremum, therefore the image of the function is the open interval (0,1)

The graph of the function f(x) = 1/x is an equilateral hyperbola with a branch in the first quadrant and a branch in the third quadrant; limx→0+(1/x) = +∞; limx→0-(1/x) = -∞; the line of equation y = 0, that is the x-axis, is the horizontal asymptote; the line of equation x = 0, that is the y-axis, is the vertical asymptote

The graph of the function f(x) = 1/x2 is contained in the first and second quadrant; limx→0+(1/x2) = +∞; limx→0-(1/x2) = +∞; the line of equation y = 0, that is the x-axis, is the horizontal asymptote; the line of equation x = 0, that is the y-axis, is the vertical asymptote

When in a rational function, which is the ratio of two polynomials, the denominator is equal to 0 in a point, but the numerator is different from 0 in this point, then at this point the limit of the function is +∞ or -∞, and the graph of the function has a vertical asymptote passing through this point; f(x) = p(x)/q(x), q(x0) = 0, p(x0) ≠ 0, x = x0 is a vertical asymptote

Considering any oblique line with equation y = mx+q, if limx→+∞(f(x)-(mx+q)) = 0 then the line is an oblique asymptote to the right, if limx→-∞(f(x)-(mx+q)) = 0 then the line is an oblique asymptote to the left

f(x) = √(1+x2) is an even function, so the trend to +∞ is equal to the trend to -∞; f(x) = √(1+x2) ~ √(x2) = |x|; limx→+∞(√(1+x2)-x) = 0; limx→+∞(√(1+x2)-x) = ∞-∞, indeterminate form; limx→+∞(√(1+x2)-x) = limx→+∞((√(1+x2)-x)((√(1+x2)+x)/(√(1+x2)+x))) = limx→+∞(((√(1+x2)-x)(√(1+x2)+x))/(√(1+x2)+x)) = limx→+∞((1+x2-x2)/(√(1+x2)+x)) = limx→+∞(1/(√(1+x2)+x)) = 1/(∞+∞) = 1/∞ = 0; y = x, that is the bisector of the first and third quadrant, is an oblique asymptote to the right; y = -x, that is the bisector of the second and fourth quadrant, is an oblique asymptote to the left

If the graph of a function f(x) has an oblique asymptote y = mx+q, then limx→+∞(f(x)-mx-q) = 0; if there is an oblique asymptote then limx→+∞(f(x)-mx-q) = 0, and dividing by x we get limx→+∞(f(x)/x-mx/x-q/x) = limx→+∞(f(x)/x-m-q/x), therefore if there is an oblique asymptote its angular coefficient is m = limx→+∞(f(x)/x), if m ∈ ℝ then q = limx→+∞(f(x)-mx), if m = 0 the asymptote is horizontal, if m and q do not exist, then the oblique asymptote does not exist

f(x) = ex; limx→-∞(ex) = 0, the x-axis is a horizontal asymptote to the left; the angular coefficient of a possible asymptotic line would be m = limx→+∞(f(x)/x) = limx→+∞(ex/x) =H limx→+∞(ex) = +∞, therefore there is no oblique asymptote; the exponential function f(x) = ex has a horizontal asymptote on the left which is the x-axis, it has no oblique asymptotes, and it has no vertical asymptotes because the function is defined on the whole line of real numbers

The graph of the exponential function ex and the graph of the logarithmic function ln(x) are symmetrical with respect to the bisector of the first and third quadrant

f(x) = ln(x); limx→0+(ln(x)) = -∞, the y-axis is the vertical asymptote, in fact the logarithm function f(x) = ln(x) is defined for x > 0; the angular coefficient of a possible oblique asymptote would be m = limx→+∞(ln(x)/x) =H limx→+∞(1/x) = 0, therefore the logarithm function f(x) = ln(x) has no oblique asymptotes and has no horizontal asymptotes; the logarithm function when x tends to +∞ diverges positively, it tends to +∞

The domain of the exponential function is all ℝ; the image of the logarithm function is all ℝ

ln(en) = n, the logarithm assumes all natural values corresponding to the powers en

The image of the logarithm function is unbounded above; the image of the logarithm function is the domain of the exponential function