Covariance Matrix
Covariance matrix is a type of matrix that is used to represent the covariance values between pairs of elements given in a random vector. The covariance matrix can also be referred to as the variance covariance matrix. This is because the variance of each element is represented along the main diagonal of the matrix.
A covariance matrix is always a square matrix. Furthermore, it is positive semi-definite, and symmetric. This matrix is very useful in stochastic modeling and principle component analysis. In this article, we will learn about the variance covariance matrix, its formula, examples, and various important properties associated with it.
1. | What is Covariance Matrix? |
2. | Covariance Matrix Formula |
3. | How To Calculate Variance Covariance Matrix? |
4. | Properties of Covariance Matrix |
5. | FAQs on Covariance Matrix |
What is Covariance Matrix?
Covariance matrix is a square matrix that displays the variance exhibited by elements of datasets and the covariance between a pair of datasets. Variance is a measure of dispersion and can be defined as the spread of data from the mean of the given dataset. Covariance is calculated between two variables and is used to measure how the two variables vary together.
Covariance Matrix Definition
Variance covariance matrix is defined as a square matrix where the diagonal elements represent the variance and the off-diagonal elements represent the covariance. The covariance between two variables can be positive, negative, and zero. A positive covariance indicates that the two variables have a positive relationship whereas negative covariance shows that they have a negative relationship. If two elements do not vary together then they will display a zero covariance.
Covariance Matrix Example
Suppose there are two data sets X = {3, 2} and Y = {7, 4}. The sample variance of dataset X = 0.5, and Y = 4.5. The covariance between X and Y is 1.5. The covariance matrix is expressed as follows:
\(\begin{bmatrix} 0.5 & 1.5\\ 1.5& 4.5 \end{bmatrix}\)
A detailed description of how to find the variance covariance matrix will be covered in the upcoming sections.
Covariance Matrix Formula
To determine the covariance matrix, the formulas for variance and covariance are required. Depending upon the type of data available, the variance and covariance can be found for both sample data and population data. These formulas are given below.
Population Variance: var(x) = \(\frac{\sum_{1}^{n}\left ( x_{i} -\mu\right )^{2} }{n}\)
Population Covariance: cov(x, y) = \(\frac{\sum_{1}^{n}\left ( x_{i} -\mu_{x}\right )\left ( y_{i}-\mu_{y} \right ) }{n}\)
Sample Variance: var(x) = \(\frac{\sum_{1}^{n}\left ( x_{i} -\overline{x}\right )^{2} }{n-1}\)
Sample Covariance: cov(x, y) = \(\frac{\sum_{1}^{n}\left ( x_{i} -\overline{x}\right )\left ( y_{i}-\overline{y} \right ) }{n-1}\)
\(\mu\) = mean of population data.
\(\overline{x}\) = mean of sample data.
n = number of observations in the dataset.
\(x_{i}\) = observations in dataset x.
Using these formulas, the general form of a variance covariance matrix is given as follows:
\(\begin{bmatrix} Var(x_{1}) & ... & Cov(x_{1},x_{n})\\ : &. & :\\ :& \: \: \: \: \: \: \: \: \: \: .& :\\ Cov(x_{n},x_{1}) & ... & Var(x_{n}) \end{bmatrix}\)
Covariance Matrix 2 × 2
A 2 × 2 matrix is one which has 2 rows and 2 columns. The formula for a 2 × 2 covariance matrix is given as follows:
\(\begin{bmatrix} var(x) & cov(x,y) \\ cov(x,y)& var(y) \end{bmatrix}\)
Covariance Matrix 3 × 3
If there are 3 datasets, x, y, and z, then the formula to find the 3 × 3 covariance matrix is given below:
\(\begin{bmatrix} var(x) & cov(x,y) &cov(x,z)\\ cov(x,y)& var(y)&cov(y,z)\\ cov(x,z)& cov(y,z)&var(z) \end{bmatrix}\)
How To Calculate Covariance Matrix?
The number of variables determines the dimension of a variance-covariance matrix. For example, if there are two variables (or datasets) it indicates that the covariance matrix will be 2 dimensional. Suppose the math and science scores of 3 students are given as follows:
Student | Math (X) | Science (Y) |
---|---|---|
1 | 92 | 80 |
2 | 60 | 30 |
3 | 100 | 70 |
The steps to calculate the covariance matrix for the sample are given below:
- Step 1: Find the mean of one variable (X). This can be done by dividing the sum of all observations by the number of observations. Thus, (92 + 60 + 100) / 3 = 84
- Step 2: Subtract the mean from all observations; (92 - 84), (60 - 84), (100 - 84)
- Step 3: Take the sum of the squares of the differences obtained in the previous step. (92 - 84)2 + (60 - 84)2 + (100 - 84)2.
- Step 4: Divide this value by 1 less than the total to get the sample variance of the first variable (X). var(X) = [(92 - 84)2 + (60 - 84)2 + (100 - 84)2] / (3 - 1) = 448
- Step 5: Repeat steps 1 to 4 to find the variances of all variables. Using these steps, var(Y) = 700.
- Step 6: Choose a pair of variables (X and Y).
- Step 7: Subtract the mean of the first variable (X) from all observations; (92 - 84), (60 - 84), (100 - 84).
- Step 8: Repeat step 7 for the second variable (Y); (80 - 60), (30 - 60), (70 - 60).
- Step 9: Multiply the corresponding observations. (92 - 84)(80 - 60), (60 - 84)(30 - 60), (100 - 84)(70 - 60).
- Step 10: Add these values and divide them by (n - 1) to get the covariance. cov(x, y) = cov(y, x) = [(92 - 84)(80 - 60) + (60 - 84)(30 - 60) + (100 - 84)(70 - 60)] / (3 - 1) = 520.
- Step 11: Repeat steps 6 to 10 for different pairs of variables.
- Step 12: Now using the general formula for covariance matrix arrange these values in matrix form. Thus, the variance covariance matrix for the example is given as \(\begin{bmatrix} 448 & 520\\ 520& 700 \end{bmatrix}\).
The same steps can be followed while calculating the covariance matrix for a population. The only difference is that the population variance and covariance formulas will be applied.
Properties of Covariance Matrix
Covariance matrix is a very important tool used by data scientists to understand and analyze multivariate data. Listed below are the various properties of this matrix that make it extremely useful.
- A covariance matrix is always a square matrix. This means that the number of rows of the matrix will be equal to the number of columns.
- The matrix is symmetric. Suppose M is the covariance matrix then MT = M.
- It is positive semi-definite. Let u be a column vector, uT is the transpose of that vector and M be the covariance matrix then uTMu ≥ 0.
- All eigenvalues of the variance covariance matrix are real and non-negative.
Related Articles:
Important Notes on Covariance Matrix
- The covariance matrix depicts the variance of datasets and covariance of a pair of datasets in matrix format.
- The diagonal elements represent the variance of a dataset and the off-diagonal terms give the covariance between a pair of datasets.
- The variance covariance matrix is always square, symmetric, and positive semi-definite.
- The general formula to represent a covariance matrix is \(\begin{bmatrix} Var(x_{1}) & ... & Cov(x_{1},x_{n})\\ : &. & :\\ :& \: \: \: \: \: \: \: \: \: \: .& :\\ Cov(x_{n},x_{1}) & ... & Var(x_{n}) \end{bmatrix}\).
Examples on Covariance Matrix
-
Example 1: Find the population covariance matrix for the following table.
Score Age 68 29 60 26 58 30 40 35 Solution: The formula for population variance is \(\frac{\sum_{1}^{n}\left ( x_{i} -\mu\right )^{2} }{n}\).
\(\mu_{x}\) = 56.5, n = 4
var(x) = [(68 - 56.5)2 + (60 - 56.5)2 + (58 - 56.5)2 + (40 - 56.5)2 ] / 4 = 104.75
\(\mu_{y}\) = 30, n = 4
var(y) = [(29 - 30)2 + (26 - 30)2 + (30 - 30)2 + (35 - 30)2] / 4 = 10. 5
cov(x, y) = \(\frac{\sum_{1}^{4}\left ( x_{i} -\mu_{x}\right )\left ( y_{i}-\mu_{y} \right ) }{4}\)
cov(x, y) = -27
The variance covariance matrix is given as follows:
\(\begin{bmatrix} 104.7 &-27 \\ -27& 10.5 \end{bmatrix}\).
-
Example 2: Find the covariance matrix for the following sample data.
X Y Z 15 12.5 50 35 15.8 55 20 9.3 70 14 20.1 65 28 5.2 80 Solution: The sample variance formula is \(\frac{\sum_{1}^{n}\left ( x_{i} -\overline{x}\right )^{2} }{n-1}\).
Substituting the values of observations for each variable in this formula we get,
n = 5, \(\overline{x}\) = 22.4, var(X) = 321.2 / (5 - 1) = 80.3
\(\overline{y}\) = 12.58, var(Y) = 132.148 / 4 = 33.037
\(\overline{z}\) = 64, var(Z) = 570 / 4 = 142.5
cov(X, Y) = \(\frac{\sum_{1}^{5}\left ( x_{i} -22.4\right )\left ( y_{i}-12.58\right ) }{5-1}\) = -13.865
cov(X, Z) = \(\frac{\sum_{1}^{5}\left ( x_{i} -22.4\right )\left ( z_{i}-64 \right ) }{5-1}\) = 14.25
cov(Y, Z) = \(\frac{\sum_{1}^{5}\left ( y_{i} -12.58\right )\left ( z_{i}-64 \right ) }{5-1}\) = -39.525
The covariance matrix is
\(\begin{bmatrix} 80.3 & -13.865 &14.25 \\ -13.865 & 33.037 & -39.5250\\ 14.25 & -39.5250 & 142.5 \end{bmatrix}\)
-
Example 3: How will you interpret the covariance matrix given below?
\(\begin{bmatrix} & X & Y & Z\\ X & 500 & 320 & -40\\ Y & 320 & 340 & 0\\ Z & -40 & 0 & 800 \end{bmatrix}\)
Solution: The variance covariance matrix can be interpreted as follows:
1) The diagonal elements 500, 340 and 800 indicate the variance in data sets X, Y and Z respectively. Y shows the lowest variance whereas Z displays the highest variance.
2) The covariance for X and Y is 320. As this is a positive number it means that when X increases (or decreases) Y also increases (or decreases)
3) The covariance for X and Z is -40. As it is a negative number it implies that when X increases Z decreases and vice - versa.
4) The covariance for Y and Z is 0. This means that there is no predictable relationship between the two data sets.
FAQs on Covariance Matrix
What is Covariance Matrix in Math?
Covariance matrix is a square matrix that denotes the variance of variables (or datasets) as well as the covariance between a pair of variables. It is symmetric and positive semi definite.
What is the Formula for Covariance Matrix?
The general formula for the variance covariance matrix is given as follows:
\(\begin{bmatrix} Var(x_{1}) & ... & Cov(x_{1},x_{n})\\ : &. & :\\ :& \: \: \: \: \: \: \: \: \: \: .& :\\ Cov(x_{n},x_{1}) & ... & Var(x_{n}) \end{bmatrix}\)
What is the 2 × 2 Covariance Matrix Formula?
Suppose X and Y are two data sets then the 2 × 2 covariance matrix formula is given as \(\begin{bmatrix} var(x) & cov(x,y) \\ cov(x,y)& var(y) \end{bmatrix}\)
How to Find Covariance Matrix?
The steps to find the covariance matrix for a sample are as follows:
- Find the sample variance for all datasets using the formula \(\frac{\sum_{1}^{n}\left ( x_{i} -\overline{x}\right )^{2} }{n-1}\).
- Find the sample covariance between all pairs of datasets given by \(\frac{\sum_{1}^{n}\left ( x_{i} -\overline{x}\right )\left ( y_{i}-\overline{y} \right ) }{n-1}\).
- Substitute the values in the matrix \(\begin{bmatrix} Var(x_{1}) & ... & Cov(x_{1},x_{n})\\ : &. & :\\ :& \: \: \: \: \: \: \: \: \: \: .& :\\ Cov(x_{n},x_{1}) & ... & Var(x_{n}) \end{bmatrix}\).
Using different formulas, the same steps can be applied to find the covariance matrix of population data.
What are the Properties of Covariance Matrix?
The main properties of a covariance matrix are listed as follows:
- It is a square matrix.
- It is symmetric.
- It is positive semi-definite.
- The eigenvalues are positive and real.
Is the Variance Covariance Matrix Symmetric?
Yes, variance covariance matrix is symmetric. Thus, if the transpose of a covariance matrix is taken, it will result in the original matrix. In other words, MT = M, where M is the covariance matrix.
What are the Applications of Covariance Matrix?
Covariance matrix is widely used in the field of economics, financial engineering, and machine learning. The Cholesky decomposition makes use of the covariance matrix to perform a Monte Carlo simulation. This simulation is used to create various mathematical models.
visual curriculum