Pearson Correlation Formula
The correlation coefficient is the measurement of the correlation between two variables. Pearson correlation formula is used to see how the two sets of data are co-related. The linear dependency between the data set is checked using the Pearson correlation coefficient. It is also known by the name of the Pearson product-moment correlation coefficient. The value of the Pearson correlation coefficient product lies between -1 to +1. If the correlation coefficient is zero, then the data is said to be not related. A value of +1 indicates that the data are positively correlated and a value of -1 indicates a negative correlation.
What Is Pearson Correlation Formula?
The Pearson correlation coefficient is symbolised by the letter “r”. RephraseThe Pearson correlation formula for the coefficient r is given by:
\(r=\frac{n\left(\sum x y\right)-\left(\sum x\right)\left(\sum y\right)}{\sqrt{\left[n \sum x^{2}-\left(\sum x\right)^{2}\right]\left[n \sum y^{2}-\left(\sum y\right)^{2}\right]}}\)
Where,
\(r=\) Pearson correlation coefficient
\(x=\) Values in the first set of data \(y=\) Values in the second set of data \(n=\) Total number of values
Let's solve a few solved examples based on the Pearson correlation formula.
Solved Examples Using Pearson Correlation Formula
Example 1: A survey was conducted in your city. Given is the following sample data containing a person's age and their corresponding income. Find out whether the increase in age has an effect on income using the correlation coefficient formula. (Use \(\frac{1}{\sqrt{181}}\) as 0.074 and \(\frac{1}{\sqrt{209}}\) as 0.07)
Age | 25 | 30 | 36 | 43 |
Income | 30000 | 44000 | 52000 | 7000 |
Solution:
To simplify the calculation, we divide y by 1000.
Age (xi) | Income/1000 (yi/1000) | \(x_i - \bar{x}\) | \(y_i - \bar{y}\) | \((x_i - \bar{x})^2\) | \((y_i - \bar{y})^2\) | \((x_i - \bar{x})(y_i - \bar{y})\) |
25 | 30 | -8.5 | -19 | 72.25 | 361 | 161.5 |
30 | 44 | -3.5 | -5 | 12.25 | 25 | 17.5 |
36 | 52 | 2.5 | 3 | 6.25 | 9 | 7.5 |
43 | 70 | 9.5 | 21 | 90.25 | 441 | 199.5 |
\(\bar{x} = 33.5\) | \(\bar{y} = 49\) | \(\Sigma (x_i - \bar{x})^2 = 181\) | \(\Sigma (y_i - \bar{y})^2 = 836\) | \(\Sigma(x_i-\bar{x})(y_i - \bar{y}) = 386\) |
Pearson correlation coefficient for sample = \(\dfrac{\Sigma (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\Sigma (x_i - \bar{x})^2 \Sigma (y_i - \bar{y})^2 }}\) = \(\dfrac{386}{\sqrt{181}\sqrt{836}}\) = \(\dfrac{193}{\sqrt{181}\sqrt{209}}\) = 0.99
Answer: Yes, with the increase in age a person's income increases as well, since the Pearson correlation coefficient between age and income is very close to 1.
Example 2: Marks obtained by 5 students in algebra and trigonometry as given below:
\(\begin{array}{|c|c|c|c|c|c|} \hline \text { Science } & 16 & 15 & 12 & 10 & 8 \\ \hline \text { Geometry } & 11 & 18 & 10 & 20 & 17 \\ \hline \end{array}\)
Calculate the Pearson correlation coefficient.
Solution:
Construct the following table:
The formula for Pearson correlation coefficient is:
\(r=\frac{n\left(\sum x y\right)-\left(\sum x\right)\left(\sum y\right)}{\sqrt{\left[n \sum x^{2}-\left(\sum x\right)^{2}\right]\left[n \sum y^{2}-\left(\sum y\right)^{2}\right]}}\) \(r=\frac{5 \times 902-61 \times 76}{ \left.\sqrt{\left[5 \times 789(61)^{2} \| 5 \times 1234-(76)^{2}\right.}\right]}\) \(r=-0.424\)
Answer: r = -0.424
visual curriculum