Outlier Formula
The extreme values in the data are called outliers. The outlier formula helps us to find outliers in a data set. The outlier in the literary world refers to the best and the brightest people. There is a non-fiction book 'Outliers' written by Malcolm Gladwell that debuted as the number one on the best seller books of the New York Times. Here, Malcolm describes outliers as people with exceptional intelligence, large fortunes, and who are different from the usual set of people. In this lesson, we shall explore the outlier formula, by finding answers to questions like what is an outlier, how to find outliers using the turkey method, and solving examples at the end.
What Is Outlier Formula?
The extreme values in the data are called outliers.
Example: For a data set containing 2, 19, 25, 32, 36, 38, 31, 42, 57, 45, and 84
In the above number line, we can observe the numbers 2 and 84 are at the extremes and are thus the outliers. The outliers are a part of the group but are far away from the other members of the group. The problem with outliers: Outliers create an imbalance in the data-set and hence are generally removed from the data. Also, sometimes the outlier occurs in the data-set, due to an error.
Consider the data: 70, 73, 77, 71, 7, 73, 72, and 78
Let's calculate the mean to understand how the outlier affects the results.
Here, the datapoint 7, is an outlier.
Mean (with outlier) = (70 + 73 + 77 + 71 + 7 + 73 + 72 + 78)/8 = 521/8 = 65.1
Mean (without an outlier) = (70 + 73 + 77 + 71 + 73 + 72 + 78)/7 = 514/7 = 73.4
We can now observe how the outlier creates a variation in the mean value of the data.
Before we learn about finding the outlier, let's know about the quartiles and interquartile range.
- First Quartile Q1: The mid-value of the first half of the data represents the first quartile.
- Second Quartile Q2: The mid-value or the median of the data represents the second quartile
- Third Quartile Q3: The mid-value of the second half of the data represents the third quartile
Outlier Formula (Turkey Method)
Turkey's method is a mathematical method to find outliers. As per the Turkey method, the outliers are the points lying beyond the upper boundary of Q3 +1.5 IQR and the lower boundary of Q1 - 1.5 IQR. These boundaries are referred to as outlier fences.
Upper~Fence = Q3 +1.5 IQR
Lower~Fence = Q1 - 1.5 IQR
The data points beyond the upper and the lower fence in this box plot are referred to as outliers.
The data points beyond the upper and the lower fence in this box plot are referred to as outliers.
How Does Removing the Outlier Affect the Mean?
Removing an outlier changes the value of the mean. Let us understand this with sample data of 10, 11, 14, 15, and 55
Mean = (10 + 11 + 14 + 15 + 55)/5 = 105/5 = 21
Mean (without the outlier) = (10 + 11 + 14 + 15)/4 = 50/4 = 12.5
Here, on removing the outlier 55 from the sample data the mean changes from 21 to 12.5
Solved Examples Using Outlier Formula
Example 1: Sam has got a set of multiples of the numbers 4, 8, 12, 16, 20, 22, 24, 28, 32, 36, 40, 44, 48, and 52. Help Sam to find the first quartile and the third quartile along with the outlier(s) of this data. Solve this by using the outlier formula.
Solution: The given data is 4, 8, 12, 16, 20, 22, 24, 28, 32, 36, 40, 44, 48, and 52
Median = 28
The first half of the data is 4, 8, 12, 16, 20, 22, 24, 28 and its mid-value is 16
Q1 = 16
The second half of the data is 28, 32, 36, 40, 44, 48, 52 and the mid-value is 40
Q3 = 40
Interquartile range IQR = Q3 - Q1 = 40 - 16 = 24
1.5 IQR = 1.5 × 24 = 36
Upper Boundary = Q3 +1.5 IQR = 40 + 36 = 76
Lower Boundary = Q1 - 1.5 IQR = 16 - 36 = -20
The outlier boundaries are -20 and 76, and no number lies beyond the upper and lower boundaries.
Answer: The first quartile is 16 and the third quartile is 40. There are no outliers.
Example 2: John has made a note of the scores of his classmates in a drawing assignment as 12, 19, 36, 33, 27, 19, 9, 66, 55, 44, 42, 71, 37, 39, 28, and 25. Help John find the interquartile range and oulier(s) for this set of marks. Solve this by using the outlier formula.
Solution:
The given data is 12, 19, 36, 33, 27, 19, 9, 66, 55, 44, 42, 71, 37, 39, 28, and 25
Arranging the data in an ascending order, we will have: 9, 12, 19, 19, 25, 27, 28, 33, 36, 37, 39, 42, 44, 55, 66, and 71
Median = 33
The first half of the data is 9, 12, 19, 19, 25, 27, 28, 33
Q1 = (19 + 25)/2 = 44/2 = 22
The second half of the data is 36, 37, 39, 42, 44, 55, 66, 71
Q3 = (42 + 44)/2 = 86/2 = 43
Interquartile Range IQR = Q3 - Q1 = 43 - 22 = 21
1.5 IQR = 1.5 × 21 = 31.5
Upper Boundary = Q3 +1.5 IQR = 43 + 31.5 = 74.5
Lower Boundary = Q1 - 1.5 IQR = 22 - 31.5 = -9.5
The outlier boundaries are 74.5 and -9.5, and no number lies beyond the upper and lower boundaries.
Answer: Interquartile Range is 21. There are no outliers.
Example 3: Dan has got the data of runs scored by a batsman as 21, 14, 26, 8, 12, 12, 14, 76, 28, 20, 32, and 38. Can you help Dan find the outlier using the outlier formula?
Solution: The given data is 21, 14, 26, 8, 12, 12, 14, 76, 28, 20, 32, and 38
Arranging this in ascending order, we have: 8, 12, 12, 14, 14, 20, 21, 26, 28, 32, 38, and 76
Clearly from observation, we can find that the outlier is the number 76
Further, let us apply the Turkey rule to find the outlier.
The first half of the data is 8, 12, 12, 14, 14, 20
Q1 = (12 + 14)/2 = (26)/2 = 13
The second half of the data is 21, 26, 28, 32, 38, 76
Q3 = (28 + 32)/2 = {60)/2 = 30
Interquartile range IQR = Q3 - Q1 = 30 - 13 = 17
1.5 IQR = 1.5 × 17 = 25.5
Upper Boundary = Q3 +1.5 IQR = 30 + 25.5 = 55.5
Lower Boundary = Q1 - 1.5 IQR = 13 - 25.5 = -12.5
The outlier boundaries are -12.5 and 55.5, and the number 76 lies beyond this boundary.
Answer: The outlier is 76.
FAQs on Outlier Formula
What Is Outlier Formula?
The extreme values in the data are called outliers. Turkey's method is a mathematical method to find outliers. As per the Turkey method, the outliers are the points lying beyond the upper boundary of Q3 +1.5 IQR and the lower boundary of Q1 - 1.5 IQR. These boundaries are referred to as outlier fences. Q1, Q2, and Q3 are the first second, and third quartile respectively. First Quartile Q1: The mid-value of the first half of the data represents the first quartile, Second Quartile Q2: The mid-value or the median of the data represents the second quartile, and Third Quartile Q3: The mid-value of the second half of the data represents the third quartile
When Should we Remove Outliers?
Errors in data entry or insufficient data collection process result in an outlier. In such instances, the outlier is removed from the data, before further analyzing the data.
Also sometimes the outliers rightly belong to the dataset and cannot be removed. An example is the marks scored by the students in which the student gaining a 100 mark (full marks) is an outlier, which cannot be removed from the dataset.
Can Normal Distribution Have Outliers?
A normal distribution also has outliers. The Z-value helps to identify the outliers.
Z = (x - μ)/ σ where μ is the mean of the data and σ is the standard deviation of the data.
The data with Z-values beyond 3 are considered as outliers.
What Percent of a Normal Distribution Are Outliers?
About 0.3% of the normal distribution are outliers.
65%, 95%, 99.7% of the data are within the Z value of 1, 2 & 3 respectively. The data beyond the Z value of 3, represent the outliers. Since 99.7% of the data is within the Z value of 3, the remaining data of 0.3% is the outliers.
visual curriculum