Scatter Plot
Scatter Plots are described as one of the most useful inventions in statistical graphs. Originally, the scatter plot was presented by an English Scientist, John Frederick W. Herschel, in the year 1833. Herschel used it in the study of the orbit of the double stars. He plotted the positional angle of the double star in relation to the year of measurement. The scatter plot was used to understand the fundamental relationship between the two measurements. Even though bar charts and line plots are frequently used, the scatter plot still dominates the scientific and business world. It is very easy for people to look at points on a scale and understand their relationship.
Let us explore this topic to understand more about scatter plots.
1. | What is a Scatter Plot? |
2. | How to Construct a Scatter Plot? |
3. | Types of Scatter Plot |
4. | What Is Scatter Plot Analysis? |
5. | FAQs on Scatter Plot |
What Is a Scatter Plot?
A scatter plot is a means to represent data in a graphical format. A simple scatter plot makes use of the Coordinate axes to plot the points, based on their values. The following scatter plot excel data for age (of the child in years) and height (of the child in feet) can be represented as a scatter plot.
Age of the Child | Height |
3 | 2.3 |
4 | 2.7 |
5 | 3.1 |
6 | 3.6 |
7 | 3.8 |
8 | 4 |
9 | 4.3 |
10 | 4.5 |
Scatter Plot Application
Here let us look at real-life application represented by scatter plot.
Example: Days of the week and the sales
How to Construct a Scatter Plot?
There are three simple steps to plot a scatter plot.
- STEP I: Identify the x-axis and y-axis for the scatter plot.
- STEP II: Define the scale for each of the axes.
- STEP III: Plot the points based on their values.
Types of Scatter Plot
A scatter plot helps find the relationship between two variables. This relationship is referred to as a correlation. Based on the correlation, scatter plots can be classified as follows.
- Scatter Plot for Positive Correlation
- Scatter Plot for Negative Correlation
- Scatter Plot for Null Correlation
Scatter Plot for Positive Correlation
A scatter plot with increasing values of both variables can be said to have a positive correlation. The scatter plot for the relationship between the time spent studying for an examination and the marks scored can be referred to as having a positive correlation.
Scatter Plot for Negative Correlation
A scatter plot with an increasing value of one variable and a decreasing value for another variable can be said to have a negative correlation. Observe the below image of negative scatter plot depicting the amount of production of wheat against the respective price of wheat.
Scatter Plot for Null Correlation
A scatter plot with no clear increasing or decreasing trend in the values of the variables is said to have no correlation. Here the points are distributed randomly across the graph. For example, the data for the number of birds on a tree at different times of the day does not show any correlation. Observe the below scatter plot showing the number of birds on a tree versus time of the day.
What is Scatter Plot Analysis?
Analysis of a scatter plot helps us understand the following aspects of the data.
- The different levels of correlation among the data points are useful to understand the relationship within the data.
- A line of best fit can be drawn for the given data and used to further predict new data values.
- The data points lying outside the given set of data can be easily identified to find the outliers.
- The grouping of data points in a scatter plot can be identified as different clusters within the data.
Scatter Plot Examples
-
Example 1: Laurell had visited a zoo recently and had collected the following data. How can Laurell use a scatter plot to represent this data?
Type of Animal Number of Animals in the Zoo Zebra 25 Lions 5 Monkeys 50 Elephants 10 Ostriches 20 Solution:
The aim is to present the above data in a scatter plot.
- Step 1: Mark the points on the x-axis and write the names of the animals beside each of the markings.
- Step 2: Marks the points as 10, 20, 30, 40, 50, 60 on the y-axis to represent the number of animals.
- Step 3: Identify the animals marked on the x-axis and mark a point above is based on the number given in the table. Refer to the y-axis to measure and mark the points.
Therefore the points representing the number of animals have been plotted on the scatter plot.
-
Example 2: The meteorological department has collected the following data about the temperature and humidity in their town. Refer to the table given below and indicate the method to find the humidity at a temperature of 60 degrees Fahrenheit.
Temperature (Degree Fahrenheit) Humidity(%) 45 60 62 48 77 40 97 30 118 20 122 18 Solution:
The collected data of the temperature and humidity can be presented in the form of a scatter plot.
Temperature is marked on the x-axis and humidity is on the y-axis.
To calculate the humidity at a temperature of 60 degrees Fahrenheit, we need to first draw a line of best fit.
A line of "Best Fit" is a straight line drawn to pass through most of these data points.
Now draw a vertical line from the mark of 60 degrees Fahrenheit on the x-axis, so that it cuts the line of "Best Fit".
At the point where this line cuts the line of "Best Fit", the corresponding marking on the y-axis represents the humidity at 60 degrees Fahrenheit.
Therefore, the humidity at a temperature of 60 degrees Fahrenheit is 50%.
-
Example 3: In a school, a teacher has prepared a scatter plot on her computer to show the marks of 8 students and the time spent in preparation for the examination. How can we help the teacher find the outlier?
Solution:
The data in the scatter plot shows a positive correlation; the marks increase with an increase in time spent on preparation. But the data point referring to the student who has to spend 2.5 hours of time for preparation and has secured 40% of marks is distinct from the correlation and can thus be identified as an outlier.
Therefore, the data point of the student with 40% marks and time of 2.5 hours is the outlier.
FAQs on Scatter Plot
What is Scatter Plot in Data?
In data, a scatter (XY) plot is a vertical use to show the relationship between two sets of data. It is a graphical representation of data represented using a set of points plotted in a two-dimensional or three-dimensional plane.
What is Scatter Plot Used For?
Scatter plots are used to observe and plot relationships between two numeric variables graphically with the help of dots. The dots in a scatter plot shows the values of individual data points.
What are Interpolation and Extrapolation in a Scatter Plot?
Interpolation or extrapolation helps in predicting the values of the new data using scatter plots.
- Interpolation helps to predict the new values for data points, within the range of the given set of data.
- Extrapolation helps to predict the new values for the data points, which are beyond the given set of data.
What are the Three Types of Scatter Plot?
The relationship between the different variables in data is referred to as a correlation. Scatter plots help find the correlation within the data. There are three types of correlation:
- Positive Correlation
- Negative Correlation
- No Correlation (None)
When Should You Use Scatter Plot?
You can use a scatter plot when you have at least two variables that can be paired well together. Plotting the variables on a scatter diagram is a systematic way to view the relationship between the variables and see if it's a positive or negative correlation.
How Can You Differentiate Between a Positive and Negative Correlation on a Scatter Plot?
In a positive correlation, both the variables increase or decrease in a similar manner. The line of best fit for the data points with a positive correlation would have a positive slope. Further, in a negative correlation, one variable increases, and another variable value would decrease. The line of best fit for the data with negative correlation would have a negative slope.
What is a Linear Scatter Plot?
A scatter plot when falls along a line it is termed a linear scatter plot while nonlinear patterns seem to follow along some curve.
visual curriculum