When we have a dataset, we often will want to describe the dataset to others without having to show them all of the individual values which make up the dataset. This is especially true for very large datasets.
Measures of location/central tendency
These values tell us where the data is centered around/located at. They include:
- Mean ($\bar{x}$) - the average of all of the data points. It is affected by the outliers (extreme values)
- Median ($Q2$) - the middle value. It is not affected by outliers
- Mode - the most frequent value
Measures of spread
These values tell us how far the data is spread out. They are important because the measures of location are often insufficient in describing how far apart the data points are. The measures of spread are:
- Range - the difference between the highest and lowest values in a dataset
- Variance ($\sigma^2$) - the sum of the squares of the distances between each data point ($x$) and the mean ($\bar{x}$), divided by the number of data points ($n$)
- Standard deviation ($\sigma$) - the square root of the variance
Research Questions
- What is meant by the term ‘measure of central tendency’?
- What is a measure of spread?
- What are the graphs/diagram we can use to represent data?
- What are the strengths and weaknesses of these graphs?