Data handling: summarising and interpreting data – Week 3 focus
Download the Lessonotes Mobile South Africa app for faster lesson access on Android and iPhone.
Subject: Mathematical Literacy
Class: Grade 11
Term: Term 4
Week: 3
Theme: General lesson support
This page supports the lesson note with a companion video and a short classroom-ready summary.
For class groups and homework, share this lesson page so learners also get the summary, objectives, and full lesson context.
Data handling is a crucial skill in the 21st century, enabling us to make sense of the vast amounts of information we encounter daily. In South Africa, understanding and interpreting data is essential for informed decision-making in various contexts, from understanding election results and evaluating government policies to managing personal finances and making informed consumer choices. This week, we will focus on summarising and interpreting data effectively, going beyond simply calculating values to understanding what the data means in context.
Measures of Central Tendency: These values describe the "typical" or "average" value in a dataset.
Mean (Average): The sum of all values divided by the number of values.
Formula: Mean = (Sum of all values) / (Number of values)
Example: The ages of 5 students are 16, 17, 16, 18, and
1
7. Mean = (16 + 17 + 16 + 18 + 17) / 5 = 84 / 5 = 16.8 years Why it matters:* The mean is sensitive to outliers (extreme values).
Median (Middle Value): The middle value when the data is arranged in ascending order. If there are an even number of values, the median is the average of the two middle values.
Example 1 (Odd number of values): The salaries (in Rand) of 7 workers are 5000, 6000, 7000, 8000, 9000, 10000,
1
1
0
0
0. Median = 8000 (the middle value)
Example 2 (Even number of values): The ages of 6 siblings are 5, 7, 9, 11, 13,
1
5. Median = (9 + 11) / 2 = 10 years Why it matters:* The median is resistant to outliers. It provides a better measure of the "center" when outliers are present. For instance, consider the salaries of employees at a small business: R5000, R6000, R7000, R8000, and R100 000 (the CEO's salary). The mean is greatly inflated by the CEO's salary, while the median (R7000) gives a more representative picture of the typical employee's salary.
Mode (Most Frequent Value): The value that appears most often in the dataset. A dataset can have no mode (if all values appear only once), one mode (unimodal), or multiple modes (bimodal, trimodal, etc.).
Example: The shoe sizes of 10 learners are 6, 7, 7, 8, 8, 8, 9, 9, 10,
1
0. Mode = 8 (appears 3 times)
Why it matters:* The mode is useful for categorical data and can indicate the most popular choice or category. For example, if we are analysing the colours of cars sold in a dealership, the mode would tell us the most popular colour.
Measures of Spread (Variability): These values describe how spread out or dispersed the data is.
Range: The difference between the highest and lowest values in the dataset.
Formula: Range = Highest Value - Lowest Value
Example: The marks of 10 students in a test are 50, 60, 70, 75, 80, 85, 90, 92, 95,
1
0
0. Range = 100 - 50 = 50 Why it matters:* The range is a simple measure of spread, but it's very sensitive to outliers.
Interquartile Range (IQR): The difference between the upper quartile (Q3) and the lower quartile (Q1).
Quartiles:* Divide the data into four equal parts.
Q1 (Lower Quartile): The median of the lower half of the data. 25% of the data falls below Q
1. Q2 (Median): The median of the entire dataset. 50% of the data falls below Q
2. Q3 (Upper Quartile): The median of the upper half of the data. 75% of the data falls below Q
3. Formula: IQR = Q3 - Q1
Example: Consider the following data: 12, 15, 18, 20, 22, 25, 28, 30, 32, 35,
3
8. First, find the median (Q2): 25 Lower half: 12, 15, 18, 20,
2
2. Q1 = 18 Upper half: 28, 30, 32, 35,
3
8. Q3 = 32 IQR = 32 - 18 = 14 Why it matters:* The IQR is a robust measure of spread, meaning it is not heavily influenced by outliers. It represents the spread of the middle 50% of the data.
Percentiles: Divide the data into 100 equal parts. The nth percentile is the value below which n% of the data falls.
Example: A student scores in the 80th percentile on a test. This means that the student performed better than 80% of the other students who took the test.
Why it matters:* Percentiles are useful for comparing individual values to the rest of the dataset. They are commonly used in education, healthcare, and other fields.
Outliers: Outliers are extreme values that are significantly different from the other values in the dataset. They can have a significant impact on the mean and range, but less impact on the median and IQR. When summarizing data, it's important to identify and consider the impact of outliers. Sometimes outliers are genuine data points, and sometimes they are the result of errors.
Data Interpretation: It's crucial to not just calculate these measures, but to interpret what they mean in the context of the data.
For example: "The average income in this community is R5000 per month, but the median income is R3000 per month. This suggests that there are some high earners who are skewing the average upwards, and that the majority of people in the community earn closer to R3000 per month." Guided Practice (With Solutions)
Question 1: The following data represents the number of learners absent from a school each day for two weeks: 2, 3, 1, 0, 4, 2, 3, 1, 2,
5. Calculate the mean, median, and mode of the number of absences.
Solution: Mean: (2 + 3 + 1 + 0 + 4 + 2 + 3 + 1 + 2 + 5) / 10 = 23 / 10 = 2.3 absences.
Commentary:* The average number of absences per day is 2.
3. Median: First, arrange the data in ascending order: 0, 1, 1, 2, 2, 2, 3, 3, 4,
5. Since there are 10 values (an even number), the median is the average of the 5th and 6th values: (2 + 2) / 2 = 2 absences.
Commentary:* Half of the days had 2 or fewer absences.
Mode: The number 2 appears most frequently (3 times).