Statistics – Week 2 focus
Download the Lessonotes Mobile South Africa app for faster lesson access on Android and iPhone.
Subject: Mathematics
Class: Grade 11
Term: Term 4
Week: 2
Theme: General lesson support
This page supports the lesson note with a companion video and a short classroom-ready summary.
For class groups and homework, share this lesson page so learners also get the summary, objectives, and full lesson context.
This week, we delve deeper into Statistics, focusing on measures of dispersion and interpreting data. Understanding how data spreads around its central tendency is crucial. This isn't just about memorizing formulas; it’s about understanding variability, which is everywhere! Imagine analyzing rainfall patterns in the Western Cape to predict drought, understanding the spread of COVID-19 cases to inform public health strategies, or analyzing income distribution to understand inequality in South Africa. These examples highlight why understanding dispersion is so vital for informed decision-making. It helps us move beyond simple averages and get a more complete picture.
This week's focus is on Measures of Dispersion. These measures tell us how spread out or varied the data is. A small dispersion indicates that the data points are clustered closely around the mean, while a large dispersion indicates that the data is more spread out.
Range: The range is the simplest measure of dispersion. It is calculated by subtracting the smallest value from the largest value in the dataset. Range = Maximum value - Minimum value
Example: Consider the ages of learners in a Grade 11 class: 16, 17, 15, 18, 17, 16,
1
7. The range is 18 - 15 =
3. A range of 3 years tells us the age spread in the class. Why? The range is quick to calculate, but it only considers two values and is heavily influenced by outliers. Interquartile Range (IQR) and Semi-Interquartile Range: Quartiles divide a dataset into four equal parts.
Q1 (First Quartile):* The value below which 25% of the data falls.
Q2 (Second Quartile):* The median (50% of the data falls below it).
Q3 (Third Quartile):* The value below which 75% of the data falls. The Interquartile Range (IQR) is the difference between the third quartile (Q3) and the first quartile (Q1). IQR = Q3 - Q1 The Semi-Interquartile Range is half of the interquartile range. Semi-IQR = IQR / 2 = (Q3 - Q1) / 2
Example: Consider the following data set representing test scores: 40, 50, 55, 60, 65, 70, 75, 80, 85,
9
0. Order the data: 40, 50, 55, 60, 65, 70, 75, 80, 85,
9
0. Q2 (Median): (65 + 70)/2 = 67.5 Q1: The median of the lower half (40, 50, 55, 60, 65) is
5
5. Q3: The median of the upper half (70, 75, 80, 85, 90) is
8
0. IQR = 80 - 55 = 25 Semi-IQR = 25 / 2 = 12.5 Why? The IQR is less sensitive to outliers than the range because it focuses on the middle 50% of the data. The semi-IQR provides a more normalized value.
Variance and Standard Deviation: Variance and standard deviation are the most common and useful measures of dispersion.
Variance: The average of the squared differences from the mean. Population Variance (σ²):* σ² = Σ(xᵢ - μ)² / N, where xᵢ is each data point, μ is the population mean, and N is the population size. Sample Variance (s²):* s² = Σ(xᵢ - x̄)² / (n-1), where xᵢ is each data point, x̄ is the sample mean, and n is the sample size. The (n-1) is used to give an unbiased estimate of the population variance when using a sample.
Standard Deviation: The square root of the variance. It is a measure of how much the data deviates from the mean. Population Standard Deviation (σ):* σ = √σ² Sample Standard Deviation (s):* s = √s²
Example: Consider the following sample data representing the number of hours learners spend studying per week: 5, 7, 9, 11,
1
3. Calculate the sample mean (x̄): (5 + 7 + 9 + 11 + 13) / 5 = 9 Calculate the squared differences from the mean: (5 - 9)² = 16 (7 - 9)² = 4 (9 - 9)² = 0 (11 - 9)² = 4 (13 - 9)² = 16 Calculate the sum of the squared differences: 16 + 4 + 0 + 4 + 16 = 40 Calculate the sample variance (s²): 40 / (5 - 1) = 40 / 4 = 10 Calculate the sample standard deviation (s): √10 ≈ 3.16 Interpretation: The standard deviation of approximately 3.16 hours indicates that, on average, the number of study hours deviates from the mean (9 hours) by about 3.16 hours. Why? Standard deviation is a powerful measure because it considers all data points and gives a clear indication of the data's spread. A lower standard deviation means the data is more concentrated around the mean, while a higher standard deviation means the data is more spread out.
Outliers: Outliers are data points that are significantly different from other data points in a dataset. They can skew the results of statistical analysis. One common way to identify outliers is by using the IQR: Lower Bound: Q1 - 1.5 IQR Upper Bound: Q3 + 1.5 IQR Data points below the lower bound or above the upper bound are considered outliers.
Example: Using the test scores example from before (40, 50, 55, 60, 65, 70, 75, 80, 85, 90) where Q1=55, Q3=80 and IQR=25: Lower Bound: 55 - 1.5 25 = 55 - 37.5 = 17.5 Upper Bound: 80 + 1.5 25 = 80 + 37.5 = 117.5 The data point 40 is not an outlier by this measure because it is between the two bounds. If we had a score of, say, 10, it would be an outlier. Why? Identifying outliers is crucial to ensure the validity of your analysis. You should investigate outliers to understand why they are different and decide whether to keep them in the dataset, correct them, or remove them. Removing them should be done with careful justification. Guided Practice (With Solutions)
Question 1: The following data represents the number of learners absent from school due to illness in a week: 2, 4, 3, 5,
1. Calculate the range.
Solution: Identify the maximum and minimum values: Maximum = 5, Minimum = 1 Calculate the range: Range = Maximum - Minimum = 5 - 1 = 4 Answer: The range is
4. This simple calculation tells us the spread of absences is 4 learners.
Question 2: The following are the ages of 10 students: 15, 16, 15, 17, 18, 16, 15, 17, 16, 17.