Lesson Notes By Weeks and Term v5 - Grade 12

Statistics: regression and correlation – Week 4 focus

Download the Lessonotes Mobile South Africa app for faster lesson access on Android and iPhone.

Get it on Google Play

Get it on the Apple App Store

Subject: Mathematics

Class: Grade 12

Term: 3rd Term

Week: 4

Theme: General lesson support

Lesson Video

This page supports the lesson note with a companion video and a short classroom-ready summary.

For class groups and homework, share this lesson page so learners also get the summary, objectives, and full lesson context.

Performance objectives

Calculate and interpret the correlation coefficient (r).
Determine the equation of the least-squares regression line (y = a + bx).
Use the regression line to make predictions.
Differentiate between correlation and causation.
Represent data using scatter plots.

Lesson summary

This week, we delve into the fascinating world of regression and correlation, critical tools in statistics used to understand and model relationships between two variables. Understanding these concepts is vital because it allows us to make predictions based on observed data. In South Africa, we can use regression and correlation to analyze everything from the relationship between household income and access to education, to the link between fertilizer use and crop yields. These analytical skills are crucial for informed decision-making in various fields.

Lesson notes

Correlation: Correlation measures the strength and direction of a linear relationship between two variables. The correlation coefficient, denoted by r, is a number between -1 and +1. r = +1: Perfect positive correlation. As one variable increases, the other increases proportionally. r = -1: Perfect negative correlation. As one variable increases, the other decreases proportionally. r = 0: No linear correlation. The variables are not linearly related. 0 < r < 1: Positive correlation. -1 < r < 0: Negative correlation. The closer r is to +1 or -1, the stronger the linear relationship. Calculating the Correlation Coefficient (r): The formula for the Pearson product-moment correlation coefficient is: r = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / √[Σ(xᵢ - x̄)² Σ(yᵢ - ȳ)²] Where: xᵢ and yᵢ are the individual data points. x̄ and ȳ are the means of the x and y values, respectively. Σ represents the sum.

Example 1: A study investigates the relationship between the number of hours students spend studying per week (x) and their final exam scores (y).

The data is as follows: | Student | Hours Studied (x) | Exam Score (y) | |---|---|---| | A | 5 | 60 | | B | 10 | 75 | | C | 15 | 85 | | D | 20 | 90 | | E | 25 | 95 | Calculate the means: x̄ = (5+10+15+20+25)/5 = 15; ȳ = (60+75+85+90+95)/5 = 81 Calculate (xᵢ - x̄)(yᵢ - ȳ), (xᵢ - x̄)², and (yᵢ - ȳ)² for each student: | Student | xᵢ - x̄ | yᵢ - ȳ | (xᵢ - x̄)(yᵢ - ȳ) | (xᵢ - x̄)² | (yᵢ - ȳ)² | |---|---|---|---|---|---| | A | -10 | -21 | 210 | 100 | 441 | | B | -5 | -6 | 30 | 25 | 36 | | C | 0 | 4 | 0 | 0 | 16 | | D | 5 | 9 | 45 | 25 | 81 | | E | 10 | 14 | 140 | 100 | 196 | Sum the columns: Σ(xᵢ - x̄)(yᵢ - ȳ) = 425; Σ(xᵢ - x̄)² = 250; Σ(yᵢ - ȳ)² = 770 Calculate r: r = 425 / √(250 * 770) = 425 / √192500 ≈ 425 / 438.75 ≈ 0.97 Interpretation: The correlation coefficient is approximately 0.97, indicating a strong positive correlation between hours studied and exam scores.

Regression: Regression analysis involves finding the equation of a line that best fits the data points in a scatter plot. This line is called the least-squares regression line.

The equation of the line is: y = a + bx Where: y is the dependent variable (the variable being predicted). x is the independent variable (the variable used to make the prediction). a is the y-intercept (the value of y when x = 0). b is the slope (the change in y for every one unit change in x). Calculating the Equation of the Regression Line: b = Σ[(xᵢ - x̄)(yᵢ - ȳ)] / Σ(xᵢ - x̄)² = Cov(x,y) / Var(x) a = ȳ - b x̄ Where: Cov(x,y) represents the covariance of x and y Var(x) represents the variance of x Example 2: Using the data from Example 1: We already calculated: Σ[(xᵢ - x̄)(yᵢ - ȳ)] = 425 and Σ(xᵢ - x̄)² = 250 Calculate b: b = 425 / 250 = 1.7 Calculate a: a = 81 - (1.7 * 15) = 81 - 25.5 = 55.5 Therefore, the equation of the regression line is: y = 55.5 + 1.7x Interpretation: Slope (b = 1.7): For every additional hour of studying, the exam score is predicted to increase by 1.7 points. Y-intercept (a = 55.5): A student who studies 0 hours is predicted to score 55.5 on the exam. (This might not be realistic in practice but is the mathematical interpretation.) Making Predictions (Interpolation and Extrapolation): We can use the regression equation to predict the value of y for a given value of x.

Interpolation: Making predictions within the range of the original x values.

Extrapolation: Making predictions outside the range of the original x values. Extrapolation should be done with caution, as the relationship may not hold true outside the observed range.

Example 3: Using the regression equation from Example 2 (y = 55.5 + 1.7x): Predict the exam score for a student who studies 12 hours (interpolation): y = 55.5 + (1.7 12) = 55.5 + 20.4 = 75.9 Predict the exam score for a student who studies 40 hours (extrapolation): y = 55.5 + (1.7 40) = 55.5 + 68 = 123.5 While mathematically correct, the prediction of 123.5 is unrealistic as the exam score cannot exceed

1

0

0. This highlights the dangers of extrapolation. Correlation vs.

Causation: It is crucial to understand that correlation does not imply causation. Just because two variables are correlated does not mean that one variable causes the other. There may be other factors involved, or the relationship may be coincidental.

Example: There might be a correlation between ice cream sales and crime rates.

However, it's unlikely that eating ice cream causes crime. A more likely explanation is that both ice cream sales and crime rates increase during warmer months. Guided Practice (With Solutions)

Question 1: The following data represents the number of unemployed individuals (in thousands) and the number of job vacancies (in thousands) in South Africa over a period of six months. | Month | Unemployed (x) | Vacancies (y) | |---|---|---| | 1 | 2500 | 50 | | 2 | 2450 | 55 | | 3 | 2400 | 60 | | 4 | 2350 | 65 | | 5 | 2300 | 70 | | 6 | 2250 | 75 | Calculate the correlation coefficient (r) and interpret its meaning.