lm3: Simple Linear Regression (Cloze with Theory, Application, Essay, and File Upload)

Exercise template with both theory and applied questions, as well as interpretation and code upload, about simple linear regression based on a randomly-generated CSV file.

Name:

lm3

Type:

cloze

Related:

lm, lm2, gaussmarkov

Preview:

Theory: Consider a linear regression of y on x. It is usually estimated with which estimation technique (three-letter abbreviation)?

This estimator yields the best linear unbiased estimator (BLUE) under the assumptions of the Gauss-Markov theorem. Which of the following properties are required for the errors of the linear regression model under these assumptions?

Application: Using the data provided in linreg.csv estimate a linear regression of y on x. What are the estimated parameters?

Intercept:

Slope:

In terms of significance at 5% level:

Interpretation: Consider various diagnostic plots for the fitted linear regression model. Do you think the assumptions of the Gauss-Markov theorem are fulfilled? What are the consequences?

Code: Please upload your code script that reads the data, fits the regression model, extracts the quantities of interest, and generates the diagnostic plots.

Theory: Linear regression models are typically estimated by ordinary least squares (OLS). The Gauss-Markov theorem establishes certain optimality properties: Namely, if the errors have expectation zero, constant variance (homoscedastic), no autocorrelation and the regressors are exogenous and not linearly dependent, the OLS estimator is the best linear unbiased estimator (BLUE).

Application: The estimated coefficients along with their significances are reported in the summary of the fitted regression model, showing that x and y are not significantly correlated (at 5% level).


Call:
lm(formula = y ~ x, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.55258 -0.15907 -0.02757  0.15782  0.74504 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.007988   0.024256  -0.329    0.743
x           -0.031263   0.045420  -0.688    0.493

Residual standard error: 0.2425 on 98 degrees of freedom
Multiple R-squared:  0.004811,  Adjusted R-squared:  -0.005344 
F-statistic: 0.4738 on 1 and 98 DF,  p-value: 0.4929

Interpretation: Considering the visualization of the data along with the diagnostic plots suggests that the assumptions of the Gauss-Markov theorem are reasonably well fulfilled.

Code: The analysis can be replicated in R using the following code.

## data
d <- read.csv("linreg.csv")
## regression
m <- lm(y ~ x, data = d)
summary(m)
## visualization
plot(y ~ x, data = d)
abline(m)
## diagnostic plots
plot(m)

Theory: Consider a linear regression of y on x. It is usually estimated with which estimation technique (three-letter abbreviation)?

Application: Using the data provided in linreg.csv estimate a linear regression of y on x. What are the estimated parameters?

Intercept:

Slope:

In terms of significance at 5% level:

Interpretation: Consider various diagnostic plots for the fitted linear regression model. Do you think the assumptions of the Gauss-Markov theorem are fulfilled? What are the consequences?

Code: Please upload your code script that reads the data, fits the regression model, extracts the quantities of interest, and generates the diagnostic plots.


Call:
lm(formula = y ~ x, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.48440 -0.14690  0.00744  0.14566  0.60770 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.01374    0.02329   0.590    0.556
x            0.01099    0.03948   0.278    0.781

Residual standard error: 0.2317 on 98 degrees of freedom
Multiple R-squared:  0.0007899, Adjusted R-squared:  -0.009406 
F-statistic: 0.07748 on 1 and 98 DF,  p-value: 0.7813

Interpretation: Considering the visualization of the data along with the diagnostic plots suggests that the assumptions of the Gauss-Markov theorem are reasonably well fulfilled.

Code: The analysis can be replicated in R using the following code.

## data
d <- read.csv("linreg.csv")
## regression
m <- lm(y ~ x, data = d)
summary(m)
## visualization
plot(y ~ x, data = d)
abline(m)
## diagnostic plots
plot(m)

Theory: Consider a linear regression of y on x. It is usually estimated with which estimation technique (three-letter abbreviation)?

Application: Using the data provided in linreg.csv estimate a linear regression of y on x. What are the estimated parameters?

Intercept:

Slope:

In terms of significance at 5% level:

Interpretation: Consider various diagnostic plots for the fitted linear regression model. Do you think the assumptions of the Gauss-Markov theorem are fulfilled? What are the consequences?

Code: Please upload your code script that reads the data, fits the regression model, extracts the quantities of interest, and generates the diagnostic plots.

Application: The estimated coefficients along with their significances are reported in the summary of the fitted regression model, showing that y decreases significantly with x (at 5% level).


Call:
lm(formula = y ~ x, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.62441 -0.14064 -0.00358  0.13978  0.54380 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.00383    0.02331  -0.164     0.87
x           -0.76385    0.03945 -19.362   <2e-16

Residual standard error: 0.2315 on 98 degrees of freedom
Multiple R-squared:  0.7928,    Adjusted R-squared:  0.7907 
F-statistic: 374.9 on 1 and 98 DF,  p-value: < 2.2e-16

Interpretation: Considering the visualization of the data along with the diagnostic plots suggests that the assumptions of the Gauss-Markov theorem are reasonably well fulfilled.

Code: The analysis can be replicated in R using the following code.

## data
d <- read.csv("linreg.csv")
## regression
m <- lm(y ~ x, data = d)
summary(m)
## visualization
plot(y ~ x, data = d)
abline(m)
## diagnostic plots
plot(m)

Description:

Cloze with theory and applied questions about linear regression. The theory part uses knowledge questions in "string" and "mchoice" format. The applied part is based on bivariate numeric data for download in a CSV file (comma-separated values) and uses two "num" and one "schoice" item. Additionally, for interpretation, there is an open-ended "essay" element and a "file" upload for the R script used by the participants. This type of extended cloze question is currently supported in QTI 2.1 (OpenOlat in particular).

Solution feedback:

Yes

Randomization:

Random numbers, data file, and graphics

Mathematical notation:

Verbatim R input/output:

Yes

Images:

Yes

Other supplements:

linreg.csv

Template:

lm3.Rmd

lm3.Rnw

Raw: (1 random version)

lm3.md

lm3.tex

PDF:

HTML:

Demo code:

library("exams")

set.seed(403)
exams2html("lm3.Rmd")
set.seed(403)
exams2pdf("lm3.Rmd")

set.seed(403)
exams2html("lm3.Rnw")
set.seed(403)
exams2pdf("lm3.Rnw")

Achim Zeileis 2022-11-21 TEMPLATES
cloze regression significance slope statistics