lm3: Simple Linear Regression (Cloze with Theory, Application, Essay, and File Upload)

Exercise template with both theory and applied questions, as well as interpretation and code upload, about simple linear regression based on a randomly-generated CSV file.

Name:
lm3
Type:
Related:
Preview:

Theory: Consider a linear regression of y on x. It is usually estimated with which estimation technique (three-letter abbreviation)?

This estimator yields the best linear unbiased estimator (BLUE) under the assumptions of the Gauss-Markov theorem. Which of the following properties are required for the errors of the linear regression model under these assumptions?

Application: Using the data provided in linreg.csv estimate a linear regression of y on x. What are the estimated parameters?

Intercept:

Slope:

In terms of significance at 5% level:

Interpretation: Consider various diagnostic plots for the fitted linear regression model. Do you think the assumptions of the Gauss-Markov theorem are fulfilled? What are the consequences?

Code: Please upload your code script that reads the data, fits the regression model, extracts the quantities of interest, and generates the diagnostic plots.

Theory: Linear regression models are typically estimated by ordinary least squares (OLS). The Gauss-Markov theorem establishes certain optimality properties: Namely, if the errors have expectation zero, constant variance (homoscedastic), no autocorrelation and the regressors are exogenous and not linearly dependent, the OLS estimator is the best linear unbiased estimator (BLUE).

Application: The estimated coefficients along with their significances are reported in the summary of the fitted regression model, showing that x and y are not significantly correlated (at 5% level).


Call:
lm(formula = y ~ x, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.55258 -0.15907 -0.02757  0.15782  0.74504 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.007988   0.024256  -0.329    0.743
x           -0.031263   0.045420  -0.688    0.493

Residual standard error: 0.2425 on 98 degrees of freedom
Multiple R-squared:  0.004811,  Adjusted R-squared:  -0.005344 
F-statistic: 0.4738 on 1 and 98 DF,  p-value: 0.4929

Interpretation: Considering the visualization of the data along with the diagnostic plots suggests that the assumptions of the Gauss-Markov theorem are reasonably well fulfilled.

Code: The analysis can be replicated in R using the following code.

## data
d <- read.csv("linreg.csv")
## regression
m <- lm(y ~ x, data = d)
summary(m)
## visualization
plot(y ~ x, data = d)
abline(m)
## diagnostic plots
plot(m)

Theory: Consider a linear regression of y on x. It is usually estimated with which estimation technique (three-letter abbreviation)?

This estimator yields the best linear unbiased estimator (BLUE) under the assumptions of the Gauss-Markov theorem. Which of the following properties are required for the errors of the linear regression model under these assumptions?

Application: Using the data provided in linreg.csv estimate a linear regression of y on x. What are the estimated parameters?

Intercept:

Slope:

In terms of significance at 5% level:

Interpretation: Consider various diagnostic plots for the fitted linear regression model. Do you think the assumptions of the Gauss-Markov theorem are fulfilled? What are the consequences?

Code: Please upload your code script that reads the data, fits the regression model, extracts the quantities of interest, and generates the diagnostic plots.

Theory: Linear regression models are typically estimated by ordinary least squares (OLS). The Gauss-Markov theorem establishes certain optimality properties: Namely, if the errors have expectation zero, constant variance (homoscedastic), no autocorrelation and the regressors are exogenous and not linearly dependent, the OLS estimator is the best linear unbiased estimator (BLUE).

Application: The estimated coefficients along with their significances are reported in the summary of the fitted regression model, showing that y increases significantly with x (at 5% level).


Call:
lm(formula = y ~ x, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.71650 -0.17244 -0.00028  0.17813  0.77574 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.14599    0.02877  -5.075 1.84e-06
x            0.85960    0.04804  17.895  < 2e-16

Residual standard error: 0.287 on 98 degrees of freedom
Multiple R-squared:  0.7657,    Adjusted R-squared:  0.7633 
F-statistic: 320.2 on 1 and 98 DF,  p-value: < 2.2e-16

Interpretation: Considering the visualization of the data along with the diagnostic plots suggests that the true relationship between y and x is not linear but quadratic (and hence errors do not have zero expectation).

Code: The analysis can be replicated in R using the following code.

## data
d <- read.csv("linreg.csv")
## regression
m <- lm(y ~ x, data = d)
summary(m)
## visualization
plot(y ~ x, data = d)
abline(m)
## diagnostic plots
plot(m)

Theory: Consider a linear regression of y on x. It is usually estimated with which estimation technique (three-letter abbreviation)?

This estimator yields the best linear unbiased estimator (BLUE) under the assumptions of the Gauss-Markov theorem. Which of the following properties are required for the errors of the linear regression model under these assumptions?

Application: Using the data provided in linreg.csv estimate a linear regression of y on x. What are the estimated parameters?

Intercept:

Slope:

In terms of significance at 5% level:

Interpretation: Consider various diagnostic plots for the fitted linear regression model. Do you think the assumptions of the Gauss-Markov theorem are fulfilled? What are the consequences?

Code: Please upload your code script that reads the data, fits the regression model, extracts the quantities of interest, and generates the diagnostic plots.

Theory: Linear regression models are typically estimated by ordinary least squares (OLS). The Gauss-Markov theorem establishes certain optimality properties: Namely, if the errors have expectation zero, constant variance (homoscedastic), no autocorrelation and the regressors are exogenous and not linearly dependent, the OLS estimator is the best linear unbiased estimator (BLUE).

Application: The estimated coefficients along with their significances are reported in the summary of the fitted regression model, showing that y increases significantly with x (at 5% level).


Call:
lm(formula = y ~ x, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.17761 -0.11645 -0.02677  0.11443  0.90074 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.01586    0.02625  -0.604    0.547
x            0.63255    0.04412  14.337   <2e-16

Residual standard error: 0.262 on 98 degrees of freedom
Multiple R-squared:  0.6772,    Adjusted R-squared:  0.6739 
F-statistic: 205.5 on 1 and 98 DF,  p-value: < 2.2e-16

Interpretation: Considering the visualization of the data along with the diagnostic plots suggests that the errors are heteroscedastic with increasing variance along with the mean.

Code: The analysis can be replicated in R using the following code.

## data
d <- read.csv("linreg.csv")
## regression
m <- lm(y ~ x, data = d)
summary(m)
## visualization
plot(y ~ x, data = d)
abline(m)
## diagnostic plots
plot(m)
Description:
Cloze with theory and applied questions about linear regression. The theory part uses knowledge questions in "string" and "mchoice" format. The applied part is based on bivariate numeric data for download in a CSV file (comma-separated values) and uses two "num" and one "schoice" item. Additionally, for interpretation, there is an open-ended "essay" element and a "file" upload for the R script used by the participants. This type of extended cloze question is currently supported in QTI 2.1 (OpenOlat in particular).
Solution feedback:
Yes
Randomization:
Random numbers, data file, and graphics
Mathematical notation:
No
Verbatim R input/output:
Yes
Images:
Yes
Other supplements:
linreg.csv
Template:
Raw: (1 random version)
PDF:
lm3-Rmd-pdf
lm3-Rnw-pdf
HTML:
lm3-Rmd-html
lm3-Rnw-html

Demo code:

library("exams")

set.seed(403)
exams2html("lm3.Rmd")
set.seed(403)
exams2pdf("lm3.Rmd")

set.seed(403)
exams2html("lm3.Rnw")
set.seed(403)
exams2pdf("lm3.Rnw")