clawd/econometrics-practice-exerc...

14 KiB
Raw Permalink Blame History

Econometrics Practice Exercises

Classical Linear Regression Model (CLRM)


Practice Exercise 1: Hypothesis Testing with t-statistics

Problem Statement

You are analyzing the relationship between years of education and hourly wages using a simple linear regression model. A researcher collected data from 45 randomly selected workers and estimated the following regression equation:

\text{Wage}_i = \beta_0 + \beta_1 \text{Education}_i + u_i

Estimated Results:

Coefficient Estimate Standard Error
\hat{\beta}_0 (Intercept) 3.25 1.84
\hat{\beta}_1 (Education) 1.78 0.42
R^2 0.62 -
Sample size ($n$) 45 -

Additional Information:

  • Wage is measured in dollars per hour
  • Education is measured in years of schooling completed
  • The classical assumptions of the CLRM hold (homoskedasticity, no autocorrelation, normality of errors)

Questions

Part A: Two-Tailed Test for Slope Coefficient

Test whether education has a statistically significant effect on wages at the 5% significance level.

  1. State the null and alternative hypotheses.
  2. Calculate the t-statistic.
  3. Determine the critical value(s).
  4. State your conclusion in statistical terms.
  5. Interpret your conclusion in the context of the wage-education relationship.

Calculation Space:

H₀: ________________________________________________
H₁: ________________________________________________

t-statistic formula: t = (β̂₁ - β₁₀) / SE(β̂₁)

t = ________________________________________________
t = ________________________________________________
t = ________________________________________________

degrees of freedom = _______________________________

critical values (α = 0.05, two-tailed): ____________

Decision: __________________________________________

Interpretation: ____________________________________
____________________________________________________

Part B: One-Tailed Test for Slope Coefficient

Test whether each additional year of education increases wages by more than $1.50 per hour at the 1% significance level.

  1. State the null and alternative hypotheses.
  2. Calculate the t-statistic.
  3. Determine the critical value.
  4. State your conclusion.

Calculation Space:

H₀: ________________________________________________
H₁: ________________________________________________

t = ________________________________________________
t = ________________________________________________

critical value (α = 0.01, one-tailed): _____________

Decision: __________________________________________

Interpretation: ____________________________________
____________________________________________________

Part C: Test for Intercept

Test whether the intercept is significantly different from zero at the 10% significance level.

  1. State the hypotheses.
  2. Calculate the t-statistic.
  3. Make your decision and interpret.

Calculation Space:

H₀: ________________________________________________
H₁: ________________________________________________

t = ________________________________________________
t = ________________________________________________

degrees of freedom = _______________________________

critical values (α = 0.10, two-tailed): ____________

Decision: __________________________________________

Interpretation: ____________________________________
____________________________________________________

Part D: Economic Interpretation

Explain what the coefficient \hat{\beta}_1 = 1.78 means in practical terms. If someone completes an additional 4 years of college education, what would this model predict as their wage increase, assuming all else equal?


Practice Exercise 2: Confidence Intervals and Joint Hypothesis Testing

Problem Statement

A regional transportation authority wants to understand factors affecting monthly public transit ridership across 35 cities. They estimate the following multiple regression model:

\text{Ridership}_i = \beta_0 + \beta_1 \text{Fare}_i + \beta_2 \text{Income}_i + \beta_3 \text{PopDensity}_i + u_i

Where:

  • Ridership: Monthly ridership per 1,000 residents (number of trips)
  • Fare: Average one-way fare in dollars
  • Income: Median household income in thousands of dollars
  • PopDensity: Population density (thousands of people per square km)

Estimated Results:

Variable Coefficient Standard Error
Intercept ($\hat{\beta}_0$) 48.6 12.3
Fare ($\hat{\beta}_1$) -4.20 1.15
Income ($\hat{\beta}_2$) 0.85 0.32
PopDensity ($\hat{\beta}_3$) 3.40 1.08
Sample size ($n$) 35 -
R^2 0.71 -
Adjusted R^2 0.68 -

Questions

Part A: 95% Confidence Interval for Fare Coefficient

Construct and interpret a 95% confidence interval for \beta_1 (the effect of fare on ridership).

Calculation Space:

Confidence interval formula: β̂₁ ± t(α/2, df) × SE(β̂₁)

degrees of freedom = n - k - 1 = ____________________
                     = ____________________

t-critical for 95% CI: ____________________________

Margin of error = __________________________________
                 = __________________________________

Lower bound = ______________________________________
Upper bound = ______________________________________

95% CI for β₁: [ _______ , _______ ]

Interpretation: What does this confidence interval tell us about the relationship between fares and ridership?

Part B: Hypothesis Test Using Confidence Interval

Using the confidence interval from Part A, test H₀: β₁ = -2.5 vs. H₁: β₁ ≠ -2.5 at the 5% significance level.

Decision Rule: Does -2.5 fall inside or outside the confidence interval?

Conclusion: _______________________________________________

Part C: 90% Confidence Interval for Income Coefficient

Construct a 90% confidence interval for \beta_2 and interpret its meaning.

Calculation Space:

t-critical for 90% CI: ____________________________

Margin of error = __________________________________

90% CI for β₂: [ _______ , _______ ]

Interpretation: What does this tell us about the relationship between income and transit ridership?

Part D: Testing a Specific Hypothesis

Test whether population density has a positive effect on ridership at the 5% significance level.

  1. State the hypotheses.
  2. Calculate the t-statistic.
  3. Determine the p-value range using the t-distribution table.
  4. Make your decision and interpret.

Calculation Space:

H₀: ________________________________________________
H₁: ________________________________________________

t-statistic = _______________________________________
            = _______________________________________
            = _______________________________________

One-tailed critical value (α = 0.05): ______________

t-statistic > critical value? _______________________

p-value is: (circle one)
    p < 0.01    0.01 < p < 0.025    0.025 < p < 0.05
    0.05 < p < 0.10    p > 0.10

Decision: __________________________________________

Interpretation: ____________________________________
____________________________________________________

Part E: Joint Interpretation

Suppose a city is considering two policies:

  1. Policy X: Reduce fare by $0.50
  2. Policy Y: Increase population density by 0.2 (through zoning changes)

Based on your regression results, calculate the expected change in ridership per 1,000 residents for each policy. Which policy would be predicted to have a larger impact on ridership?

Calculation Space:

Policy X (Fare reduction):
Expected ΔRidership = ______________________________
                    = ______________________________

Policy Y (Density increase):
Expected ΔRidership = ______________________________
                    = ______________________________

Larger predicted impact: ____________________________

ANSWER KEY


Exercise 1 Answers

Part A: Two-Tailed Test for Slope

  1. Hypotheses:

    • H₀: β₁ = 0 (Education has no effect on wages)
    • H₁: β₁ ≠ 0 (Education has an effect on wages)
  2. t-statistic:

    t = \frac{1.78 - 0}{0.42} = 4.238
  3. Critical values:

    • df = 45 - 2 = 43
    • t-critical (two-tailed, α=0.05) = ±2.017
  4. Decision: Reject H₀ because |4.238| > 2.017

  5. Conclusion: There is statistically significant evidence at the 5% level that education affects wages. The p-value is approximately 0.0001 (much less than 0.05).


Part B: One-Tailed Test

  1. Hypotheses:

    • H₀: β₁ ≤ 1.50
    • H₁: β₁ > 1.50
  2. t-statistic:

    t = \frac{1.78 - 1.50}{0.42} = \frac{0.28}{0.42} = 0.667
  3. Critical value:

    • t-critical (one-tailed, α=0.01, df=43) = 2.416
  4. Decision: Fail to reject H₀ because 0.667 < 2.416

  5. Conclusion: At the 1% significance level, we do NOT have sufficient evidence to conclude that each year of education increases wages by more than $1.50.


Part C: Test for Intercept

  1. Hypotheses:

    • H₀: β₀ = 0
    • H₁: β₀ ≠ 0
  2. t-statistic:

    t = \frac{3.25 - 0}{1.84} = 1.766
  3. Critical values:

    • t-critical (two-tailed, α=0.10, df=43) = ±1.681
  4. Decision: Reject H₀ because |1.766| > 1.681

  5. Conclusion: The intercept is statistically significant at the 10% level, suggesting that even with zero education, predicted wages differ significantly from zero. (Note: This may not be economically meaningful—workers with zero education would still earn something.)


Part D: Economic Interpretation

  • β̂₁ = 1.78 means: Each additional year of education is associated with an increase of $1.78 per hour in wages, holding all else constant.

  • For 4 years of college:

    • Predicted wage increase = 4 × $1.78 = $7.12 per hour
    • If working 2,000 hours/year, this translates to approximately $14,240 additional annual income

Exercise 2 Answers

Part A: 95% Confidence Interval for Fare Coefficient

  • df = 35 - 3 - 1 = 31 (k = 3 regressors)

  • t-critical (two-tailed, α=0.05, df=31) = 2.040

  • Margin of error = 2.040 × 1.15 = 2.346

  • Lower bound = -4.20 - 2.346 = -6.546

  • Upper bound = -4.20 + 2.346 = -1.854

95% CI for β₁: [ -6.55 , -1.85 ]

Interpretation: We are 95% confident that a $1 increase in fare is associated with a decrease in ridership of between 1.85 and 6.55 trips per 1,000 residents per month. Since the entire interval is negative, there is strong evidence of an inverse relationship.


Part B: Hypothesis Test Using Confidence Interval

H₀: β₁ = -2.5 vs. H₁: β₁ ≠ -2.5

  • Decision: Since -2.5 falls WITHIN the 95% CI [-6.55, -1.85], we fail to reject H₀

  • Conclusion: At the 5% significance level, we do not have sufficient evidence to reject the claim that the true effect of fare on ridership is -2.5 trips per dollar increase.


Part C: 90% Confidence Interval for Income Coefficient

  • t-critical (two-tailed, α=0.10, df=31) = 1.696

  • Margin of error = 1.696 × 0.32 = 0.543

  • Lower bound = 0.85 - 0.543 = 0.307

  • Upper bound = 0.85 + 0.543 = 1.393

90% CI for β₂: [ 0.31 , 1.39 ]

Interpretation: We are 90% confident that a $1,000 increase in median household income is associated with an increase in transit ridership of between 0.31 and 1.39 trips per 1,000 residents per month. The positive relationship suggests higher-income cities use transit more (perhaps due to downtown employment).


Part D: Testing Population Density Effect

  1. Hypotheses:

    • H₀: β₃ ≤ 0 (Population density has no positive effect)
    • H₁: β₃ > 0 (Population density has a positive effect)
  2. t-statistic:

    t = \frac{3.40 - 0}{1.08} = 3.148
  3. Critical value:

    • t-critical (one-tailed, α=0.05, df=31) = 1.696
  4. Decision: Reject H₀ because 3.148 > 1.696

  5. p-value range: p < 0.01 (actually p ≈ 0.002)

  6. Conclusion: There is strong statistical evidence that higher population density increases transit ridership. Cities with greater density have significantly more transit usage per capita.


Part E: Policy Comparison

Policy X (Fare reduction of $0.50):

\Delta \text{Ridership} = (-4.20) \times (-0.50) = +2.10 \text{ trips per 1,000 residents}

Policy Y (Density increase of 0.2):

\Delta \text{Ridership} = 3.40 \times 0.2 = +0.68 \text{ trips per 1,000 residents}

Larger predicted impact: Policy X (fare reduction)

The fare reduction is predicted to increase ridership by about 3 times more than the density increase, based on these coefficient estimates.


Common Mistakes to Avoid

  1. Degrees of freedom: Remember df = n - k - 1 for multiple regression (where k = number of slope coefficients). For simple regression, df = n - 2.

  2. One-tailed vs two-tailed: Always check whether the alternative hypothesis uses ≠ (two-tailed) or < / > (one-tailed). This affects your critical value.

  3. Sign interpretation: When interpreting coefficients, always explain both the magnitude AND the direction (positive/negative).

  4. Confidence interval for hypothesis testing: If the hypothesized value falls within the (1-α)% confidence interval, you fail to reject H₀ at significance level α.

  5. Practical vs statistical significance: A coefficient can be statistically significant (large t-statistic) but economically small, or vice versa. Always consider both!


End of Practice Exercises