clawd/econometrics-practice-exerc...

# Econometrics Practice Exercises
## Classical Linear Regression Model (CLRM)

---

# Practice Exercise 1: Hypothesis Testing with t-statistics

## Problem Statement

You are analyzing the relationship between years of education and hourly wages using a simple linear regression model. A researcher collected data from 45 randomly selected workers and estimated the following regression equation:

$$\text{Wage}_i = \beta_0 + \beta_1 \text{Education}_i + u_i$$

### Estimated Results:
| Coefficient | Estimate | Standard Error |
|-------------|----------|----------------|
| $\hat{\beta}_0$ (Intercept) | 3.25 | 1.84 |
| $\hat{\beta}_1$ (Education) | 1.78 | 0.42 |
| $R^2$ | 0.62 | - |
| Sample size ($n$) | 45 | - |

### Additional Information:
- Wage is measured in dollars per hour
- Education is measured in years of schooling completed
- The classical assumptions of the CLRM hold (homoskedasticity, no autocorrelation, normality of errors)

---

## Questions

### Part A: Two-Tailed Test for Slope Coefficient
**Test whether education has a statistically significant effect on wages at the 5% significance level.**

1. State the null and alternative hypotheses.
2. Calculate the t-statistic.
3. Determine the critical value(s).
4. State your conclusion in statistical terms.
5. Interpret your conclusion in the context of the wage-education relationship.

**Calculation Space:**
```
H₀: ________________________________________________
H₁: ________________________________________________

t-statistic formula: t = (β̂₁ - β₁₀) / SE(β̂₁)

t = ________________________________________________
t = ________________________________________________
t = ________________________________________________

degrees of freedom = _______________________________

critical values (α = 0.05, two-tailed): ____________

Decision: __________________________________________

Interpretation: ____________________________________
____________________________________________________
```

### Part B: One-Tailed Test for Slope Coefficient
**Test whether each additional year of education increases wages by more than $1.50 per hour at the 1% significance level.**

1. State the null and alternative hypotheses.
2. Calculate the t-statistic.
3. Determine the critical value.
4. State your conclusion.

**Calculation Space:**
```
H₀: ________________________________________________
H₁: ________________________________________________

t = ________________________________________________
t = ________________________________________________

critical value (α = 0.01, one-tailed): _____________

Decision: __________________________________________

Interpretation: ____________________________________
____________________________________________________
```

### Part C: Test for Intercept
**Test whether the intercept is significantly different from zero at the 10% significance level.**

1. State the hypotheses.
2. Calculate the t-statistic.
3. Make your decision and interpret.

**Calculation Space:**
```
H₀: ________________________________________________
H₁: ________________________________________________

t = ________________________________________________
t = ________________________________________________

degrees of freedom = _______________________________

critical values (α = 0.10, two-tailed): ____________

Decision: __________________________________________

Interpretation: ____________________________________
____________________________________________________
```

### Part D: Economic Interpretation
Explain what the coefficient $\hat{\beta}_1 = 1.78$ means in practical terms. If someone completes an additional 4 years of college education, what would this model predict as their wage increase, assuming all else equal?

---

# Practice Exercise 2: Confidence Intervals and Joint Hypothesis Testing

## Problem Statement

A regional transportation authority wants to understand factors affecting monthly public transit ridership across 35 cities. They estimate the following multiple regression model:

$$\text{Ridership}_i = \beta_0 + \beta_1 \text{Fare}_i + \beta_2 \text{Income}_i + \beta_3 \text{PopDensity}_i + u_i$$

Where:
- **Ridership**: Monthly ridership per 1,000 residents (number of trips)
- **Fare**: Average one-way fare in dollars
- **Income**: Median household income in thousands of dollars
- **PopDensity**: Population density (thousands of people per square km)

### Estimated Results:
| Variable | Coefficient | Standard Error |
|----------|-------------|----------------|
| Intercept ($\hat{\beta}_0$) | 48.6 | 12.3 |
| Fare ($\hat{\beta}_1$) | -4.20 | 1.15 |
| Income ($\hat{\beta}_2$) | 0.85 | 0.32 |
| PopDensity ($\hat{\beta}_3$) | 3.40 | 1.08 |
| Sample size ($n$) | 35 | - |
| $R^2$ | 0.71 | - |
| Adjusted $R^2$ | 0.68 | - |

---

## Questions

### Part A: 95% Confidence Interval for Fare Coefficient
**Construct and interpret a 95% confidence interval for $\beta_1$ (the effect of fare on ridership).**

**Calculation Space:**
```
Confidence interval formula: β̂₁ ± t(α/2, df) × SE(β̂₁)

degrees of freedom = n - k - 1 = ____________________
                     = ____________________

t-critical for 95% CI: ____________________________

Margin of error = __________________________________
                 = __________________________________

Lower bound = ______________________________________
Upper bound = ______________________________________

95% CI for β₁: [ _______ , _______ ]
```

**Interpretation:** What does this confidence interval tell us about the relationship between fares and ridership?

### Part B: Hypothesis Test Using Confidence Interval
**Using the confidence interval from Part A, test H₀: β₁ = -2.5 vs. H₁: β₁ ≠ -2.5 at the 5% significance level.**

**Decision Rule:** Does -2.5 fall inside or outside the confidence interval?

**Conclusion:** _______________________________________________

### Part C: 90% Confidence Interval for Income Coefficient
**Construct a 90% confidence interval for $\beta_2$ and interpret its meaning.**

**Calculation Space:**
```
t-critical for 90% CI: ____________________________

Margin of error = __________________________________

90% CI for β₂: [ _______ , _______ ]
```

**Interpretation:** What does this tell us about the relationship between income and transit ridership?

### Part D: Testing a Specific Hypothesis
**Test whether population density has a positive effect on ridership at the 5% significance level.**

1. State the hypotheses.
2. Calculate the t-statistic.
3. Determine the p-value range using the t-distribution table.
4. Make your decision and interpret.

**Calculation Space:**
```
H₀: ________________________________________________
H₁: ________________________________________________

t-statistic = _______________________________________
            = _______________________________________
            = _______________________________________

One-tailed critical value (α = 0.05): ______________

t-statistic > critical value? _______________________

p-value is: (circle one)
    p < 0.01    0.01 < p < 0.025    0.025 < p < 0.05
    0.05 < p < 0.10    p > 0.10

Decision: __________________________________________

Interpretation: ____________________________________
____________________________________________________
```

### Part E: Joint Interpretation
Suppose a city is considering two policies:
1. **Policy X:** Reduce fare by $0.50
2. **Policy Y:** Increase population density by 0.2 (through zoning changes)

Based on your regression results, calculate the **expected change in ridership per 1,000 residents** for each policy. Which policy would be predicted to have a larger impact on ridership?

**Calculation Space:**
```
Policy X (Fare reduction):
Expected ΔRidership = ______________________________
                    = ______________________________

Policy Y (Density increase):
Expected ΔRidership = ______________________________
                    = ______________________________

Larger predicted impact: ____________________________
```

---

# ANSWER KEY

---

## Exercise 1 Answers

### Part A: Two-Tailed Test for Slope

1. **Hypotheses:**
   - H₀: β₁ = 0 (Education has no effect on wages)
   - H₁: β₁ ≠ 0 (Education has an effect on wages)

2. **t-statistic:**
   $$t = \frac{1.78 - 0}{0.42} = 4.238$$

3. **Critical values:**
   - df = 45 - 2 = 43
   - t-critical (two-tailed, α=0.05) = ±2.017

4. **Decision:** Reject H₀ because |4.238| > 2.017

5. **Conclusion:** There is statistically significant evidence at the 5% level that education affects wages. The p-value is approximately 0.0001 (much less than 0.05).

---

### Part B: One-Tailed Test

1. **Hypotheses:**
   - H₀: β₁ ≤ 1.50
   - H₁: β₁ > 1.50

2. **t-statistic:**
   $$t = \frac{1.78 - 1.50}{0.42} = \frac{0.28}{0.42} = 0.667$$

3. **Critical value:**
   - t-critical (one-tailed, α=0.01, df=43) = 2.416

4. **Decision:** Fail to reject H₀ because 0.667 < 2.416

5. **Conclusion:** At the 1% significance level, we do NOT have sufficient evidence to conclude that each year of education increases wages by more than $1.50.

---

### Part C: Test for Intercept

1. **Hypotheses:**
   - H₀: β₀ = 0
   - H₁: β₀ ≠ 0

2. **t-statistic:**
   $$t = \frac{3.25 - 0}{1.84} = 1.766$$

3. **Critical values:**
   - t-critical (two-tailed, α=0.10, df=43) = ±1.681

4. **Decision:** Reject H₀ because |1.766| > 1.681

5. **Conclusion:** The intercept is statistically significant at the 10% level, suggesting that even with zero education, predicted wages differ significantly from zero. (Note: This may not be economically meaningful—workers with zero education would still earn something.)

---

### Part D: Economic Interpretation

- **β̂₁ = 1.78** means: Each additional year of education is associated with an increase of **$1.78 per hour** in wages, holding all else constant.

- **For 4 years of college:**
  - Predicted wage increase = 4 × $1.78 = **$7.12 per hour**
  - If working 2,000 hours/year, this translates to approximately **$14,240 additional annual income**

---

## Exercise 2 Answers

### Part A: 95% Confidence Interval for Fare Coefficient

- **df** = 35 - 3 - 1 = **31** (k = 3 regressors)
- **t-critical** (two-tailed, α=0.05, df=31) = **2.040**

- **Margin of error** = 2.040 × 1.15 = **2.346**
- **Lower bound** = -4.20 - 2.346 = **-6.546**
- **Upper bound** = -4.20 + 2.346 = **-1.854**

**95% CI for β₁: [ -6.55 , -1.85 ]**

**Interpretation:** We are 95% confident that a $1 increase in fare is associated with a decrease in ridership of between 1.85 and 6.55 trips per 1,000 residents per month. Since the entire interval is negative, there is strong evidence of an inverse relationship.

---

### Part B: Hypothesis Test Using Confidence Interval

**H₀: β₁ = -2.5 vs. H₁: β₁ ≠ -2.5**

- **Decision:** Since -2.5 falls **WITHIN** the 95% CI [-6.55, -1.85], we **fail to reject H₀**

- **Conclusion:** At the 5% significance level, we do not have sufficient evidence to reject the claim that the true effect of fare on ridership is -2.5 trips per dollar increase.

---

### Part C: 90% Confidence Interval for Income Coefficient

- **t-critical** (two-tailed, α=0.10, df=31) = **1.696**

- **Margin of error** = 1.696 × 0.32 = **0.543**
- **Lower bound** = 0.85 - 0.543 = **0.307**
- **Upper bound** = 0.85 + 0.543 = **1.393**

**90% CI for β₂: [ 0.31 , 1.39 ]**

**Interpretation:** We are 90% confident that a $1,000 increase in median household income is associated with an increase in transit ridership of between 0.31 and 1.39 trips per 1,000 residents per month. The positive relationship suggests higher-income cities use transit more (perhaps due to downtown employment).

---

### Part D: Testing Population Density Effect

1. **Hypotheses:**
   - H₀: β₃ ≤ 0 (Population density has no positive effect)
   - H₁: β₃ > 0 (Population density has a positive effect)

2. **t-statistic:**
   $$t = \frac{3.40 - 0}{1.08} = 3.148$$

3. **Critical value:**
   - t-critical (one-tailed, α=0.05, df=31) = **1.696**

4. **Decision:** **Reject H₀** because 3.148 > 1.696

5. **p-value range:** **p < 0.01** (actually p ≈ 0.002)

6. **Conclusion:** There is strong statistical evidence that higher population density increases transit ridership. Cities with greater density have significantly more transit usage per capita.

---

### Part E: Policy Comparison

**Policy X (Fare reduction of $0.50):**
$$\Delta \text{Ridership} = (-4.20) \times (-0.50) = +2.10 \text{ trips per 1,000 residents}$$

**Policy Y (Density increase of 0.2):**
$$\Delta \text{Ridership} = 3.40 \times 0.2 = +0.68 \text{ trips per 1,000 residents}$$

**Larger predicted impact: Policy X (fare reduction)**

The fare reduction is predicted to increase ridership by about **3 times more** than the density increase, based on these coefficient estimates.

---

## Common Mistakes to Avoid

1. **Degrees of freedom:** Remember df = n - k - 1 for multiple regression (where k = number of slope coefficients). For simple regression, df = n - 2.

2. **One-tailed vs two-tailed:** Always check whether the alternative hypothesis uses ≠ (two-tailed) or < / > (one-tailed). This affects your critical value.

3. **Sign interpretation:** When interpreting coefficients, always explain both the magnitude AND the direction (positive/negative).

4. **Confidence interval for hypothesis testing:** If the hypothesized value falls within the (1-α)% confidence interval, you fail to reject H₀ at significance level α.

5. **Practical vs statistical significance:** A coefficient can be statistically significant (large t-statistic) but economically small, or vice versa. Always consider both!

---

*End of Practice Exercises*