# Formative Test Exercise — New Scenario
## Practice for Exam: t-Tests, CIs, Hypothesis Testing

---

# Exercise 1: The Retail Store Manager's Dilemma

## Scenario

Emma is a regional manager for a retail clothing chain with 40 stores across the Netherlands. She suspects that **store location characteristics** affect monthly sales revenue. Emma hires a data analyst to investigate.

The analyst collects data from all 40 stores and estimates the following regression:

$$
\text{Sales}_i = \beta_0 + \beta_1 \text{FootTraffic}_i + \beta_2 \text{Competitors}_i + \beta_3 \text{ParkingSpots}_i + \mu_i
$$

Where:
- **Sales**: Monthly revenue (in thousands of euros)
- **FootTraffic**: Estimated daily pedestrian count near the store (in hundreds)
- **Competitors**: Number of competing clothing stores within 500m radius
- **ParkingSpots**: Number of parking spots within 200m of the store

---

## Regression Output

| Variable | Coefficient | Standard Error | t-statistic | p-value |
|----------|-------------|----------------|-------------|---------|
| Intercept | 12.5 | 8.3 | 1.51 | 0.140 |
| FootTraffic | **2.8** | **0.95** | 2.95 | 0.006 |
| Competitors | -1.4 | 0.82 | -1.71 | 0.096 |
| ParkingSpots | 0.6 | 0.38 | 1.58 | 0.124 |

**Model statistics:** $n = 40$, $R^2 = 0.64$

Emma knows that all CLRM assumptions (A1–A5) hold for this data.

---

## Questions

### Question 1 (2 points)
The analyst tells Emma: *"Foot traffic has a statistically significant effect on sales at the 5% level."*

**a)** Verify this claim by performing a hypothesis test. Show:
- The null and alternative hypotheses
- The calculation of the t-statistic (verify it matches the table)
- Your conclusion using the critical value approach

**b)** What is the economic interpretation of $\hat{\beta}_1 = 2.8$? Be specific about units.

---

### Question 2 (2 points)
Emma asks: *"But does 100 extra pedestrians per day really matter? Could the effect be as small as 1.5 thousand euros?"*

**a)** Construct a 95% confidence interval for the FootTraffic coefficient.

**b)** Use your confidence interval to test whether $\beta_1 = 1.5$ is plausible. Show your reasoning.

**c)** Interpret the confidence interval in language Emma would understand.

---

### Question 3 (2 points)
The marketing director claims: *"Every competitor nearby costs us exactly €1,000 in monthly revenue."*

**a)** Translate this claim into a hypothesis test (state $H_0$ and $H_1$).

**b)** Test this claim at the 10% significance level. Show your work.

**c)** Based on your result, can Emma confidently tell the marketing director they're wrong?

---

### Question 4 (2 points)

**a)** Calculate the predicted monthly sales for a store with:
- FootTraffic = 50 (i.e., 5,000 pedestrians/day)
- Competitors = 3
- ParkingSpots = 20

**b)** Emma wants to open a new store. She can choose between:
- **Location A**: FootTraffic = 60, Competitors = 5, Parking = 15
- **Location B**: FootTraffic = 40, Competitors = 2, Parking = 30

Which location is predicted to generate higher monthly sales? Show your calculation.

---

### Question 5 (2 points)
The analyst mentions that $R^2 = 0.64$.

**a)** Explain what this means in the context of Emma's stores.

**b)** Does a high $R^2$ mean the model is "correct" or that we have proven causation? Explain why or why not.

---

# Exercise 2: The Restaurant Owner's Question

## Scenario

David owns 32 independent restaurants across Belgium. He wonders whether **customer ratings on Google Maps** affect monthly profit. He suspects that higher-rated restaurants earn more, but he's unsure if the effect is meaningful.

David's accountant runs a regression:

$$
\text{Profit}_i = \beta_0 + \beta_1 \text{Rating}_i + \beta_2 \text{Seats}_i + \mu_i
$$

Where:
- **Profit**: Monthly profit (in thousands of euros)
- **Rating**: Average Google rating (scale 1.0–5.0)
- **Seats**: Number of seats in the restaurant

---

## Regression Output

| Variable | Coefficient | Std. Error |
|----------|-------------|------------|
| Intercept | -8.5 | 12.3 |
| Rating | **4.2** | **1.8** |
| Seats | 0.15 | 0.08 |

**Model statistics:** $n = 32$, $R^2 = 0.51$

---

## Questions

### Question 1 (3 points)

**a)** Test whether Rating has a significant effect on Profit at the 5% level using a two-tailed test. Show all steps.

**b)** David's friend claims: *"Each extra star on Google Maps brings at least €5,000 extra profit per month."* Test whether the data supports this claim using an appropriate hypothesis test.

**c)** Calculate a 90% confidence interval for the Rating coefficient and interpret it.

---

### Question 2 (2 points)

**a)** Calculate the t-statistic for Seats and determine if it is significant at the 10% level.

**b)** David thinks about removing "Seats" from the model. Based on your statistical test, what would you advise? Justify your answer.

---

### Question 3 (2 points)

David has two restaurants:
- **Restaurant X**: Rating = 4.2, Seats = 80
- **Restaurant Y**: Rating = 4.8, Seats = 60

**a)** Calculate the predicted profit for each restaurant.

**b)** Which restaurant is predicted to be more profitable? By how much?

---

### Question 4 (3 points)

**Critical thinking:** David notices that restaurants with higher ratings tend to have more seats (correlation = 0.6). 

**a)** Why might this correlation be a problem for interpreting the coefficient on Rating?

**b)** If you had to advise David on improving his model, what additional variable might you suggest collecting data on? Explain why.

---

# Answer Key

<details>
<summary>Click to reveal answers (try first!)</summary>

## Exercise 1 Answers

### Question 1

**a)** 
- $H_0: \beta_1 = 0$ vs $H_1: \beta_1 \neq 0$
- $t = \frac{2.8 - 0}{0.95} = 2.947 \approx 2.95$ ✓ (matches table)
- $df = 40 - 3 - 1 = 36$, critical value (two-tailed, 5%) ≈ 2.028
- $|2.95| > 2.028$ → **Reject $H_0$**
- **Conclusion:** Foot traffic has a statistically significant effect on sales at the 5% level.

**b)** Each additional 100 pedestrians per day is associated with €2,800 higher monthly sales, holding competitors and parking constant.

---

### Question 2

**a)** 
- $df = 36$, $t_{0.025} = 2.028$
- Margin = $2.028 \times 0.95 = 1.927$
- 95% CI: $[2.8 - 1.93, 2.8 + 1.93] = [0.87, 4.73]$

**b)** Since 1.5 lies **outside** the CI [0.87, 4.73], we can reject $H_0: \beta_1 = 1.5$ at 5% level. An effect as small as 1.5 is not plausible.

**c)** *"Emma, we're 95% confident that each 100 extra daily pedestrians brings between €870 and €4,730 extra monthly revenue. The effect is definitely positive, but could range from modest to quite substantial."*

---

### Question 3

**a)** 
- $H_0: \beta_2 = -1.0$ (effect is exactly -€1,000)
- $H_1: \beta_2 \neq -1.0$

**b)** 
- $t = \frac{-1.4 - (-1.0)}{0.82} = \frac{-0.4}{0.82} = -0.488$
- $|t| = 0.488$, critical (10%, two-tailed) = 1.69
- $0.488 < 1.69$ → **Fail to reject $H_0$**

**c)** **No!** The data does NOT provide sufficient evidence to reject the marketing director's claim. Emma cannot confidently say they're wrong — an effect of -€1,000 per competitor is statistically plausible.

---

### Question 4

**a)** 
$\hat{Sales} = 12.5 + 2.8(50) + (-1.4)(3) + 0.6(20)$
$= 12.5 + 140 - 4.2 + 12 = 160.3$

**Predicted monthly sales: €160,300**

**b)**
- Location A: $12.5 + 2.8(60) + (-1.4)(5) + 0.6(15) = 12.5 + 168 - 7 + 9 = 182.5$
- Location B: $12.5 + 2.8(40) + (-1.4)(2) + 0.6(30) = 12.5 + 112 - 2.8 + 18 = 139.7$

**Location A is predicted to be more profitable** by €42,800/month.

---

### Question 5

**a)** 64% of the variation in monthly sales across stores is explained by foot traffic, competitors, and parking spots.

**b)** **No.** High $R^2$ means good fit, not causation. The model shows association, not proof that changing these variables would cause sales to change. There could be omitted variables, reverse causality, or other issues.

---

## Exercise 2 Answers

### Question 1

**a)**
- $H_0: \beta_1 = 0$ vs $H_1: \beta_1 \neq 0$
- $t = 4.2 / 1.8 = 2.333$
- $df = 32 - 2 - 1 = 29$, critical (5%) = 2.045
- $2.333 > 2.045$ → **Significant** ✓

**b)** Friend claims $\beta_1 \geq 5$ (at least €5,000)
- $H_0: \beta_1 \geq 5$ vs $H_1: \beta_1 < 5$
- $t = \frac{4.2 - 5}{1.8} = \frac{-0.8}{1.8} = -0.444$
- One-tailed critical (5%, left tail) = -1.699
- $-0.444 > -1.699$ → **Fail to reject $H_0$**
- Data does NOT support rejecting the friend's claim — an effect of €5,000+ is still plausible.

**c)** 90% CI:
- $t_{0.05, 29} = 1.699$
- Margin = $1.699 \times 1.8 = 3.058$
- CI: $[4.2 - 3.06, 4.2 + 3.06] = [1.14, 7.26]$
- *"We're 90% confident each rating point brings €1,140 to €7,260 extra monthly profit."*

---

### Question 2

**a)**
- $t = 0.15 / 0.08 = 1.875$
- $df = 29$, critical (10%, two-tailed) = 1.699
- $1.875 > 1.699$ → **Significant at 10% level** ✓

**b)** **Keep Seats.** Despite the smaller coefficient, it's statistically significant at 10% and practically meaningful (each seat adds €150/month). Removing it would lose explanatory power.

---

### Question 3

**a)**
- Restaurant X: $\hat{Profit} = -8.5 + 4.2(4.2) + 0.15(80) = -8.5 + 17.64 + 12 = 21.14$
  - **€21,140 predicted profit**
- Restaurant Y: $\hat{Profit} = -8.5 + 4.2(4.8) + 0.15(60) = -8.5 + 20.16 + 9 = 20.66$
  - **€20,660 predicted profit**

**b)** **Restaurant X is more profitable** by €480/month, despite the lower rating, because it has more seats.

---

### Question 4

**a)** This is **multicollinearity**. When Rating and Seats move together, it's hard to separate their individual effects. The coefficient on Rating might partly capture the effect of having more seats, and vice versa. Standard errors may be inflated.

**b)** Suggest collecting:
- **Price level** (expensive restaurants might rate higher AND have fewer seats)
- **Location/neighborhood** (upscale areas might have both higher ratings and larger spaces)
- **Staff quality** or **chef experience** (could drive ratings independently of size)

Any variable that affects Rating but not Seats, or vice versa, would help disentangle the effects.

</details>

---

**Total points: 20** | **Time recommended: 45–60 minutes**

Good luck practicing! 📝