chore: auto-commit uncommitted changes

This commit is contained in:
James 2026-03-27 06:03:39 -04:00
parent 4771188ccd
commit 43967b813f
6 changed files with 1342 additions and 5 deletions

View File

@ -0,0 +1,366 @@
# Practice Exercises — Story Format
## Voorbereiding op Tentamen Real Estate Research
---
# 🏢 Casus 1: De Twijfel van HR-manager Sophie
## Het Verhaal
Sophie werkt als HR-manager bij een middelgroot technologiebedrijf in Amsterdam. De afdeling heeft al jarenlast van een hoog verloop — gemiddeld vertrekt 22% van de medewerkers per jaar. Het management vraagt zich af: **loont het om meer te investeren in opleiding?**
Sophie analyseert data van 48 medewerkers die de afgelopen 3 jaar zijn aangenomen. Ze heeft voor iedereen geregistreerd:
- Aantal jaar opleiding na de middelbare school
- Jaarsalaris (in duizenden euro's)
- Leeftijd bij indiensttreding
- Aantal jaar werkervaring voor deze baan
Met behulp van een collega-econoom heeft Sophie een regressie gerund:
$$
\text{Salaris}_i = \beta_0 + \beta_1 \text{Opleiding}_i + \beta_2 \text{Leeftijd}_i + \beta_3 \text{Ervaring}_i + u_i
$$
## De Resultaten
| Variabele | Coefficient | Std. Error | t-waarde | p-waarde |
|-----------|-------------|------------|----------|----------|
| Intercept | 8.50 | 4.20 | 2.02 | 0.049 |
| Opleiding (jaren) | **2.15** | **0.55** | 3.91 | 0.000 |
| Leeftijd | 0.35 | 0.28 | 1.25 | 0.216 |
| Werkervaring | 0.80 | 0.42 | 1.90 | 0.063 |
- $R^2 = 0.58$
- $n = 48$
- De collega verzekert Sophie dat alle CLRM-assumpties (A1-A5) gelden
---
## De Vragen
### Deel A: De Directeur vraagt uitleg
De directeur wil weten: *"Is dit nou echt significant, of kunnen we net zo goed dobbelen?"*
**Beantwoord de volgende vragen voor de directeur:**
1. **De onderzoeksvraag:** Wat toets je precies als je vraagt of opleiding een effect heeft op salaris?
- Schrijf de nulhypothese ($H_0$) en alternatieve hypothese ($H_1$) op.
- Is dit een éénzijdige of tweezijdige toets? Leg uit waarom.
2. **De berekening:** Laat zien hoe de t-waarde van 3.91 is berekend. Laat de formule zien en vul de getallen in.
3. **Het oordeel:**
- Wat is de kritieke waarde bij $\alpha = 0.05$ (tweezijdig)?
- Vergelijk je berekende t-waarde met deze kritieke waarde.
- Wat is je conclusie in gewone taal voor de directeur?
4. **De onzekerheid:** Stel dat de directeur zegt: *"Maar het zou toch ook 1.5 of 0.8 kunnen zijn?"*
- Bereken een 95% betrouwbaarheidsinterval voor het opleidingseffect.
- Wat betekent dit interval voor de directeur?
---
### Deel B: De Sceptische CFO
De CFO denkt: *"Misschien dat leeftijd wel belangrijker is dan opleiding. En als we iemand aannemen die al ervaring heeft, hoeft die toch minder opleiding?"*
5. **Leeftijd vs Opleiding:** Test of leeftijd een significant effect heeft op salaris op het 5% niveau. Wat concludeer je?
6. **Werkervaring:** De CFO ziet dat werkervaring een p-waarde van 0.063 heeft. Leg uit wat dit betekent in de context van het bedrijf.
7. **De bonusvraag:** Stel dat de CFO vraagt: *"Als ik twee kandidaten heb — Anouk (5 jaar opleiding, 2 jaar ervaring) en Bas (3 jaar opleiding, 4 jaar ervaring) — wie verdient dan meer volgens dit model?"*
- Bereken het voorspelde salaris voor beiden.
- Welke aanname maak je hierbij over leeftijd?
---
### Deel C: De Presentatie
Sophie moet haar bevindingen presenteren aan de Raad van Commissarissen.
8. **Economische interpretatie:** Wat betekent $\hat{\beta}_1 = 2.15$ concreet? Als een medewerker besluit om 2 jaar extra te studeren (bijvoorbeeld een master), wat verwacht je dan qua salarisverschil?
9. **Goed of slecht model?** De directeur vraagt: *"Hoe zeker weten we dat dit model klopt?"*
- Wat zegt $R^2 = 0.58$ over dit model?
- Leg uit waarom dit wel/niet betekent dat het model "goed" is.
---
## Antwoordenblad (eerst zelf proberen!)
<details>
<summary>Klik hier voor antwoorden na zelf gedaan te hebben</summary>
### Deel A
**1. Hypotheses**
- $H_0: \beta_1 = 0$ (opleiding heeft geen effect op salaris)
- $H_1: \beta_1 \neq 0$ (opleiding heeft wel een effect)
- **Tweezijdig:** We toetsen óf positief óf negatief effect, niet van tevoren gericht
**2. t-waarde berekening**
$$t = \frac{2.15 - 0}{0.55} = \frac{2.15}{0.55} = 3.909 \approx 3.91$$
**3. Oordeel**
- $df = 48 - 4 = 44$, kritieke waarde ≈ 2.015 (tabel) of 2.021 (precies)
- $|3.91| > 2.02$ → **Verwerp $H_0$**
- *"Directeur, opleiding heeft een statistisch significant effect op salaris. Het is zeer onwaarschijnlijk (p < 0,001) dat we dit resultaat zien als er in werkelijkheid geen effect is."*
**4. 95% betrouwbaarheidsinterval**
- $t_{0.025, 44} = 2.015$
- Marge = $2.015 \times 0.55 = 1.108$
- Interval: $[2.15 - 1.11, 2.15 + 1.11] = [1.04, 3.26]$
- *"We zijn 95% zeker dat elke extra jaar opleiding leidt tot een salarisstijging van €1.040 tot €3.260 per jaar."*
---
### Deel B
**5. Leeftijd**
- $t = 1.25$, kritieke waarde = 2.015
- $|1.25| < 2.015$ **Niet verwerpen**
- Conclusie: Leeftijd heeft bij deze significantie geen statistisch aantoonbaar effect op salaris.
**6. Werkervaring (p = 0.063)**
- Op 5% niveau: **niet significant** (0.063 > 0.05)
- Op 10% niveau: **wel significant** (0.063 < 0.10)
- *"CFO, het effect van werkervaring is op de traditionele grens net niet significant, maar wel suggestief. Met meer data zouden we wellicht een sterker signaal krijgen."*
**7. Anouk vs Bas**
Anouk: $\hat{Salaris} = 8.50 + 2.15(5) + 0.35(L) + 0.80(2) = 8.50 + 10.75 + 0.35L + 1.60 = 20.85 + 0.35L$
Bas: $\hat{Salaris} = 8.50 + 2.15(3) + 0.35(L) + 0.80(4) = 8.50 + 6.45 + 0.35L + 3.20 = 18.15 + 0.35L$
- **Anouk verdient €2.700 meer** (bij gelijke leeftijd)
- Aanname: Leeftijd $L$ is voor beiden hetzelfde
---
### Deel C
**8. Economische interpretatie**
- $\hat{\beta}_1 = 2.15$: Elk extra jaar opleiding is geassocieerd met €2.150 hoger jaarsalaris, ceteris paribus
- 2 jaar extra master: $2 \times 2.150 = €4.300$ meer per jaar
- Over 40 jaar carrière: potentieel €172.000 extra verdiend (niet gediscount)
**9. Modelkwaliteit**
- $R^2 = 0.58$: 58% van de salarisvariabiliteit wordt verklaard door deze drie factoren
- 42% komt door andere factoren (talent, onderhandeling, sector, geluk, etc.)
- Dit is redelijk maar niet spectaculair — het model legt meer dan de helft van de variatie uit, maar er is nog veel onverklaard
</details>
---
# 🏠 Casus 2: Makelaar Marco en de Vraagprijs
## Het Verhaal
Marco is makelaar in Den Haag en wil klanten beter adviseren over vraagprijzen. Hij analyseert 35 recent verkochte appartementen in dezelfde buurt.
Voor elk appartement heeft hij:
- **Vraagprijs** (in €10.000) — dit is de prijs die de verkoper vraagt
- **Woonoppervlakte** (in vierkante meters)
- **Bouwjaar** (hoe nieuwer, hoe hoger de cijfers)
- **Afstand tot station** (in km)
Marco wil weten: **welke factoren bepalen de vraagprijs echt?**
## De Resultaten
$$
\text{Vraagprijs}_i = \beta_0 + \beta_1 \text{Oppervlakte}_i + \beta_2 \text{Bouwjaar}_i + \beta_3 \text{Afstand}_i + u_i
$$
| Variabele | Coefficient | Std. Error |
|-----------|-------------|------------|
| Intercept | -145.0 | 42.5 |
| Oppervlakte | **0.35** | **0.08** |
| Bouwjaar | 0.12 | 0.09 |
| Afstand tot station | -2.80 | 1.15 |
- $n = 35$, $R^2 = 0.72$
Marco's assistent heeft een tabel gemaakt met kritieke waarden:
| Vrijheidsgraden | $\alpha = 0.10$ | $\alpha = 0.05$ | $\alpha = 0.01$ |
|-----------------|-----------------|------------------|------------------|
| 30 | 1.697 | 2.042 | 2.750 |
| 31 | 1.696 | 2.040 | 2.744 |
| 32 | 1.694 | 2.037 | 2.738 |
---
## De Vragen
### Deel A: De Hypothesetoetsen
Marco wil per variabele weten: is dit echt significant, of kan het toeval zijn?
**Vraag 1:** Bereken voor elke variabele (behalve de intercept) de t-waarde. Toon je berekening.
**Vraag 2:** Bepaal de vrijheidsgraden voor deze regressie. Leg uit hoe je dit berekent.
**Vraag 3:** Toets voor elke variabele of deze significant is op het 5% niveau. Gebruik de tabel hierboven.
**Vraag 4:** Marco denkt: *"Nieuwe huizen zijn toch altijd duurder?"* — maar jouw analyse laat iets anders zien. Leg uit wat er aan de hand is. Zou dit een steekproefkwestie kunnen zijn, of iets anders?
---
### Deel B: De Betrouwbaarheidsintervallen
Een klant vraagt Marco: *"Als ik 10 vierkante meter meer woonoppervlakte heb, hoeveel meer vraagprijs kan ik dan verwachten?"*
**Vraag 5:** Construeer een 95% betrouwbaarheidsinterval voor het oppervlakte-effect ($\beta_1$).
**Vraag 6:** Marco wil weten of de prijsdaling per kilometer afstand van het station significant is. Gebruik je betrouwbaarheidsinterval om te toetsen of $\beta_3 = -1.5$ wordt verworpen of niet.
---
### Deel C: De Praktijk
**Vraag 7:** Een klant heeft een appartement van 80 m², bouwjaar 2010, op 2 km van het station. Wat is de voorspelde vraagprijs volgens dit model?
**Vraag 8:** Hetzelfde appartement, maar dan in 1990 gebouwd (ipv 2010). Wat is het prijsverschil?
**Vraag 9:** Marco denkt erover om "verdieping" als extra variabele toe te voegen. Hij verwacht dat hoger gelegen appartementen duurder zijn (betere view). Leg uit waarom het toevoegen van een variabele het model altijd beter lijkt te maken ($R^2$ stijgt), maar dit niet per se betekent dat het model ook echt beter voorspelt.
---
## Antwoordenblad
<details>
<summary>Klik hier voor antwoorden</summary>
### Deel A
**1. t-waardes**
- Oppervlakte: $t = 0.35 / 0.08 = 4.375$
- Bouwjaar: $t = 0.12 / 0.09 = 1.333$
- Afstand: $t = -2.80 / 1.15 = -2.435$ → $|t| = 2.435$
**2. Vrijheidsgraden**
- $k = 3$ (explanatory variables)
- $df = n - k - 1 = 35 - 3 - 1 = 31$
**3. Toetsing 5% niveau**
- Kritieke waarde (tabel): 2.040
- **Oppervlakte**: 4.375 > 2.040 → **Significant**
- **Bouwjaar**: 1.333 < 2.040 **Niet significant**
- **Afstand**: 2.435 > 2.040 → **Significant**
**4. Bouwjaar-analyse**
- De coefficient is positief (0.12) maar niet significant
- Mogelijke verklaringen:
- In deze buurt correlatieert bouwjaar sterk met andere kenmerken (bijv. oppervlakte — nieuwe appartementen zijn vaak groter)
- Multicollineariteit: als oppervlakte en bouwjaar samenhangen, kan het effect van bouwjaar "meezitten" in oppervlakte
- Te kleine steekproef om dit subtiele effect te detecteren
---
### Deel B
**5. 95% CI voor oppervlakte**
- $t_{0.025, 31} = 2.040$
- Marge = $2.040 \times 0.08 = 0.163$
- CI: $[0.35 - 0.163, 0.35 + 0.163] = [0.187, 0.513]$
*Interpretatie: We zijn 95% zeker dat elke extra m² leidt tot een prijsstijging van €1.870 tot €5.130.*
**6. Toets $\beta_3 = -1.5$ met CI**
Eerst 95% CI voor afstand:
- $|-2.80| / 1.15 = 2.435$
- Marge = $2.040 \times 1.15 = 2.346$
- CI: $[-2.80 - 2.35, -2.80 + 2.35] = [-5.15, -0.45]$
- **-1.5 ligt WEL in het interval** [-5.15, -0.45]
- Conclusie: We kunnen $H_0: \beta_3 = -1.5$ **niet verwerpen** op 5% niveau
- Marco kan niet met zekerheid zeggen dat het effect anders is dan -1.5
---
### Deel C
**7. Voorspelde prijs**
$$\hat{Y} = -145.0 + 0.35(80) + 0.12(2010) + (-2.80)(2)$$
$$= -145.0 + 28.0 + 241.2 - 5.6$$
$$= 118.6$$
→ Voorspelde vraagprijs: **€1.186.000** (let op: prijs in €10.000!)
**8. Prijsverschil 2010 vs 1990**
Verschil in bouwjaar: 20 jaar
Effect: $20 \times 0.12 = 2.4$ (in €10.000)
**€24.000** duurder voor het nieuwere appartement
**9. Variabele toevoegen en $R^2$**
- $R^2$ meet verklaarde variantie: altijd stijgend als je variabelen toevoegt (zelfs als ze onzin zijn)
- Dit heet **overfitting**: model past te goed op de steekproef, slechter op nieuwe data
- Beter: Adjusted $R^2$ bekijken, of out-of-sample testen
- "Verdieping" kan relevant zijn, maar toevoegen "omdat het kan" is geen goed idee
</details>
---
# 📝 Samenvattende Toetsvragen (5-minuut drills)
## Drill 1: Snelle t-toets
**Scenario:** Een regressie met $n = 52$ geeft $\hat{\beta} = 1.8$, $SE = 0.6$. Test $H_0: \beta = 0$ vs $H_1: \beta \neq 0$ op 5% niveau.
**Wat moet je doen?**
1. t-waarde berekenen
2. df bepalen (aannemen: 1 explanatory variable)
3. Vergelijken met kritieke waarde
4. Conclusie trekken
*(Antwoord: t = 3.0, df = 50, kritiek ≈ 2.01, verwerp H₀)*
---
## Drill 2: Betrouwbaarheidsinterval interpretatie
**Scenario:** Je hebt een 95% CI voor een effect: [2.3, 7.8]. Je collega zegt: *"Dat betekent dat er een 95% kans is dat het werkelijke effect tussen 2.3 en 7.8 ligt."*
**Is dit correct?** Leg uit waarom wel of niet.
*(Antwoord: Nee! Het is geen kans over het werkelijke effect — dat is vast. Het is: als we oneindig vaak steekproeven zouden nemen, zou 95% van de CIs het werkelijke effect bevatten.)*
---
## Drill 3: P-waarde interpretatie
**Scenario:** Je regressie geeft p = 0.047 voor een variabele.
**Welke uitspraken zijn correct?**
- [ ] Er is 4.7% kans dat de nulhypothese waar is
- [ ] Als de nulhypothese waar is, is 4.7% de kans op deze data of extremer
- [ ] We zijn 95.3% zeker dat er een effect is
- [ ] Op 5% niveau verwerpen we de nulhypothese
*(Antwoord: alleen de tweede en vierde zijn correct)*
---
## Veelgemaakte Fouten — Checklist
| Fout | Waarom fout | Hoe goed |
|------|-------------|----------|
| $\beta_0$ vergeten in df-berekening | $df = n - k - 1$, de +1 is voor intercept! | ✓ |
| Éénzijdig vs tweezijdig door elkaar | Tweezijdig: $\alpha/2$ in elke staart | ✓ |
| Betrouwbaarheidsniveau verkeerd | 95% CI → $\alpha = 0.05$, maar $t_{0.025}$ | ✓ |
| P-waarde als kans op $H_0$ | P-waarde is $P(\text{data}|H_0)$, niet $P(H_0|\text{data})$ | ✓ |
| $|t|$ vergeten | Altijd absolute waarde vergelijken met kritieke waarde | ✓ |
---
*Succes met oefenen! 💪*

View File

@ -0,0 +1,401 @@
# Econometrics Practice Exercises
## Classical Linear Regression Model (CLRM)
---
# Practice Exercise 1: Hypothesis Testing with t-statistics
## Problem Statement
You are analyzing the relationship between years of education and hourly wages using a simple linear regression model. A researcher collected data from 45 randomly selected workers and estimated the following regression equation:
$$\text{Wage}_i = \beta_0 + \beta_1 \text{Education}_i + u_i$$
### Estimated Results:
| Coefficient | Estimate | Standard Error |
|-------------|----------|----------------|
| $\hat{\beta}_0$ (Intercept) | 3.25 | 1.84 |
| $\hat{\beta}_1$ (Education) | 1.78 | 0.42 |
| $R^2$ | 0.62 | - |
| Sample size ($n$) | 45 | - |
### Additional Information:
- Wage is measured in dollars per hour
- Education is measured in years of schooling completed
- The classical assumptions of the CLRM hold (homoskedasticity, no autocorrelation, normality of errors)
---
## Questions
### Part A: Two-Tailed Test for Slope Coefficient
**Test whether education has a statistically significant effect on wages at the 5% significance level.**
1. State the null and alternative hypotheses.
2. Calculate the t-statistic.
3. Determine the critical value(s).
4. State your conclusion in statistical terms.
5. Interpret your conclusion in the context of the wage-education relationship.
**Calculation Space:**
```
H₀: ________________________________________________
H₁: ________________________________________________
t-statistic formula: t = (β̂₁ - β₁₀) / SE(β̂₁)
t = ________________________________________________
t = ________________________________________________
t = ________________________________________________
degrees of freedom = _______________________________
critical values (α = 0.05, two-tailed): ____________
Decision: __________________________________________
Interpretation: ____________________________________
____________________________________________________
```
### Part B: One-Tailed Test for Slope Coefficient
**Test whether each additional year of education increases wages by more than $1.50 per hour at the 1% significance level.**
1. State the null and alternative hypotheses.
2. Calculate the t-statistic.
3. Determine the critical value.
4. State your conclusion.
**Calculation Space:**
```
H₀: ________________________________________________
H₁: ________________________________________________
t = ________________________________________________
t = ________________________________________________
critical value (α = 0.01, one-tailed): _____________
Decision: __________________________________________
Interpretation: ____________________________________
____________________________________________________
```
### Part C: Test for Intercept
**Test whether the intercept is significantly different from zero at the 10% significance level.**
1. State the hypotheses.
2. Calculate the t-statistic.
3. Make your decision and interpret.
**Calculation Space:**
```
H₀: ________________________________________________
H₁: ________________________________________________
t = ________________________________________________
t = ________________________________________________
degrees of freedom = _______________________________
critical values (α = 0.10, two-tailed): ____________
Decision: __________________________________________
Interpretation: ____________________________________
____________________________________________________
```
### Part D: Economic Interpretation
Explain what the coefficient $\hat{\beta}_1 = 1.78$ means in practical terms. If someone completes an additional 4 years of college education, what would this model predict as their wage increase, assuming all else equal?
---
# Practice Exercise 2: Confidence Intervals and Joint Hypothesis Testing
## Problem Statement
A regional transportation authority wants to understand factors affecting monthly public transit ridership across 35 cities. They estimate the following multiple regression model:
$$\text{Ridership}_i = \beta_0 + \beta_1 \text{Fare}_i + \beta_2 \text{Income}_i + \beta_3 \text{PopDensity}_i + u_i$$
Where:
- **Ridership**: Monthly ridership per 1,000 residents (number of trips)
- **Fare**: Average one-way fare in dollars
- **Income**: Median household income in thousands of dollars
- **PopDensity**: Population density (thousands of people per square km)
### Estimated Results:
| Variable | Coefficient | Standard Error |
|----------|-------------|----------------|
| Intercept ($\hat{\beta}_0$) | 48.6 | 12.3 |
| Fare ($\hat{\beta}_1$) | -4.20 | 1.15 |
| Income ($\hat{\beta}_2$) | 0.85 | 0.32 |
| PopDensity ($\hat{\beta}_3$) | 3.40 | 1.08 |
| Sample size ($n$) | 35 | - |
| $R^2$ | 0.71 | - |
| Adjusted $R^2$ | 0.68 | - |
---
## Questions
### Part A: 95% Confidence Interval for Fare Coefficient
**Construct and interpret a 95% confidence interval for $\beta_1$ (the effect of fare on ridership).**
**Calculation Space:**
```
Confidence interval formula: β̂₁ ± t(α/2, df) × SE(β̂₁)
degrees of freedom = n - k - 1 = ____________________
= ____________________
t-critical for 95% CI: ____________________________
Margin of error = __________________________________
= __________________________________
Lower bound = ______________________________________
Upper bound = ______________________________________
95% CI for β₁: [ _______ , _______ ]
```
**Interpretation:** What does this confidence interval tell us about the relationship between fares and ridership?
### Part B: Hypothesis Test Using Confidence Interval
**Using the confidence interval from Part A, test H₀: β₁ = -2.5 vs. H₁: β₁ ≠ -2.5 at the 5% significance level.**
**Decision Rule:** Does -2.5 fall inside or outside the confidence interval?
**Conclusion:** _______________________________________________
### Part C: 90% Confidence Interval for Income Coefficient
**Construct a 90% confidence interval for $\beta_2$ and interpret its meaning.**
**Calculation Space:**
```
t-critical for 90% CI: ____________________________
Margin of error = __________________________________
90% CI for β₂: [ _______ , _______ ]
```
**Interpretation:** What does this tell us about the relationship between income and transit ridership?
### Part D: Testing a Specific Hypothesis
**Test whether population density has a positive effect on ridership at the 5% significance level.**
1. State the hypotheses.
2. Calculate the t-statistic.
3. Determine the p-value range using the t-distribution table.
4. Make your decision and interpret.
**Calculation Space:**
```
H₀: ________________________________________________
H₁: ________________________________________________
t-statistic = _______________________________________
= _______________________________________
= _______________________________________
One-tailed critical value (α = 0.05): ______________
t-statistic > critical value? _______________________
p-value is: (circle one)
p < 0.01 0.01 < p < 0.025 0.025 < p < 0.05
0.05 < p < 0.10 p > 0.10
Decision: __________________________________________
Interpretation: ____________________________________
____________________________________________________
```
### Part E: Joint Interpretation
Suppose a city is considering two policies:
1. **Policy X:** Reduce fare by $0.50
2. **Policy Y:** Increase population density by 0.2 (through zoning changes)
Based on your regression results, calculate the **expected change in ridership per 1,000 residents** for each policy. Which policy would be predicted to have a larger impact on ridership?
**Calculation Space:**
```
Policy X (Fare reduction):
Expected ΔRidership = ______________________________
= ______________________________
Policy Y (Density increase):
Expected ΔRidership = ______________________________
= ______________________________
Larger predicted impact: ____________________________
```
---
# ANSWER KEY
---
## Exercise 1 Answers
### Part A: Two-Tailed Test for Slope
1. **Hypotheses:**
- H₀: β₁ = 0 (Education has no effect on wages)
- H₁: β₁ ≠ 0 (Education has an effect on wages)
2. **t-statistic:**
$$t = \frac{1.78 - 0}{0.42} = 4.238$$
3. **Critical values:**
- df = 45 - 2 = 43
- t-critical (two-tailed, α=0.05) = ±2.017
4. **Decision:** Reject H₀ because |4.238| > 2.017
5. **Conclusion:** There is statistically significant evidence at the 5% level that education affects wages. The p-value is approximately 0.0001 (much less than 0.05).
---
### Part B: One-Tailed Test
1. **Hypotheses:**
- H₀: β₁ ≤ 1.50
- H₁: β₁ > 1.50
2. **t-statistic:**
$$t = \frac{1.78 - 1.50}{0.42} = \frac{0.28}{0.42} = 0.667$$
3. **Critical value:**
- t-critical (one-tailed, α=0.01, df=43) = 2.416
4. **Decision:** Fail to reject H₀ because 0.667 < 2.416
5. **Conclusion:** At the 1% significance level, we do NOT have sufficient evidence to conclude that each year of education increases wages by more than $1.50.
---
### Part C: Test for Intercept
1. **Hypotheses:**
- H₀: β₀ = 0
- H₁: β₀ ≠ 0
2. **t-statistic:**
$$t = \frac{3.25 - 0}{1.84} = 1.766$$
3. **Critical values:**
- t-critical (two-tailed, α=0.10, df=43) = ±1.681
4. **Decision:** Reject H₀ because |1.766| > 1.681
5. **Conclusion:** The intercept is statistically significant at the 10% level, suggesting that even with zero education, predicted wages differ significantly from zero. (Note: This may not be economically meaningful—workers with zero education would still earn something.)
---
### Part D: Economic Interpretation
- **β̂₁ = 1.78** means: Each additional year of education is associated with an increase of **$1.78 per hour** in wages, holding all else constant.
- **For 4 years of college:**
- Predicted wage increase = 4 × $1.78 = **$7.12 per hour**
- If working 2,000 hours/year, this translates to approximately **$14,240 additional annual income**
---
## Exercise 2 Answers
### Part A: 95% Confidence Interval for Fare Coefficient
- **df** = 35 - 3 - 1 = **31** (k = 3 regressors)
- **t-critical** (two-tailed, α=0.05, df=31) = **2.040**
- **Margin of error** = 2.040 × 1.15 = **2.346**
- **Lower bound** = -4.20 - 2.346 = **-6.546**
- **Upper bound** = -4.20 + 2.346 = **-1.854**
**95% CI for β₁: [ -6.55 , -1.85 ]**
**Interpretation:** We are 95% confident that a $1 increase in fare is associated with a decrease in ridership of between 1.85 and 6.55 trips per 1,000 residents per month. Since the entire interval is negative, there is strong evidence of an inverse relationship.
---
### Part B: Hypothesis Test Using Confidence Interval
**H₀: β₁ = -2.5 vs. H₁: β₁ ≠ -2.5**
- **Decision:** Since -2.5 falls **WITHIN** the 95% CI [-6.55, -1.85], we **fail to reject H₀**
- **Conclusion:** At the 5% significance level, we do not have sufficient evidence to reject the claim that the true effect of fare on ridership is -2.5 trips per dollar increase.
---
### Part C: 90% Confidence Interval for Income Coefficient
- **t-critical** (two-tailed, α=0.10, df=31) = **1.696**
- **Margin of error** = 1.696 × 0.32 = **0.543**
- **Lower bound** = 0.85 - 0.543 = **0.307**
- **Upper bound** = 0.85 + 0.543 = **1.393**
**90% CI for β₂: [ 0.31 , 1.39 ]**
**Interpretation:** We are 90% confident that a $1,000 increase in median household income is associated with an increase in transit ridership of between 0.31 and 1.39 trips per 1,000 residents per month. The positive relationship suggests higher-income cities use transit more (perhaps due to downtown employment).
---
### Part D: Testing Population Density Effect
1. **Hypotheses:**
- H₀: β₃ ≤ 0 (Population density has no positive effect)
- H₁: β₃ > 0 (Population density has a positive effect)
2. **t-statistic:**
$$t = \frac{3.40 - 0}{1.08} = 3.148$$
3. **Critical value:**
- t-critical (one-tailed, α=0.05, df=31) = **1.696**
4. **Decision:** **Reject H₀** because 3.148 > 1.696
5. **p-value range:** **p < 0.01** (actually p ≈ 0.002)
6. **Conclusion:** There is strong statistical evidence that higher population density increases transit ridership. Cities with greater density have significantly more transit usage per capita.
---
### Part E: Policy Comparison
**Policy X (Fare reduction of $0.50):**
$$\Delta \text{Ridership} = (-4.20) \times (-0.50) = +2.10 \text{ trips per 1,000 residents}$$
**Policy Y (Density increase of 0.2):**
$$\Delta \text{Ridership} = 3.40 \times 0.2 = +0.68 \text{ trips per 1,000 residents}$$
**Larger predicted impact: Policy X (fare reduction)**
The fare reduction is predicted to increase ridership by about **3 times more** than the density increase, based on these coefficient estimates.
---
## Common Mistakes to Avoid
1. **Degrees of freedom:** Remember df = n - k - 1 for multiple regression (where k = number of slope coefficients). For simple regression, df = n - 2.
2. **One-tailed vs two-tailed:** Always check whether the alternative hypothesis uses ≠ (two-tailed) or < / > (one-tailed). This affects your critical value.
3. **Sign interpretation:** When interpreting coefficients, always explain both the magnitude AND the direction (positive/negative).
4. **Confidence interval for hypothesis testing:** If the hypothesized value falls within the (1-α)% confidence interval, you fail to reject H₀ at significance level α.
5. **Practical vs statistical significance:** A coefficient can be statistically significant (large t-statistic) but economically small, or vice versa. Always consider both!
---
*End of Practice Exercises*

View File

@ -0,0 +1,259 @@
# Quick Practice Drills — English
## t-Tests, Confidence Intervals, Hypothesis Testing
---
## Drill 1: Simple t-Test
**Given:**
- Sample size: n = 38
- Estimated coefficient: β̂ = 2.4
- Standard error: SE = 0.9
**Test:** H₀: β = 0 vs H₁: β ≠ 0 at α = 0.05
**Your tasks:**
1. Calculate the t-statistic
2. Find degrees of freedom
3. Find critical value (two-tailed, α = 0.05)
4. Make your decision: Reject or fail to reject H₀?
5. Calculate the p-value range using t-table
<details>
<summary>Answers</summary>
1. t = (2.4 - 0) / 0.9 = 2.667
2. df = 38 - 2 = 36 (simple regression)
3. Critical value ≈ 2.028 (or use 2.042 for df=30, 2.021 for df=40)
4. |2.667| > 2.028 → **Reject H₀**
5. From t-table: 2.434 < 2.667 < 2.750 **0.01 < p < 0.02**
</details>
---
## Drill 2: One-Tailed Test
**Given:**
- n = 55
- β̂ = -1.8, SE = 0.7
**Test:** H₀: β ≥ 0 vs H₁: β < 0 at α = 0.01
**Your tasks:**
1. Calculate t-statistic
2. Find critical value (one-tailed, α = 0.01)
3. Decision?
<details>
<summary>Answers</summary>
1. t = (-1.8 - 0) / 0.7 = -2.571 → |t| = 2.571
2. df = 53, one-tailed critical at α=0.01: ≈ 2.404
3. 2.571 > 2.404 → **Reject H₀** (evidence that β < 0)
</details>
---
## Drill 3: 95% Confidence Interval
**Given:**
- n = 42
- β̂ = 3.6, SE = 1.2
**Your tasks:**
1. Construct 95% confidence interval
2. Interpret the interval
3. Does this interval contain 2.0? What does that tell you?
<details>
<summary>Answers</summary>
1. df = 40, t₀.₀₂₅ = 2.021
Margin = 2.021 × 1.2 = 2.425
CI: [3.6 - 2.425, 3.6 + 2.425] = **[1.175, 4.025]**
2. We are 95% confident the true β lies between 1.175 and 4.025
3. Yes, 2.0 is in the interval → We **cannot reject** H₀: β = 2.0 at α=0.05
</details>
---
## Drill 4: Multiple Regression Test
**Regression output:**
| Variable | Coefficient | Std. Error |
|----------|-------------|------------|
| Intercept | 5.2 | 2.1 |
| X₁ | **1.5** | **0.4** |
| X₂ | -0.8 | 0.6 |
| X₃ | 2.1 | 0.9 |
n = 65
**Your tasks:**
1. Test each slope coefficient at α = 0.05
2. Which variables are significant?
3. Construct 90% CI for X₁
<details>
<summary>Answers</summary>
1. df = 65 - 3 - 1 = 61, critical value ≈ 1.96 (or 2.000 for df=60)
- X₁: t = 1.5/0.4 = 3.75 → **Significant**
- X₂: t = -0.8/0.6 = -1.33 → |1.33| < 2.0 **Not significant**
- X₃: t = 2.1/0.9 = 2.33 → **Significant**
2. X₁ and X₃ are significant at 5% level
3. 90% CI for X₁: t₀.₀₅,₆₁ ≈ 1.671
Margin = 1.671 × 0.4 = 0.668
CI: [1.5 - 0.668, 1.5 + 0.668] = **[0.832, 2.168]**
</details>
---
## Drill 5: CI for Hypothesis Test
**Given:**
- 95% CI for β: [0.5, 3.2]
**Test:** H₀: β = 4.0 vs H₁: β ≠ 4.0 at α = 0.05
**Your task:** Use the CI method to test this hypothesis.
<details>
<summary>Answer</summary>
4.0 lies **outside** the 95% CI [0.5, 3.2]
**Reject H₀** at α = 0.05
The hypothesized value 4.0 is not a plausible value for β.
</details>
---
## Drill 6: Interpretation Check
**Given:** β̂ = 2.3, p-value = 0.08
**Which statements are correct?**
- [ ] The effect is not statistically significant at 5% level
- [ ] There is an 8% probability that β = 0
- [ ] If H₀ were true, there's 8% chance of seeing this result
- [ ] We are 92% confident there is an effect
- [ ] At 10% level, the effect would be significant
<details>
<summary>Answers</summary>
Correct: ✓✗✓✗✓
- ✓ Not significant at 5% (0.08 > 0.05)
- ✗ P-value is NOT probability H₀ is true
- ✓ Correct interpretation of p-value
- ✗ Confidence and significance are different concepts
- ✓ Would be significant at 10% (0.08 < 0.10)
</details>
---
## Drill 7: Full Problem — Coffee Shop Prices
**Scenario:** A researcher studies coffee shop prices in 28 cities.
**Model:** Priceᵢ = β₀ + β₁Rentᵢ + β₂Wageᵢ + uᵢ
**Output:**
| Variable | Coefficient | Std. Error |
|----------|-------------|------------|
| Intercept | 1.50 | 0.80 |
| Rent (€100/m²) | **0.25** | **0.08** |
| Wage (€/hour) | 0.15 | 0.12 |
**Questions:**
1. Test if rent affects price at 5% level
2. Test if wage affects price at 5% level
3. Construct 95% CI for rent coefficient
4. A politician claims each €100 rent increase raises price by €0.40. Test this claim.
<details>
<summary>Answers</summary>
1. Rent: t = 0.25/0.08 = 3.125, df = 25, critical ≈ 2.060
3.125 > 2.060 → **Significant**
2. Wage: t = 0.15/0.12 = 1.25, |1.25| < 2.060 **Not significant**
3. 95% CI for rent: t₀.₀₂₅,₂₅ = 2.060
Margin = 2.060 × 0.08 = 0.165
CI: [0.25 - 0.165, 0.25 + 0.165] = **[0.085, 0.415]**
4. Politician claims β₁ = 0.40. Is 0.40 in the CI [0.085, 0.415]?
Yes! (0.40 is inside the interval)
**Cannot reject** the politician's claim at 5% level.
</details>
---
## Drill 8: Quick Calculations
**Calculate mentally or with scratch paper:**
| β̂ | SE | n | k | t-stat | Significant at 5%? |
|----|----|---|---|--------|-------------------|
| 4.0 | 1.5 | 30 | 1 | ? | ? |
| -2.5 | 1.0 | 50 | 2 | ? | ? |
| 0.8 | 0.3 | 100 | 3 | ? | ? |
| 1.2 | 0.9 | 25 | 1 | ? | ? |
<details>
<summary>Answers</summary>
| β̂ | SE | n | k | t-stat | df | critical | Significant? |
|----|----|---|---|--------|-----|----------|--------------|
| 4.0 | 1.5 | 30 | 1 | 2.67 | 28 | 2.048 | **Yes** |
| -2.5 | 1.0 | 50 | 2 | 2.50 | 47 | 2.012 | **Yes** |
| 0.8 | 0.3 | 100 | 3 | 2.67 | 96 | 1.985 | **Yes** |
| 1.2 | 0.9 | 25 | 1 | 1.33 | 23 | 2.069 | **No** |
</details>
---
## Formula Sheet
**t-statistic:**
$$t = \frac{\hat{\beta} - \beta_0}{SE(\hat{\beta})}$$
**Confidence Interval:**
$$CI = \hat{\beta} \pm t_{\alpha/2, df} \times SE(\hat{\beta})$$
**Degrees of freedom:**
- Simple regression: df = n - 2
- Multiple regression: df = n - k - 1 (where k = number of X variables)
**Common critical values (two-tailed):**
| df | α = 0.10 | α = 0.05 | α = 0.01 |
|----|----------|----------|----------|
| 25 | 1.708 | 2.060 | 2.787 |
| 30 | 1.697 | 2.042 | 2.750 |
| 40 | 1.684 | 2.021 | 2.704 |
| 60 | 1.671 | 2.000 | 2.660 |
| 120 | 1.658 | 1.980 | 2.617 |
| ∞ | 1.645 | 1.960 | 2.576 |
---
*Practice these until you can do them in your sleep! 🎯*

View File

@ -0,0 +1,311 @@
# Formative Test Exercise — New Scenario
## Practice for Exam: t-Tests, CIs, Hypothesis Testing
---
# Exercise 1: The Retail Store Manager's Dilemma
## Scenario
Emma is a regional manager for a retail clothing chain with 40 stores across the Netherlands. She suspects that **store location characteristics** affect monthly sales revenue. Emma hires a data analyst to investigate.
The analyst collects data from all 40 stores and estimates the following regression:
$$
\text{Sales}_i = \beta_0 + \beta_1 \text{FootTraffic}_i + \beta_2 \text{Competitors}_i + \beta_3 \text{ParkingSpots}_i + \mu_i
$$
Where:
- **Sales**: Monthly revenue (in thousands of euros)
- **FootTraffic**: Estimated daily pedestrian count near the store (in hundreds)
- **Competitors**: Number of competing clothing stores within 500m radius
- **ParkingSpots**: Number of parking spots within 200m of the store
---
## Regression Output
| Variable | Coefficient | Standard Error | t-statistic | p-value |
|----------|-------------|----------------|-------------|---------|
| Intercept | 12.5 | 8.3 | 1.51 | 0.140 |
| FootTraffic | **2.8** | **0.95** | 2.95 | 0.006 |
| Competitors | -1.4 | 0.82 | -1.71 | 0.096 |
| ParkingSpots | 0.6 | 0.38 | 1.58 | 0.124 |
**Model statistics:** $n = 40$, $R^2 = 0.64$
Emma knows that all CLRM assumptions (A1A5) hold for this data.
---
## Questions
### Question 1 (2 points)
The analyst tells Emma: *"Foot traffic has a statistically significant effect on sales at the 5% level."*
**a)** Verify this claim by performing a hypothesis test. Show:
- The null and alternative hypotheses
- The calculation of the t-statistic (verify it matches the table)
- Your conclusion using the critical value approach
**b)** What is the economic interpretation of $\hat{\beta}_1 = 2.8$? Be specific about units.
---
### Question 2 (2 points)
Emma asks: *"But does 100 extra pedestrians per day really matter? Could the effect be as small as 1.5 thousand euros?"*
**a)** Construct a 95% confidence interval for the FootTraffic coefficient.
**b)** Use your confidence interval to test whether $\beta_1 = 1.5$ is plausible. Show your reasoning.
**c)** Interpret the confidence interval in language Emma would understand.
---
### Question 3 (2 points)
The marketing director claims: *"Every competitor nearby costs us exactly €1,000 in monthly revenue."*
**a)** Translate this claim into a hypothesis test (state $H_0$ and $H_1$).
**b)** Test this claim at the 10% significance level. Show your work.
**c)** Based on your result, can Emma confidently tell the marketing director they're wrong?
---
### Question 4 (2 points)
**a)** Calculate the predicted monthly sales for a store with:
- FootTraffic = 50 (i.e., 5,000 pedestrians/day)
- Competitors = 3
- ParkingSpots = 20
**b)** Emma wants to open a new store. She can choose between:
- **Location A**: FootTraffic = 60, Competitors = 5, Parking = 15
- **Location B**: FootTraffic = 40, Competitors = 2, Parking = 30
Which location is predicted to generate higher monthly sales? Show your calculation.
---
### Question 5 (2 points)
The analyst mentions that $R^2 = 0.64$.
**a)** Explain what this means in the context of Emma's stores.
**b)** Does a high $R^2$ mean the model is "correct" or that we have proven causation? Explain why or why not.
---
# Exercise 2: The Restaurant Owner's Question
## Scenario
David owns 32 independent restaurants across Belgium. He wonders whether **customer ratings on Google Maps** affect monthly profit. He suspects that higher-rated restaurants earn more, but he's unsure if the effect is meaningful.
David's accountant runs a regression:
$$
\text{Profit}_i = \beta_0 + \beta_1 \text{Rating}_i + \beta_2 \text{Seats}_i + \mu_i
$$
Where:
- **Profit**: Monthly profit (in thousands of euros)
- **Rating**: Average Google rating (scale 1.05.0)
- **Seats**: Number of seats in the restaurant
---
## Regression Output
| Variable | Coefficient | Std. Error |
|----------|-------------|------------|
| Intercept | -8.5 | 12.3 |
| Rating | **4.2** | **1.8** |
| Seats | 0.15 | 0.08 |
**Model statistics:** $n = 32$, $R^2 = 0.51$
---
## Questions
### Question 1 (3 points)
**a)** Test whether Rating has a significant effect on Profit at the 5% level using a two-tailed test. Show all steps.
**b)** David's friend claims: *"Each extra star on Google Maps brings at least €5,000 extra profit per month."* Test whether the data supports this claim using an appropriate hypothesis test.
**c)** Calculate a 90% confidence interval for the Rating coefficient and interpret it.
---
### Question 2 (2 points)
**a)** Calculate the t-statistic for Seats and determine if it is significant at the 10% level.
**b)** David thinks about removing "Seats" from the model. Based on your statistical test, what would you advise? Justify your answer.
---
### Question 3 (2 points)
David has two restaurants:
- **Restaurant X**: Rating = 4.2, Seats = 80
- **Restaurant Y**: Rating = 4.8, Seats = 60
**a)** Calculate the predicted profit for each restaurant.
**b)** Which restaurant is predicted to be more profitable? By how much?
---
### Question 4 (3 points)
**Critical thinking:** David notices that restaurants with higher ratings tend to have more seats (correlation = 0.6).
**a)** Why might this correlation be a problem for interpreting the coefficient on Rating?
**b)** If you had to advise David on improving his model, what additional variable might you suggest collecting data on? Explain why.
---
# Answer Key
<details>
<summary>Click to reveal answers (try first!)</summary>
## Exercise 1 Answers
### Question 1
**a)**
- $H_0: \beta_1 = 0$ vs $H_1: \beta_1 \neq 0$
- $t = \frac{2.8 - 0}{0.95} = 2.947 \approx 2.95$ ✓ (matches table)
- $df = 40 - 3 - 1 = 36$, critical value (two-tailed, 5%) ≈ 2.028
- $|2.95| > 2.028$ → **Reject $H_0$**
- **Conclusion:** Foot traffic has a statistically significant effect on sales at the 5% level.
**b)** Each additional 100 pedestrians per day is associated with €2,800 higher monthly sales, holding competitors and parking constant.
---
### Question 2
**a)**
- $df = 36$, $t_{0.025} = 2.028$
- Margin = $2.028 \times 0.95 = 1.927$
- 95% CI: $[2.8 - 1.93, 2.8 + 1.93] = [0.87, 4.73]$
**b)** Since 1.5 lies **outside** the CI [0.87, 4.73], we can reject $H_0: \beta_1 = 1.5$ at 5% level. An effect as small as 1.5 is not plausible.
**c)** *"Emma, we're 95% confident that each 100 extra daily pedestrians brings between €870 and €4,730 extra monthly revenue. The effect is definitely positive, but could range from modest to quite substantial."*
---
### Question 3
**a)**
- $H_0: \beta_2 = -1.0$ (effect is exactly -€1,000)
- $H_1: \beta_2 \neq -1.0$
**b)**
- $t = \frac{-1.4 - (-1.0)}{0.82} = \frac{-0.4}{0.82} = -0.488$
- $|t| = 0.488$, critical (10%, two-tailed) = 1.69
- $0.488 < 1.69$ **Fail to reject $H_0$**
**c)** **No!** The data does NOT provide sufficient evidence to reject the marketing director's claim. Emma cannot confidently say they're wrong — an effect of -€1,000 per competitor is statistically plausible.
---
### Question 4
**a)**
$\hat{Sales} = 12.5 + 2.8(50) + (-1.4)(3) + 0.6(20)$
$= 12.5 + 140 - 4.2 + 12 = 160.3$
**Predicted monthly sales: €160,300**
**b)**
- Location A: $12.5 + 2.8(60) + (-1.4)(5) + 0.6(15) = 12.5 + 168 - 7 + 9 = 182.5$
- Location B: $12.5 + 2.8(40) + (-1.4)(2) + 0.6(30) = 12.5 + 112 - 2.8 + 18 = 139.7$
**Location A is predicted to be more profitable** by €42,800/month.
---
### Question 5
**a)** 64% of the variation in monthly sales across stores is explained by foot traffic, competitors, and parking spots.
**b)** **No.** High $R^2$ means good fit, not causation. The model shows association, not proof that changing these variables would cause sales to change. There could be omitted variables, reverse causality, or other issues.
---
## Exercise 2 Answers
### Question 1
**a)**
- $H_0: \beta_1 = 0$ vs $H_1: \beta_1 \neq 0$
- $t = 4.2 / 1.8 = 2.333$
- $df = 32 - 2 - 1 = 29$, critical (5%) = 2.045
- $2.333 > 2.045$ → **Significant**
**b)** Friend claims $\beta_1 \geq 5$ (at least €5,000)
- $H_0: \beta_1 \geq 5$ vs $H_1: \beta_1 < 5$
- $t = \frac{4.2 - 5}{1.8} = \frac{-0.8}{1.8} = -0.444$
- One-tailed critical (5%, left tail) = -1.699
- $-0.444 > -1.699$ → **Fail to reject $H_0$**
- Data does NOT support rejecting the friend's claim — an effect of €5,000+ is still plausible.
**c)** 90% CI:
- $t_{0.05, 29} = 1.699$
- Margin = $1.699 \times 1.8 = 3.058$
- CI: $[4.2 - 3.06, 4.2 + 3.06] = [1.14, 7.26]$
- *"We're 90% confident each rating point brings €1,140 to €7,260 extra monthly profit."*
---
### Question 2
**a)**
- $t = 0.15 / 0.08 = 1.875$
- $df = 29$, critical (10%, two-tailed) = 1.699
- $1.875 > 1.699$ → **Significant at 10% level**
**b)** **Keep Seats.** Despite the smaller coefficient, it's statistically significant at 10% and practically meaningful (each seat adds €150/month). Removing it would lose explanatory power.
---
### Question 3
**a)**
- Restaurant X: $\hat{Profit} = -8.5 + 4.2(4.2) + 0.15(80) = -8.5 + 17.64 + 12 = 21.14$
- **€21,140 predicted profit**
- Restaurant Y: $\hat{Profit} = -8.5 + 4.2(4.8) + 0.15(60) = -8.5 + 20.16 + 9 = 20.66$
- **€20,660 predicted profit**
**b)** **Restaurant X is more profitable** by €480/month, despite the lower rating, because it has more seats.
---
### Question 4
**a)** This is **multicollinearity**. When Rating and Seats move together, it's hard to separate their individual effects. The coefficient on Rating might partly capture the effect of having more seats, and vice versa. Standard errors may be inflated.
**b)** Suggest collecting:
- **Price level** (expensive restaurants might rate higher AND have fewer seats)
- **Location/neighborhood** (upscale areas might have both higher ratings and larger spaces)
- **Staff quality** or **chef experience** (could drive ratings independently of size)
Any variable that affects Rating but not Seats, or vice versa, would help disentangle the effects.
</details>
---
**Total points: 20** | **Time recommended: 4560 minutes**
Good luck practicing! 📝

Binary file not shown.

View File

@ -1,9 +1,9 @@
{ {
"last_updated": "2026-03-27T04:00:01.337857Z", "last_updated": "2026-03-27T10:03:24.554188Z",
"source": "api", "source": "api",
"session_percent": 4, "session_percent": 0,
"session_resets": "2026-03-27T08:00:00.292526+00:00", "session_resets": "2026-03-27T13:00:00.507330+00:00",
"weekly_percent": 0, "weekly_percent": 2,
"weekly_resets": "2026-04-03T03:00:00.292545+00:00", "weekly_resets": "2026-04-03T03:00:00.507351+00:00",
"sonnet_percent": 0 "sonnet_percent": 0
} }