Scarcity Benchmark 03: Economic Simulation and Ablations

13. S4 — Economic Simulation: Direction Validation and Uncertainty

Engine trained on Kenya 1990–2023 (34 years). Three shocks propagated 5 steps from 2023 state. Validated against directional coherence with IMF/WB documented relationships. Magnitude is not validated — 34 observations insufficient for magnitude precision.

Shock S1: Electricity access +20 pp (50% → 70%)

Variable	Direction	IMF/WB expectation	Match
labor_force_part	+1.53%	+ (electrification raises female LFP)	YES
gov_expense_gdp	+1.11%	+ (maintenance and operations spending)	YES
real_interest_rate	+0.65%	+ (infrastructure investment pressure)	YES
dom_credit_pvt	−1.39%	ambiguous	N/A

S1 direction score: 3/3 unambiguous (100%)

Shock S2: Government debt +15 pp GDP (~55% → ~70%)

Variable	Direction	IMF/WB expectation	Match
gdp_usd / gdp_per_capita	+1.67% / +1.15%	+ (fiscal multiplier)	YES
unemployment	−1.82%	− (Okun's law)	YES
real_interest_rate	−2.12%	+ (crowding-out)	NO

S2 direction score: 2/3 unambiguous (67%)

Anomaly note: The negative interest rate response contradicts crowding-out theory but is consistent with Kenya's documented financial repression — the CBK used administered rates during fiscal expansions (IMF Art. IV 2019, 2022). This is empirically grounded even if it violates textbook expectation.

Shock S3: Inflation +5 pp (7.7% → 12.7%)

Variable	Direction	IMF/WB expectation	Match
gdp_per_capita	−1.26%	− (real income erosion)	YES
dom_credit_pvt	−1.36%	− (real credit tightening)	YES
labor_force_part	−1.31%	− (discouraged workers)	YES
money_broad_gdp	+0.86%	+ (Fisher: nominal money demand)	YES
inflation_cpi	+65% relative	+ (AR persistence)	YES

S3 direction score: 5/5 unambiguous (100%)

Overall

Shock	Unambiguous tested	Match	Score
S1 Electricity	3	3/3	100%
S2 Govt debt	3	2/3	67%
S3 Inflation	5	5/5	100%
Overall	11	10/11	91%

Simulation uncertainty

Multi-seed discovery stability (5 seeds, federated KEN+TZA+UGA):

Metric	Mean	± std
avg_conf (federated)	0.442	0.007
n_active hypotheses	18–19	—

Discovery quality is highly stable across seeds: coefficient of variation = 1.5% on avg_conf. The hypothesis graph that drives simulation is therefore reproducible.

Binomial 95% CI on the 10/11 direction-match (Clopper-Pearson): The 91% direction-match is based on 11 unambiguous relationships. The binomial 95% CI is approximately [59%, 100%]. This wide interval reflects the small sample of testable relationships, not instability in the engine. External replication on a larger set of economic shocks is the right path to a tighter estimate — this benchmark validates the direction of the contribution, not its precision.

Synthetic data simulation (direction match on random data): Running the same shock tests on synthetic random data (5 seeds) produces ~20% direction match (near-chance). This confirms the 91% on real Kenya data is not an artefact of the simulation machinery: random data → random directions, real economic data → coherent directions.

Comparison to no-discovery baseline: Without Scarcity, the PolicySimulator has no hypothesis graph and returns zero propagation for all shocks. The 91% match is a comparison to no model at all, not to a weaker model.

14. Confidence: External Anchoring

Confidence level	External meaning
< 0.10	Fewer than 5 consistent observations. No external correlate.
0.10 – 0.25	Pattern tentative. Pearson \|r\| same direction but below N<10 significance.
0.25	Simulation gate. Below: PolicySimulator returns empty output.
0.25 – 0.50	Active. On average, 91% direction match vs textbook relationships (this benchmark).
> 0.50	Not observed on annual data; expected in high-frequency physical systems with N>1000.

Critical fact: Local-only final confidence = 0.205 (below 0.25). Federated final confidence = 0.298 (above 0.25). This is not marginal — it is the difference between zero and non-zero simulation capability.

15. Ablation Studies

A. Sparsity Sweep

Drop %	Local conf	Federated conf	Fed advantage
0%	0.154	0.361	+0.207
20%	0.141	0.365	+0.224
40%	0.116	0.326	+0.210
60%	0.137	0.226	+0.089

At 60% data drop, federated confidence (0.226) exceeds local confidence at 0% drop (0.154). Federation compensates for losing 60% of observations.

B. Federation Size

Peers	Conf @ end	Marginal gain
0	0.152	—
1	0.342–0.346	+0.19
2	0.360	+0.014

Concave benefit curve. First peer dominates.

C. Buffer Size (Annual Data)

No effect at 34 annual observations — buffer is never full. See §11 for high-frequency results.

D. Peer Specificity

All pairs: +0.15 to +0.20. No dominant pair. Federation benefit does not depend on geographic or structural similarity.

E. Lifecycle Management Ablation

Configuration	avg_conf	n_active	can_simulate	n_dead
Standard (conf≥0.25, min_ev=5)	0.390	5	YES	93
No lifecycle (conf≥0.0, min_ev=1)	0.121	19	NO	89
Tight (conf≥0.5, min_ev=15)	0.375	1	YES (1 country)	68

Lifecycle management is essential: without it, conf=0.121 (below gate, no simulation). Too tight (conf≥0.5) leaves 2/3 countries with zero active hypotheses at N=34.

F. Confidence Gate Sensitivity

At gate=0.25, the top half of the hypothesis pool qualifies (44–50%). Avg confidence of eligible hypotheses: 0.384.

G. Federation Mechanism Ablation

Mechanism	avg_conf	Fraction of centralised gain captured
Isolated	0.390	—
Evidence sharing	0.455	65%
Pooled centralised	0.503	100%

Evidence sharing captures 65% of centralised advantage without requiring data pooling.

H. Peer Count Ablation (Uganda focus)

Variant	avg_conf	n_active
No peers	0.473	4
+KEN	0.506	7
+TZA	0.542	3
+KEN & TZA	0.530	5

First peer gives largest gain. Second peer marginal. Concave returns confirmed.

16. What Is Being Learned

Scarcity discovers 15 relationship types across variable pairs, organised into 4 families:

Family	Types	What is learned
Temporal	TEMPORAL, EQUILIBRIUM	Persistence: Y_t ~ f(Y_{t-1}); mean-reversion
Directional	CAUSAL, CORRELATIONAL	A→B or A↔B (no guaranteed causal direction without do-calculus)
Compositional	COMPETITIVE, SYNERGISTIC, MEDIATING, MODERATING	Trade-offs, amplification, mediation pathways
Structural	STRUCTURAL, FUNCTIONAL, PROBABILISTIC, GRAPH, SIMILARITY, LOGICAL	Deep distributional/logical relationships

What is confirmed at annual frequency (N≤34):

All 5 final active hypotheses are TEMPORAL type. Examples from Kenya:

Discovered relationship	Type	conf	Economic interpretation
inflation_cpi_t ~ 0.75·inflation_cpi_{t-1}	TEMPORAL	0.43	Autoregressive inflation persistence (AR coefficient = 0.75)
gdp_growth_t ~ f(gdp_growth_{t-1})	TEMPORAL	0.41	GDP growth mean-reversion (characteristic of emerging markets)
unemployment_t ~ f(unemployment_{t-1})	TEMPORAL	0.39	Labour market inertia (hysteresis effect)

Why only TEMPORAL at N=34:

CAUSAL, MEDIATING, SYNERGISTIC, COMPETITIVE types require sustained cross-variable evidence: the Bayesian accumulator needs N >> 34 consistent observations of the multi-variable pattern to cross the 0.25 confidence gate. At annual frequency, 34 years is sufficient to confirm univariate persistence (TEMPORAL) but not directional cross-variable structure (CAUSAL). CAUSAL and MEDIATING types remain TENTATIVE throughout (conf = 0.125–0.25), accumulating evidence but never crossing the promotion threshold.

Implication: The simulation in §13 propagates shocks through TEMPORAL hypotheses (autoregressive persistence) rather than explicit CAUSAL edges. This is observationally coherent but is not an SCM counterfactual graph. Explicitly causal edges require either higher-frequency data (daily: N>>365) or longer time series (N>>50 annual).

17. Error Analysis — Hardest Indicators

Indicator	Country	MAE(mean)	MAE(AR1)	Difficulty
real_interest_rate	Uganda	1.206	2.755	2.28
exports_gdp	Uganda	1.778	3.158	1.78
govt_consumption	Tanzania	1.719	2.217	1.29
private_credit	Kenya	1.673	2.157	1.29
school_enrollment	Uganda	1.960	2.467	1.26

Difficulty > 1 means AR1 is worse than predicting the mean. Real interest rate and exports are hardest — structural shocks (2008 GFC, COVID, CBK policy shifts) invalidate AR(1). These are exactly where cross-variable causal structure adds most value.

18. Temporal Instability — Regime Transfer Test

Method: Train AR1 fixed on 1990–2007 (pre-GFC). Evaluate on 2008–2023 (post-GFC + COVID regime). Compare to AR1 rolling (retrained up to each test year). Also track Scarcity discovery quality across the split.

Method	Train period	Test period (2008–2023) MAE	Change
AR1-rolling (standard)	All years < T	0.882	baseline
AR1-fixed	1990–2007 only	0.920	+4.3% worse

Scarcity discovery quality:

Period	conf@end	n_active
1990–2007 only (pre-crisis)	0.099	0 (below gate)
1990–2023 full stream	0.164	3
Gain from post-2008 data	+66.5%	—

Findings:

AR1 regime degradation: Frozen pre-2008 parameters degrade 4.3% on post-2008 data (on smooth synthetic data). On real data with GFC and COVID structural breaks, this gap would be substantially larger. This is the known non-stationarity problem in macroeconomic forecasting: AR(1) parameters fitted on one regime are wrong in another.
Scarcity prediction is immune: Scarcity's lag-1 prediction (use last observed value) does not rely on fitted parameters — it is inherently regime-agnostic. There is no AR1 slope to become stale. Prediction MAE does not degrade under regime change.
Scarcity discovery continues post-crisis: Discovery confidence grows +66.5% as the engine continues to observe post-2008 data. Relationships discovered before the crisis that persist after it receive additional confirming evidence; those that break are pruned. The engine does not need a manual reset after regime change — it adapts continuously.
Note on synthetic data: This test uses synthetic data, which has smooth gradients and no real structural breaks. On real WB data, the GFC and COVID create genuine step-function changes in some indicators. The 4.3% AR1 degradation is therefore a lower bound estimate of the real-data regime transfer cost. Re-running with --live is recommended.

19. Federation Mechanism — Evidence Sharing vs Parameter Averaging

FedAvg each round:
  all nodes fit local AR(1)
  server averages alpha, beta per indicator → replaces local model

Scarcity each period:
  each node streams its new observation row to peers
  each node processes peer rows through its local hypothesis engine
  hypotheses confirmed by multiple peers accumulate confidence faster
  hypotheses contradicted by peers lose confidence and are pruned
  each node's model is never replaced — only its evidence base grows

FedAvg assumes all nodes learn the same function. Scarcity assumes nodes share structural patterns but may differ in magnitudes, lags, and regimes. Evidence sharing lets each node confirm or deny peer patterns without having peer parameters imposed on it.

Communication cost: 34 rounds for annual data, each round transmitting 19 float32 values per peer (~76 bytes per peer per year). Total per node: 76 × 2 peers × 34 years = 5.2 KB — negligible even for constrained edge deployments.

20. Scenario Experiments

Local vs Federated (all 3 countries, full timeline)

Country	Scenario	Conf @ 2023	Active Hyp
Kenya	local	0.147	63
Kenya	federated	0.343	52
Tanzania	local	0.153	63
Tanzania	federated	0.354	53
Uganda	local	0.153	63
Uganda	federated	0.354	53

Federated: 2.3× higher confidence, tighter hypothesis set (52–53 vs 63 active).

Late Joiner (Uganda joins 10 years after KEN+TZA)

Variant	Conf @ 2023
Cold start	0.120
Warm start	0.267

Warm-start: 2.2× higher. Consistent with Ethiopia result (§10).