Python Applied Portfolio Trilogy
All three Python Applied portfolio projects — provider-variance analytics, loss-ratio + reserve-adequacy, parcel-network postcode-adjusted KPIs. Three written findings docs reviewed by the same instructor. Save £78 vs buying standalones individually.
Projects in this bundle
HealthPY Applied — Provider Variance Analytics
## The scenario You have been promoted onto the **CubedNet Health analytics team** and asked to investigate why one of the three trusts (the South East acute) consistently runs ~14% above tariff on day-case ophthalmology spells over a rolling 12-month window. Finance wants to know whether this is a coding drift problem, a casemix problem, or a process problem before they raise it at the next contract review. You receive twelve monthly SUS+ extracts, the rolling tariff history (some HRGs were re-priced mid-year), a provider reference table, and an audit-flag log from the Core report's outputs. ## Deliverables A pandas-based analysis (`healthpy_applied.py` orchestrating notebooks or scripts) that produces: 1. `monthly_variance.csv` — per-provider, per-HRG-chapter monthly spend vs expected-on-current-tariff, with rolling 3-month and 12-month deltas. 2. `casemix_drift.csv` — month-on-month HRG distribution shift per provider (Wasserstein distance or KL-divergence is fine; justify your choice). 3. `coding_anomaly_flags.csv` — spells where the HRG looks inconsistent with the diagnosis/procedure pair (build a simple rule library or a frequency-based outlier rule). 4. `findings.md` — a 400-600 word note your manager could forward to Finance: where the variance is coming from, what's noise vs signal, and one recommended next step. ## Acceptance criteria (summary) pandas/numpy allowed · charts saved to `./output/figures/` · findings.md present and substantive · rolling windows computed correctly · tariff history respected (use the rate in force on the discharge date, not a single snapshot) · reproducible: `python healthpy_applied.py` regenerates everything · ≥10 conventional commits. Full brief, dataset orientation, and starter notebook appear inside the lesson once enrolled.
InsurancePY Applied — Loss Ratio & Reserve Adequacy
## The scenario CubedNet Insurance's Q3 board pack flagged that the motor book's **loss ratio drifted from 62% to 71% over twelve months**, with broker-channel business contributing disproportionately. You have been asked to (a) decompose the loss-ratio movement by channel × cover type × incident category, (b) compare case-reserve set on day one against the eventual paid amount, and (c) identify any FNOL cohort where the reserve adequacy ratio is materially off. You receive twelve months of FNOLs, payment transactions (initial + supplementary), reserve history (case reserve revisions), policy/channel reference, and the cover-rules history. ## Deliverables 1. `loss_ratio_decomposition.csv` — month × channel × cover × incident, with earned premium, paid + outstanding loss, and the loss ratio. Aggregate up the dimension tree (channel total, cover total, grand total) for sense-checking. 2. `reserve_adequacy.csv` — for FNOLs at least 9 months old: initial case reserve, ultimate paid, ratio, plus a triangle of paid-to-date by development month. 3. `cohort_alerts.csv` — cohorts whose reserve adequacy ratio is more than 1.5σ from the book mean. Use enough exposure (≥30 FNOLs) to avoid noise. 4. `findings.md` — 500-700 words explaining what's actually driving the drift and which assumptions you'd push back on. ## Acceptance criteria (summary) pandas/numpy allowed · earned-premium calculation is time-weighted (a policy mid-month earns ½ of monthly premium) · supplementary payments handled · ≥3 charts checked in · findings.md substantive and specific · reproducible · ≥10 conventional commits. Full brief, dataset orientation, and starter notebook appear inside the lesson once enrolled.
LogisticsPY Applied — Network Performance & Route Diagnostics
## The scenario Network Performance at CubedNet Logistics suspects that two of the eight depots are running below the network-average first-attempt success rate by **more than the spread you'd expect from postcode mix alone**. Operations want this confirmed or ruled out before they reorganise the depot management structure. You receive twelve months of attempt-level data, depot and route reference tables, postcode-area difficulty scores (a propensity-to-fail score from the operations team), public-holiday calendars, and the rolling SLA-rule history. ## Deliverables 1. `route_kpi_monthly.csv` — per-route monthly stats (first-attempt success rate, breach rate, average attempts-to-deliver, compensation paid). 2. `depot_postcode_adjusted.csv` — depot-level performance with and without postcode-mix adjustment (use direct standardisation against the network postcode distribution). 3. `route_anomalies.csv` — routes whose adjusted first-attempt-success-rate is materially below the network adjusted rate (define your threshold, justify it). 4. `findings.md` — 500-700 words: are the suspect depots genuinely underperforming, or is the spread inside what postcode mix explains? Include one concrete operational recommendation. ## Acceptance criteria (summary) pandas/numpy allowed · timezone handling correct (DST transition weeks appear in the dataset) · holiday weeks treated explicitly · direct-standardisation arithmetic correct · ≥3 charts · findings.md substantive · reproducible · ≥10 conventional commits. Full brief, dataset orientation, and starter notebook appear inside the lesson once enrolled.