The team needed one trusted platform to combine wearable streams and operational batch feeds without splitting analytics across separate systems.
Healthcare cloud data platform
KafkaADLS Gen2Delta LakeSparkDruidSupersetAzure
Motivation
Thinking model
- Unify streaming and batch ingestion before optimizing downstream models.
- Treat data governance as part of architecture, not post-processing.
- Design for operational simplicity so product and analytics teams can move quickly.
Architecture
Ingest
Wearable + app events
Kafka ingestion
Storage
ADLS Gen2 + Delta Lake
Process
Spark transforms
Serve
Druid / Superset analytics
Ops
Azure Monitor + Log Analytics
Ingest
Wearable + app events
Kafka ingestion
Storage
ADLS Gen2 + Delta Lake
Process
Spark transforms
Serve
Druid / Superset analytics
Ops
Azure Monitor + Log Analytics
Flow edges
event streams: Wearable + app events → Kafka ingestionraw landing: Kafka ingestion → ADLS Gen2 + Delta Lakecuration: ADLS Gen2 + Delta Lake → Spark transformsserving models: Spark transforms → Druid / Superset analyticshealth signals: Spark transforms → Azure Monitor + Log Analytics
- Streaming and batch sources are normalized into one platform contract.
- Processing boundaries are explicit, so downstream ownership stays clear.
Build
Core components
- Established ingestion contracts for mixed source cadence (real-time and scheduled).
- Implemented Spark transformation layers on Delta-backed storage.
- Aligned application platform deployments across Azure App Service, Static Web Apps, Functions, and PostgreSQL.
Quality controls
- Schema and contract validation before curated table promotion.
- Data publishing gated by freshness and completeness checks.
Observability
- Pipeline and service alerts routed through Azure Monitor and Log Analytics.
- Failure triage runbooks shared across data and app teams.
Outcomes
Platform scope
One shared platform for streaming + batch analytics delivery.
Delivery alignment
Cross-functional standards coordinated across a 4-engineer team.
Decision speed
Real-time analytics access for product and operations reporting.
Tradeoffs
- Prioritized platform consistency over quick, one-off source-specific pipelines.
- Accepted stricter ingestion contracts to reduce long-term downstream model drift.
Confidentiality note
- Domain entities and internal naming are abstracted; architecture patterns are preserved.
Work with me
Building a data platform like this?
I work with teams building data systems that need to be reliable, governed, and fast to iterate.
Start a project