Healthcare cloud data platform

KafkaADLS Gen2Delta LakeSparkDruidSupersetAzure

Motivation

The team needed one trusted platform to combine wearable streams and operational batch feeds without splitting analytics across separate systems.

Thinking model

  • Unify streaming and batch ingestion before optimizing downstream models.
  • Treat data governance as part of architecture, not post-processing.
  • Design for operational simplicity so product and analytics teams can move quickly.

Architecture

Ingest

Wearable + app events
Kafka ingestion

Storage

ADLS Gen2 + Delta Lake

Process

Spark transforms

Serve

Druid / Superset analytics

Ops

Azure Monitor + Log Analytics

Flow edges

event streams: Wearable + app events Kafka ingestionraw landing: Kafka ingestion ADLS Gen2 + Delta Lakecuration: ADLS Gen2 + Delta Lake Spark transformsserving models: Spark transforms Druid / Superset analyticshealth signals: Spark transforms Azure Monitor + Log Analytics
  • Streaming and batch sources are normalized into one platform contract.
  • Processing boundaries are explicit, so downstream ownership stays clear.

Build

Core components

  • Established ingestion contracts for mixed source cadence (real-time and scheduled).
  • Implemented Spark transformation layers on Delta-backed storage.
  • Aligned application platform deployments across Azure App Service, Static Web Apps, Functions, and PostgreSQL.

Quality controls

  • Schema and contract validation before curated table promotion.
  • Data publishing gated by freshness and completeness checks.

Observability

  • Pipeline and service alerts routed through Azure Monitor and Log Analytics.
  • Failure triage runbooks shared across data and app teams.

Outcomes

Platform scope

One shared platform for streaming + batch analytics delivery.

Delivery alignment

Cross-functional standards coordinated across a 4-engineer team.

Decision speed

Real-time analytics access for product and operations reporting.

Tradeoffs

  • Prioritized platform consistency over quick, one-off source-specific pipelines.
  • Accepted stricter ingestion contracts to reduce long-term downstream model drift.

Confidentiality note

  • Domain entities and internal naming are abstracted; architecture patterns are preserved.

Work with me

Building a data platform like this?

I work with teams building data systems that need to be reliable, governed, and fast to iterate.

Start a project