How I Built Stackr's Shape-Aware SaaS Analytics Engine
Can we take messy SaaS exports from real companies and turn them into useful product, adoption, and cost signals without forcing every customer into a rigid template?
Stackr started with a simple question:
Can we take messy SaaS exports from real companies and turn them into useful product, adoption, and cost signals without forcing every customer into a rigid template?
That sounds straightforward until you look at real data.
The first version of the system assumed something many internal analytics tools assume:
- one row equals one event
- one column holds the feature or object of interest
- the rest of the file is just supporting context
That works for tidy audit logs. It breaks quickly when you start ingesting real exports from tools like Zoom, Microsoft 365, Teams, HubSpot, and other SaaS platforms.
Real exports are not uniform. Some are row-based activity streams. Some are wide tables with dozens of feature columns. Some are metric tables. Some are date-pivoted matrices. Some are semantically obvious to a human and still awkward for software to interpret safely.
So Stackr evolved from a single-column analyser into a shape-aware analytics system.
This post explains how I built that system, what I changed in both the backend and frontend, and what I learned while making it production-ready.
The Original Problem
The original ingestion path was too narrow.
It expected a user to upload a CSV, pick one column, and run analysis.
That led to three structural problems:
- Wide feature exports were interpreted incorrectly.
- Activity logs with multiple meaningful dimensions were flattened too early.
- The UI pushed users into decisions they should not have to make without context.
For example:
- a Zoom feature export may contain columns like
Screen sharing,Poll,Whiteboard,Reaction - a HubSpot activity export may contain
Category,Subcategory, andAction - a Microsoft 365 report may represent usage through numeric metric columns instead of explicit feature labels
These files should not be treated the same way.
The system needed to understand the shape of the dataset first, and only then decide how to analyse it.
The Core Product Decision
The most important architectural decision was this:
Do not start with feature analysis. Start with dataset interpretation.
That led to a shape-first model with four supported dataset families:
event_logwide_feature_matrixwide_usage_matrixdate_matrix
In product language, those became:
Activity logFeature table exportUsage table exportDate-based usage export
That sounds like a naming detail, but it changed the entire product.
It allowed Stackr to stop pretending every file should be configured the same way.
Diagram
Building The Backend
The backend had to do more than parse CSVs. It had to make structured decisions and preserve those decisions in a way the rest of the product could understand.
1. Dataset profiling
The first layer profiles the incoming file:
- column types
- email candidates
- date candidates
- boolean feature-flag columns
- numeric usage columns
- likely context fields
This is not the final answer. It is the evidence layer.
2. Shape resolution
The next layer resolves the most likely dataset shape and decides whether:
- the file can move forward directly
- the file needs an adapter
- the file is too ambiguous to trust
That last point mattered a lot.
I wanted the system to fail closed, not generate confident-looking but incorrect analytics.
3. Source-aware overrides
Real exports from known products often carry stable signatures.
So I added a curated source-override layer for known families such as:
- Zoom meeting feature exports
- Microsoft Teams activity reports
- M365 usage files
This gave the system deterministic shortcuts where the shape alone was not enough.
4. Manual overrides
Even with shape detection and source-aware rules, there are always exceptions.
So I added a manual override system with clear precedence:
- manual override
- source override
- validated AI mapping
- generic heuristics
That kept the runtime deterministic while still allowing correction when a dataset family needed explicit handling.
5. AI-assisted mapping
AI is useful when used carefully.
I did not use AI to compute final analytics. I used it to help interpret awkward files and propose mappings, which are then validated by deterministic rules.
That distinction matters.
The AI can help answer:
- which field looks like the main feature dimension
- whether this wide file is usage-oriented or feature-oriented
- which columns behave like context versus product behaviour
But all final reshaping, counting, and metric generation remain code-driven.
6. Canonical result contract
Once a file is interpreted, the backend emits a canonical result contract that includes:
- dataset profile
- shape resolution
- source override
- manual override
- import setup used
- processing diagnostics
- performance diagnostics
- review state when needed
That made the rest of the product much easier to build because the UI no longer had to reverse-engineer backend decisions.
Diagram
The Multi-Feature Problem
One of the biggest shifts was realising that “multi-feature” does not mean one thing.
There are at least two materially different cases.
Wide feature tables
These files contain many feature columns directly.
A single row might look like:
Host emailStart timeScreen sharing = YesPoll = NoWhiteboard = Yes
In this case, the backend must reshape the file into long usage rows.
Multi-dimension activity logs
These files are still row-based, but more than one column matters semantically.
A row might contain:
CategorySubcategoryAction
That is not a wide feature table. It is an event log with multiple useful dimensions.
The correct model is:
Subcategoryas the main fieldCategoryas a grouping fieldActionas an action field
That insight changed both the backend and the UI.
Diagram
Building The Frontend Import Experience
Once the backend became shape-aware, the original upload UI became the weakest part of the product.
It still behaved as if every user should pick one column and hope for the best.
That was not acceptable.
So I rebuilt Step 4 of the upload flow around an import setup model.
The product now distinguishes between:
- the recommended configuration
- the adjusted configuration
- the saved template
Instead of showing internal jargon, the UI now answers three product questions:
- What did we recognise?
- What will we do with this file?
- How can the user adjust it if the interpretation is not right?
For wide feature files, the UI says things like:
- we found 36 feature columns
- examples include
Screen sharing,Poll,Whiteboard - we will use all of them in the analysis
That is a much better user model than talking about matrices and shape resolution.
For activity logs, the UI now exposes:
- main field
- grouping field
- action field
- people field
- date field
- supporting fields
That is still controlled, but it gives the user far more leverage than the original one-column selector.
Diagram
Preview Performance Became A Product Problem
Once users could adjust import settings, Step 4 started making more preview requests.
That exposed a real performance problem:
the preview path was still downloading a full remote blob for each refresh.
For large files, that was obviously not acceptable.
So I changed the preview route for import-setup requests to:
- skip the legacy full preview-analysis branch
- use lightweight sample-based preview logic
- use ranged, streamed CSV fetches capped to a preview window
- trim partial payloads to complete lines before parsing
That means the preview path no longer has to re-download a 100+ MB CSV just because the user changed a grouping field or deselected a few columns.
This is exactly the kind of detail that does not look glamorous in a demo, but it makes a product feel real.
Reuse And Import Templates
After the import setup became customer-controlled, the next obvious question was:
why should a user have to reconfigure the same monthly export every time?
So I added two layers of reuse:
1. Inferred saved setup suggestions
If a similar file had already been analysed with a non-default setup, the product could suggest reusing that setup.
2. Explicit import templates
Users can now deliberately save a named import template and apply it to future uploads.
That turned the import flow from a one-off correction experience into a reusable system.
Diagram
Operational Safety Still Matters
Even with a strong self-serve flow, there are still ugly files.
So I built the backend to support:
needs_reviewstates- explicit review reasons
- processing diagnostics
- performance diagnostics
- an internal review queue
- internal override management
That is not the main product path. It is the safety rail.
The main product should stay self-serve. But a serious product also needs an escalation path for files that are ambiguous, malformed, or genuinely new.
That balance mattered throughout the build.
What I Learned
1. “Multiple feature columns” is not the real problem
The real problem is dataset interpretation.
Once I framed it that way, the architecture became much cleaner.
2. Users should control product-shaped settings, not raw backend logic
Users do not want registry rules, override manifests, or AI mapping internals.
They want:
- choose the right people field
- choose the right date field
- choose the right main field
- keep the right supporting fields
- see a preview before running the analysis
That is the right abstraction layer.
3. You have to preserve the backend decision trail
If the product is making interpretation decisions, those decisions have to be inspectable.
Otherwise debugging becomes guesswork.
4. Good UX depends on backend honesty
A lot of UI confusion was really backend ambiguity leaking through.
The product became easier to explain once the backend could say, clearly:
- what kind of file this is
- what path it will use
- what the user can change
- when it should stop and ask for review
5. Performance is part of the product
Preview latency is not a backend detail. It changes whether the product feels trustworthy.
If a user changes one field and waits too long for feedback, the setup flow feels fragile.
That is why the ranged preview optimisation was worth doing.
Where Stackr Stands Now
Today, Stackr can:
- ingest multiple SaaS export shapes
- distinguish between row-based and wide-table files
- expose customer-safe import controls
- generate richer reporting dimensions
- reuse import logic across recurring uploads
- fail safely when a file should not be interpreted automatically
The system is stronger because it stopped pretending every SaaS export is the same.
That was the real build.
Not just adding more AI. Not just adding more dashboards.
The real work was teaching the product how to understand the shape of messy data, preserve that interpretation, and turn it into something users can trust.
Final Thought
Stackr became better when I stopped treating ingestion as a preprocessing chore and started treating it as a product surface.
That is the main lesson from building this.
If your analytics product depends on customer-uploaded data, the import layer is not a formality. It is the foundation of trust.