How I Built Stackr's Shape-Aware SaaS Analytics Engine

Can we take messy SaaS exports from real companies and turn them into useful product, adoption, and cost signals without forcing every customer into a rigid template?

Stackr started with a simple question:

Can we take messy SaaS exports from real companies and turn them into useful product, adoption, and cost signals without forcing every customer into a rigid template?

That sounds straightforward until you look at real data.

The first version of the system assumed something many internal analytics tools assume:

one row equals one event
one column holds the feature or object of interest
the rest of the file is just supporting context

That works for tidy audit logs. It breaks quickly when you start ingesting real exports from tools like Zoom, Microsoft 365, Teams, HubSpot, and other SaaS platforms.

Real exports are not uniform. Some are row-based activity streams. Some are wide tables with dozens of feature columns. Some are metric tables. Some are date-pivoted matrices. Some are semantically obvious to a human and still awkward for software to interpret safely.

So Stackr evolved from a single-column analyser into a shape-aware analytics system.

This post explains how I built that system, what I changed in both the backend and frontend, and what I learned while making it production-ready.

The Original Problem

The original ingestion path was too narrow.

It expected a user to upload a CSV, pick one column, and run analysis.

That led to three structural problems:

Wide feature exports were interpreted incorrectly.
Activity logs with multiple meaningful dimensions were flattened too early.
The UI pushed users into decisions they should not have to make without context.

For example:

a Zoom feature export may contain columns like Screen sharing, Poll, Whiteboard, Reaction
a HubSpot activity export may contain Category, Subcategory, and Action
a Microsoft 365 report may represent usage through numeric metric columns instead of explicit feature labels

These files should not be treated the same way.

The system needed to understand the shape of the dataset first, and only then decide how to analyse it.

The Core Product Decision

The most important architectural decision was this:

Do not start with feature analysis. Start with dataset interpretation.

That led to a shape-first model with four supported dataset families:

event_log
wide_feature_matrix
wide_usage_matrix
date_matrix

In product language, those became:

Activity log
Feature table export
Usage table export
Date-based usage export

That sounds like a naming detail, but it changed the entire product.

It allowed Stackr to stop pretending every file should be configured the same way.

Diagram

Rendering Mermaid diagram...

Building The Backend

The backend had to do more than parse CSVs. It had to make structured decisions and preserve those decisions in a way the rest of the product could understand.

1. Dataset profiling

The first layer profiles the incoming file:

column types
email candidates
date candidates
boolean feature-flag columns
numeric usage columns
likely context fields

This is not the final answer. It is the evidence layer.

2. Shape resolution

The next layer resolves the most likely dataset shape and decides whether:

the file can move forward directly
the file needs an adapter
the file is too ambiguous to trust

That last point mattered a lot.

I wanted the system to fail closed, not generate confident-looking but incorrect analytics.

3. Source-aware overrides

Real exports from known products often carry stable signatures.

So I added a curated source-override layer for known families such as:

Zoom meeting feature exports
Microsoft Teams activity reports
M365 usage files

This gave the system deterministic shortcuts where the shape alone was not enough.

4. Manual overrides

Even with shape detection and source-aware rules, there are always exceptions.

So I added a manual override system with clear precedence:

manual override
source override
validated AI mapping
generic heuristics

That kept the runtime deterministic while still allowing correction when a dataset family needed explicit handling.

5. AI-assisted mapping

AI is useful when used carefully.

I did not use AI to compute final analytics. I used it to help interpret awkward files and propose mappings, which are then validated by deterministic rules.

That distinction matters.

The AI can help answer:

which field looks like the main feature dimension
whether this wide file is usage-oriented or feature-oriented
which columns behave like context versus product behaviour

But all final reshaping, counting, and metric generation remain code-driven.

6. Canonical result contract

Once a file is interpreted, the backend emits a canonical result contract that includes:

dataset profile
shape resolution
source override
manual override
import setup used
processing diagnostics
performance diagnostics
review state when needed

That made the rest of the product much easier to build because the UI no longer had to reverse-engineer backend decisions.

Diagram

Rendering Mermaid diagram...

The Multi-Feature Problem

One of the biggest shifts was realising that “multi-feature” does not mean one thing.

There are at least two materially different cases.

Wide feature tables

These files contain many feature columns directly.

A single row might look like:

Host email
Start time
Screen sharing = Yes
Poll = No
Whiteboard = Yes

In this case, the backend must reshape the file into long usage rows.

Multi-dimension activity logs

These files are still row-based, but more than one column matters semantically.

A row might contain:

Category
Subcategory
Action

That is not a wide feature table. It is an event log with multiple useful dimensions.

The correct model is:

Subcategory as the main field
Category as a grouping field
Action as an action field

That insight changed both the backend and the UI.

Diagram

Rendering Mermaid diagram...

Building The Frontend Import Experience

Once the backend became shape-aware, the original upload UI became the weakest part of the product.

It still behaved as if every user should pick one column and hope for the best.

That was not acceptable.

So I rebuilt Step 4 of the upload flow around an import setup model.

The product now distinguishes between:

the recommended configuration
the adjusted configuration
the saved template

Instead of showing internal jargon, the UI now answers three product questions:

What did we recognise?
What will we do with this file?
How can the user adjust it if the interpretation is not right?

For wide feature files, the UI says things like:

we found 36 feature columns
examples include Screen sharing, Poll, Whiteboard
we will use all of them in the analysis

That is a much better user model than talking about matrices and shape resolution.

For activity logs, the UI now exposes:

main field
grouping field
action field
people field
date field
supporting fields

That is still controlled, but it gives the user far more leverage than the original one-column selector.

Diagram

Rendering Mermaid diagram...

Preview Performance Became A Product Problem

Once users could adjust import settings, Step 4 started making more preview requests.

That exposed a real performance problem:

the preview path was still downloading a full remote blob for each refresh.

For large files, that was obviously not acceptable.

So I changed the preview route for import-setup requests to:

skip the legacy full preview-analysis branch
use lightweight sample-based preview logic
use ranged, streamed CSV fetches capped to a preview window
trim partial payloads to complete lines before parsing

That means the preview path no longer has to re-download a 100+ MB CSV just because the user changed a grouping field or deselected a few columns.

This is exactly the kind of detail that does not look glamorous in a demo, but it makes a product feel real.

Reuse And Import Templates

After the import setup became customer-controlled, the next obvious question was:

why should a user have to reconfigure the same monthly export every time?

So I added two layers of reuse:

1. Inferred saved setup suggestions

If a similar file had already been analysed with a non-default setup, the product could suggest reusing that setup.

2. Explicit import templates

Users can now deliberately save a named import template and apply it to future uploads.

That turned the import flow from a one-off correction experience into a reusable system.

Diagram

Rendering Mermaid diagram...

Operational Safety Still Matters

Even with a strong self-serve flow, there are still ugly files.

So I built the backend to support:

needs_review states
explicit review reasons
processing diagnostics
performance diagnostics
an internal review queue
internal override management

That is not the main product path. It is the safety rail.

The main product should stay self-serve. But a serious product also needs an escalation path for files that are ambiguous, malformed, or genuinely new.

That balance mattered throughout the build.

What I Learned

1. “Multiple feature columns” is not the real problem

The real problem is dataset interpretation.

Once I framed it that way, the architecture became much cleaner.

2. Users should control product-shaped settings, not raw backend logic

Users do not want registry rules, override manifests, or AI mapping internals.

They want:

choose the right people field
choose the right date field
choose the right main field
keep the right supporting fields
see a preview before running the analysis

That is the right abstraction layer.

3. You have to preserve the backend decision trail

If the product is making interpretation decisions, those decisions have to be inspectable.

Otherwise debugging becomes guesswork.

4. Good UX depends on backend honesty

A lot of UI confusion was really backend ambiguity leaking through.

The product became easier to explain once the backend could say, clearly:

what kind of file this is
what path it will use
what the user can change
when it should stop and ask for review

5. Performance is part of the product

Preview latency is not a backend detail. It changes whether the product feels trustworthy.

If a user changes one field and waits too long for feedback, the setup flow feels fragile.

That is why the ranged preview optimisation was worth doing.

Where Stackr Stands Now

Today, Stackr can:

ingest multiple SaaS export shapes
distinguish between row-based and wide-table files
expose customer-safe import controls
generate richer reporting dimensions
reuse import logic across recurring uploads
fail safely when a file should not be interpreted automatically

The system is stronger because it stopped pretending every SaaS export is the same.

That was the real build.

Not just adding more AI. Not just adding more dashboards.

The real work was teaching the product how to understand the shape of messy data, preserve that interpretation, and turn it into something users can trust.

Final Thought

Stackr became better when I stopped treating ingestion as a preprocessing chore and started treating it as a product surface.

That is the main lesson from building this.

If your analytics product depends on customer-uploaded data, the import layer is not a formality. It is the foundation of trust.

How I Built Stackr's Shape-Aware SaaS Analytics Engine

Can we take messy SaaS exports from real companies and turn them into useful product, adoption, and cost signals without forcing every customer into a rigid template?

Stackr started with a simple question:

Can we take messy SaaS exports from real companies and turn them into useful product, adoption, and cost signals without forcing every customer into a rigid template?

That sounds straightforward until you look at real data.

The first version of the system assumed something many internal analytics tools assume:

one row equals one event
one column holds the feature or object of interest
the rest of the file is just supporting context

That works for tidy audit logs. It breaks quickly when you start ingesting real exports from tools like Zoom, Microsoft 365, Teams, HubSpot, and other SaaS platforms.

So Stackr evolved from a single-column analyser into a shape-aware analytics system.

This post explains how I built that system, what I changed in both the backend and frontend, and what I learned while making it production-ready.

The Original Problem

The original ingestion path was too narrow.

It expected a user to upload a CSV, pick one column, and run analysis.

That led to three structural problems:

Wide feature exports were interpreted incorrectly.
Activity logs with multiple meaningful dimensions were flattened too early.
The UI pushed users into decisions they should not have to make without context.

For example:

a Zoom feature export may contain columns like Screen sharing, Poll, Whiteboard, Reaction
a HubSpot activity export may contain Category, Subcategory, and Action
a Microsoft 365 report may represent usage through numeric metric columns instead of explicit feature labels

These files should not be treated the same way.

The system needed to understand the shape of the dataset first, and only then decide how to analyse it.

The Core Product Decision

The most important architectural decision was this:

Do not start with feature analysis. Start with dataset interpretation.

That led to a shape-first model with four supported dataset families:

event_log
wide_feature_matrix
wide_usage_matrix
date_matrix

In product language, those became:

Activity log
Feature table export
Usage table export
Date-based usage export

That sounds like a naming detail, but it changed the entire product.

It allowed Stackr to stop pretending every file should be configured the same way.

Diagram

Rendering Mermaid diagram...

Building The Backend

The backend had to do more than parse CSVs. It had to make structured decisions and preserve those decisions in a way the rest of the product could understand.

1. Dataset profiling

The first layer profiles the incoming file:

column types
email candidates
date candidates
boolean feature-flag columns
numeric usage columns
likely context fields

This is not the final answer. It is the evidence layer.

2. Shape resolution

The next layer resolves the most likely dataset shape and decides whether:

the file can move forward directly
the file needs an adapter
the file is too ambiguous to trust

That last point mattered a lot.

I wanted the system to fail closed, not generate confident-looking but incorrect analytics.

3. Source-aware overrides

Real exports from known products often carry stable signatures.

So I added a curated source-override layer for known families such as:

Zoom meeting feature exports
Microsoft Teams activity reports
M365 usage files

This gave the system deterministic shortcuts where the shape alone was not enough.

4. Manual overrides

Even with shape detection and source-aware rules, there are always exceptions.

So I added a manual override system with clear precedence:

manual override
source override
validated AI mapping
generic heuristics

That kept the runtime deterministic while still allowing correction when a dataset family needed explicit handling.

5. AI-assisted mapping

AI is useful when used carefully.

I did not use AI to compute final analytics. I used it to help interpret awkward files and propose mappings, which are then validated by deterministic rules.

That distinction matters.

The AI can help answer:

which field looks like the main feature dimension
whether this wide file is usage-oriented or feature-oriented
which columns behave like context versus product behaviour

But all final reshaping, counting, and metric generation remain code-driven.

6. Canonical result contract

Once a file is interpreted, the backend emits a canonical result contract that includes:

dataset profile
shape resolution
source override
manual override
import setup used
processing diagnostics
performance diagnostics
review state when needed

That made the rest of the product much easier to build because the UI no longer had to reverse-engineer backend decisions.

Diagram

Rendering Mermaid diagram...

The Multi-Feature Problem

One of the biggest shifts was realising that “multi-feature” does not mean one thing.

There are at least two materially different cases.

Wide feature tables

These files contain many feature columns directly.

A single row might look like:

Host email
Start time
Screen sharing = Yes
Poll = No
Whiteboard = Yes

In this case, the backend must reshape the file into long usage rows.

Multi-dimension activity logs

These files are still row-based, but more than one column matters semantically.

A row might contain:

Category
Subcategory
Action

That is not a wide feature table. It is an event log with multiple useful dimensions.

The correct model is:

Subcategory as the main field
Category as a grouping field
Action as an action field

That insight changed both the backend and the UI.

Diagram

Rendering Mermaid diagram...

Building The Frontend Import Experience

Once the backend became shape-aware, the original upload UI became the weakest part of the product.

It still behaved as if every user should pick one column and hope for the best.

That was not acceptable.

So I rebuilt Step 4 of the upload flow around an import setup model.

The product now distinguishes between:

the recommended configuration
the adjusted configuration
the saved template

Instead of showing internal jargon, the UI now answers three product questions:

What did we recognise?
What will we do with this file?
How can the user adjust it if the interpretation is not right?

For wide feature files, the UI says things like:

we found 36 feature columns
examples include Screen sharing, Poll, Whiteboard
we will use all of them in the analysis

That is a much better user model than talking about matrices and shape resolution.

For activity logs, the UI now exposes:

main field
grouping field
action field
people field
date field
supporting fields

That is still controlled, but it gives the user far more leverage than the original one-column selector.

Diagram

Rendering Mermaid diagram...

Preview Performance Became A Product Problem

Once users could adjust import settings, Step 4 started making more preview requests.

That exposed a real performance problem:

the preview path was still downloading a full remote blob for each refresh.

For large files, that was obviously not acceptable.

So I changed the preview route for import-setup requests to:

skip the legacy full preview-analysis branch
use lightweight sample-based preview logic
use ranged, streamed CSV fetches capped to a preview window
trim partial payloads to complete lines before parsing

That means the preview path no longer has to re-download a 100+ MB CSV just because the user changed a grouping field or deselected a few columns.

This is exactly the kind of detail that does not look glamorous in a demo, but it makes a product feel real.

Reuse And Import Templates

After the import setup became customer-controlled, the next obvious question was:

why should a user have to reconfigure the same monthly export every time?

So I added two layers of reuse:

1. Inferred saved setup suggestions

If a similar file had already been analysed with a non-default setup, the product could suggest reusing that setup.

2. Explicit import templates

Users can now deliberately save a named import template and apply it to future uploads.

That turned the import flow from a one-off correction experience into a reusable system.

Diagram

Rendering Mermaid diagram...

Operational Safety Still Matters

Even with a strong self-serve flow, there are still ugly files.

So I built the backend to support:

needs_review states
explicit review reasons
processing diagnostics
performance diagnostics
an internal review queue
internal override management

That is not the main product path. It is the safety rail.

The main product should stay self-serve. But a serious product also needs an escalation path for files that are ambiguous, malformed, or genuinely new.

That balance mattered throughout the build.

What I Learned

1. “Multiple feature columns” is not the real problem

The real problem is dataset interpretation.

Once I framed it that way, the architecture became much cleaner.

2. Users should control product-shaped settings, not raw backend logic

Users do not want registry rules, override manifests, or AI mapping internals.

They want:

choose the right people field
choose the right date field
choose the right main field
keep the right supporting fields
see a preview before running the analysis

That is the right abstraction layer.

3. You have to preserve the backend decision trail

If the product is making interpretation decisions, those decisions have to be inspectable.

Otherwise debugging becomes guesswork.

4. Good UX depends on backend honesty

A lot of UI confusion was really backend ambiguity leaking through.

The product became easier to explain once the backend could say, clearly:

what kind of file this is
what path it will use
what the user can change
when it should stop and ask for review

5. Performance is part of the product

Preview latency is not a backend detail. It changes whether the product feels trustworthy.

If a user changes one field and waits too long for feedback, the setup flow feels fragile.

That is why the ranged preview optimisation was worth doing.

Where Stackr Stands Now

Today, Stackr can:

ingest multiple SaaS export shapes
distinguish between row-based and wide-table files
expose customer-safe import controls
generate richer reporting dimensions
reuse import logic across recurring uploads
fail safely when a file should not be interpreted automatically

The system is stronger because it stopped pretending every SaaS export is the same.

That was the real build.

Not just adding more AI. Not just adding more dashboards.

The real work was teaching the product how to understand the shape of messy data, preserve that interpretation, and turn it into something users can trust.

Final Thought

Stackr became better when I stopped treating ingestion as a preprocessing chore and started treating it as a product surface.

That is the main lesson from building this.

If your analytics product depends on customer-uploaded data, the import layer is not a formality. It is the foundation of trust.