White Paper
Health Scores

Why Customer Success Built the Wrong Thing, And Keeps Rebuilding It

Health scores promise churn prediction but mostly rely on lagging signals like usage, support tickets, and surveys. This creates false confidence and costly implementations. Customer health is multidimensional and contextual, not a single metric. Customer Success should shift from predictive scoring to diagnostic evaluation, combining structured account assessments, human insight, and AI-assisted synthesis to understand real customer conditions.

6 min read
Why Customer Success Built the Wrong Thing, And Keeps Rebuilding It

The Health Score Problem

Why Customer Success Built the Wrong Thing, And Keeps Rebuilding It

The Promise

Somewhere along the way, Customer Success made a collective bet.

The bet was simple: if we could just get the right data into the right model, we could produce a single number (a health score) that would tell us whether a customer was going to renew or churn.

Green, yellow, red.
Safe, at-risk, in trouble.

It was an elegant idea. And it made intuitive sense. If credit scores could predict whether someone would repay a loan, surely we could predict whether a customer would renew a software contract.

So we built.

We connected product telemetry, support tickets, survey responses, engagement data, executive sponsor activity, contract terms, and dozens of other signals. We hired data scientists. We bought platforms. We spent, conservatively, hundreds of millions of dollars across the industry on health scoring infrastructure.

And then something uncomfortable happened.

Or rather, didn’t happen.

The scores didn’t work.

Not in the way they were supposed to. Not reliably. Not predictively. And the industry, rather than confronting that failure, largely chose to keep iterating on the same broken concept, convinced that the next version, the next model, the next data source would finally crack it.

This guide is an attempt to step back and ask a more foundational question:

What if health scores aren’t a data problem or a modeling problem?
What if they’re a conceptual problem, a misunderstanding of what health actually is?


What We Mean When We Say "Health"

Before we can evaluate whether health scores work, we have to be honest about what we're asking them to do.

When a CS leader says they want a health score, what they typically mean is:

Tell me which accounts are going to churn so I can intervene before it's too late.

That's a prediction problem.

Prediction problems require leading indicators, signals that change before the outcome they’re predicting.

But that’s not how most health scores actually function.

Most health scores are constructed from signals that are, by their nature, lagging indicators.

Consider the most common inputs:

Product Usage and Adoption

By the time usage has meaningfully declined, something has already gone wrong.

The customer has already started disengaging. They may have already made their renewal decision internally.

A drop in logins doesn’t predict churn.

It is churn, just in its early observable stages.

Support Ticket Volume and Sentiment

An angry support ticket means the customer is already frustrated.

A spike in escalations means trust has already eroded.

These signals don’t warn you that something is about to go wrong.

They tell you that something already has.

NPS and CSAT Responses

Survey scores reflect how a customer felt at a specific moment, often weeks or months in the past.

A detractor score doesn’t predict dissatisfaction.

It reports it.

Stakeholder Engagement

When an executive sponsor stops responding to emails, they haven’t just become busy.

They’ve deprioritized you.

That decision happened before the silence did.


The pattern is consistent:

The signals we feed into health scores are observations of problems that have already begun.

We have dressed up lagging indicators in the language of prediction.


The Human Health Analogy

There’s a reason this problem is so persistent, and a useful analogy makes it clearer.

No doctor would ever try to reduce your health to a single number.

Your body is a complex system with:

  • cardiovascular function
  • metabolic health
  • neurological status
  • immune response
  • mental health
  • musculoskeletal integrity

Each has its own indicators, baselines, and interactions.

A resting heart rate of 80 might be perfectly healthy for one person and a warning sign for another.

A cholesterol level means something different at 25 than it does at 65.

Context is everything.

Medicine learned this lesson centuries ago.

The response wasn’t to build a better single score.

It was to build diagnostic frameworks.

Structured approaches to understanding:

  • what’s happening
  • why it’s happening
  • what to do about it

Medicine distinguishes between:

Screening

Broad checks to identify where to look more closely.

Diagnosis

Deep investigation into a specific area of concern.

Monitoring

Ongoing tracking of known conditions.

Prognosis

An informed estimation of outcomes based on the full clinical picture.

Each of these is a different activity requiring different tools, data, and expertise.

No one confuses a screening test with a prognosis.

No one expects a thermometer to tell them whether a patient will recover.

Yet in Customer Success, we have tried to collapse all four activities into a single number.

We asked health scores to simultaneously:

  • screen
  • diagnose
  • monitor
  • predict

And then we were surprised when they failed at all of them.


How We Got Here

The dominance of the health score wasn’t accidental.

It emerged from a real problem:

Customer Success has very little visibility into what is actually happening inside a customer's organization.

Sales has clear signals:

  • pipeline
  • forecast
  • close dates
  • signed contracts

Marketing has:

  • attribution
  • funnel metrics
  • conversion rates

The feedback loops are tight.

Customer Success has almost none of that.

The outcome (renewal or churn) happens once a year or once every three years.

Between those moments, teams operate largely in the dark, trying to infer the internal state of a customer relationship from external signals that are:

  • noisy
  • incomplete
  • misleading

Faced with this ambiguity, the industry made a reasonable but flawed choice.

Instead of investing in diagnostic visibility, the industry invested in predictive modeling.

If we couldn’t see clearly, we tried to calculate our way to clarity.

This was understandable.

Diagnostic visibility is expensive.

It requires:

  • deep relationships
  • skilled CSMs
  • structured discovery
  • organizational trust

It doesn’t scale easily.

It’s hard to measure.

It’s hard to put on a dashboard.

A score, on the other hand, scales beautifully.

It fits in a dashboard cell.

It can be color-coded.

It can be reported to the board.

It creates the appearance of control.

And that appearance is a major reason health scoring persists.


The False Sense of Security

This is the most damaging consequence of health scoring.

It creates confidence where none is warranted.

When a CS leader sees a dashboard showing 78% of accounts in green, it feels reassuring.

The business looks healthy.

The team appears to be managing the portfolio.

Executives see a clean summary.

Board decks get a slide.

But what does green actually mean?

In most implementations it means:

  • usage hasn’t dropped
  • no angry support tickets
  • surveys are above threshold
  • the contract isn’t expiring soon

Green does not mean the customer is healthy.

It means the score hasn’t yet observed symptoms.

That difference matters.

The most dangerous churn is the type health scores cannot detect:

  • the customer quietly evaluating competitors
  • the executive who already decided to consolidate vendors
  • the champion who left three months ago

These accounts remain green until the moment they churn.

Then someone says:

“The health score didn’t catch it.”

Of course it didn’t.

It was never designed to.

It reflects the observable past, and the observable past looked fine.


The Implementation Tax

Health scores also carry a significant implementation cost.

Building one requires connecting:

  • product analytics
  • CRM
  • support platforms
  • billing systems
  • survey tools
  • communication logs

Each integration introduces:

  • complexity
  • latency
  • maintenance overhead

Typical implementations take months. Some take over a year.

Teams debate endlessly:

  • which signals matter
  • how to weight them
  • how to normalize segments
  • how to handle missing data
  • how to set thresholds

These are real problems and they consume large amounts of:

  • engineering time
  • data science effort
  • CS operations resources

Then the score launches.

Within weeks someone says:

“This account is green, but I know they’re unhappy.”

The response is predictable:

  • adjust weights
  • clean data
  • add signals

The cycle repeats indefinitely:

build → launch → question → recalibrate → rebuild

The score is never finished.

The opportunity cost is rarely measured.


What AI Changes (And What It Doesn’t)

AI has brought a new wave of enthusiasm for health scoring.

The argument:

With better models and more data we can finally build the score that works.

AI does provide real capabilities:

  • NLP on support tickets
  • analysis of call transcripts
  • sentiment detection
  • pattern detection in usage

But AI does not solve the core problem.

If the underlying signals are still lagging indicators, a better model still analyzes lagging data.

AI can detect declining sentiment.

But sentiment shifts after the problem begins.

AI can detect usage decline.

But usage decline is still a symptom, not a cause.

Worse, AI can amplify the false confidence.

Because the models sound sophisticated, organizations trust the output more.

There is also risk of overfitting.

Models can become excellent at explaining historical churn while failing to predict future churn.

AI’s real value is elsewhere.

Not scoring.

Diagnosis.

AI can help:

  • summarize account history
  • surface anomalies
  • synthesize interactions across touchpoints

These are diagnostic tools, not predictive scores.


What Health Actually Is

Health in complex systems isn’t a number.

It is a state.

An emergent property across multiple dimensions.

A customer account is healthy when:

  • the product delivers measurable value
  • stakeholders understand that value
  • the relationship has organizational depth
  • the customer's strategy aligns with your product
  • friction points are resolved
  • the customer sees a future with you

None of these can be reliably inferred from telemetry.

They require different forms of visibility.

Some require conversation.

Some require organizational intelligence.

Some require understanding the customer’s business.

This is diagnostic work.

And it cannot be reduced to a single number.


Toward a Diagnostic Model

Moving forward requires shifting from prediction to diagnosis.

Replace the Single Score with Structured Assessment

Evaluate accounts across dimensions:

  • value realization
  • stakeholder alignment
  • product fit
  • relationship depth
  • strategic alignment

Each dimension has its own signals and response playbooks.

Separate Screening From Diagnosis

Use automated signals as screening.

Usage changes, support patterns, and engagement shifts flag accounts worth investigating.

But they do not declare health.

Diagnosis requires deeper investigation.

Accept That Some Signals Cannot Be Measured

Key indicators often cannot be captured through product data.

For example:

  • whether the executive sponsor remains committed
  • whether competitors are gaining influence
  • whether long-term value is recognized

This information must come through relationships and structured discovery.

Shift From Prediction to Preparedness

Instead of predicting churn, understand why churn happens.

Identify recurring failure patterns.

Then build early warning systems around those patterns.

Use AI for Synthesis

AI should help CSMs see the full picture faster.

Summarizing context.

Highlighting changes.

Connecting signals across touchpoints.

AI should assist diagnosis, not attempt fortune telling.


The Honest Conversation

The hardest barrier is cultural.

Health scores persist because they serve an organizational need.

They give:

  • CS leaders something to report
  • executives a dashboard
  • boards a sense of control

Abandoning the score means giving up that comfort.

It requires saying:

We cannot produce a single number that predicts retention.

What we can provide instead:

  • a structured assessment of account strength
  • visibility into risk factors
  • clear action plans

This is a harder conversation.

It is also a more honest one.

The industry has spent years trying to make health scores work.

Each iteration adds new data, new models, and new tools.

But the same limitation remains.

At some point the right response to repeated failure is not a better version of the same approach.

It is a different approach entirely.

Health is not a score.

It never was.

The sooner we stop trying to make it one, the sooner we can start doing the real work of understanding our customers.


Final Note

This guide examines structural limitations of health scoring.

Organizations may still derive operational value from health scores as one signal among many.

The argument here is not that measurement is futile.

It is that a single predictive score is the wrong abstraction for a fundamentally diagnostic problem.# The Health Score Problem

Why Customer Success Built the Wrong Thing, And Keeps Rebuilding It

The Promise

Somewhere along the way, Customer Success made a collective bet.

The bet was simple: if we could just get the right data into the right model, we could produce a single number (a health score) that would tell us whether a customer was going to renew or churn.

Green, yellow, red.
Safe, at-risk, in trouble.

It was an elegant idea. And it made intuitive sense. If credit scores could predict whether someone would repay a loan, surely we could predict whether a customer would renew a software contract.

So we built.

We connected product telemetry, support tickets, survey responses, engagement data, executive sponsor activity, contract terms, and dozens of other signals. We hired data scientists. We bought platforms. We spent, conservatively, hundreds of millions of dollars across the industry on health scoring infrastructure.

And then something uncomfortable happened.

Or rather, didn’t happen.

The scores didn’t work.

Not in the way they were supposed to. Not reliably. Not predictively. And the industry, rather than confronting that failure, largely chose to keep iterating on the same broken concept, convinced that the next version, the next model, the next data source would finally crack it.

This guide is an attempt to step back and ask a more foundational question:

What if health scores aren’t a data problem or a modeling problem?
What if they’re a conceptual problem, a misunderstanding of what health actually is?


What We Mean When We Say "Health"

Before we can evaluate whether health scores work, we have to be honest about what we're asking them to do.

When a CS leader says they want a health score, what they typically mean is:

Tell me which accounts are going to churn so I can intervene before it's too late.

That's a prediction problem.

Prediction problems require leading indicators, signals that change before the outcome they’re predicting.

But that’s not how most health scores actually function.

Most health scores are constructed from signals that are, by their nature, lagging indicators.

Consider the most common inputs:

Product Usage and Adoption

By the time usage has meaningfully declined, something has already gone wrong.

The customer has already started disengaging. They may have already made their renewal decision internally.

A drop in logins doesn’t predict churn.

It is churn, just in its early observable stages.

Support Ticket Volume and Sentiment

An angry support ticket means the customer is already frustrated.

A spike in escalations means trust has already eroded.

These signals don’t warn you that something is about to go wrong.

They tell you that something already has.

NPS and CSAT Responses

Survey scores reflect how a customer felt at a specific moment, often weeks or months in the past.

A detractor score doesn’t predict dissatisfaction.

It reports it.

Stakeholder Engagement

When an executive sponsor stops responding to emails, they haven’t just become busy.

They’ve deprioritized you.

That decision happened before the silence did.


The pattern is consistent:

The signals we feed into health scores are observations of problems that have already begun.

We have dressed up lagging indicators in the language of prediction.


The Human Health Analogy

There’s a reason this problem is so persistent, and a useful analogy makes it clearer.

No doctor would ever try to reduce your health to a single number.

Your body is a complex system with:

  • cardiovascular function
  • metabolic health
  • neurological status
  • immune response
  • mental health
  • musculoskeletal integrity

Each has its own indicators, baselines, and interactions.

A resting heart rate of 80 might be perfectly healthy for one person and a warning sign for another.

A cholesterol level means something different at 25 than it does at 65.

Context is everything.

Medicine learned this lesson centuries ago.

The response wasn’t to build a better single score.

It was to build diagnostic frameworks.

Structured approaches to understanding:

  • what’s happening
  • why it’s happening
  • what to do about it

Medicine distinguishes between:

Screening

Broad checks to identify where to look more closely.

Diagnosis

Deep investigation into a specific area of concern.

Monitoring

Ongoing tracking of known conditions.

Prognosis

An informed estimation of outcomes based on the full clinical picture.

Each of these is a different activity requiring different tools, data, and expertise.

No one confuses a screening test with a prognosis.

No one expects a thermometer to tell them whether a patient will recover.

Yet in Customer Success, we have tried to collapse all four activities into a single number.

We asked health scores to simultaneously:

  • screen
  • diagnose
  • monitor
  • predict

And then we were surprised when they failed at all of them.


How We Got Here

The dominance of the health score wasn’t accidental.

It emerged from a real problem:

Customer Success has very little visibility into what is actually happening inside a customer's organization.

Sales has clear signals:

  • pipeline
  • forecast
  • close dates
  • signed contracts

Marketing has:

  • attribution
  • funnel metrics
  • conversion rates

The feedback loops are tight.

Customer Success has almost none of that.

The outcome (renewal or churn) happens once a year or once every three years.

Between those moments, teams operate largely in the dark, trying to infer the internal state of a customer relationship from external signals that are:

  • noisy
  • incomplete
  • misleading

Faced with this ambiguity, the industry made a reasonable but flawed choice.

Instead of investing in diagnostic visibility, the industry invested in predictive modeling.

If we couldn’t see clearly, we tried to calculate our way to clarity.

This was understandable.

Diagnostic visibility is expensive.

It requires:

  • deep relationships
  • skilled CSMs
  • structured discovery
  • organizational trust

It doesn’t scale easily.

It’s hard to measure.

It’s hard to put on a dashboard.

A score, on the other hand, scales beautifully.

It fits in a dashboard cell.

It can be color-coded.

It can be reported to the board.

It creates the appearance of control.

And that appearance is a major reason health scoring persists.


The False Sense of Security

This is the most damaging consequence of health scoring.

It creates confidence where none is warranted.

When a CS leader sees a dashboard showing 78% of accounts in green, it feels reassuring.

The business looks healthy.

The team appears to be managing the portfolio.

Executives see a clean summary.

Board decks get a slide.

But what does green actually mean?

In most implementations it means:

  • usage hasn’t dropped
  • no angry support tickets
  • surveys are above threshold
  • the contract isn’t expiring soon

Green does not mean the customer is healthy.

It means the score hasn’t yet observed symptoms.

That difference matters.

The most dangerous churn is the type health scores cannot detect:

  • the customer quietly evaluating competitors
  • the executive who already decided to consolidate vendors
  • the champion who left three months ago

These accounts remain green until the moment they churn.

Then someone says:

“The health score didn’t catch it.”

Of course it didn’t.

It was never designed to.

It reflects the observable past, and the observable past looked fine.


The Implementation Tax

Health scores also carry a significant implementation cost.

Building one requires connecting:

  • product analytics
  • CRM
  • support platforms
  • billing systems
  • survey tools
  • communication logs

Each integration introduces:

  • complexity
  • latency
  • maintenance overhead

Typical implementations take months. Some take over a year.

Teams debate endlessly:

  • which signals matter
  • how to weight them
  • how to normalize segments
  • how to handle missing data
  • how to set thresholds

These are real problems and they consume large amounts of:

  • engineering time
  • data science effort
  • CS operations resources

Then the score launches.

Within weeks someone says:

“This account is green, but I know they’re unhappy.”

The response is predictable:

  • adjust weights
  • clean data
  • add signals

The cycle repeats indefinitely:

build → launch → question → recalibrate → rebuild

The score is never finished.

The opportunity cost is rarely measured.


What AI Changes (And What It Doesn’t)

AI has brought a new wave of enthusiasm for health scoring.

The argument:

With better models and more data we can finally build the score that works.

AI does provide real capabilities:

  • NLP on support tickets
  • analysis of call transcripts
  • sentiment detection
  • pattern detection in usage

But AI does not solve the core problem.

If the underlying signals are still lagging indicators, a better model still analyzes lagging data.

AI can detect declining sentiment.

But sentiment shifts after the problem begins.

AI can detect usage decline.

But usage decline is still a symptom, not a cause.

Worse, AI can amplify the false confidence.

Because the models sound sophisticated, organizations trust the output more.

There is also risk of overfitting.

Models can become excellent at explaining historical churn while failing to predict future churn.

AI’s real value is elsewhere.

Not scoring.

Diagnosis.

AI can help:

  • summarize account history
  • surface anomalies
  • synthesize interactions across touchpoints

These are diagnostic tools, not predictive scores.


What Health Actually Is

Health in complex systems isn’t a number.

It is a state.

An emergent property across multiple dimensions.

A customer account is healthy when:

  • the product delivers measurable value
  • stakeholders understand that value
  • the relationship has organizational depth
  • the customer's strategy aligns with your product
  • friction points are resolved
  • the customer sees a future with you

None of these can be reliably inferred from telemetry.

They require different forms of visibility.

Some require conversation.

Some require organizational intelligence.

Some require understanding the customer’s business.

This is diagnostic work.

And it cannot be reduced to a single number.


Toward a Diagnostic Model

Moving forward requires shifting from prediction to diagnosis.

Replace the Single Score with Structured Assessment

Evaluate accounts across dimensions:

  • value realization
  • stakeholder alignment
  • product fit
  • relationship depth
  • strategic alignment

Each dimension has its own signals and response playbooks.

Separate Screening From Diagnosis

Use automated signals as screening.

Usage changes, support patterns, and engagement shifts flag accounts worth investigating.

But they do not declare health.

Diagnosis requires deeper investigation.

Accept That Some Signals Cannot Be Measured

Key indicators often cannot be captured through product data.

For example:

  • whether the executive sponsor remains committed
  • whether competitors are gaining influence
  • whether long-term value is recognized

This information must come through relationships and structured discovery.

Shift From Prediction to Preparedness

Instead of predicting churn, understand why churn happens.

Identify recurring failure patterns.

Then build early warning systems around those patterns.

Use AI for Synthesis

AI should help CSMs see the full picture faster.

Summarizing context.

Highlighting changes.

Connecting signals across touchpoints.

AI should assist diagnosis, not attempt fortune telling.


The Honest Conversation

The hardest barrier is cultural.

Health scores persist because they serve an organizational need.

They give:

  • CS leaders something to report
  • executives a dashboard
  • boards a sense of control

Abandoning the score means giving up that comfort.

It requires saying:

We cannot produce a single number that predicts retention.

What we can provide instead:

  • a structured assessment of account strength
  • visibility into risk factors
  • clear action plans

This is a harder conversation.

It is also a more honest one.

The industry has spent years trying to make health scores work.

Each iteration adds new data, new models, and new tools.

But the same limitation remains.

At some point the right response to repeated failure is not a better version of the same approach.

It is a different approach entirely.

Health is not a score.

It never was.

The sooner we stop trying to make it one, the sooner we can start doing the real work of understanding our customers.


Final Note

This guide examines structural limitations of health scoring.

Organizations may still derive operational value from health scores as one signal among many.

The argument here is not that measurement is futile.

It is that a single predictive score is the wrong abstraction for a fundamentally diagnostic problem.

Ready to Close the Execution Gap?

Schedule a demo to see BackEngine in action.