ZeroTwo/ LABSCase studies  ·  01 · Support IntelligenceTalk to a founder →
CASE STUDY · 01 / 06Consumer services · 2025

Turning customer
conversations into
operational intelligence.

A mid-sized consumer services company handles thousands of support interactions a month — calls, chats, emails, surveys. We replaced random QA sampling and spreadsheet reporting with an AI system that reads every interaction, scores it, and routes the signal where it matters.

Client
Consumer services · mid-sized
Engagement
Build & operate · 14 weeks
Channels covered
Voice · chat · email · CRM · surveys · social
Production status
Live since Q1 2025
Data sources · 07 channels~18.4K interactions / day

Call transcripts

Speech-to-text from contact center recordings.

~9.2K / day

Chat logs

Live agent + bot conversation threads.

~5.1K / day

Email tickets

Support inbox messages and threads.

~2.4K / day

CRM data

Customer history, plan tier, ticket metadata.

All accounts

Surveys

CSAT, NPS, post-resolution feedback forms.

~640 / day

Agent notes

Internal resolution notes and wrap-up codes.

~1.1K / day

Social mentions

Public posts referencing the brand.

~310 / day
01 · The problem

Volume without visibility.

Support generates more data than any other function in the company. Almost none of it makes it back into the operating loop. Random call sampling and manual tagging cover less than 2% of interactions — the other 98% is invisible.

What the team lived with

  • Thousands of monthly interactions reviewed by spot-check sampling, not population.
  • QA scorecards filled in by hand — inconsistent across reviewers, weeks behind reality.
  • Recurring pain points buried in transcripts; the loudest complaints reached leadership through anecdote.
  • No real-time signal on customer sentiment — frustration was discovered in the next-month CSAT report.
  • Agent coaching driven by a handful of recordings per quarter, not the actual distribution of behaviour.
  • Escalations triggered by the customer, not by the system. By that point retention had usually failed.

What we set out to build

  • Read 100% of interactions across every channel, in the language and accent they arrived in.
  • Classify intent and complaint type automatically, with a stable taxonomy the ops team trusts.
  • Score every interaction for sentiment, frustration, and escalation risk in near-real time.
  • Predict CSAT for the >90% of interactions that never get a survey response.
  • Evaluate agent behaviour on the population, not the sample — empathy, compliance, resolution.
  • Surface recurring root causes and act on them — route tickets, alert managers, trigger coaching.
02 · System architecture

A five-stage pipeline. End to end, no manual hops.

Conversations stream in from seven channels, get cleaned and normalised, pass through six AI analysis components, surface in dashboards, and trigger downstream actions — escalations, routing, coaching cues.

// PIPELINE · support-intel/v2.1flow: left → right01 · Collect02 · Process03 · Analyze04 · Surface & actCall transcriptsChat logsEmail ticketsCRM dataSurveys (CSAT/NPS)Agent notesSocial mentions/ Stage 02Data processing→ Speech-to-text (Whisper)→ Language detection→ PII masking→ Normalize & chunk→ Embed (vector store)/ Stage 03 · AI CoreAnalysis layersix concurrent models · transformer-basedAIntentdetection→ BERTBSentiment+ frustration→ RoBERTaCComplaintcategorization→ multi-labelDCSATprediction→ regressorEAgent perf.eval suite→ LLM judgeFConv. intelinterruptions,silences, holdswrites →features + labels → analytics warehouse/ Stage 04aAnalytics dashboard→ Real-time insights→ Agent scorecards→ Complaint trends→ Sentiment + CSAT/ Stage 04bAction & automation→ Auto-route tickets→ Priority escalation→ Coaching alerts→ Manager notificationsfeedback loop · labels & corrections retrain models
03 · The AI core

Six models. One signal stream.

Every interaction passes through the same six analyses. The outputs land as structured labels in the warehouse — the dashboard and the routing engine read from there.

/ 03.AINTENT

Intent detection

Why is the customer here? Classifies each interaction into a stable taxonomy — billing, refund, defect, delay, technical, access, cancellation.

bert-base42 classesmultilingual
F1 (macro)
0.91
Inference
~84ms
Languages
EN · ES · FR
Coverage
100%
/ 03.BSENTIMENT

Sentiment & frustration

Tracks emotional tone turn-by-turn. Detects positive, neutral, negative, frustrated, angry — and flags the moment a conversation tilts.

roberta-large6 labelsturn-level
Accuracy
0.89
Granularity
per turn
Output
score + tag
Latency
~120ms
/ 03.CCOMPLAINT

Complaint categorization

Multi-label classifier over service quality, product, staff behaviour, refund delays, technical bugs, delivery. Topic modelling surfaces emergent themes.

multi-labeltopic-clusterroot-cause
Categories
18 + topics
Recall
0.87
Refresh
weekly
Drift watch
active
/ 03.DCSAT PRED

CSAT prediction

Predicts customer satisfaction even when a survey is never submitted. Inputs: tone, resolution speed, repeat contact, escalation level, agent quality.

gradient-boost5-point scalechurn risk
RMSE
0.42
Coverage
100%
Survey base
~7%
Calibration
monthly
/ 03.EAGENT EVAL

Agent performance

LLM-judged eval suite scoring empathy, resolution efficiency, compliance adherence, response time, escalation frequency, sentiment-improvement delta.

llm-as-judgerubric-basedaudited
Dimensions
6
Agreement
κ = 0.78
Sample
population
Cycle
daily
/ 03.FCONV INTEL

Conversational intelligence

Beyond the words: interruptions, long silences, frustration spikes after holds, compliance violations, escalation triggers.

diarizationprosodyrule + ml
Signals
11
Voice-only
yes
Compliance
tracked
Audit
per call
04 · Worked example

One call. Six analyses. Twelve seconds.

A real-world shape of inbound interaction. The transcript is shown left; the AI layer's output for that conversation is on the right.

call · #CX-90412 · inbound voice02:14 · transcript
00:04
Customer

Hi. My internet has been down for 2 days and nobody is helping. I've called three times already.

00:11
Agent · Priya

I'm really sorry to hear that. Let me pull up your account — can you confirm the address on file?

00:23
Customer

Sure. But I want to know why I keep getting passed around. I work from home. This is costing me money.

00:42
Agent · Priya

Completely understand. I can see two earlier tickets — I'm escalating this to the field-ops team now so it stays with one owner. Expect a call within the hour.

AI Output · #CX-90412
Intent
technical_issue → service_outage 0.96
Sentiment
NEGATIVE
0.82
Frustration
HIGH
0.91
Complaint
service_outage · repeat_contact 2 labels
Escalation risk
HIGH churn-prob · 0.34
CSAT (pred.)
2.1/ 5
Agent
empathy ↑ 0.88 · compliance ↑ 0.95
Conv. signals
0 interruptions · hold 0:08 · sentiment Δ +0.21 by close
Action → Priority escalation queued. Field ops alerted. Retention check-in scheduled for T+24h.
05 · Operator dashboard

What managers see at 9am.

The dashboard is built around the three questions ops leaders actually ask: what's happening right now, what are customers complaining about, and which agents need coaching.

support-intel · operator-view
OverviewComplaintsAgentsSentiment
LIVE · last refresh 14s ago
Interactions · 24h▲ 6.4%
18,402
across 7 channels
First contact resolution▲ 4.1pt
78.2%
target 75% · met
Avg handle time▼ 35%
4:18
vs 6:36 baseline
Escalation rate▼ 28%
3.4%
predictive flags · 41 today
Top complaint categories · 7dvol · 12,840
Service outage
3,412
Billing dispute
2,488
Delivery delay
1,862
Refund timing
1,410
Plan change
1,082
Technical bug
812
Staff behaviour
432
Agent scorecard · live cohort42 agents · today
AgentTierEmpathyResolveCSATFlags
Priya R.A0.910.884.7
Marco D.A0.850.924.6
Sara N.B0.790.744.12
James K.B0.720.814.01
Lina P.C0.580.623.25
Ahmed Y.C0.540.663.06
06 · Business impact

Six months after rollout.

Measured against the 90-day pre-launch baseline. All deltas computed on the population, not the sample.

01 · CSAT
+22%
Customer satisfaction lift, predicted & verified.
02 · Speed
−35%
Average resolution time across all channels.
03 · Escalations
−28%
Complaint escalations, predictively pre-empted.
04 · QA cost
−60%
Manual quality-review hours, redirected to coaching.
05 · Productivity
+30%
Agent interactions handled per shift.
Before/ baseline
  • Manual QA reviews on <2% of interactions.
  • Weekly reporting cycles, two-week lag.
  • Limited visibility into root causes.
  • Reactive support: customer drives escalation.
  • Coaching based on anecdote.
After/ in production
  • 100% of interactions analysed automatically.
  • Real-time signal stream, 14-second refresh.
  • Complaint trends with named root causes.
  • Proactive: system flags risk pre-escalation.
  • Coaching cues per agent, per shift.
07 · Technologies used

Boring infrastructure. Interesting models.

Off-the-shelf where it earns its keep. Custom only where it actually matters — the eval rubrics, the topic-clustering, the agent scoring.

NLP & classification
BERTRoBERTaGPT (LLM-judge)spaCyHF Transformers
Speech
WhisperGoogle Speech APIpyannote (diarization)
Data & analytics
PythonSparkdbtSnowflakePostgres
Dashboards
Power BITableauinternal · React
Infra & MLOps
AWSKubernetesMLflowAirflowTerraform
Governance
GDPR · PII maskingSOC 2 audit trailPCI scope isolation
08 · Challenges

Where it's hard.

The interesting failure modes show up in production, not the notebook.

  • 01

    Data quality

    Noisy transcripts, mixed languages mid-call, incomplete CRM joins. We invest in normalisation more than in models.

  • 02

    Privacy & compliance

    GDPR, HIPAA where in-scope, PCI DSS for payment talk. PII masking happens before anything reaches a model.

  • 03

    Model bias

    Emotion and accent are where misclassification concentrates. We audit by language and demographic monthly.

  • 04

    Integration surface

    CRM, telephony, ticketing, BI — five systems, three vendors. Most of the engineering work lives here.

09 · What's next

On the roadmap.

Currently in design or pilot. Shipping through 2025.

  • 01

    Real-time agent assist

    Recommended responses, next-best-action, contextual knowledge-base snippets while the call is live.

  • 02

    Generated summaries & coaching

    Auto-written call summaries; per-shift coaching memos with the three things to work on tomorrow.

  • 03

    Voice-emotion detection

    Prosody-aware stress and frustration scoring, layered on top of the text signal.

  • 04

    Autonomous resolution agents

    Closed-loop agents for the long tail of simple, high-volume intents — password resets, plan changes, status checks.

// LET'S BUILD

Run this on your support data.

If you're sitting on a year of call recordings, chat logs, and tickets you've never read, we'll help you turn them into a working system. Build & operate, or build-and-hand-off — your call.