WTWH Media
Automated NRR prediction pipeline — classifying 757K Salesforce emails across 9 dimensions to produce per-account health scores and risk tiers.
WTWH's salesforce_fivetran.task table contains 1.1M tasks, of which 757K are emails with meaningful content across 25K accounts. When joined against the sql_for_brian revenue VIEW, 3,307 accounts (96%) have both email activity and revenue data — enough signal density for reliable account health classification.
Brian Malon's proof-of-concept classifier validated the approach. We're proposing to productionize it as an automated pipeline: dbt Cloud for data modeling, a Cloud Run service for AI scoring, and BigQuery as the analytical layer — all orchestrated on a monthly cadence.
Fivetran syncs Salesforce data into BigQuery. dbt Cloud models prepare classification inputs. A Cloud Run service runs the AI scorer. Results flow back through dbt into a presentation layer.
sql_for_brian VIEW — UNION ALL of SF Closed Won + Pipeline + NetSuite combined with brand/tower mapping Each email is scored across nine complementary dimensions derived from Brian's classifier_config.py (411 lines). Together they create a multi-signal picture of account communication health.
Positive / Negative / Neutral — base-level tone classification
Multi-label: Sales, Support, Onboarding, Performance, Billing, etc.
Multi-label: Enthusiastic, Professional, Urgent, Concerned, Frustrated
Renewal/expansion vs. churn/contraction language signals
6 ordinal levels: Highly Engaged through Disengaged
5 levels: Strong Acknowledgment through Dissatisfaction
5 levels: Strong Partnership, Positive, Neutral, Strained, Deteriorating
Multi-label: Strategic priority, Budget constraints, Competition, Regulatory
5 levels: Strong Commitment through On Hold — forward momentum signal
Using the Anthropic Batch API for 50% savings over real-time. Incremental runs only score new emails — no re-processing existing classifications.
Start with all 9 dimensions or a focused subset? Full run is ~$100, so cost isn't a blocker — question is whether we want to validate a subset first.
The sql_for_brian VIEW is ~500 lines with complex UNION ALL + brand/tower mapping. Migrate fully into dbt now, or defer and reference the VIEW?
Target timeline for Phase 1 data pipeline vs. full four-phase rollout? Phase 2 depends on an anthropic-api-key secret in GCP Secret Manager.
Which teams get portal access — Sales, CS, leadership? How should risk tiers integrate with existing account management workflows?