Client Portal

WTWH Media

Technical Architecture

Salesforce Activity Intelligence

Automated NRR prediction pipeline — classifying 757K Salesforce emails across 9 dimensions to produce per-account health scores and risk tiers.

Untapped communication signals in Salesforce

WTWH's salesforce_fivetran.task table contains 1.1M tasks, of which 757K are emails with meaningful content across 25K accounts. When joined against the sql_for_brian revenue VIEW, 3,307 accounts (96%) have both email activity and revenue data — enough signal density for reliable account health classification.

Brian Malon's proof-of-concept classifier validated the approach. We're proposing to productionize it as an automated pipeline: dbt Cloud for data modeling, a Cloud Run service for AI scoring, and BigQuery as the analytical layer — all orchestrated on a monthly cadence.

End-to-end pipeline

Fivetran syncs Salesforce data into BigQuery. dbt Cloud models prepare classification inputs. A Cloud Run service runs the AI scorer. Results flow back through dbt into a presentation layer.

Data flows from Salesforce through Fivetran sync, dbt pre-scoring models, Cloud Run AI classification (9 dimensions via Claude), into BigQuery output tables, then through dbt post-scoring models into the Account Health Dashboard.

Phased technical rollout

Phase 1 — dbt Cloud: Pre-Scoring Models

Data Pipeline

stg_salesforce_fivetran__tasks Staging model — filter deleted records, rename columns, surface Outreach fields
int_salesforce__revenue_unified Migrate the ~500-line sql_for_brian VIEW — UNION ALL of SF Closed Won + Pipeline + NetSuite combined with brand/tower mapping
int_salesforce__activity_classification_input Join emails to revenue accounts — filter to type=Email, body > 10 chars, last 12 months. Inner join to revenue for 3,307 accounts
Phase 2 — Cloud Run: AI Classification Service

Scoring Engine

POST /classify Flask endpoint — reads from BigQuery, sanitizes PII, extracts email content, batches through Claude (100/batch), writes results back to BQ
Claude claude-sonnet-4-5 via Batch API 50% cost savings vs. real-time API. Temperature 0, max_tokens 2048. 9 dimensions per email.
Incremental strategy Track classified activity_ids to avoid re-scoring. New emails only on subsequent runs.
Infrastructure 2Gi memory, 1hr timeout, max 1 instance. Cloud Scheduler monthly on 1st. Reuses existing SA.
Phase 3 — dbt Cloud: Post-Scoring Models

Account Health Model

stg_activity_classification__content_fields Stage the aggregated classification output from BigQuery
int_salesforce__account_health Weighted composite health_score (0-100), risk_tier assignment (High Risk / At Risk / Stable / Healthy), volatility scores, NRR signals
salesforce__account_health_dashboard Presentation-layer model joining account health with revenue context for the dashboard
Phase 4 — Dashboard

Account Health Portal

Internal web portal Filterable by risk tier, revenue segment, trend direction. BQ query with Firestore cache layer. Sales and CS team access.

9 scoring dimensions per email

Each email is scored across nine complementary dimensions derived from Brian's classifier_config.py (411 lines). Together they create a multi-signal picture of account communication health.

1

Sentiment

Positive / Negative / Neutral — base-level tone classification

2

Theme

Multi-label: Sales, Support, Onboarding, Performance, Billing, etc.

3

Emotion & Tone

Multi-label: Enthusiastic, Professional, Urgent, Concerned, Frustrated

4

Growth Tone

Renewal/expansion vs. churn/contraction language signals

5

Engagement Quality

6 ordinal levels: Highly Engaged through Disengaged

6

Value Recognition

5 levels: Strong Acknowledgment through Dissatisfaction

7

Relationship Health

5 levels: Strong Partnership, Positive, Neutral, Strained, Deteriorating

8

Business Context

Multi-label: Strategic priority, Budget constraints, Competition, Regulatory

9

Action Orientation

5 levels: Strong Commitment through On Hold — forward momentum signal

757K Emails with Content
3,307 Revenue Accounts
96% Account Coverage
9 AI Dimensions
4 Risk Tiers

AI classification costs

Using the Anthropic Batch API for 50% savings over real-time. Incremental runs only score new emails — no re-processing existing classifications.

~$5
Validation batch
~500 emails, full API, quality check before full run
~$100
Initial full run
~15K emails via Batch API (50% off), all 9 dimensions
$10–15
Monthly incremental
~1-2K new emails/month, Batch API pricing
~$150/yr
Steady state
Fully automated, Cloud Scheduler on the 1st

Files to create / modify

File Action Phase
models/staged/salesforce/_salesforce_sources.yml Modify 1, 3
models/staged/salesforce/stg_salesforce_fivetran__tasks.sql Create 1
models/intermediate/.../int_salesforce__revenue_unified.sql Create 1
models/intermediate/.../int_salesforce__activity_classification_input.sql Create 1
cloudruns/activity_classifier/ (8 files) Create 2
models/staged/.../stg_activity_classification__content_fields.sql Create 3
models/intermediate/.../int_salesforce__account_health.sql Create 3
models/presentation/.../salesforce__account_health_dashboard.sql Create 3

Open decisions

?

Dimension scope

Start with all 9 dimensions or a focused subset? Full run is ~$100, so cost isn't a blocker — question is whether we want to validate a subset first.

?

Revenue model migration

The sql_for_brian VIEW is ~500 lines with complex UNION ALL + brand/tower mapping. Migrate fully into dbt now, or defer and reference the VIEW?

?

Timeline priority

Target timeline for Phase 1 data pipeline vs. full four-phase rollout? Phase 2 depends on an anthropic-api-key secret in GCP Secret Manager.

?

Dashboard access & integration

Which teams get portal access — Sales, CS, leadership? How should risk tiers integrate with existing account management workflows?