Salesforce Activity Intelligence

Untapped communication signals in Salesforce

WTWH's salesforce_fivetran.task table contains 1.1M tasks, of which 757K are emails with meaningful content across 25K accounts. When joined against the sql_for_brian revenue VIEW, 3,307 accounts (96%) have both email activity and revenue data — enough signal density for reliable account health classification.

Brian Malon's proof-of-concept classifier validated the approach. We're proposing to productionize it as an automated pipeline: dbt Cloud for data modeling, a Cloud Run service for AI scoring, and BigQuery as the analytical layer — all orchestrated on a monthly cadence.

End-to-end pipeline

Fivetran syncs Salesforce data into BigQuery. dbt Cloud models prepare classification inputs. A Cloud Run service runs the AI scorer. Results flow back through dbt into a presentation layer.

Data flows from Salesforce through Fivetran sync, dbt pre-scoring models, Cloud Run AI classification (9 dimensions via Claude), into BigQuery output tables, then through dbt post-scoring models into the Account Health Dashboard.

Phased technical rollout

Phase 1 — dbt Cloud: Pre-Scoring Models

Data Pipeline

stg_salesforce_fivetran__tasks Staging model — filter deleted records, rename columns, surface Outreach fields

int_salesforce__revenue_unified Migrate the ~500-line sql_for_brian VIEW — UNION ALL of SF Closed Won + Pipeline + NetSuite combined with brand/tower mapping

int_salesforce__activity_classification_input Join emails to revenue accounts — filter to type=Email, body > 10 chars, last 12 months. Inner join to revenue for 3,307 accounts

Phase 2 — Cloud Run: AI Classification Service

Scoring Engine

POST /classify Flask endpoint — reads from BigQuery, sanitizes PII, extracts email content, batches through Claude (100/batch), writes results back to BQ

Claude claude-sonnet-4-5 via Batch API 50% cost savings vs. real-time API. Temperature 0, max_tokens 2048. 9 dimensions per email.

Incremental strategy Track classified activity_ids to avoid re-scoring. New emails only on subsequent runs.

Infrastructure 2Gi memory, 1hr timeout, max 1 instance. Cloud Scheduler monthly on 1st. Reuses existing SA.

Phase 3 — dbt Cloud: Post-Scoring Models

Account Health Model

stg_activity_classification__content_fields Stage the aggregated classification output from BigQuery

int_salesforce__account_health Weighted composite health_score (0-100), risk_tier assignment (High Risk / At Risk / Stable / Healthy), volatility scores, NRR signals

salesforce__account_health_dashboard Presentation-layer model joining account health with revenue context for the dashboard

Phase 4 — Dashboard

Account Health Portal

Internal web portal Filterable by risk tier, revenue segment, trend direction. BQ query with Firestore cache layer. Sales and CS team access.

9 scoring dimensions per email

Each email is scored across nine complementary dimensions derived from Brian's classifier_config.py (411 lines). Together they create a multi-signal picture of account communication health.

Sentiment

Positive / Negative / Neutral — base-level tone classification

Theme

Multi-label: Sales, Support, Onboarding, Performance, Billing, etc.

Emotion & Tone

Multi-label: Enthusiastic, Professional, Urgent, Concerned, Frustrated

Growth Tone

Renewal/expansion vs. churn/contraction language signals

Engagement Quality

6 ordinal levels: Highly Engaged through Disengaged

Value Recognition

5 levels: Strong Acknowledgment through Dissatisfaction

Relationship Health

5 levels: Strong Partnership, Positive, Neutral, Strained, Deteriorating

Business Context

Multi-label: Strategic priority, Budget constraints, Competition, Regulatory

Action Orientation

5 levels: Strong Commitment through On Hold — forward momentum signal

AI classification costs

Using the Anthropic Batch API for 50% savings over real-time. Incremental runs only score new emails — no re-processing existing classifications.

~$5

Validation batch

~500 emails, full API, quality check before full run

~$100

Initial full run

~15K emails via Batch API (50% off), all 9 dimensions

$10–15

Monthly incremental

~1-2K new emails/month, Batch API pricing

~$150/yr

Steady state

Fully automated, Cloud Scheduler on the 1st

Files to create / modify

File Action Phase

models/staged/salesforce/_salesforce_sources.yml Modify 1, 3

models/staged/salesforce/stg_salesforce_fivetran__tasks.sql Create 1

models/intermediate/.../int_salesforce__revenue_unified.sql Create 1

models/intermediate/.../int_salesforce__activity_classification_input.sql Create 1

cloudruns/activity_classifier/ (8 files) Create 2

models/staged/.../stg_activity_classification__content_fields.sql Create 3

models/intermediate/.../int_salesforce__account_health.sql Create 3

models/presentation/.../salesforce__account_health_dashboard.sql Create 3

Open decisions

Dimension scope

Start with all 9 dimensions or a focused subset? Full run is ~$100, so cost isn't a blocker — question is whether we want to validate a subset first.

Revenue model migration

The sql_for_brian VIEW is ~500 lines with complex UNION ALL + brand/tower mapping. Migrate fully into dbt now, or defer and reference the VIEW?

Timeline priority

Target timeline for Phase 1 data pipeline vs. full four-phase rollout? Phase 2 depends on an anthropic-api-key secret in GCP Secret Manager.

Dashboard access & integration

Which teams get portal access — Sales, CS, leadership? How should risk tiers integrate with existing account management workflows?

Client Portal