Blogs

Home/Blog Details
Website & Funnel EngineeringAutomation & Internal ToolsAI & Data SystemsTechnical SEOMarketing Data IntegrationsCustom SaaS PlatformsOngoing Technical SupportCore Web VitalsCRM WorkflowsLead ScoringClient PortalsReporting PipelinesWebsite & Funnel EngineeringAutomation & Internal ToolsAI & Data SystemsTechnical SEOMarketing Data IntegrationsCustom SaaS PlatformsOngoing Technical SupportCore Web VitalsCRM WorkflowsLead ScoringClient PortalsReporting Pipelines
What Happens When You Give Your Agency an AI Lead Scoring System (Real Numbers)
AI & Data

What Happens When You Give Your Agency an AI Lead Scoring System (Real Numbers)

Dream Code Labs
Written by Dream Code Labs
18 Mar 20259 min read
Share

Key Takeaways

  • AI lead scoring for marketing agencies improved overall close rate from 12% to 19% — a 58% gain
  • High-score leads closed at 34% — nearly triple the baseline close rate of 12%
  • The model was trained on 18 months of historical CRM data covering 847 leads with known outcomes
  • Six data points drove the model: company size, website behaviour, content engagement, referral source, industry fit, and response time
  • AI lead scoring augments sales judgment — it does not replace it

Who Is This For?

This case study is for agency owners and sales leads who suspect their team is spending too much time on leads that will never close — and want to understand what AI lead scoring actually delivers in practice, based on six months of real data rather than vendor claims.

AI lead scoring for marketing agencies promises a compelling outcome: stop wasting sales effort on leads that will never close, focus your best people on the opportunities most likely to convert, and let data drive prioritisation rather than gut instinct. After building and deploying a lead scoring system for a 20-person digital marketing agency and tracking every metric for six months, we have real data on exactly what that outcome looks like — and what it does not look like — in a real agency environment.

The agency generates approximately 80 inbound leads per month across organic search, referrals, and paid advertising. Before the lead scoring system, their three-person sales team treated every lead with roughly equal priority — a quick qualification call, a proposal to anyone who expressed genuine interest, and follow-up sequences that were largely identical regardless of the lead's apparent quality. The overall close rate was 12%. Average deal size was £4,200 per month. The team consistently described the sales process as feeling like a lottery — lots of effort with unpredictable output.

In this post we document how we built the scoring model, what the model learned from the agency's historical data, the six-month performance results in detail, and — importantly — what the system did not improve, because that context is equally necessary for anyone evaluating whether AI lead scoring is the right investment for their agency.

How We Built the AI Lead Scoring Model

The first phase of any AI lead scoring build is data preparation, and it is always the most time-consuming phase. We extracted 18 months of historical lead data from the agency's HubSpot CRM — 847 leads with known outcomes (closed, lost, or disqualified) and as much associated metadata as the CRM had captured. The quality of this historical data determines the quality of the model. Leads with incomplete records, missing company information, or no engagement tracking were excluded from the training set, leaving 634 usable data points.

We identified six data dimensions with both sufficient data coverage and hypothesised predictive value: company size (employee count from LinkedIn or Companies House), website behaviour prior to form submission (pages visited, time on site, specific pages like case studies or pricing), content engagement (audit downloads, webinar attendance, email open rates), referral source (organic, paid, referral, direct), industry vertical alignment with the agency's stated target market, and response time to the first outreach email. Each dimension was normalised and converted into a numerical feature for the model.

We used a gradient boosting classifier — specifically XGBoost — as the underlying model, trained on the 634 historical records with a 70/30 train/test split. The model was evaluated on its ability to correctly classify leads into high, medium, and low tiers based on conversion probability. The test set performance showed an AUC of 0.84, indicating strong discriminatory power — the model could meaningfully distinguish high-probability leads from low-probability ones far better than random chance. The scoring output was integrated directly into HubSpot as a custom property, visible on every lead record within 60 seconds of form submission.

What the Model Learned From the Agency's Historical Data

The most instructive output of a lead scoring model is not the score itself — it is the feature importance rankings, which reveal which factors most strongly predicted conversion in the historical data. For this agency, the three most predictive features were, in order: industry vertical alignment (agencies and SaaS companies with 10–30 employees converted at nearly four times the rate of out-of-target industries); content engagement prior to enquiry (leads who had visited the case studies page or downloaded a resource before submitting the contact form converted at 2.8 times the rate of leads who had not); and response time to first email (leads responding within 24 hours converted at 3.1 times the rate of those taking 3+ days to respond).

Company size was moderately predictive but not in the direction the team initially assumed. The assumption had been that larger companies (100+ employees) would be higher-value leads. The data showed the opposite: the agency's sweet spot was 10–30 employee companies, where the decision-maker was typically the founder or marketing director, decisions moved quickly, and the service scope aligned well with the agency's deliverables. This insight alone changed the team's manual qualification approach — they had been inadvertently deprioritising their best lead segment.

Referral source was the least predictive feature in the model, contradicting another assumption the team held going in. They believed referrals were consistently their best leads. The data showed that while referrals did convert slightly better than average, the difference was not statistically significant enough to justify the preferential treatment they had been receiving. Organic search leads who visited the case studies page before enquiring converted at a higher rate than average referrals — a finding that directly shaped the agency's subsequent SEO content strategy.

Interested in AI Lead Scoring for Your Agency?

We build custom AI lead scoring systems for UK marketing agencies — trained on your specific historical data and integrated directly into your CRM. Book a free discovery call to discuss what's possible.

Book a Free Discovery Call

Six-Month Performance Results

At the 60-day mark, the first meaningful performance data was available. Leads classified as high-score by the model (top 30% of scores) had closed at a 34% rate — nearly triple the pre-system baseline of 12%. Leads classified as medium-score were closing at 14%, slightly above baseline. Low-score leads were closing at 4%, confirming the model's ability to identify low-probability enquiries. The sales team had been instructed to prioritise high-score leads for same-day response and personal outreach, while medium and low-score leads received automated sequences.

At the six-month mark, the aggregate data told a clear commercial story. Overall close rate had improved from 12% to 19% — a 58% improvement. Average response time to high-score leads dropped from 4.2 hours to 47 minutes, driven by the Slack notification the system sends to the assigned account manager within 60 seconds of a high-score lead arriving. Monthly recurring revenue from new clients increased 34% over the six-month period, attributable primarily to the improved conversion rate on high-quality leads rather than any increase in lead volume.

The sales team's subjective experience of the change was as significant as the numerical results. In a feedback session at the six-month mark, all three account managers reported that the process felt more purposeful — they spent their most active sales effort on leads they genuinely believed would close, and the data validated that intuition rather than contradicting it. The psychological benefit of working a prioritised pipeline rather than treating every lead as equally uncertain had a measurable effect on sales energy and follow-up consistency.

What the System Did Not Improve (And Why That Matters)

Intellectual honesty about AI system limitations is as important as documenting the wins. The lead scoring system did not improve conversion rates on low-score leads. The assumption that automated sequences would convert a meaningful percentage of low-probability leads at a lower cost of sales effort was not validated. Low-score leads converted at 4% — better than zero, but not meaningfully different from the pre-system rate for leads given minimal attention. The model was good at identifying high-probability leads; it was not useful for converting low-probability ones.

The system also required more ongoing maintenance than the initial build suggested. The model needed retraining three times in six months as the agency's lead mix shifted — seasonal changes in traffic source, a new paid advertising campaign targeting a different industry, and a website redesign that changed the page engagement signals the model had been trained on. Each retraining took approximately 2–3 hours of data preparation and model evaluation. This maintenance overhead is a real cost that any agency evaluating AI lead scoring should factor into their business case.

The most important limitation is the data quality dependency. The system performs at the level of the historical data it was trained on. Agencies with CRM hygiene problems — missing fields, inconsistent data entry, short lead histories — will see significantly lower model performance. Before investing in AI lead scoring, the prerequisite is 12+ months of clean, consistent lead data with outcome tracking in the CRM. Without that foundation, the model has insufficient signal to produce reliable scores. For agencies ready to explore AI lead scoring, our AI development services include the full CRM data audit and model build.

Dream Code Labs

Dream Code Labs

Web Development & Automation Agency · 7+ years experience

Dream Code Labs is a remote-first development and automation agency specialising in custom websites, AI-powered tools, and workflow automation for marketing agencies and growing SMEs across the UK, US, Canada, and Australia. We have delivered 50+ projects that produce measurable, real-world results.

Frequently Asked Questions

How does AI lead scoring work for marketing agencies?

AI lead scoring analyses historical CRM data to identify which characteristics most strongly predict a lead converting into a client. The model learns from past outcomes and applies that learning to score new leads as they arrive — ranking them by conversion probability. The scores surface in the CRM, allowing the sales team to prioritise their highest-probability leads for fastest response and best effort, while lower-probability leads receive automated sequences.

How much historical data do you need to build an AI lead scoring model?

A minimum of 12 months of historical lead data with known outcomes is required to train a reliable model. We recommend 18+ months and at least 500 leads with recorded outcomes for a model that performs consistently. Agencies with fewer than 500 historical leads with outcomes may see limited model accuracy and should consider rule-based scoring as an interim approach while accumulating more data.

How long does it take to see results from AI lead scoring?

Initial results are visible within 30–60 days of deployment, as the first cohort of scored leads progresses through the sales pipeline to known outcomes. Statistically meaningful performance data — enough to confirm the model's discriminatory power — typically requires 60–90 days. Full ROI realisation, including the compound effect of improved conversion rates on monthly recurring revenue, is typically measurable at the 90–120 day mark.

Can a small agency with limited CRM data use AI lead scoring?

If the agency has fewer than 12 months of clean lead data or fewer than 500 leads with recorded outcomes, a pure machine learning approach will underperform. A better starting point is a rule-based scoring system — manually defined rules based on known conversion signals — implemented in HubSpot or Pipedrive. As the lead history grows, the rule-based system can be augmented and eventually replaced by a trained model.

What CRM works best for AI lead scoring integration?

HubSpot is our primary recommendation for AI lead scoring integration due to its flexible custom properties, Workflows automation, and comprehensive API that allows external models to write scores directly to lead records. Pipedrive and Salesforce are strong alternatives with good API support. The minimum requirement is a CRM with custom field support, API write access, and a complete historical lead record including lead source and outcome tracking.

Last updated: 20 Apr 2025

The Tech Setup That Helps a £500K Small Business Run Like a £5M CompanyHire a Developer vs Use a No-Code Tool? The Small Business Decision Guide