Why duplicate contacts compound through your entire CRM
A duplicate contact looks like a single problem — two records, one person — but the downstream damage spreads:
- Activity history fragments across records. Half the emails are on Record A, the other half on Record B. Neither record has the full context.
- Deal associations split. One rep attached their deal to Record A; another rep attached theirs to Record B. The "all deals for this person" view shows neither rep the full picture.
- Marketing sends double-mail. Both records get the campaign. The recipient sees the same email twice and unsubscribes from both.
- Forecasting double-counts. Two deals on the same person look like two opportunities. Pipeline metrics inflate.
Common causes: form re-submissions where the user typed their email differently, manual contact creation by reps who didn't search first, chat-to-CRM sync creating new records when the contact was already in the system, list imports without dedup checking.
Three normalized match dimensions to cluster on
Match contacts on three normalized dimensions:
- Email — case-folded, whitespace-trimmed. Watch for placeholder domains (
test@example.com) and skip them — they create false-positive clusters. - Phone — digits-only, last 10 digits. Catches
(415) 555-1234matching+1-415-555-1234matching4155551234. - Full name — first + last, lowercase, accents stripped, punctuation removed. Cap clusters at 5 — a portal with 30 "John Smith"s has 30 unrelated people, not a dedup signal.
A contact appearing in any one cluster is a duplicate candidate. Same contact appearing in two clusters (matched on email AND phone) is a stronger signal — and resolving the email match may not resolve the phone match.
Why merging is slower than finding
The deeper friction is that resolving duplicates requires deciding which record to keep. The contact with more activity? More recent updates? More complete fields? HubSpot's merge UI lets you pick field-by-field, which is correct but slow. For 1,000 duplicates that's a quarter-long project. Most teams clean up the obvious cases — same email, exact match — and leave the fuzzier matches alone.
The other layer of friction: dedup is recurring work. Every form re-fill, every list import, every chat-to-CRM sync without idempotency creates new duplicates. A one-time cleanup is meaningless if the source isn't fixed. The community recommendation: combine periodic dedup sweeps with form-side prevention (HubSpot's "Recognize returning visitor" feature, idempotent API integrations, required-email forms).
The manual HubSpot recipe
HubSpot has a native duplicate detection tool that catches email-match cases. For phone-only and name-only matches, you need an export-and-spreadsheet workaround.
- Open Contacts → ActionsNavigate to Contacts → Contacts. Click 'Actions' in the top right and select 'Manage duplicates'.
- Review HubSpot's email-match clustersHubSpot surfaces pairs it considers high-confidence email matches. Click into each pair to see the side-by-side comparison.
- Merge or dismiss each pairPick which record to keep, choose field-by-field which value to preserve. Or dismiss if it's a false positive (rare with email-match).
- Export contacts for phone/name dedupThe native tool doesn't catch phone-only or name-only duplicates. Export Contacts → CSV with
email,phone,firstname,lastname. - Group in a spreadsheetSort by phone (last 10 digits) and full name (lowercased). Any cluster of 2+ rows is a candidate for review. Cap at 5 per name cluster — beyond that it's noise.
- Merge candidates back in HubSpotFor each cluster, open one of the contact records, click 'Actions → Merge', and select the duplicate. Same field-by-field comparison as the native tool.
What Bloated does instead
Three match dimensions, clustered separately, ranked by merge confidence.
Bloated normalizes contacts across email, phone, and full name independently — so a duplicate that matches on email AND phone appears in both clusters with merge confidence stacked. The compound match is the highest-priority merge. Cluster size capped at 5 for name-only matches (so a portal with 30 'John Smith's doesn't drown the queue).
email, phone, mobilephone, firstname, lastname · HubSpot contact properties