CRM Deduplication - Playbooks2 of 3

CRM Deduplication — Methodology

This document covers the core concepts, frameworks, and calculations behind CRM Deduplication. It provides the methodological foundation — the 'how it works' behind the execution steps.

1) Core Concepts

What Is a Duplicate Record?

What is it?

A duplicate record is two or more entries in a CRM that represent the same real-world person, company, or entity. Duplicates range from exact copies (same email, same name) to near-matches where data entry variations create records that look different but refer to the same contact -- "Robert Smith at Acme Inc." and "Bob Smith at ACME" are the same person.

Why does it matter?

Duplicate records erode trust in CRM data across sales, marketing, and CS. Sales reps waste time on leads already being worked. Marketing inflates audience counts and sends duplicate emails. Attribution models break when the same person exists across multiple records. 91% of CRM data is estimated to be incomplete, stale, or duplicated in any given year [1]. Bad data costs the average organization $12.9 million annually according to Gartner [2].

Key insight:

Duplicates are not a data problem -- they are a revenue problem. Every duplicate contact is a potential routing error, a broken automation, and a missed attribution. The CRM becomes unreliable not because of one bad record, but because thousands of small duplications compound into systemic distrust.

Examples:

Context	Example
Exact duplicate	Two contact records with identical email `[email protected]` created by different reps
Fuzzy duplicate	"Robert Smith" at "Acme Inc" and "Bob Smith" at "ACME Incorporated" -- same person, different data entry
Cross-object duplicate	A Lead record and a Contact record for the same person, never converted
Account-level duplicate	"Acme Inc.", "Acme, Inc.", and "ACME Incorporated" -- three account records for one company

Matching Theory: How Deduplication Identifies Duplicates

What is it?

Matching is the algorithmic process of comparing records to determine whether they represent the same entity. It operates on a spectrum from exact matching (field values are identical) to fuzzy matching (field values are similar within a defined tolerance). The matching layer is the core engine of any deduplication project -- every other decision flows from how well matching performs.

Why does it matter?

Matching accuracy directly determines whether you clean up legitimate duplicates (true positives) or accidentally merge distinct records (false positives). A 2% false positive rate on a 100,000-record database means 2,000 records incorrectly merged -- potentially destroying deal history, notes, and relationships.

The Framework:

Record A fields ──┐
                   ├── Matching Algorithm ──> Confidence Score (0-100%)
Record B fields ──┘
                                                    │
                                        ┌───────────┼───────────┐
                                        │           │           │
                                    Auto-Merge   Review Queue   No Match
                                    (95-100%)    (70-94%)       (<70%)

Matching methods ranked by precision:

Method	What It Compares	Best For	Limitation
Exact match	Character-for-character identity	Email addresses, phone numbers	Misses "Bob" vs "Robert"
Levenshtein distance	Minimum edits to transform string A into string B	Typos, minor spelling variations	Slow on large datasets
Jaro-Winkler	Character transposition tolerance weighted toward string start	Name matching	Less effective on short strings
Soundex / Metaphone	Phonetic encoding of words	Name pronunciation variants ("Smith" vs "Smyth")	Language-dependent
N-gram fingerprinting	Overlapping character sequences	Company name variations	Can match unrelated strings

Common misunderstandings:

Misconception: Exact email match catches all duplicates. Reality: Email match catches roughly 40-60% of duplicates. The rest require fuzzy matching on name + company combinations. People use multiple email addresses, and records created from different sources often have different email fields populated.
Misconception: More matching rules = better deduplication. Reality: Overly broad matching rules increase false positives faster than they reduce false negatives. Start tight (exact email), validate accuracy, then widen incrementally. Salesforce limits you to five active matching rules per object for this reason [3].

Data Survivorship: Building the Golden Record

What is it?

Data survivorship (also called "master record selection") is the set of rules that determine which field values survive when two or more duplicate records are merged into one. The surviving record -- the "golden record" -- should contain the most complete, most recent, and most accurate data from all duplicate copies.

Why does it matter?

A merge that keeps the wrong data is worse than no merge at all. If the surviving record keeps an outdated phone number, loses 6 months of activity history, or drops a critical integration ID, the merge has created a new data problem while solving the old one. Picking the right master record determines the effectiveness of any deduplication campaign [4].

The Framework:

Survivorship operates at two levels:

1. Master record selection -- which record becomes the surviving container:

Selection Criterion	When to Use	Example
Most recent activity	Active sales cycles	Record with last activity 2 days ago beats record untouched for 6 months
Most complete data	Data enrichment contexts	Record with 18/20 fields populated beats record with 8/20
Oldest created date	Preserving historical timeline	Original inbound lead from 2022 beats re-import from 2024
Lifecycle stage priority	Complex funnel tracking	"Customer" record wins over "Lead" record for same person
Integration ID presence	Multi-system environments	Record with Salesforce sync ID preserved over record without

2. Field-level survivorship -- for each field, which value wins:

Strategy	Logic	Best For
Most recent value	Latest non-null value across records	Phone, title, address
Longest value	Most characters (usually most complete)	Notes, descriptions
Concatenate	Combine values from all records	Tags, notes where each record may have unique info
Source priority	Prefer value from trusted source	Integration IDs, verified emails
Most frequent	Value appearing in the most records	Company name standardization

Common misunderstandings:

Misconception: The oldest record should always be the master. Reality: The oldest record often has the most stale data. Master selection should be context-dependent -- most recent activity for active accounts, most complete data for enrichment, lifecycle stage for funnel integrity.
Misconception: Field merge behavior is the same for every field. Reality: Different fields need different survivorship rules. Email should use source priority (verified over unverified). Phone should use most recent. Notes should concatenate. A single "keep master value" rule across all fields guarantees data loss.

Duplicate Prevention vs. Duplicate Cleanup

What is it?

Deduplication has two distinct workstreams: cleanup (finding and merging existing duplicates) and prevention (stopping new duplicates from being created). Most projects focus heavily on cleanup and underinvest in prevention, leading to a cycle where duplicates return within weeks.

Why does it matter?

92% of duplicate records are created during initial data entry phases when users create new records instead of searching for existing ones [5]. Without prevention rules, even a perfectly cleaned database will accumulate new duplicates at a rate proportional to record creation volume.

Key insight:

Cleanup without prevention is a subscription to repeat work. Prevention without cleanup is governance on a dirty foundation. Both must happen, and prevention must come with cleanup -- not "someday after."

Examples:

Context	Example
Prevention via alert	Rep types "John Smith" into new Contact form, CRM shows alert: "Potential match: John Smith at Acme (existing contact)"
Prevention via validation	Import CSV requires email column; rows without email are rejected to prevent unmatchable records
Prevention via standardization	Workflow auto-lowercases email, trims whitespace, and normalizes "Inc." / "Incorporated" / "Inc" at entry
Ongoing detection	Weekly scheduled scan finds 15 new potential duplicate pairs, queues for admin review

2) Decision Frameworks

Approach Selection Matrix

Situation	Recommended Approach	Why
Database <10,000 records, Salesforce or HubSpot	Native CRM tools only	Built-in duplicate management handles exact matches; volume is manageable for manual review of edge cases
Database 10,000-100,000 records, moderate duplicate rate (<15%)	Native CRM tools + lightweight third-party (Dedupely, Koalify)	Native handles prevention; third-party adds fuzzy matching for cleanup
Database >100,000 records or duplicate rate >20%	Full third-party tool (Insycle, Cloudingo, DemandTools)	Need batch processing, advanced fuzzy matching, and automated merge rules at scale
Multi-CRM environment (Salesforce + HubSpot sync)	Full third-party tool + integration-aware merge strategy	Must account for sync behavior; merging in one system can create orphans in the other
Post-migration cleanup	Full third-party tool with rollback capability	Migrations generate high duplicate volumes; need preview and undo capabilities

Scoping Factors

1. CRM Platform

Salesforce → Native Duplicate Management (matching rules + duplicate rules, up to 5 per object [3]) + third-party for fuzzy matching
HubSpot → Native deduplication (Operations Hub Professional+) + Koalify/Insycle/Dedupely for advanced matching [6]
Other CRM → Almost always requires third-party tool

2. Database Size

<10K records → Manual review feasible for edge cases, native tools sufficient
10K-50K records → Semi-automated approach, batch sizes of 500-1000
50K-500K records → Fully automated with human review of low-confidence matches only
500K+ records → Requires high-performance tool with parallel processing, strict batching

3. Integration Complexity

Standalone CRM → Standard deduplication workflow
CRM + MAP (Salesforce + HubSpot/Marketo) → Must coordinate merge timing; pause sync during extended operations
CRM + ERP + MAP → Integration ID preservation is critical; each system's unique identifiers must survive merges

4. Duplicate Rate Severity

<10% duplicate rate → Cleanup is a tune-up; focus on prevention rules
10-25% duplicate rate → Standard deduplication project; full cleanup + prevention
>25% duplicate rate → Major data quality remediation; consider phased approach starting with highest-value records

5. Data Governance Maturity

No existing governance → Must build governance alongside cleanup
Basic governance (some naming standards) → Extend existing framework with deduplication-specific rules
Mature governance → Focus on tool configuration and process integration

Native CRM Tools Approach

Best for:

Small databases (<10K records)
Exact-match-dominant duplicate problems
Teams with Salesforce Admin or HubSpot Operations Hub access
Prevention-focused projects (alerting, blocking)

Not recommended for:

Large-scale cleanup (>10K duplicates to merge)
Fuzzy matching requirements ("Bob" vs "Robert")
Multi-object deduplication in a single pass
Environments needing preview/undo capability

Key differences from third-party:

Aspect	Native CRM	Third-Party Tool
Matching	Exact + limited fuzzy	Advanced fuzzy, phonetic, N-gram
Merge preview	Limited or none	Full preview with field-by-field comparison
Batch processing	Manual, record-by-record	Automated batches of 500-5000+
Rollback	No native undo	Most offer undo/rollback window
Cost	Included in CRM license	$30-500+/month depending on scale
Custom survivorship	Basic (keep master value)	Field-level rules (most recent, longest, concatenate)

Third-Party Tool Approach

Best for:

Databases >10K records
Duplicate rates >15%
Fuzzy matching needs (name variants, company abbreviations)
Environments requiring audit trail and rollback
Multi-object deduplication

Not recommended for:

Tiny databases where manual review is faster than tool setup
One-time micro-cleanups (<500 duplicates)

Tool selection factors:

Factor	Insycle	Dedupely	Koalify	Cloudingo	DemandTools
Primary CRM	HubSpot, Salesforce	HubSpot, Salesforce, Pipedrive	HubSpot only	Salesforce only	Salesforce only
Fuzzy matching	Advanced (any field, custom rules)	Basic-moderate	Moderate (HubSpot-native)	Advanced	Advanced
Pricing entry	~$30/mo (30K records)	Free tier available	~50% cheaper than competitors for large portals	~$15/user/mo	~$20/user/mo
Rollback	Yes	Limited	Yes	Yes	Yes
Best fit	Complex rules, multi-object	Simple dedup, budget-conscious	HubSpot-native automation	Salesforce power users	Salesforce admins with complex needs

3) Benchmarks & Standards

How to Use Benchmarks

Benchmarks are guidelines, not rules. Always:

Start with benchmark as baseline
Adjust based on client-specific data
Validate against their actual numbers when available
Document deviations and rationale

Duplicate Rate Benchmarks

Metric	Low	Typical	High	Notes
Contact duplicate rate	5-10%	15-25%	30-50%	Uncleaned databases average 20-30% [7]
Lead duplicate rate	10-15%	20-30%	40-60%	Higher than contacts due to form submissions and imports
Account duplicate rate	3-5%	8-15%	20-30%	Lower volume but higher impact per duplicate
Post-cleanup target rate	<1%	1-3%	3-5%	1% is the achievable industry standard; 22% of organizations meet it [1]
New duplicate creation rate	<1%/month	2-5%/month	>5%/month	With prevention rules active, target <1%

Source: Landbase Duplicate Record Rate Statistics 2026 [1]

Interpretation:

Below low: Either very clean data governance already exists, or the audit methodology is too narrow (only checking exact email matches)
Above high: Indicates systemic data entry problems -- multiple uncontrolled entry points, no validation rules, extended imports without dedup checks

Matching Accuracy Benchmarks

Metric	Good	Warning	Red Flag
False positive rate (records flagged as duplicates that are not)	<2%	2-5%	>5%
False negative rate (true duplicates missed by matching rules)	<10%	10-20%	>20%
Match confidence threshold for auto-merge	95-100%	85-94%	<85%
Match confidence threshold for review queue	70-94%	60-69%	<60%

Source: Insycle deduplication best practices [4], Data Ladder fuzzy matching guide [8]

Interpretation:

False positive rate >5%: Matching rules are too loose. Tighten fuzzy thresholds, add secondary confirmation fields (e.g., require email OR company+name match, not just name alone)
False negative rate >20%: Matching rules are too tight. Consider adding fuzzy matching, phonetic matching, or lowering similarity thresholds

Data Decay Benchmarks

Metric	Rate	Impact
B2B contact data decay	~70% per year [9]	Even clean databases accumulate stale records that generate new duplicates when re-entered from fresh sources
Job title change frequency	~30% per year	Title changes on existing records create matching confusion
Company name/domain changes	~5-10% per year	Mergers, acquisitions, and rebrands create account-level duplicates
Email address changes	~20-30% per year	People change jobs, companies change domains

Source: Various B2B data quality reports [9]

Quick Reference Thresholds

Question	Good	Warning	Red Flag
What duplicate rate is acceptable post-cleanup?	<3%	3-5%	>5%
How many records should I review manually in test batch?	50+ pairs	25-50	<25
How long before duplicates return without prevention?	6+ months	2-6 months	<2 months
What batch size for extended merges?	500-1000	1000-3000	>3000
How often should recurring scans run?	Weekly	Monthly	Quarterly or never

4) Calculations & Scoring

Formula Quick Reference

Calculation	Formula	Example
Duplicate rate	`(duplicate records / total records) x 100`	5,000 duplicates / 25,000 total = 20%
Records after merge	`total records - (duplicate pairs x merge ratio)`	25,000 - (2,500 x 1) = 22,500
False positive rate	`(false matches / total flagged matches) x 100`	8 false / 200 flagged = 4%
Estimated time saved	`(duplicates resolved x minutes per manual resolution) / 60`	2,500 x 15 min / 60 = 625 hours
Cost of duplicates	`duplicate records x cost per bad record`	5,000 x $10 = $50,000

Duplicate Rate Calculation

Formula:

Duplicate Rate = (Number of Duplicate Records / Total Records in Object) x 100

Variables explained:

Number of Duplicate Records = Count of records identified as duplicates by matching rules (not the number of duplicate pairs -- count each extra record beyond the first)
Total Records in Object = Total contacts, leads, or accounts in the CRM object

Worked Example:

Scenario: Mid-market SaaS company, 30,000 contacts in Salesforce

Given:
- Total contacts: 30,000
- Duplicate scan identifies 3,200 duplicate pairs (6,400 records involved)
- Each pair merges 2 → 1, so 3,200 records to be removed

Calculate:
- Duplicate rate: (6,400 / 30,000) x 100 = 21.3%
- Records after merge: 30,000 - 3,200 = 26,800
- Reduction: 10.7% fewer records

Validation:

A 20-25% duplicate rate is typical for an uncleaned B2B database [7]
If the calculated rate is below 5%, the matching rules may be too strict
If above 40%, verify the matching rules are not producing excessive false positives

ROI Estimation

Formula:

Annual ROI = (Time Savings + Revenue Recovery + Tool Cost Savings) - Project Cost

Variables explained:

Time Savings = Hours saved per rep per week x hourly cost x number of reps x 52 weeks. Sales reps lose approximately 550 hours annually to inaccurate CRM data [10]
Revenue Recovery = Percentage of revenue lost to bad data x annual revenue. Companies lose an estimated 10-12% of revenue to poor CRM data quality [11]
Tool Cost Savings = Reduction in CRM license costs if deduplication reduces record count below a pricing tier
Project Cost = Consulting fees + tool licensing + internal time commitment

Worked Example:

Scenario: $15M ARR B2B SaaS, 20 sales reps, 50,000 CRM records, 22% duplicate rate

Given:
- Reps save 1 hour/week on duplicate-related confusion
- Fully loaded rep cost: $75/hour
- Conservative revenue recovery: 1% (not the full 10-12%, just direct duplicate impact)
- Dedup tool: $200/month
- Project cost: $15,000

Calculate:
- Time savings: 1 hr x $75 x 20 reps x 52 weeks = $78,000/year
- Revenue recovery: 1% x $15,000,000 = $150,000/year
- Tool cost: $2,400/year
- Net annual ROI: ($78,000 + $150,000 - $2,400) - $15,000 = $210,600

Validation:

If ROI exceeds $500K for a company under $10M ARR, the revenue recovery percentage is likely too aggressive
Time savings alone should justify the project for most mid-market companies

Match Confidence Scoring

Scoring Rubric:

Criterion	Points	Threshold
Exact email match	50 pts	Identical email address (case-insensitive)
Fuzzy name match (>90% Jaro-Winkler)	20 pts	First + last name within 90% similarity
Company name match (>85% similarity)	15 pts	After normalization (trim, lowercase, remove Inc/LLC)
Phone number match	10 pts	After standardization (remove spaces, dashes, country code)
Domain match	5 pts	Email domain or website domain matches
Total	100 pts

Tier Thresholds:

Auto-merge: 95+ points (exact email + at least one confirming field)
Review queue: 70-94 points (strong fuzzy match, needs human verification)
No action: Below 70 points (insufficient evidence of duplication)

Note: These weights are a starting point. Adjust based on client's data quality and matching patterns.

5) Edge Cases & Deep Dives

Edge Case 1: Free Email Domains (Gmail, Yahoo, Hotmail)

Scenario:

Two contacts share the email address format [email protected], or multiple contacts at different companies all use @gmail.com domains. Matching on email alone would incorrectly flag different people as duplicates, or conversely, a person using a personal Gmail across multiple form fills would not match their corporate email record.

Challenge:

Free email domains cannot be used as company-level grouping signals. "[email protected]" at Company A is a different context than "[email protected]" at Company B. But they may also be the same person who submitted two forms at different times.

Approach:

Exclude free email domains from domain-based matching rules
For records with free emails, require name + company match as secondary criteria
Never auto-merge records where both have free email domains -- route to review queue
Flag free email contacts for data enrichment (use ZoomInfo or Apollo to find corporate email)

Fallback assumptions:

Missing Data	Use This Instead	Source
Corporate email	Free email + company name + full name match at >90%	Requires 3-field match to compensate for missing anchor field
Company name on free-email record	IP-based enrichment or form field "Company"	Many form fills with free emails have company populated separately

Key validation:

Check the false positive rate specifically for free-email matches. It should be reviewed separately from the overall false positive rate, as it will be higher.

Edge Case 2: HubSpot-Salesforce Integration Active During Merge

Scenario:

Client runs both HubSpot (marketing) and Salesforce (sales) with bidirectional sync enabled. Merging contacts in HubSpot while the integration is active can create orphaned records in Salesforce, break association mappings, or trigger sync conflicts that re-create deleted records.

Challenge:

HubSpot cannot merge company records when the Salesforce integration is active [7]. Contact merges may propagate unpredictably depending on sync field mappings and conflict resolution settings. A merge in one system does not automatically merge the corresponding record in the other.

Approach:

Document the full sync field mapping before starting (which fields sync, which direction, conflict resolution)
For company/account merges: pause the integration, merge in the primary system, then re-enable and verify
For contact merges: test 5-10 merges with sync active and verify behavior in both systems
Merge in the system of record first (usually Salesforce for sales data, HubSpot for marketing data)
After extended merge in primary system, run duplicate scan in secondary system to catch orphans

Fallback assumptions:

Missing Data	Use This Instead	Source
Sync field mapping documentation	Export HubSpot Settings > Integrations > Salesforce field mappings	Available in HubSpot UI
Conflict resolution settings	Default is "most recent update wins"	Verify in HubSpot sync settings before proceeding

Key validation:

After merge, compare record counts in both systems. Salesforce contact count and HubSpot contact count should be within 5% of each other (exact match is unlikely due to different counting logic).

Edge Case 3: Bulk Import Creates Mass Duplicates

Scenario:

A list import (from event, purchased list, or partner) creates hundreds or thousands of duplicates in a single batch because the import was not deduplicated against existing records before upload.

Challenge:

The duplicates from a single import often have identical data quality (same source, same timestamp), making master record selection ambiguous. If the import included new data not in existing records (e.g., event attendance, specific campaign response), you need to preserve that data during merge.

Approach:

Isolate import-created records using created date and source field
Run deduplication scan scoped to "import records vs. existing records" (not import records vs. each other)
Set survivorship rule: existing record is always master (it has historical activity)
Set field-level rule: for fields populated in import but empty in existing record, take import value
After merge, verify campaign membership and list associations transferred correctly

Fallback assumptions:

Missing Data	Use This Instead	Source
Source field not populated on import	Use created date range to identify import batch	Filter for records created within the import time window
No campaign association on import records	Manually associate surviving records with original campaign	Preserves attribution for the event/list

Key validation:

Count records in the import campaign before and after merge. Campaign member count should remain the same (just pointing to surviving records instead of duplicates).

Edge Case 4: Duplicate Accounts with Different Ownership

Scenario:

Two account records exist for the same company, each owned by a different sales rep, each with different opportunities and activity history. Merging would reassign all opportunities to one rep.

Challenge:

This is not just a data problem -- it is a territory and compensation problem. Merging accounts without sales leadership alignment can cause rep conflicts and trust issues.

Approach:

Flag multi-owner account duplicates separately from standard duplicates
Route to sales leadership for ownership decision before merge
Document the merge's impact: which opportunities move, which rep loses account ownership
Consider timing: merge after quarter close if active deals exist on both accounts
Update territory assignments and routing rules post-merge to prevent re-creation

Fallback assumptions:

Missing Data	Use This Instead	Source
Territory rules not documented	Ask sales leadership which rep "should" own the account based on current rules	This is a business decision, not a data decision
Active opportunity impact unclear	Run report: opportunities by account for both duplicate accounts, show to sales manager	Let stakeholders see the full picture before deciding

Key validation:

After merge, verify all opportunities are associated with surviving account and correct owner. Run pipeline report pre- and post-merge to confirm no revenue dropped.

Edge Case 5: Records with Conflicting Data Across Fields

Scenario:

Two duplicate records have conflicting information: Record A has phone number X and title "VP Sales", Record B has phone number Y and title "Director of Sales". Both could be correct at different points in time, or one could be wrong.

Challenge:

Standard survivorship rules (most recent, longest value) may not pick the right answer. A title change from "Director" to "VP" is progression; "VP" to "Director" would be unusual. Phone numbers may both be valid (office vs. mobile).

Approach:

For title fields: prefer the value from the most recently active record (last activity date, not last modified)
For phone fields: if tool supports it, map to separate fields (Phone to Phone, other phone to Mobile) rather than overwriting
For address fields: prefer the record with verified/enriched data source
When neither value is clearly better: concatenate into a "needs review" note field and flag for manual cleanup

Key validation:

Spot-check 20-30 records where field conflicts existed. Verify the surviving value makes business sense. If more than 10% of spot-checked records have wrong surviving values, revisit survivorship rules.

References

[1] Landbase - Duplicate Record Rate Statistics: 32 Key Facts [2] Gartner - Data Quality Market Survey (via Plauti) [3] Salesforce - Things to Know About Duplicate Rules [4] Insycle - CRM Deduplication: Why Picking the Right Master Record is Critical [5] Landbase - 92% of Duplicates Created During Registration [6] Hubsessed - The HubSpot Deduplication Tool Comparison Guide [7] Insycle - HubSpot Deduplication & Integration Tools [8] Data Ladder - Fuzzy Matching 101: The Complete Guide [9] FindStack - CRM Statistics [10] Validity - How Poor Data Quality is Sabotaging Your Business [11] Grazitti Interactive - Bad Data Can Cost Over 12% of Revenue

1) Core Concepts​

What Is a Duplicate Record?​

Matching Theory: How Deduplication Identifies Duplicates​

Data Survivorship: Building the Golden Record​

Duplicate Prevention vs. Duplicate Cleanup​

2) Decision Frameworks​

Approach Selection Matrix​

Scoping Factors​

Native CRM Tools Approach​

Third-Party Tool Approach​

3) Benchmarks & Standards​

How to Use Benchmarks​

Duplicate Rate Benchmarks​

Matching Accuracy Benchmarks​

Data Decay Benchmarks​

Quick Reference Thresholds​

4) Calculations & Scoring​

Formula Quick Reference​

Duplicate Rate Calculation​

ROI Estimation​

Match Confidence Scoring​

5) Edge Cases & Deep Dives​

Edge Case 1: Free Email Domains (Gmail, Yahoo, Hotmail)​

Edge Case 2: HubSpot-Salesforce Integration Active During Merge​

Edge Case 3: Bulk Import Creates Mass Duplicates​

Edge Case 4: Duplicate Accounts with Different Ownership​

Edge Case 5: Records with Conflicting Data Across Fields​

References​

1) Core Concepts

What Is a Duplicate Record?

Matching Theory: How Deduplication Identifies Duplicates

Data Survivorship: Building the Golden Record

Duplicate Prevention vs. Duplicate Cleanup

2) Decision Frameworks

Approach Selection Matrix

Scoping Factors

Native CRM Tools Approach

Third-Party Tool Approach

3) Benchmarks & Standards

How to Use Benchmarks

Duplicate Rate Benchmarks

Matching Accuracy Benchmarks

Data Decay Benchmarks

Quick Reference Thresholds

4) Calculations & Scoring

Formula Quick Reference

Duplicate Rate Calculation

ROI Estimation

Match Confidence Scoring

5) Edge Cases & Deep Dives

Edge Case 1: Free Email Domains (Gmail, Yahoo, Hotmail)

Edge Case 2: HubSpot-Salesforce Integration Active During Merge

Edge Case 3: Bulk Import Creates Mass Duplicates

Edge Case 4: Duplicate Accounts with Different Ownership

Edge Case 5: Records with Conflicting Data Across Fields

References