Identifies mirror sites by analyzing link-graph connectivity patterns and IP-address co-location. Foundational mirror-host detection that prevents mirror replication from inflating the index and skewing rankings.
Patent Overview
- Inventor
- Jeffrey Dean, others
- Assignee
- Google LLC
- Filed
- 1999
- Granted
- 2002-11-26
The Challenge
The Challenge
Mirror hosts serve the same content from different domains. If treated as independent, mirrors inflate the index, dilute link signals, and let one entity occupy multiple result slots. Detection requires reading both link-graph structure and host metadata.
- Mirror Hosts Inflate Index — If each mirror appears independently, the index stores the same content many times. Storage and ranking both suffer.
- Link Signals Get Diluted — Links pointing at one mirror don't reinforce another. Mirror unification consolidates link signals correctly.
- Result Slots Get Captured — Without mirror detection, one entity captures multiple SERP slots through mirrored hosts. Result diversity suffers.
- Mirror Detection Must Scale — Web-scale mirror detection requires efficient algorithms. All-pairs comparison is too slow.
- Pattern Variation Resists Detection — Mirrors vary by branding, ads, and minor content. Detection must focus on connectivity and IP signals, not byte-identical content.
Innovation
How The System Works
The system analyzes link-graph connectivity patterns between hosts, checks IP-address co-location, computes mirror-likelihood scores, clusters mirror hosts, selects a canonical host per cluster, and applies consolidation to ranking and index.
- Enumerate Host Pairs — From link graph, identify candidate host pairs that share many inbound link sources.
- Compare Connectivity Patterns — Per pair, compare inbound and outbound link patterns. Highly similar patterns suggest mirroring.
- Check IP-Address Co-Location — Hosts at the same or related IP addresses signal probable mirroring.
- Compute Mirror Likelihood — Combine connectivity similarity, IP signals, and content similarity into a mirror-likelihood score.
- Cluster Mirrors — Pairs above threshold grouped into mirror clusters. Transitive clustering handles multi-host mirrors.
- Select Canonical Host — Per cluster, select canonical by authority, age, or other criteria.
- Apply In Ranking And Index — Canonical host retained; non-canonicals consolidated. Link signals merged. Result diversity preserved.
Connectivity Plus IP
The patent's load-bearing idea is that mirror detection requires reading both link-graph structure and IP-address metadata. Combined signals catch mirrors that single signals would miss.
Multi-Signal Mirror Detection
Connectivity alone can flag legitimate co-citation as mirroring; IP alone can miss mirrors across providers. Combining signals yields robust detection.
- Link-Graph Connectivity — Inbound and outbound link pattern similarity flags candidate mirrors. Highly correlated patterns are strong signal.
- IP Co-Location — Hosts at same or related IP addresses signal mirroring. Hosting-pattern signal.
- Cluster And Canonicalize — Mirror clusters identified; canonical host selected. Index and ranking consolidate.
Technical Foundation
Technical Foundation
The patent specifies the host-pair enumerator, connectivity comparator, IP analyzer, mirror-likelihood scorer, cluster builder, canonical selector, and consolidation engine.
- Host-Pair Enumerator — From link graph, identifies candidate host pairs with significant shared inbound link sources.
- Connectivity Comparator — Per pair, compares inbound and outbound link patterns. Outputs similarity score.
- IP Analyzer — Checks IP-address relatedness. Same or co-located IPs signal mirroring.
- Mirror-Likelihood Scorer — Combines connectivity, IP, and content similarity into per-pair mirror likelihood.
- Cluster Builder — Pairs above threshold grouped into mirror clusters. Transitive grouping applied.
- Canonical Selector — Per cluster, selects canonical host by authority, age, or other criteria.
The Process
The Process
Mirror detection runs as a batch process over the link graph. Consolidation propagates to index and ranking.
- Enumerate Candidates — Host pairs with shared link sources identified.
- Compare Connectivity — Per pair, link-pattern similarity computed.
- Check IP Signals — Per pair, IP-address relatedness checked.
- Score Mirror Likelihood — Combined signal produces per-pair mirror likelihood.
- Cluster — Pairs above threshold clustered. Transitive grouping applied.
- Select Canonical — Canonical host per cluster selected.
- Apply Consolidation — Index reflects canonical hosts. Link signals merged across cluster.
Quality Control
Quality Control
Mirror detection must avoid false positives that consolidate legitimate-but-similar sites. The patent specifies safeguards.
- Multi-Signal Convergence — Connectivity, IP, and content similarity must converge to flag mirroring. Single-signal flags rejected to reduce false positives.
- Per-Signal Thresholds — Each signal carries a calibrated threshold. Sub-threshold signals don't contribute.
- Canonical Selection Criteria — Multi-criteria canonical selection. Wrong canonical hurts the consolidated entity.
- Adversarial Robustness — Sites that mirror to evade detection may vary surface signals. Detection adapts to known evasion patterns.
- Continuous Recalibration — Per-signal weights and clustering thresholds recalibrate against fresh labeled data.
Real-World Application
Mirror-host detection underpins index consolidation in every modern search engine. The connectivity-plus-IP pattern is the structural template for mirror identification.
- Connectivity-based Primary Signal — Link-graph pattern similarity drives mirror candidate identification.
- IP-confirmed Confirming Signal — IP-address co-location confirms mirror suspicion. Multi-signal reduces false positives.
- Canonical-aware Consolidation Outcome — Canonical host retained; non-canonicals consolidated. Link signals merged correctly.
Why One Canonical Domain Wins
Mirror consolidation selects one canonical per cluster. Operating from one canonical domain consolidates all link and content signal there, rather than distributing across mirrors that the system will collapse anyway.
Why Hosting Infrastructure Matters
IP co-location is a mirror-detection signal. Hosting genuinely independent sites on shared infrastructure can trigger false-mirror flags if connectivity patterns also align. Independent hosting reduces this risk.
<\/section>What This Means for SEO
What This Means for SEO
This patent detects mirror hosts by combining link-graph connectivity similarity with IP co-location, then consolidates mirrors to one canonical and merges their link signals. SEO implication: serving the same content across multiple domains fragments your signals and the system collapses them anyway, so consolidate to one canonical domain.
- One Canonical Domain Wins — Mirror clusters consolidate to a single canonical host with merged link signals. Operating from one domain concentrates all your authority instead of spreading it across mirrors the system will unify.
- Mirrors Fragment Link Authority — Links to one mirror do not reinforce another until consolidation merges them. Running parallel domains dilutes the link signal you could have concentrated on one strong host.
- Connectivity Plus IP Catches Mirrors — Detection combines link-pattern similarity with IP co-location, so mirrors that vary branding and ads are still caught. Surface-level differences between mirrored domains do not hide them.
- Multiple Slots Will Not Stick — The system specifically prevents one entity from capturing multiple SERP slots via mirrors. Spinning up duplicate domains to dominate a results page does not work.
- Independent Hosting Avoids False Flags — IP co-location is a signal, so genuinely independent sites sharing infrastructure with correlated link patterns can risk a false mirror flag. Host distinct properties independently to reduce that risk.
- Multi-Signal Convergence Limits False Positives — Connectivity, IP, and content similarity must converge before mirroring is declared. Legitimate similar sites are usually safe, but the more signals align, the more likely consolidation.
- Canonical Selection Is Out Of Your Hands — The system picks the canonical by authority and age across the cluster. Rather than gambling on which mirror wins, make one domain unambiguously the authoritative source.