Identifies pages topically related to a given page by analyzing link-graph co-citation, anchor-text patterns, and topical content alignment. Foundational related-pages discovery that powers 'related results' and topical-cluster identification.
Patent Overview
- Inventor
- Jeffrey Dean, others
- Assignee
- Google LLC
- Filed
- 2000
- Granted
- 2003-12-16
The Challenge
The Challenge
Given a page, the system must surface other pages topically related to it. Pure link analysis misses topical context; pure content analysis misses authority signals. Combining link, anchor, and content signals yields robust related-page discovery.
- Link Co-Citation Reveals Topical Proximity — Pages frequently linked together by third parties are topically related. Co-citation is a primary signal.
- Anchor Text Reveals Vocabulary Sharing — Pages linked with similar anchor-text vocabulary share topical context.
- Content Alignment Confirms Topical Match — Topical-model alignment between the target page and candidates confirms or rejects topical proximity.
- Authority Differences Matter — Authoritative pages on a topic relate differently to other authoritative pages versus to low-quality pages. Quality matters in relatedness.
- Discovery Must Scale — Per-page, fast related-pages discovery across billions of candidates required. Efficient candidate generation is structural.
Innovation
How The System Works
The system extracts per-target signals (co-citation, shared anchor vocabulary, topical model), generates related-page candidates from the link graph, scores candidates by multi-signal alignment, ranks them, and returns top related pages.
- Identify Target Page — Per-query or per-page-context, target page identified.
- Extract Co-Citation Signals — From link graph, identify pages frequently linked alongside the target by third parties.
- Extract Anchor-Vocabulary Patterns — Anchors pointing at the target reveal vocabulary; pages receiving similar anchors are candidates.
- Compute Topical Models — Per target and per candidate, compute topical models from content. Model alignment quantifies topical proximity.
- Score Candidates — Per candidate, combine co-citation, anchor-vocabulary, and topical-model alignment into relatedness score.
- Apply Quality Filter — Per-candidate quality gate. Low-quality candidates filtered or down-weighted.
- Rank And Return — Top-N candidates by relatedness score returned. Supports related-results, topic-cluster identification, and discovery features.
Multi-Signal Relatedness
The patent's load-bearing idea is that relatedness requires multiple aligned signals. Co-citation, anchor vocabulary, and topical content alignment combine into a relatedness score that single signals cannot match.
Convergent Signals Beat Single Signals
Two pages co-cited but topically divergent aren't related; two pages topically aligned but unconnected aren't related either. Convergent signals across link, anchor, and content yield robust relatedness.
- Co-Citation Analysis — Third-party co-citation reveals topical proximity. Pages frequently linked together are likely related.
- Anchor-Vocabulary Sharing — Pages receiving similar anchor-text vocabulary share topical context. Anchor patterns are a related-page signal.
- Topical-Model Alignment — Content topical models compared between target and candidates. Alignment confirms or rejects relatedness.
Technical Foundation
Technical Foundation
The patent specifies the link-graph queries, co-citation analyzer, anchor-vocabulary matcher, topical-model builder, multi-signal combiner, and quality filter.
- Link-Graph Queries — Per-target, retrieves inbound link sources, outbound link targets, and co-cited pages.
- Co-Citation Analyzer — Identifies pages frequently linked alongside target by third parties. Outputs co-citation frequency per candidate.
- Anchor-Vocabulary Matcher — Anchors pointing at target define vocabulary; candidates receiving similar vocabulary identified.
- Topical-Model Builder — Per page, builds topical content model. Models compared between target and candidates for alignment.
- Multi-Signal Combiner — Combines co-citation, anchor-vocabulary, topical-alignment scores into per-candidate relatedness score.
- Quality Filter — Per-candidate quality gate. Low-quality candidates filtered or down-weighted before final ranking.
The Process
The Process
Related-page discovery runs at query time or as a precomputed batch. Per-target signals combine for fast candidate ranking.
- Identify Target — Per query or page context, target page identified.
- Co-Citation Lookup — Link-graph query returns pages co-cited with target.
- Anchor-Vocabulary Match — Anchor patterns pointing at target define vocabulary; matching candidates identified.
- Topical-Model Compare — Per candidate, topical-model alignment computed against target.
- Combine Signals — Multi-signal combiner produces per-candidate relatedness score.
- Quality Filter — Quality gate filters or down-weights low-quality candidates.
- Return Top-N — Top-N related pages by combined score returned.
Quality Control
Quality Control
Related-page discovery must avoid surfacing spam, off-topic, or manipulated candidates. The patent specifies safeguards.
- Quality Filter — Per-candidate quality gate. Low-quality candidates filtered before final ranking.
- Anchor-Spam Detection — Manipulated anchor patterns flagged. Anchor-vocabulary contribution adjusted accordingly.
- Topical-Alignment Threshold — Minimum topical alignment required for candidate to count as related. Loose alignment filtered.
- Co-Citation Diversity — Co-citation must come from diverse sources. Single-source co-citation flagged as low-signal.
- Continuous Calibration — Per-signal weights and quality filters recalibrate periodically against fresh labeled data.
Real-World Application
Related-page discovery underpins related-results, topical-cluster identification, and content-recommendation systems. The primitives apply across modern search and discovery infrastructure.
- Multi-signal Relatedness Method — Co-citation, anchor vocabulary, topical alignment combine. No single signal dominates.
- Graph-aware Discovery Method — Link graph drives candidate generation. Content signals confirm.
- Quality-filtered Result Quality — Per-candidate quality gate filters low-quality results. Final relatedness reflects both topic and quality.
Why Topical Cluster Matters
Related-page discovery surfaces topical neighbors. Building a content presence across the topical cluster (multiple related pages, internally linked) builds the kind of co-citation and anchor-vocabulary signals that drive related-page surfacing.
Why Earned Anchors Drive Discovery
Anchor-vocabulary matching is a discovery signal. Pages that earn diverse, topically aligned inbound anchors are more easily discovered as related pages by the system.
<\/section>What This Means for SEO
What This Means for SEO
This patent identifies topically related pages by combining link co-citation, shared anchor-text vocabulary, and topical-model alignment, gated by quality. SEO implication: build a coherent topical cluster of internally linked pages that earn diverse, on-topic anchors so the system reads you as a related authority.
- Convergent Signals Define Relatedness — Relatedness requires link co-citation, anchor-vocabulary, and topical alignment to converge; co-cited but off-topic pages do not count. Build genuine topical depth so all three signals point the same way.
- Co-Citation Reveals Topical Neighbors — Pages frequently linked together by third parties are read as related. Earning citations alongside established authorities in your topic positions you within that neighborhood.
- Topical Clusters Build The Signal — Creating multiple related, internally linked pages on a topic generates the co-citation and anchor-vocabulary patterns that drive related-page surfacing. Cluster your content rather than publishing isolated one-offs.
- Shared Anchor Vocabulary Aids Discovery — Pages receiving similar anchor-text vocabulary are identified as candidates. Earning diverse, topically aligned anchors makes your pages discoverable as related to the right neighbors.
- Quality Filters Apply — A per-candidate quality gate filters or down-weights low-quality pages before final ranking. Relatedness alone does not surface a thin page; it must also clear quality.
- Co-Citation Diversity Is Required — Single-source co-citation is treated as low signal. Being linked alongside authorities by many independent sources is far stronger than repeated co-citation from one place.
- Earned Anchors Drive Inclusion — Anchor-vocabulary matching is a discovery mechanism, and diverse earned anchors are hard to fake. Focus on being genuinely cited within your topic to be pulled into related-results surfaces.