Ranks pages by computing graph-distance from a curated set of trusted seed pages, so authority flows along shortest paths in the link graph rather than via raw incoming-link counts and resources close to seeds inherit credibility while distant or isolated resources do not.
Patent Overview
- Filed
- 2012-08-09
- Granted
- 2015-10-20
- Application Number
- US 13/571,089
The Challenge
The Challenge
PageRank treats every link as a vote, but votes from unknown sources accumulate authority without proper grounding. The system needed a way to anchor authority in known-good pages and let it propagate outward by graph distance, so resources close to trust earn credit and resources far from it do not.
- Raw Link Counts Lack Trust Anchor — PageRank rewards heavily-linked pages regardless of whether their link sources are themselves trustworthy. A spam cluster can pump rank into a target page through many low-quality links. The system needs a trust anchor the spam graph cannot reach.
- Seed Pages Carry Curated Authority — A small set of known-good pages (Wikipedia, university homepages, major government sites) carry editorial-curated authority. Using them as seeds anchors the entire graph in trustworthy origin points.
- Graph Distance Is A Trust Decay Metric — Resources one hop from a seed are nearly as trusted as the seed itself. Two hops, somewhat trusted. Twenty hops, barely related. Distance is a natural trust-decay metric that maps cleanly onto how authority should flow.
- Need To Compute Distances At Web Scale — Computing shortest-path distances on a graph with hundreds of billions of edges is non-trivial. The patent must describe an algorithm that scales while preserving the distance interpretation.
- Distance Score Must Combine With Other Signals — Distance is one signal among many. The ranking system must blend it with text-match, freshness, behavioral signals to produce a final ranking. The blending must be tuned so distance influences without dominating.
Innovation
How The System Works
The patent defines a set of curated seed pages, computes the shortest-path distance from any seed to every reachable page in the link graph, assigns each page a distance-based authority score, and uses the score as a ranking input that complements traditional PageRank.
- Curate A Set Of Seed Pages — A small but high-quality set of pages is chosen as seeds. Seeds are pages with editorial-vetted authority: Wikipedia, .edu and .gov homepages, established encyclopedic resources. The set is maintained and refreshed periodically.
- Build The Link Graph — From the crawl, build the directed link graph of the web. Nodes are pages, edges are hyperlinks. The graph is the same one PageRank uses but the algorithm reads it differently.
- Compute Shortest-Path Distances — For each page, compute the shortest-path distance from any seed in the seed set. Distance is measured in hops or in weighted-edge steps depending on the variant. Pages unreachable from any seed receive a maximum distance.
- Convert Distance Into Authority Score — The distance is transformed into a score where close-to-seed produces high authority and far-from-seed produces low. A typical transform is exponential decay so the score drops sharply with distance.
- Combine With Other Ranking Signals — The distance authority score is exposed as a feature in the ranker, alongside PageRank, text-match, freshness, and behavioral signals. The model weights it appropriately per query type.
- Update On Crawl Refresh — As the link graph evolves, distances change. New links can shorten the distance from a seed; deleted links can lengthen it. The signal updates each crawl cycle.
- Refine Seeds Over Time — The seed set is not static. Seeds that turn out to spread authority too widely or too narrowly are adjusted. The patent contemplates a feedback process for seed curation.
Distance From Trust Anchors
The patent's load-bearing idea is to anchor authority in a small curated seed set and let it propagate outward by shortest-path distance. The signal cannot be inflated by spam clusters because those clusters cannot get close to the seeds without genuine editorial endorsement.
Trust Has A Source
Where PageRank treats trust as emerging from the graph itself, distance-based ranking treats trust as having a source: the seed set. Authority flows from the source, decaying with distance. The shift in interpretation rules out a whole class of graph-level manipulation.
- Curated Seeds As Trust Source — A small set of editorially-vetted pages anchors the entire authority distribution. Spam networks cannot self-elect into the seed set; they must earn editorial endorsement, which is structurally expensive.
- Shortest-Path As Trust Path — Distance is computed as the shortest path in the link graph. Pages close to seeds inherit strong authority; pages far from seeds inherit little. The metric is natural and intuitively interpretable.
- Exponential Decay With Distance — Authority drops sharply with each additional hop. A page two hops from a seed has substantially less authority than one hop, and a page twenty hops away has effectively none. This shape makes the signal manipulation-resistant.
Technical Foundation
Technical Foundation
The patent specifies the seed curation process, the distance computation algorithm, the storage architecture for distance values, and the integration with the ranking pipeline.
- Seed Set Curation — Seeds are chosen by editorial process and refreshed periodically. The set is bounded (thousands to low millions of pages) so distance computation stays tractable. Selection criteria balance authority, neutrality, and coverage across topics.
- Distance Algorithm — Distances are computed via breadth-first search from each seed, then per-page minimum across seeds. The algorithm parallelizes efficiently and runs incrementally as the link graph changes.
- Weighted-Edge Variant — An optional variant assigns weights to links (e.g., based on anchor text quality or link prominence) so distance becomes weighted shortest path rather than hop count. This variant is more expensive but more precise.
- Distance-To-Authority Transform — Distance is transformed to an authority score via a decay function. Common transforms include 1/(1+d), exp(-alpha*d), or piecewise functions tuned per query type.
- Per-Page Distance Storage — Each page's minimum-distance-to-seed is stored in the per-page feature record. The store is updated incrementally on each crawl cycle so the signal stays fresh.
- Ranker Integration — The authority score is exposed as one feature among many. The learned ranker decides its weight per query type. Navigational queries weight it less; informational queries weight it more.
The Process
The Process
The pipeline runs as a periodic batch alongside PageRank. Each crawl refresh recomputes distances, updates per-page scores, and writes new feature values to the ranker's feature store.
- Refresh The Seed Set — Editorial review periodically updates the seed set: adding new authoritative resources, removing seeds that have drifted in quality. The refresh is infrequent (months) compared to the crawl.
- Crawl And Build The Link Graph — The crawler refreshes the link graph. Outbound links are extracted and the graph is normalized for the distance algorithm.
- Compute Per-Seed BFS — Breadth-first search runs from each seed in parallel. Each BFS produces a per-page distance map for that seed.
- Aggregate To Minimum Distance — Per-page distances across all seeds are aggregated to the minimum. Each page's final distance is the closest seed's hop count.
- Apply Authority Transform — The decay function converts distances to authority scores in a bounded range. Pages unreachable from any seed receive the minimum score.
- Publish To Feature Store — Per-page authority scores are written to the ranker's feature store. The next ranking refresh consumes them.
- Monitor And Iterate — Distribution monitoring catches anomalies. If a seed turns out to be spreading authority unexpectedly far or narrowly, the seed set is reconsidered. The process is incremental and self-correcting.
Quality Control
Quality Control
Distance-based authority is robust by design but still requires safeguards against seed manipulation, distance pathologies, and integration errors.
- Seed Curation Discipline — Seeds are chosen by editorial process, not by algorithm. This prevents seed-set manipulation, which would be a catastrophic single point of failure. The process is documented and audited.
- Distance Cap And Decay — Authority decays sharply with distance and is capped at a minimum below which all pages are treated as equally distant. This prevents pathological long-tail distance values from producing weird ranking effects.
- Multiple-Seed Aggregation — Each page's authority is the minimum across all seeds, not the average. This means a page only needs to be close to one seed to earn authority, which is robust to seed-specific issues.
- Anomaly Monitoring — Sudden authority-score swings (per page or in aggregate) are flagged for investigation. Most are pipeline issues; a few reveal real shifts in the link graph that warrant attention.
- Combined With Other Signals — Distance authority is one feature, not the only one. Strong text match and behavioral signal can lift a page with mediocre distance authority, preventing the signal from becoming dictatorial.
Real-World Application
Distance-based authority is one of the load-bearing trust signals in Google's stack and a conceptual ancestor of TrustRank-style ideas. Its primitives appear in Google's anti-spam framework and in publicly-confirmed guidance about earning authority through editorial endorsement.
- Seed-anchored Authority Source — A small curated seed set anchors the authority distribution for the entire web. Spam networks cannot self-elect into seeds, structurally limiting their reach.
- Shortest-path Distance Metric — Authority decays with graph distance from the closest seed. Resources within a few hops of a seed inherit strong authority; distant resources do not.
- Per-page Score Granularity — Each page has its own distance-derived authority score, which is exposed as one feature among many to the learned ranker.
Why Editorial Endorsement Matters
Getting a link from an editorial site close to the seed set is worth orders of magnitude more than dozens of links from far-distance sites. The patent's primitives are the technical reason editorial press coverage and authoritative citations are so heavily rewarded by ranking.
Why Link Farm Distance Stays Far
Coordinated link networks build dense clusters of mutually-linking sites, but none of them are close to the seed set. The shortest-path distance from any seed to a farm node stays high regardless of how many links the farm internally builds. The patent's distance metric is why link farms cannot manufacture trust.
<\/section>What This Means for SEO
What This Means for SEO
Graph-distance ranking rewards being close to authoritative seed nodes in the link graph, not just having a high raw link count.
- Distance From Seeds Beats Total Link Count — A page two hops from a top authority site beats a page with a hundred links from low-authority neighbors. The graph topology matters more than the count.
- Tight Topical Neighborhoods Reinforce Each Other — When authoritative sites in your niche link to each other, being in that neighborhood is a strong signal. Get cited within your cluster, even if the cites are not direct links.
- Avoid Toxic Neighborhoods — Being close in the link graph to spam clusters bleeds into your distance score. Audit your inbound link sources and disavow when needed.