Extends PageRank with a stochastic backoff model where unsatisfied users back off probabilistically through the link graph, capturing realistic navigation and producing more robust authority scores than a uniform random walk.
Patent Overview
- Inventor
- Prabhakar Raghavan
- Assignee
- Verity, Inc.
- Filed
- 2000-10-30
- Granted
- 2004-09-14
- Application Number
- US 09/702,140
The Challenge
The Uniform Random Walk Misrepresents Real Navigation
Classic PageRank models a web surfer as a uniform random walker who follows outbound links with fixed probability and teleports to a random page with the rest. Real users do not behave this way. They back off when content is unsatisfying, return to previous pages, retry from earlier points in their navigation. Authority computed under the uniform-walk model misses the back-pressure that real navigation patterns exert on the link graph.
- Users Back Off, Not Teleport — When users do not find what they want at a page, they typically go back to a previous page in their session rather than jumping to a random destination. Uniform teleportation misrepresents this behavior.
- Dead Ends Distort The Walk — Pages with no outbound links act as dead ends in PageRank, requiring artificial damping or teleportation to keep the walk well-defined. A backoff model handles dead ends naturally because the walker can simply return rather than teleport.
- Need A Walk That Matches Reality — The closer the walk model is to actual user behavior, the more meaningful the resulting authority scores become. The system needs an explicit backoff process built into the walk.
- Convergence Must Still Hold — Adding backoff complicates the math. The walk has to converge to a well-defined stationary distribution even with the backoff dynamics layered on top.
- Compute Cost Should Stay Reasonable — PageRank already runs at web scale. Adding backoff dynamics cannot blow up the iteration cost or the algorithm becomes impractical.
Innovation
Stochastic Backoff As The Walk Dynamics
The patent defines a random walk through the link graph where the walker follows outbound links from the current page most of the time and, with some probability, backs off to a previously visited page rather than teleporting uniformly at random. The stationary distribution of this stochastic-backoff process becomes the authority score, capturing realistic user navigation patterns.
- Build The Directed Graph — Construct a directed graph from the document crawl where nodes are documents and edges are hyperlinks.
- Initialize The Walker — Start the walker at a uniformly random page or at a designated seed page. Maintain a history of visited pages so backoff can target them.
- Forward Step — With probability p, follow an outbound link from the current page chosen uniformly at random. Add the destination to the visited history.
- Backoff Step — With probability (1-p), back off to a previously visited page in the walker's history. The backoff target can be the immediately preceding page or sampled from history with a decay schedule.
- Iterate Until Stable — Repeat forward and backoff steps for many iterations. The fraction of time the walker spends at each page converges to the stationary distribution.
- Derive Authority Score — The stationary distribution becomes the authority score per document. Pages visited frequently under the backoff walk are high-authority; pages rarely visited even with backoff are low-authority.
Realistic Walk Dynamics, Robust Authority
The contribution beyond classic PageRank is a walk model that matches how real users navigate. The backoff mechanism handles dead ends naturally and produces authority scores that better reflect actual page importance under real user behavior.
Backoff Replaces Uniform Teleportation
Where classic PageRank teleports the walker to a uniformly random page when navigation cannot continue or when damping fires, this patent has the walker back off to a previously visited page. The change is small in math but large in behavioral realism.
- Forward Step — Standard outbound-link traversal with probability p. Same as the forward part of classic PageRank.
- Backoff Step — Return to a previously visited page with probability (1-p). Replaces uniform teleportation and matches real navigation patterns.
The walk converges to authority that respects realistic user dynamics.
<\/section>Technical Foundation
The Backoff Walk
The walk is a Markov chain with state that includes both the current page and (potentially) the recent navigation history.
- Forward Probability p — Probability that the walker takes a forward outbound-link step. Typically the dominant operation; backoff is the alternative path.
- Backoff Probability (1-p) — Probability of returning to a previously visited page. Replaces classic PageRank's uniform teleportation.
- Backoff Distribution — How the backoff step chooses which previous page to return to. Can be deterministic (most recent), random (any visited), or weighted (decay by recency).
- Stationary Distribution — The long-run fraction of time the walker spends at each page. Computed by power iteration as in classic PageRank, with the modified transition operator.
Key Insight: Real user navigation is path-dependent. Where the user is now is influenced by where they have been. Classic PageRank's memoryless random walk discards that path dependence. The backoff process restores it cheaply by allowing the walker to return to recent history, which is exactly the behavior real users exhibit when they hit unsatisfying content.
<\/section>The Process
Computing Backoff PageRank
The computation parallels classic PageRank with a modified transition operator that encodes the backoff dynamics.
- Build Transition Operator — Construct the modified transition matrix that includes both forward outbound-link transitions and backoff transitions to recent history.
- Initialize Distribution — Start with a uniform distribution over pages. The walker is equally likely to be anywhere initially.
- Iterate — Apply the transition operator repeatedly to the distribution. After each iteration the distribution shifts closer to the stationary distribution.
- Check Convergence — Measure how much the distribution changes between iterations. Stop when the change drops below the convergence threshold.
- Output Authority Scores — The converged distribution gives per-page authority scores. Higher mass means the walker spends more time there under the backoff dynamics.
What This Means for SEO
What This Means for SEO
Backoff PageRank is one of the variants Google has researched in its quest to model real user behavior in authority computation. The implications for SEO are about what kinds of pages benefit from realistic navigation modeling.
- Pages That Receive Return Visits Score Higher — Backoff dynamics reward pages that users return to after unsuccessful navigation elsewhere. Content that users keep coming back to becomes a recurring waypoint in the walk and accumulates authority.
- Strong Outbound Links Don't Bleed Authority As Much — Under classic PageRank, every outbound link dilutes a page's exported authority. Under backoff, users return to your page after following an outbound link to unsatisfying content, partially restoring the authority. Linking to high-quality external resources is less costly than it appears.
- Hub Pages Benefit — Pages that aggregate links to many resources (well-built hub pages) are natural backoff destinations. Users explore outbound links, get partial satisfaction, and return. The hub captures the return-visit signal repeatedly.
- Dead-End Pages Are Less Penalized — Pages with few or no outbound links are handled naturally by backoff (users just return) rather than penalized as in classic PageRank. Short focused pages that satisfy a specific intent remain viable authority targets.
- Session Depth Compounds Authority — Pages encountered later in a session, where the user has built up navigation history to back off into, contribute more to the authority distribution than uniform-walk theory predicts. Deep-funnel content rewards from session-level engagement.