Microsoft's explicit mathematical formulation of topical authority at the site level. A site's authority for a topic equals its global site rank multiplied by its click signal multiplied by the square of its per-topic page share.
Patent Overview
- Inventor
- Li Xiong, Chuan Hu, Arnold Overwijk, Junaid Ahmed
- Assignee
- Microsoft Technology Licensing LLC
- Filed
- 2019-07-02
- Granted
- Published January 7, 2021
The Challenge
The Challenge
Document-level ranking treats every page in isolation and ignores the site it lives on. A high-quality article on a site that covers a hundred unrelated topics gets the same site-level credit as the same article on a site that has built deep, coherent coverage of one topic. The challenge: quantify, at the site level, how authoritative a site is for a specific topic, then use that quantity to re-rank documents from that site for queries about that topic.
- Site-Level Topic Signal Is Missing — Per page, traditional ranking reads the document and its inlinks. The site's overall depth on the topic is not represented as a measurable quantity.
- Global Authority Conflates Topics — Per site, PageRank-style site rank is one number for the whole site. It cannot tell whether the site's authority comes from sports, finance, or cooking.
- Shallow Coverage Looks The Same As Deep — Per topic, a site that publishes one article about it scores the same topic relevance as a site that has built five hundred articles around it, when only document signals are read.
- Key Phrases Are Noisy — Per document, naive phrase extraction produces near-duplicates, fragments, and boilerplate. Without dedup, the system cannot identify which topics a site actually covers.
- Click Signals Alone Reward Brand — Per query, clicks favor large brands across every topic regardless of depth. A signal that combines clicks with topical depth is needed to surface true topic specialists.
Innovation
How The System Works
The system extracts candidate key phrases from each document with a machine-learned model, deduplicates them across the site to identify the topics the site actually covers, computes per-topic page share, and combines page share with site rank and click signal to produce a per-(site, topic) authority score that re-ranks documents at query time.
- Extract Key Phrase Candidates — Per document, an ML-trained model extracts candidate key phrases that describe what the document is about.
- Deduplicate Across The Site — Per site, near-duplicate and overlapping phrases are merged so each distinct topic is counted once, not many times.
- Compute Per-Topic Page Share — Per (site, topic) pair, T(s,t) measures the fraction of the site's pages that cover topic t.
- Score Global Site Rank — Per site, SR(s) captures global link-based authority in a PageRank-style site-level rank.
- Score Click Signal — Per site, C(s) captures engagement strength from observed query and click behavior.
- Compute Topical Authority — Per (site, topic) pair, A(s,t) = SR(s) times C(s) times T(s,t) squared. The squaring strongly rewards topical depth.
- Boost Documents At Query Time — Per query, documents from sites with high A(s,t) for the query's topic are ranked higher than documents from sites with the same content but weaker topical authority.
Topical Authority Has A Formula
The patent's load-bearing contribution is that topical authority is no longer a rhetorical concept. It is a specific product of three measurable factors, with the topic-share factor squared to punish shallow coverage and reward depth.
A(s,t) = SR(s) times C(s) times T(s,t) squared
Per (site, topic) pair, topical authority is the product of global site rank, click signal, and the square of per-topic page share. All three factors must be non-trivial for the product to be meaningful.
- SR(s) Global Site Rank — Per site, PageRank-style global authority from the link graph.
- C(s) Click Signal — Per site, engagement strength from observed user behavior.
- T(s,t) Squared Topic Share — Per (site, topic) pair, the fraction of pages on topic t, squared.
Technical Foundation
Technical Foundation
The patent specifies key phrase candidate extraction, ML-trained deduplication, per-topic page-share computation, site rank aggregation, click-signal aggregation, and the multiplicative authority function with squared topic-share.
- Key Phrase Candidate Extractor — Per document, a machine-learned model scans document text and emits ranked candidate key phrases that summarize what the document is about.
- Phrase Deduplication Model — Per site, an ML-trained dedup step merges near-duplicate, overlapping, and morphologically related candidates so each distinct topic is represented once.
- Per-Topic Page Share T(s,t) — Per (site, topic) pair, the number of pages on the site that cover topic t is divided by the total page count, producing a fraction in [0,1].
- Site Rank SR(s) — Per site, a PageRank-style aggregation of the link graph at site granularity captures global authority.
- Click Signal C(s) — Per site, observed click-through and engagement statistics across queries aggregate into a single site-level signal.
- Topical Authority Function — Per (site, topic) pair, A(s,t) = SR(s) times C(s) times T(s,t) squared combines the three factors with the squared topic-share term that rewards depth.
The Process
The Process
From the crawl, the system builds per-site key phrase inventories, deduplicates them into topics, computes the three factors per site, multiplies them into A(s,t), and applies the score at query time to boost documents from sites with strong topical authority for the query topic.
- Crawl And Index Documents — Per document, content is ingested and prepared for phrase extraction.
- Extract Candidate Phrases — Per document, the ML extractor emits ranked candidate key phrases.
- Deduplicate Into Topics — Per site, candidates are merged so the inventory reflects distinct topics, not noise.
- Compute Per-Topic Page Share — Per (site, topic) pair, T(s,t) is the fraction of the site's pages on that topic.
- Aggregate Site Rank And Clicks — Per site, SR(s) and C(s) are computed and stored.
- Compute Topical Authority — Per (site, topic) pair, A(s,t) is computed as the product with T(s,t) squared.
- Boost At Query Time — Per query, documents from sites with high A(s,t) for the query topic are promoted in the ranking.
Quality Control
Quality Control
A multiplicative score with a squared term and an ML-extracted topic inventory needs floors, dedup discipline, and sanity checks to avoid noise dominating the signal.
- Minimum Page-Count Floor — Per site, T(s,t) is computed only on sites with enough pages that page-share is statistically meaningful, not on tiny sites where one page produces a high share.
- Phrase Dedup Quality — Per site, the dedup model must merge near-duplicates aggressively, otherwise T(s,t) inflates artificially by counting the same topic many times.
- Three-Factor Presence — Per (site, topic) pair, all three factors must clear floors. A site with high T(s,t) but zero SR(s) or C(s) produces a near-zero product and does not surface.
- Click Signal Normalization — Per site, C(s) is normalized so that pure traffic volume does not let a single huge brand dominate every topic with weak depth.
- Topic Granularity Check — Per topic, granularity is kept consistent so that a site is not credited at both the broad topic and the narrow sub-topic for the same pages.
Real-World Application
Site-topical authority ranking is the mechanical reason topic-specialist sites outrank generalist sites on per-topic queries even when the generalist has stronger global authority. The squaring of T(s,t) is the lever that makes depth dominate breadth.
- A(s,t) = SR(s) * C(s) * T(s,t)^2 Topical Authority Formula — Three multiplicative factors with topic share squared.
- Per (site, topic) Score Granularity — Every site carries its own authority for every topic it covers.
- Squared term Depth Lever — A 50% topic-share site scores one hundred times the topic-share factor of a 5% site.
Why Specialist Sites Beat Generalists Per Topic
Per topic, the squared T(s,t) term means a specialist with 50% page share on the topic scores one hundred times the topic-share factor of a generalist with 5% share. Even with stronger SR and C, the generalist cannot close that gap without dedicating substantially more of its content to the topic.
Why The Patent Made The Concept Quantitative
Per (site, topic) pair, the patent moved topical authority from a marketing concept to a measurable function. SEO discussions that reference 'topical authority' have a specific mathematical anchor here, even if production rankers use variants and refinements.
<\/section>What This Means for SEO
What This Means for SEO
Microsoft's patent makes topical authority a specific quantity with a formula. Strategy that targets the three factors directly outperforms strategy built on vague topical-authority intuition.
- Topical Authority Is Quantified — The patent gives a specific formula: A(s,t) = SR(s) times C(s) times T(s,t) squared. SEO discussions of topical authority have mathematical backing here, not just intuition. Strategy that names the three factors and works each one directly is grounded in the mechanism.
- The Squared Term Punishes Shallow Coverage Severely — A site with 5% of its pages on a topic scores one one-hundredth of the topic-share factor of a site with 50% coverage. Depth matters far more than breadth in the score, which means publishing many on-topic pages on a focused site beats publishing the same pages on a sprawling generalist site.
- All Three Factors Must Be Present — A(s,t) is a product, not a sum. Missing global authority tanks the score even with deep coverage and strong clicks. Missing engagement tanks the score even with deep coverage and strong links. Strategy must address SR(s), C(s), and T(s,t) together rather than maxing one and ignoring the others.
- New Sites Can Earn Authority Through Topic Share — Even with low SR(s), a new site can drive A(s,t) upward by concentrating its content tightly enough that T(s,t) becomes meaningfully large. The squared term means a focused launch on a narrow topic compounds quickly, and SR catches up as earned links accumulate.
- Specialists Beat Generalists Per Topic — By design, the squared T(s,t) term lets topic-specialist sites outrank generalists on per-topic queries even when the generalist has stronger global authority. A high-quality topic blog can outrank a national news brand on the topic the blog specializes in.
- Auto-Discovered Topics Demand Coherence — Key-phrase extraction means the system reads the site's content and infers the topics by itself. Off-topic content dilutes T(s,t) for the topics the site is trying to rank for. Coherent editorial scope produces a clean topic inventory and a sharper topic-share factor.
- Microsoft Wrote It Down. The Insight Is Industry-Wide — The patent is Microsoft's, from 2019, by an inventor list that includes Arnold Overwijk. Whether Google uses this exact formula is unknown, but the structural insight that site topical authority is global authority times engagement times topic-share-squared shapes the modern topical-authority playbook across search engines.