The Orion algorithm mines candidate refinements from the document corpus itself, ranking phrase patterns that appear in pages satisfying the query. It is the mechanical ancestor of Searches Related To, People Also Ask, and the modern refinement surfaces that turn one query into a chain of long-tail entry points.
Patent Overview
- Inventor
- Ori Allon, Ugo Di Girolamo, Tomer Shmiel, Alexandre Petcherski, Tzvika Hartman
- Assignee
- Google LLC
- Filed
- 2008-08-29
- Granted
- March 5, 2013
The Challenge
The Challenge
Query refinement built only on past search logs cannot help when the parent query has thin or no log history, and it cannot surface refinements that are topically valid but never co-searched before. The challenge is to produce ranked, high-quality refinements from the corpus itself, so that the system suggests sub-topics that actually exist in the literature of the topic rather than only what other users happened to type.
- Log-Driven Refinement Misses New Topics — Per query, behavioral refinements need accumulated co-search evidence. Fresh or low-volume queries get no useful suggestions.
- Stop-Word And Vague Phrasing Limits Suggestions — Per refinement, raw n-grams pulled from logs include junk variants. Without corpus grounding, the ranked list is noisy.
- Sub-Topic Structure Is Invisible To Logs — Per topic, the natural sub-topic hierarchy lives inside documents. A log-only view cannot see how experts actually decompose the topic.
- Refinement Quality Drives Downstream Surfaces — Per SERP, suggestion blocks and related-search modules amplify or waste user attention. Weak refinements degrade every surface they feed.
- Long-Tail Discovery Stalls Without Good Refinements — Per session, users iterate through refinements to reach the answer. If the first refinement set is poor, the chain breaks.
Innovation
How The System Works
The system runs the parent query, takes the documents that satisfy it, extracts candidate refinement phrases from those documents, scores each candidate against corpus and topical signals, and returns a ranked list of refinements that represent real sub-topics of the parent query.
- Issue The Parent Query — Per query, the parent query is executed against the index and a candidate set of relevant documents is retrieved.
- Extract Phrase Candidates From Documents — Per document in the candidate set, phrase patterns near the query terms are extracted as refinement candidates.
- Filter Candidates By Linguistic Quality — Per candidate, stop-word, length, and grammaticality filters drop low-quality fragments before scoring.
- Score Candidates Against The Corpus — Per candidate, frequency in the relevant set is compared to frequency in the broader corpus to detect topical specificity.
- Rank Refinements By Topical Strength — Per query, candidates are ordered so the strongest sub-topics rise to the top of the refinement list.
- Attach Refinements To Result Presentation — Per result page, top refinements feed Searches Related To, expanded snippets, and downstream PAA-style surfaces.
- Recycle Refinements As Follow-On Queries — Per refinement click, the chosen refinement becomes a new query and the cycle produces the next layer of sub-topic exploration.
Refinements Come From What Documents Say, Not From What Users Typed
The load-bearing idea of Orion is that the corpus already encodes the structure of a topic. Documents that satisfy a query carry the vocabulary of that topic's sub-areas. Mining and ranking those phrases produces refinements that are topically valid even when no behavioral evidence exists.
Corpus-Mined Refinement
Per query, refinements are derived from the language of the documents that answer the query. Per refinement, the score reflects how strongly the phrase represents a real sub-topic, not how often other users typed it.
- Document-Side Extraction — Per candidate doc, phrases near query terms become refinement candidates.
- Topical Specificity Scoring — Per candidate, in-set frequency versus corpus frequency drives the score.
- Ranked Refinement Output — Per query, the top refinements feed every downstream suggestion surface.
Technical Foundation
Technical Foundation
The patent specifies candidate generation from the relevant document set, linguistic filtering, statistical scoring against the broader corpus, ranking, and presentation as refinement suggestions.
- Relevant Set Construction — Per query, the parent query retrieves a candidate document set used as the substrate for refinement mining.
- Phrase Candidate Extraction — Per document, contiguous phrase patterns near the query terms are pulled as candidate refinements.
- Linguistic Filtering — Per candidate, filters drop stop-word fragments, ungrammatical strings, and overlong or trivially short phrases.
- Topical Specificity Scoring — Per candidate, frequency inside the relevant set is compared to frequency in the broader corpus to compute a topicality score.
- Diversity And Deduplication — Per refinement list, near-duplicate candidates are merged and remaining picks are diversified across sub-topics.
- Presentation Hooks — Per result page, the ranked refinements feed Searches Related To, extended snippets, and follow-on suggestion modules.
The Process
The Process
From a parent query, the system retrieves the relevant document set, mines phrase candidates, filters and scores them, ranks the survivors, and emits refinements that drive every downstream suggestion surface.
- Receive Parent Query — Per query, the parent query string arrives at the refinement pipeline.
- Retrieve Relevant Document Set — Per query, the top documents that satisfy the parent query form the mining substrate.
- Extract Phrase Candidates — Per document, phrases adjacent to or near the query terms are collected as candidates.
- Apply Linguistic Filters — Per candidate, structural filters drop fragments that are not viable refinement strings.
- Score Topical Specificity — Per candidate, in-set frequency versus corpus frequency produces a specificity score.
- Rank And Diversify — Per query, candidates are ordered by score and diversified across sub-topics.
- Surface To Downstream Modules — Per result page, the ranked refinements feed Searches Related To, PAA-style modules, and extended snippets.
Quality Control
Quality Control
Corpus-mined refinement risks promoting noisy phrases, single-source artifacts, or spammy boilerplate. The patent specifies safeguards that keep the refinement list trustworthy.
- Minimum Document Support — Per candidate, a refinement must appear across enough distinct documents in the relevant set to count as a real sub-topic.
- Corpus-Comparison Floor — Per candidate, the in-set frequency must rise meaningfully above background corpus frequency before the phrase qualifies.
- Boilerplate And Template Filtering — Per candidate, phrases recognized as site templates, navigation chrome, or boilerplate are excluded from the refinement pool.
- Diversity Across Sub-Topics — Per refinement list, near-duplicate phrases are merged so the user sees distinct sub-topic suggestions instead of one phrase restated.
- Linguistic Sanity Filters — Per candidate, grammaticality and length checks drop fragments that would look broken inside a suggestion surface.
Real-World Application
Orion is the mechanical ancestor of every modern Google refinement surface. Searches Related To, People Also Ask, expanded snippets, and contextual SERP previews all rest on the same idea: mine the corpus for the sub-topic structure of a query, rank the phrases, and present them as next-step entry points.
- Corpus-driven Refinement Source — Refinements are mined from documents that answer the query.
- Log-independent Coverage Mode — Refinements work even for queries with no behavioral history.
- Cascading Discovery Pattern — Each refinement becomes a new query, multiplying long-tail entry points.
Why Content Vocabulary Shapes Refinement Surfaces
Per refinement, the phrase that surfaces in Searches Related To or PAA was extracted from documents that satisfied the parent query. The vocabulary used in those documents directly shapes the suggestion module. Pages that name sub-topics clearly are the pages that feed the suggestion engine.
Why Sub-Topic Coverage Compounds
Per topic, covering the parent query and its natural sub-topics teaches the system that those sub-topics are valid refinements. The page becomes part of the substrate from which Google mines the SERP's own follow-on suggestions, multiplying the entry points back to the same site.
<\/section>What This Means for SEO
What This Means for SEO
Orion connects the words inside a page to the refinement surfaces that drive Google's long-tail discovery loop. SEO strategy should treat the language of sub-topic coverage as a direct input into Searches Related To, People Also Ask, and every downstream suggestion module.
- Content Vocabulary Drives Refinement Suggestions — Refinements are mined from the documents that answer the parent query. The exact phrasing in headings, subheadings, and body copy on pages that satisfy the query teaches Google which refinements to surface. Treat sub-topic phrasing as part of the suggestion-engine input layer.
- Topic Breadth Becomes Discoverable — Pages that genuinely cover a topic and its natural sub-topics feed the refinement pipeline with the vocabulary of those sub-topics. The system learns that the sub-topic is a valid refinement of the parent query, which expands the surfaces through which the page can be reached.
- Searches Related And People Also Ask Are Downstream — These surfaces are not separate features bolted onto search. They consume the ranked refinements produced by the corpus-mining pipeline. Content patterns that match real informational structure feed these modules. Keyword-stuffed content that lacks coherent sub-topic phrasing does not.
- Low-Volume Queries Still Get Refinements — Because the mechanism is corpus-driven rather than log-driven, even queries with thin search history can produce useful refinement surfaces if the underlying document set is rich. Well-structured content on emerging topics can drive refinement surface area before user logs catch up.
- Long-Tail Entry Points Multiply Through Refinement Chains — Each surfaced refinement is a new query through which users can reach a site. A page that contributes the vocabulary of multiple refinements compounds its entry-point count, because every refinement click becomes another opportunity for the same content to satisfy the next step.
- Topical Authority Compounds Mechanically — Covering a topic plus its sub-topics is rewarded directly by this pipeline. Specialist depth on a topic teaches the refinement engine the full sub-topic structure, which surfaces the site repeatedly across refinement modules. Shallow generalists do not feed the substrate in the same way.
- Orion's Descendants Still Shape Modern SERPs — The patent dates to 2008 to 2013, and its mechanism shaped Google's suggestion stack for over a decade. PAA, Knowledge Graph follow-on cards, and modern related-search modules all inherit the corpus-mining lineage. Writing for clear sub-topic structure is writing for the surfaces these descendants drive today.