Identifies candidate answer passages within retrieved documents by detecting language patterns that signal the presence of an answer, isolating those spans for downstream scoring and direct surfacing as featured snippets or generative-answer grounding.
Patent Overview
- Inventor
- Srinivasan Venkatachary
- Assignee
- Google LLC
- Filed
- 2015-09-29
- Granted
- 2019-01-15
- Application Number
- US 14/870,121
The Challenge
The Challenge
When a document contains an answer, the answer usually lives in one paragraph or sentence, not the whole document. Surfacing the right span as a direct answer requires identifying it. Treating the entire document as the answer is too coarse; word-level retrieval is too fine.
- Document-Level Granularity Is Too Coarse — A 5000-word article contains the answer in one paragraph. Pointing users at the whole article forces unnecessary scanning when a single span would suffice.
- Word-Level Granularity Loses Context — Extracting individual words or short phrases strips context needed for the answer to be meaningful. Passages are the right unit.
- Answer Language Has Recognizable Patterns — Definitional sentences, factoid statements, list openers, all have characteristic linguistic patterns. Detection can use these patterns to isolate candidate passages.
- Passages Must Be Self-Contained — A passage with unresolved pronouns or implicit references fails as a standalone answer. The candidate must be readable without the surrounding document.
- Candidate Set Must Be Bounded — Too many candidates per document overwhelm downstream scoring. The detector must produce a small, high-precision candidate set.
Innovation
How The System Works
The system scans retrieved documents for language patterns indicating answer presence (definitional sentences, factoid claims, list openers), extracts candidate passages with appropriate context windows, filters for self-containment and quality, and produces a bounded candidate set for downstream scoring.
- Retrieve Candidate Documents — Standard retrieval surfaces top documents likely to contain the answer. Documents enter the candidate-passage pipeline.
- Scan For Answer-Pattern Sentences — Per document, scan sentences for language patterns indicating answers: definitional structures, factoid statements, numeric facts, list items, named-entity assertions.
- Extract Passage With Context Window — Each pattern-matched sentence becomes the center of a candidate passage. Context windows (preceding and following sentences) make the passage self-contained.
- Filter Self-Containment — Passages with unresolved pronouns or implicit references that depend on more context are filtered. Surviving candidates read cleanly as standalone.
- Filter Quality And Length — Passages outside acceptable length bounds or with quality issues (broken text, code blocks, ad fragments) are dropped.
- Bound Candidate Set — Per document, only top candidates by pattern strength enter the final set. The set is small enough for downstream scoring to evaluate carefully.
- Output For Scoring — Candidates feed the answer-scoring stage (separate patents in the family). Each candidate carries metadata: source, span, surrounding context, pattern type.
Passage As Answer Granularity
The patent's load-bearing decision is to make passages the granularity for direct answers. Documents are too coarse, words too fine. Passages with appropriate context are the sweet spot.
Right-Sized Atoms For Direct Answers
Featured snippets and generative-answer grounding need extractable, self-contained spans. Passages with bounded context windows are the right atoms for both.
- Pattern-Based Detection — Answer-bearing sentences have recognizable patterns. The detector uses these patterns to isolate candidates without exhaustively scoring every sentence.
- Context Window Sizing — Each candidate gets a context window that makes it self-contained. Pronouns and implicit references resolve within the window.
- Bounded Candidate Set — Per document, the candidate set is small. Downstream scoring can evaluate candidates carefully without exhausting compute.
Technical Foundation
Technical Foundation
The patent specifies the pattern detection model, the passage extraction logic, the self-containment filter, the quality filter, and the candidate-set bounding.
- Pattern Detection Model — Learned classifier identifies sentences likely to be answers based on syntactic, semantic, and structural features. Trained on labeled examples of answer-bearing sentences.
- Passage Extractor — Per pattern-matched sentence, extracts the surrounding passage with calibrated context window. Window size depends on pattern type and surrounding text structure.
- Self-Containment Filter — Detects unresolved pronouns, implicit references, or context-dependent claims that would make the passage incomprehensible standalone. Filters those out.
- Quality Filter — Excludes passages with quality issues: broken text, code blocks, ad fragments, navigation chrome, table captions without context.
- Pattern-Strength Scorer — Per candidate, scores the strength of the answer-pattern match. Top-scoring candidates enter the bounded final set.
- Candidate Set Bounder — Caps the per-document candidate count. The bound balances coverage (enough candidates for scoring to work with) and compute (not too many to evaluate).
The Process
The Process
The pipeline runs as a downstream stage after document retrieval. Per retrieved document, it produces a small set of candidate passages that downstream scoring evaluates for surfacing.
- Retrieve Documents — Upstream retrieval surfaces top candidate documents. They feed the passage extraction pipeline.
- Sentence Segmentation — Each document is segmented into sentences. Sentences are the unit for pattern detection.
- Run Pattern Detector — The detector classifies each sentence on answer-pattern likelihood. High-likelihood sentences become passage anchors.
- Extract Passages — Around each anchor, extract the passage with appropriate context window. Output is raw candidate passages.
- Filter Self-Containment And Quality — Filter out non-self-contained passages and quality-issue passages. Survivors enter the bounded set.
- Score Pattern Strength — Each surviving candidate gets a pattern-strength score. Sort by score.
- Output Top Candidates — Top candidates per document feed downstream scoring. The set is bounded for compute efficiency.
Quality Control
Quality Control
Bad passage selection degrades direct-answer quality. The patent specifies safeguards across the pipeline.
- Pattern Detector Calibration — Detector precision is tuned conservatively. False positives produce wrong candidates; false negatives miss real answers. Calibration balances both.
- Self-Containment Strictness — Passages that depend on context outside the window are filtered strictly. A passage that requires unresolved references is not a good answer.
- Quality Audit — Periodic audits sample candidates and verify quality. Patterns of bad candidates feed back into filter refinement.
- Context Window Tuning — Window size is tuned per pattern type. Some patterns need only a sentence; others need a paragraph. Per-type calibration.
- Bounded Set Size — Candidate count per document is capped. Too many candidates dilute downstream scoring; too few miss real answers. The cap is tuned empirically.
Real-World Application
Candidate answer passage detection underpins featured snippets, the People Also Ask answer extraction, and the grounding-passage retrieval for Search Generative Experience. The patent's primitives shape how Google identifies extractable answers across surfaces.
- Passage Answer Granularity — Passages with context windows are the unit. Coarser than sentences, finer than documents.
- Pattern-based Detection Method — Language patterns indicate answers. Detection uses patterns rather than exhaustive scoring.
- Bounded Candidate Count — Per-document candidate count is bounded. Downstream scoring evaluates a small set carefully.
Why Definition-Style Sentences Win Featured Snippets
Sentences in clear definitional form (X is Y, X means Y) are the easiest for the detector to identify. Pages structured with definitional openers earn featured-snippet visibility more reliably than pages that bury the answer in prose.
Why Self-Contained Paragraphs Help SGE Citations
Generative-answer grounding draws from extractable passages. Paragraphs that read cleanly standalone (no pronouns referencing earlier content, no implicit assumptions) get pulled as grounding sources more often than paragraphs requiring document context.
<\/section>What This Means for SEO
What This Means for SEO
The patent makes passages, not documents or words, the unit for direct answers by detecting answer-signaling language patterns and extracting self-contained spans. SEO implication: pages that present answers in clear, extractable passages win featured snippets and generative-answer grounding more reliably than pages that bury answers in prose.
- Definition-Style Sentences Win Snippets — Clear definitional forms (X is Y, X means Y) are easiest for the detector to identify. Open the relevant section with a direct definitional sentence so the system can isolate it as a candidate answer passage.
- Write Self-Contained Paragraphs — The extractor filters for self-containment. Paragraphs that read cleanly standalone, with no pronouns referencing earlier content and no implicit assumptions, get pulled as answer candidates and generative-answer grounding far more often than context-dependent prose.
- Answers Live In Spans, Not Whole Pages — The system isolates the paragraph or sentence carrying the answer, not the whole document. Place the actual answer in a discrete, identifiable span rather than diffusing it across the page, so the right atom is extractable.
- Context Windows Need Clean Boundaries — Passages are extracted with bounded context windows. Content organized so each answer sits in a coherent, well-bounded paragraph (rather than spanning multiple loosely-connected sentences) extracts more cleanly.
- List Openers Signal Answers — List openers are among the patterns the detector recognizes. Structuring how-to and enumeration answers with clear list-introducing sentences and ordered items improves candidate detection for list-style snippets.
- Quality Filtering Gates Candidates — Candidates are filtered for quality before scoring. Thin, vague, or poorly-written spans are discarded. Investing in clear, complete, well-formed answer passages keeps you in the candidate set rather than filtered out.
- Structure For Both Snippets And SGE — The same extractable passages serve featured snippets and generative-answer grounding. Writing extractable, self-contained spans is dual-purpose work that positions you for both classic snippet and AI-overview citation visibility.