Scores passages from indexed documents as answer candidates for question queries, combining query-term and answer-term similarity signals to pick the passage that wins the featured-snippet slot.
Patent Overview
- Inventor
- Steven D. Baker
- Assignee
- Google LLC
- Filed
- 2014-06-04
- Granted
- 2018-04-10
- Application Number
- US 14/295,857
The Challenge
Picking The Right Passage From A Long Document
When the engine identifies a document that is responsive to a question query, it still has to choose which passage from that document is the answer. A long article may contain the right information in one paragraph and irrelevant content everywhere else. Scoring passages, not documents, is the unit of work for featured snippets and direct answers.
- Document-Level Match Is Not Enough — A document that ranks well for the query may contain the answer in one paragraph among many. Returning the wrong paragraph as the answer is worse than returning no answer because it misleads the user.
- Query Terms Alone Mislead — Scoring passages purely on query-term match favors passages that repeat the question, not passages that contain the answer. Restated questions are not answers.
- Need Both Query And Answer Signals — A real answer passage matches the query terms and also matches the kinds of answer terms expected for that question type. Both signals together identify the answer; either alone is insufficient.
- Question Type Drives Answer Shape — Different question types expect different answer shapes. "How tall" expects a numeric height with units. "Who is" expects a person name with context. "When did" expects a date. The scoring must respect this type structure.
- Passages Compete Across Multiple Documents — The answer can come from any of many responsive documents. The scoring must rank passages globally across the document set, not just within one document.
Innovation
Score Each Candidate Passage Twice
For each candidate answer passage, the system computes two similarity scores: how well it matches the query terms, and how well it matches the answer terms expected for the question. Combining the two yields a query-dependent score that ranks the passages globally. The combination is the load-bearing piece because either score alone is insufficient.
- Identify Question Query — The query has been classified upstream as a question query seeking an answer response. Without this classification, the answer pipeline does not engage.
- Receive Responsive Resources — A set of documents determined to be responsive to the query is provided as input. The documents have already cleared standard retrieval relevance gates.
- Extract Candidate Passages — From each responsive resource, extract candidate answer passages (typically paragraphs, sentences, or structured blocks). Each candidate is a self-contained piece of text.
- Compute Query-Term Match Score — For each candidate, measure how similar the candidate is to the query terms. This rewards on-topic passages and acts as the necessary condition for being an answer.
- Compute Answer-Term Match Score — For each candidate, measure how well it matches the expected answer-term shape for this question type (numbers for "how tall", names for "who is", dates for "when did").
- Combine Into Query-Dependent Score — Combine the two scores into a single ranking score. The top-scoring passage is selected as the answer to surface in the featured-snippet slot or equivalent direct-answer surface.
- Surface The Winning Passage — The winning passage is rendered as the direct answer to the user, often with attribution back to its source document. The document itself may also appear in the standard search results.
Two Scores Are Better Than One
The whole point is that a query-term-only score and an answer-term-only score each fail in distinct ways. The query-term score loves passages that quote the question. The answer-term score loves passages that look like answers regardless of topic. Combining them filters out both failure modes simultaneously.
Topic AND Form
A real answer is both on-topic (query-term match) and answer-shaped (answer-term match). The conjunction is what selects answers; either condition alone admits too many failures.
- Query Term Match — Passages that contain or paraphrase the query terms are on-topic candidates. This is the necessary condition that ensures the passage addresses the user's question.
- Answer Term Match — Passages whose terms match the expected answer shape for this question type (numbers, names, dates, definitions). This is the sufficient condition that distinguishes answers from restatements.
- Query-Dependent Combination — The combination of the two scores is itself query-dependent. Different question types weight the two component scores differently because the answer shape varies.
Technical Foundation
What The System Computes Per Passage
Each candidate answer passage is associated with two distinct similarity measurements relative to the query and its expected answer shape.
- Query Term Match Score — A measure of similarity between the query terms and the terms of the candidate passage. Higher values indicate the passage is more on-topic for the query.
- Answer Term Match Score — A measure of similarity between answer terms (derived from the question type) and the terms of the candidate passage. Higher values indicate the passage contains content shaped like an answer to this kind of question.
- Query-Dependent Score — A combined score over the two measurements, used to rank candidate passages and select the answer. The combination weights can vary by question type.
- Question Type Classification — The classification of the query (factual, definitional, comparative, etc.) that determines what answer terms to expect and how to combine the scores.
Quality Metrics
- Query Term Match — Standard term-based similarity measure. The passage must overlap meaningfully with the query terms to be on-topic.
qtm(P, Q) = sim(terms(P), terms(Q)) - Answer Term Match — Type-specific check. For "how tall" the expected terms include numeric tokens with length units. For "who is" the expected terms include person names.
atm(P, A) = sim(terms(P), expected(A)) - Query-Dependent Score — A function over the two component scores plus question type. The function may be a learned model or a hand-tuned combiner. Higher scores rank passages higher.
score(P) = f(qtm, atm, type)
Key Insight: Passage scoring is the layer where featured snippets are actually decided. Standard retrieval gets a document into the response set; passage scoring picks the paragraph that becomes the answer. The two scoring axes (query terms and answer terms) are both necessary because each catches what the other misses.
<\/section>The Process
Answer Selection Pipeline
End to end, the answer pipeline runs after standard retrieval has identified a set of responsive documents. The pipeline picks the winning passage from across those documents.
- Question Classification — Upstream classification identifies the query as a question and determines its type (factual, definitional, comparative).
- Retrieve Responsive Documents — Standard retrieval produces a set of documents that match the query. The set is the source pool for answer passages.
- Extract Candidate Passages — Each responsive document is segmented into candidate passages. Passages may be paragraphs, sentences, list items, or table rows depending on document structure.
- Score Each Passage — Compute both query-term-match and answer-term-match scores for every candidate. The two scores are computed independently and in parallel.
- Combine And Rank — Combine the two component scores per passage and rank globally across all candidates from all responsive documents.
- Select Winner — The top-scoring passage wins the answer slot and is rendered as the direct answer with source attribution.
What This Means for SEO
What This Means for SEO
Featured snippets and direct answers are won at the passage level, not at the page level. The passage scoring rules tell you what to write and where to write it for any question-intent target.
- Lead With The Answer In Its Own Passage — If your page targets a question query, the first paragraph (or a clearly delineated answer block near the top) should contain both the query terms and a clean, answer-shaped expression of the answer. Burying the answer mid-article loses on both scoring axes.
- Match The Question Type's Answer Shape — For "how tall" questions, the answer passage should contain a numeric height with units. For "when did" questions, a date. For "who is" questions, a name plus context. Generic prose without the right shape loses on the answer-term score.
- Repeat The Question In The Passage — The query-term match score rewards passages that echo the query terms. A direct-answer passage that says "The Eiffel Tower is 330 meters tall" wins over one that says "It is 330 meters". Echoing the question terms is not redundant; it is signal.
- Use Structure To Isolate Answers — Bullet lists, definition tags, FAQ sections, and tables make answer passages easier to extract. A well-bounded passage with both query and answer terms beats the same content embedded in long prose because the boundaries help the extractor.
- One Passage Per Question Intent — If a page covers multiple related questions, each should have its own clearly bounded answer passage. Mixing answers to different questions in one paragraph dilutes the passage score for each.
- Definitional Passages Need Definition Shape — For "what is X" queries, the answer passage should start with "X is..." or use definition list markup. This is the canonical answer shape for the question type.
- Numeric And Date Answers Should Use Numerals — Write "330 meters" not "three hundred and thirty meters". The answer-term match favors numeric tokens with units because that is the expected shape for many factual question types.
- Source Attribution Is The Reward — Winning the answer slot includes attribution back to your page. The CTR boost from featured snippets is substantial, and it compounds with the brand exposure of being the named source.