Generates candidate answer passages from top-ranking documents by segmenting each resource into passage units and applying scoring criteria to identify which segments are candidate answers to the question query.
Patent Overview
- Inventor
- Nitin Gupta
- Assignee
- Google LLC
- Filed
- 2016-12-30
- Granted
- 2019-01-15
- Application Number
- US 15/394,840
The Challenge
Answer Surfaces Need The Right Passage, Not The Right Document
When a question query has top-ranking documents, the answer-passage system needs to find the specific passage within each document that actually contains the answer. The document as a whole may be long and only partially relevant; the right passage is a small subset. The system needs a passage-extraction step that segments candidates and scores them against the question.
- Documents Are Larger Than Answers — Top-ranking documents are often comprehensive pages where the actual answer is a few sentences. Returning the whole document misses the answer-surface opportunity.
- Need Per-Document Passage Extraction — Each document needs to contribute its best candidate passage(s) for evaluation. Segmenting the document into passage units and scoring each is the mechanism.
- Top-K Resources Define The Pool — The pool of candidate documents is the top-k from standard retrieval. Passages outside this pool are not considered, focusing the work on already-strong documents.
- Passage Units Must Be Self-Contained — A passage unit needs to stand on its own as a candidate answer. Mid-sentence cuts produce useless candidates; coherent passages produce usable ones.
Innovation
Per-Document Passage Segmentation And Scoring
The system receives a query determined to be a question query along with data identifying resources determined to be responsive to the query. For each resource in a top-ranked subset, the system identifies multiple passage units. A set of passage unit scoring criteria is applied to identify which passage units are candidate answer passages for the question query.
- Confirm Question Query — The query has been classified upstream as a question query. The answer-passage pipeline activates only on question queries.
- Receive Top-Ranked Resources — Standard retrieval has produced a set of responsive resources. The top subset becomes the source pool for passage extraction.
- Segment Each Resource Into Passage Units — For each resource, identify multiple passage units. Units can be paragraphs, sentences, list items, table rows, or structural blocks. Each unit is a coherent self-contained candidate.
- Apply Scoring Criteria Per Unit — Score each passage unit against multiple criteria: query-term coverage, expected-answer-shape alignment, factual density, structural context. Combined into a candidate-answer score per unit.
- Promote Candidate Answer Passages — Passage units above the threshold are designated candidate answer passages for the query. They become the input to downstream answer selection.
- Aggregate Across Resources — Combine candidate passages from all top-ranked resources into one candidate pool. Ranking across the pool drives final answer selection.
Per-Document Passage Segmentation
The passage-level work happens within each top-ranked resource. The document-level retrieval pre-selects the candidate pool; the passage segmentation finds the specific answer-bearing piece within each one.
Passage Inside The Right Document
The right answer passage lives inside a document that is already responsive to the query. Document retrieval filters the universe; passage extraction finds the answer within the filtered pool.
- Resource Set — Top-k documents from standard retrieval. Defines the pool of candidate sources.
- Passage Units — Coherent, self-contained segments of each resource. Paragraphs, sentences, list items, structural blocks.
- Scoring Criteria — Multi-factor evaluation per passage unit. Combines query-term coverage, answer-shape alignment, and structural context.
What This Means for SEO
What This Means for SEO
Per-document passage segmentation is the structural step that decides which piece of your page gets surfaced as a candidate answer. Knowing the mechanism informs how to structure content for passage extraction.
- Each Page Should Have A Best Passage — The system pulls one or a few candidate passages per page. The best passage on your page becomes your contribution to the answer pool. Identify it explicitly: which sentences are your canonical answer to the target query?
- Structural Boundaries Help Segmentation — Paragraphs, headings, list items, and structured blocks are natural passage units. Pages with clear structural boundaries produce cleaner passage units than wall-of-text pages.
- Get Into The Top-Ranked Resource Set First — Passage extraction runs only on top-ranked documents. If your page isn't in the top retrieval pool, your passages aren't even evaluated. Document-level ranking is a prerequisite to passage-level surfacing.
- Answer-Shape Alignment Per Passage — Each passage should align with the expected answer shape (number for 'how tall', date for 'when did', name for 'who is'). Generic prose without that alignment scores lower at the passage level.