Builds an inverted index entry from query constraint patterns sampled across documents associated with a query, enabling pattern-level matching that handles paraphrasing and word-order variation.
Patent Overview
- Inventor
- Anand Shukla
- Assignee
- Google LLC
- Filed
- 2018-10-26
- Granted
- 2021-06-01
- Application Number
- US 16/172,008
The Challenge
Literal-Term Indexes Miss Pattern-Level Matches
Traditional inverted indexes match queries by literal term presence. This works for direct term hits but fails when the query expresses an intent in different surface form than the indexed documents use. The system needs to extract query constraint patterns from documents and index them as patterns, so retrieval can match by pattern shape rather than only by literal term overlap.
- Literal Match Misses Paraphrases — Documents that express the answer in different phrasing than the query don't surface even when they're directly relevant. The literal-term index has no way to bridge the surface variation.
- Patterns Capture Intent Across Phrasings — A query constraint pattern is an abstraction over the query's structural form. Multiple surface phrasings produce the same pattern, so pattern-level matching unifies retrieval across paraphrasings.
- Sample-Based Pattern Extraction — Rather than parsing every document for patterns, the system samples a subset of documents associated with a query and extracts patterns from the sample. The sampling makes pattern indexing scalable.
- Index By Pattern, Not Just By Term — Inverted index entries can be keyed by pattern as well as by term. Pattern-keyed entries enable retrieval to match queries against documents that share the pattern shape.
Innovation
Sample Documents, Extract Patterns, Index By Pattern
The system determines a set of documents associated with a query. It samples a subset of those documents and identifies a corresponding query constraint pattern for each document in the subset. An entry of an inverted index is generated based on the patterns. Future queries can be matched against indexed patterns, retrieving documents that share the pattern shape even when surface terms differ.
- Determine Document Set — For a query, determine the set of documents associated with it. The association comes from standard retrieval or from related-query history.
- Sample Document Subset — Sample a subset of the documents. Sampling controls the cost of pattern extraction; the subset is representative but smaller than the full set.
- Identify Query Constraint Patterns — For each sampled document, extract the query constraint pattern it satisfies. Patterns capture structural and semantic constraints, not literal terms.
- Build Inverted Index Entry — Generate an inverted index entry keyed by the constraint patterns. The entry points to documents matching each pattern.
- Match New Queries Against Patterns — When a new query arrives, identify its constraint patterns and look them up in the inverted index. Matched documents include those with patterns aligned to the query, not only literal-term overlap.
- Combine Pattern-Match With Standard Retrieval — Pattern-matched documents merge with standard literal-match retrieval to produce the final candidate set. Both signals contribute.
Pattern Indexing As A Retrieval Substrate
The patent extends inverted indexing from term-keyed to pattern-keyed. Adding the pattern dimension lets retrieval bridge surface variation between queries and documents.
Patterns Bridge Paraphrasing
Two queries expressing the same intent in different surface forms share the same constraint pattern. Indexing by pattern unifies them at retrieval time.
- Query Constraint Pattern — Structural abstraction over a query's form. Captures intent without locking to literal terms.
- Sample-Based Extraction — Patterns are extracted from a sampled subset of associated documents. Sampling controls cost while preserving signal.
- Pattern-Keyed Index Entries — Inverted index entries keyed by pattern (not just term) enable pattern-level retrieval at query time.
Technical Foundation
Pattern Extraction And Indexing
Three components combine into the pattern-keyed inverted index.
- Document Set Per Query — Documents associated with a given query. Source for pattern sampling.
- Query Constraint Pattern — An abstraction over query form that captures structural and semantic constraints.
- Inverted Index Entry — Keyed by pattern, pointing to documents matching the pattern. Enables fast pattern-level retrieval at query time.
Key Insight: Pattern indexing is what lets modern retrieval handle paraphrasing without depending entirely on neural embeddings. The pattern abstraction is symbolic and computationally cheap, but it bridges surface variation in ways that pure term matching cannot. It's a complementary signal to embeddings, not a replacement for them.
<\/section>What This Means for SEO
What This Means for SEO
Pattern-keyed indexing means your content can rank for queries that don't share literal terms with it. Understanding the pattern bridge shapes how to think about variant phrasings.
- Pattern Match Captures Paraphrases — Your page can match queries that don't share literal terms with your content if the query constraint pattern matches. Optimization should target intent patterns, not just exact keyword matches.
- Structural Coverage Beats Literal Coverage — Pages that cover the structural shape of a topic (entity attributes, relationship patterns, common question shapes) participate in pattern-level retrieval more broadly than pages with just literal keyword matches.
- Pattern Indexing Compounds With Embeddings — Pattern-keyed retrieval and embedding-based retrieval run together. Your content benefits when both signals point at it. Combining clear structural patterns with semantic richness covers both.