Detects when two query templates that share the same entity collection identifier are semantically equivalent based on a similarity measure, so the suggestion and routing systems can treat them as a single intent.
Patent Overview
- Inventor
- Nitin Gupta
- Assignee
- Google LLC
- Filed
- 2016-05-20
- Granted
- 2018-09-11
- Application Number
- US 15/160,540
The Challenge
Template Variants Express The Same Intent Differently
The system has many query templates that express the same underlying intent in different surface forms. "[movie] showtimes" and "showtimes for [movie]" are syntactically distinct templates but semantically identical. Treating them as separate templates wastes computation, fragments the suggestion index, and produces inconsistent ranking. The system needs to detect when templates are semantically equivalent so they can be merged or treated as one.
- Surface-Form Variation Hides Identity — Templates with the same intent often have different word orders, prepositions, or qualifying phrases. Syntactic comparison says they are different; semantic comparison says they are the same.
- Fragmentation Wastes Index Space — If each surface variant has its own template entry, the template index inflates with redundant entries that all carry the same information.
- Inconsistent Ranking Across Variants — Equivalent templates with separate entries can produce different ranks for the same conceptual suggestion depending on which variant gets queried. Equivalence detection unifies the ranking.
- Shared Entity Collection Is The Cue — Templates that share the same entity collection identifier are candidates for equivalence. The shared collection signals that they both target the same set of entities even if their surface forms differ.
- Need A Similarity Measure — The equivalence detection requires a computable similarity measure that respects entity overlap, term overlap, and structural similarity. The measure has to be calibrated against ground-truth labeled equivalences.
Innovation
Similarity Over Shared Entity Collections
The system identifies pairs of query templates that share the same entity collection identifier. For each such pair, it computes a similarity measure based on multiple factors including the number of entities of the entity collection that instantiate both templates. Pairs above the similarity threshold are designated semantically equivalent and merged in the suggestion and routing pipelines.
- Group Templates By Entity Collection — For each entity collection identifier in the system, find all templates that bind to that collection. Templates in the same group are candidates for equivalence checks.
- Generate Instantiation Sets — For each template in a group, enumerate the entities in the collection that instantiate the template. The set represents the template's actual coverage.
- Compute Pairwise Similarity — For each pair of templates in the group, compute a similarity measure. Factors include shared instantiation count, shared term count, and structural overlap.
- Apply Equivalence Threshold — Pairs above the similarity threshold are designated semantically equivalent. Below-threshold pairs are kept as distinct templates.
- Cluster Equivalent Templates — Group equivalent pairs into equivalence classes. Each class contains templates that all express the same intent.
- Pick Canonical Template — For each equivalence class, designate one template as canonical. The canonical is the version used in downstream ranking and suggestion.
- Route Equivalents To Canonical — When any equivalent template is matched at query time, route the match to the canonical for ranking and suggestion. The system serves equivalent intents consistently regardless of which surface variant was matched.
Entity Collection As The Equivalence Anchor
The patent uses the shared entity collection identifier as the gating cue for equivalence checks. Two templates that target different entity collections are clearly distinct; templates that share a collection are candidates for being the same intent. The shared collection focuses the equivalence detection on the right candidate set.
Same Entities, Same Intent
Templates that target the same entities under different surface forms express the same intent. The shared entity collection is the proof; the similarity measure is the confirmation.
- Entity Collection Identifier — The shared identifier that gates equivalence checks. Only templates with the same identifier are compared.
- Instantiation Overlap — How many entities of the collection instantiate both templates. High overlap means the templates cover the same conceptual space.
- Canonical Template — One template per equivalence class is designated canonical. Downstream systems consult the canonical, eliminating surface-form fragmentation.
Technical Foundation
The Similarity Measure
Multiple factors combine into a single similarity score per template pair.
- Shared Entity Collection — The gating requirement. Templates with different collection identifiers cannot be equivalent.
- Instantiation Overlap — Number of entities that instantiate both templates. Larger overlap indicates stronger equivalence.
- Term Overlap — Shared terms between the two templates. Captures lexical similarity beyond entity-position equivalence.
- Structural Similarity — Similarity of the templates' structural patterns. Captures cases where two templates differ in word order or prepositions but match in shape.
Key Insight: Most equivalence-detection systems try to learn similarity across all template pairs. The patent's contribution is to gate the comparison by shared entity collection identifier. This dramatically reduces the candidate set, makes the comparison cheap, and improves precision because templates without the shared anchor are almost never truly equivalent.
<\/section>What This Means for SEO
What This Means for SEO
Semantically-equivalent-template detection underlies the suggestion and result-routing consistency users experience across phrasing variations. Understanding it shapes how to think about variant phrasing in content.
- Variant Phrasings Of An Intent Are Unified — If users search '[movie] showtimes' and 'showtimes for [movie]', the system treats both as the same intent. Content optimized for one variant ranks for both, so you don't need parallel pages per phrasing variant of the same intent.
- Entity Membership Anchors The Equivalence — The equivalence detection requires the same entity collection identifier. Content with strong entity association participates in the equivalence relationships and benefits across all the canonical-equivalent phrasings.
- Don't Over-Differentiate For Surface Variants — Producing different content for '[product] reviews' versus 'reviews of [product]' is wasted effort. The system already merges them; your content benefits from being the canonical answer for one variant of the equivalence class.
- Canonical Template Inheritance — The system picks one canonical template per equivalence class. The canonical's ranking applies to all variants. Optimizing your content for the highest-volume variant of an equivalence class captures the entire class's traffic.