Bootstraps cross-language related-term graphs by translating known related pairs from one language into others, projecting topical knowledge across language boundaries.
Patent Overview
- Inventor
- Steven D. Baker
- Assignee
- Google LLC
- Filed
- 2010-08-06
- Granted
- 2014-08-05
- Application Number
- US 12/852,167
The Challenge
Synonym Graphs Need To Cross Languages
A high-quality related-term graph in English is great for English. It does not help Spanish, Japanese, or Hindi search. Building a comparable graph from scratch in each language is expensive and produces inconsistent quality. The system needs a way to project knowledge across languages without re-mining the entire corpus for each new locale.
- Per-Language Mining Is Expensive — Each language requires its own query logs, document corpus, and tuning. Smaller languages produce thinner signals and the cost-per-relation becomes uneconomical.
- Direct Translation Of Synonyms Often Fails — Translating an English synonym pair word-for-word produces target-language pairs that may not actually behave as synonyms in the target language. Lexical synonymy does not transfer reliably.
- Related Terms Are Easier To Translate Than Synonyms — Pairs that are related (not strictly synonymous) survive translation better because the relationship is conceptual rather than lexical. The conceptual relationship is stable across languages even when the surface words change.
- Quality Validation Per Language Is Still Needed — Translated pairs need verification against target-language signals before promotion. Pure translation without validation produces noise.
- Bidirectional Translation Helps — Translating in both directions and keeping pairs that survive the round-trip is a stronger gate than one-way translation. The round-trip catches translation artifacts.
Innovation
Translate Related Pairs, Not Synonym Pairs
Take a pair of terms that are known to be related (not necessarily synonyms). Translate both into a target language. Add the translated pair to the related-term graph for that language. This bootstraps the graph in the target language using the structural knowledge already encoded in the source language. Validation against local signals confirms or rejects each transferred pair.
- Receive Known Related Pair — Two non-synonym, related terms in a source language arrive as input. The pair has been validated in the source language graph.
- Translate Both Terms Into Target Language — Use machine translation to produce target-language forms of both terms. Each term is translated independently to preserve the conceptual relationship.
- Add To Target-Language Graph — The translated pair is added to the list of known related pairs for the target language with provenance metadata noting the source-language origin.
- Iterate Across Languages — Repeat for each supported target language, growing the cross-language related-term graph. The same source pair can seed many target-language pairs.
- Validate With Local Signals — Translated pairs that survive local validation (co-occurrence in target-language documents, query logs) are promoted to high-confidence entries in the target-language graph.
- Demote Pairs That Fail Locally — Translated pairs that fail local validation are kept at lower confidence or removed. The local check is what prevents translation noise from polluting the target graph.
Conceptual Relationships Travel Across Languages
The patent's key observation is that conceptual relationships (related terms) translate more reliably than lexical relationships (synonyms). A graph built on conceptual relationships projects across languages cheaply, where a graph built on lexical equivalences would not.
Relatedness Is Conceptual; Synonymy Is Lexical
The conceptual link between "doctor" and "hospital" exists in every language. The lexical synonymy between "car" and "automobile" does not have a stable equivalent in many languages.
- Translation As A Bridge — Machine translation of each term independently into the target language. Cross-language transfer depends on the quality of the translation source but does not require manual curation.
- Local Validation Filters Noise — Translated pairs are checked against target-language co-occurrence and query log signals. Pairs that fail local validation are excluded from the high-confidence graph.
Build the graph once. Project it everywhere. Validate locally to keep it honest.
<\/section>Technical Foundation
Why Relatedness Travels Better Than Synonymy
Synonymy is lexical; relatedness is conceptual. Conceptual relationships survive translation more reliably than lexical ones because the underlying concept is language-independent.
- Related Pair — Two terms whose underlying concepts are connected (e.g., "doctor" and "hospital") but which are not interchangeable. The relationship is topical or associative, not substitutional.
- Translation Bridge — Machine translation of each term independently into the target language. Quality of translation matters; ambiguous or low-resource translations produce noisier outputs.
- Validation In Target — Local target-language signals (query co-occurrence, document overlap) confirm or reject the translated pair. The local check converts the projection into a validated graph entry.
- Confidence Stratification — Pairs are stratified by whether they survived local validation. Validated pairs go into the high-confidence graph; unvalidated translations stay at lower confidence.
Key Insight: The patent distinguishes carefully between synonymy and relatedness. The technique works for related terms because the relationship is conceptual; trying the same trick on lexical synonyms produces brittle cross-language pairs because lexical structure varies dramatically across languages.
<\/section>The Process
End-To-End Cross-Language Projection
The pipeline turns a source-language related-term graph into a multi-language graph by projection plus local validation.
- Source Graph Snapshot — Take a snapshot of the source-language related-term graph that has been validated in that language.
- Per-Pair Translation — For each related pair, translate both terms independently into each target language. The translation step uses standard machine translation.
- Provisional Entry — Add the translated pair to the target-language graph as a provisional entry with low confidence until validated.
- Local Validation — Check the provisional entry against local target-language signals: query log co-occurrence, document mining results, click patterns. Each signal contributes to validation.
- Promote Or Demote — Provisional entries that pass local validation are promoted to the high-confidence target graph. Entries that fail are kept at low confidence or removed.
What This Means for SEO
What This Means for SEO
For multilingual sites, this patent describes how Google's understanding of related concepts crosses language boundaries. The implications for how you structure international content are direct and shape how topical authority transfers across locales.
- Topical Authority Crosses Languages — If your English site has built strong topical authority on a concept, that concept's related-term network exists in other languages too via translation. Localized content benefits from the established relationships even before earning its own signals.
- Translate Concepts, Not Words — Localized content should translate the underlying concept and its related terms, not just translate the surface phrasing. This matches how the related-term graph crosses languages.
- Hreflang And Language-Aware Linking Matter — When you link related-concept pages across languages with hreflang, you reinforce the cross-language related-term structure the engine is already building. Missing or inconsistent hreflang denies that reinforcement and weakens the cross-language projection.
- Avoid Synonym-Level Translation Bets — Do not assume an English synonym pair translates into a target-language synonym pair. Target the underlying concept, then look up which target-language terms actually surface in that language's query logs and SERPs.
- Long-Tail Concepts Transfer Even With Thin Local Corpora — Languages with smaller search volume still inherit related-term knowledge from larger languages via projection. You can target long-tail intent in low-resource languages with less local linking volume than you would need in English.
- Local Co-Occurrence Strengthens The Transferred Pair — When your target-language content puts related terms close together (same paragraph, same heading hierarchy), you contribute to the local validation signal that promotes the transferred pair to high confidence.
- Inconsistent Localization Breaks The Bridge — If your English content treats two concepts as related but your localized version splits them across separate pages with no cross-linking, you weaken the local validation signal that would otherwise reinforce the projected pair.