Resolves which sense of an ambiguous word the user means by reading the surrounding entity context as a graph, where entities mentioned nearby anchor the word to the right meaning when the entity neighborhood matches one sense's typical context better than others.
Patent Overview
- Filed
- 2020-08-26
- Granted
- 2023-06-27
- Application Number
- US 17/003,567
The Challenge
The Challenge
Many words have multiple senses. 'Apple' is a company, a fruit, a record label. 'Jaguar' is a car, an animal, an operating system. Pure dictionary disambiguation cannot scale, and surface context windows often lack enough signal. The system needs a richer disambiguator.
- Polysemy Is Pervasive — Most common words have multiple senses. Search queries and documents are full of polysemous terms whose right interpretation depends on surrounding context.
- Window-Based Context Is Often Too Thin — A few surrounding words may not disambiguate. 'Apple stock' might be about company shares or a recipe ingredient. The system needs to look further than a token window.
- Entities Provide Cleaner Context — Other entities mentioned nearby (people, places, organizations) anchor meaning more reliably than raw word context. They are denser signal.
- Knowledge Graph Encodes Sense Affinities — Each sense of a word has known entity affinities in the graph. The 'Apple company' sense links to Tim Cook, iPhone, Cupertino. The 'apple fruit' sense links to recipes, orchards, nutrition.
- Disambiguation Must Run Fast — Every query needs disambiguation in milliseconds. The graph lookup and scoring must be efficient enough for the online query path.
Innovation
How The System Works
The system identifies candidate senses for each ambiguous word, extracts the surrounding entity context as a graph neighborhood, scores each candidate sense by how well its known affinities match the observed neighborhood, and picks the highest-scoring sense.
- Detect Ambiguous Word — Lookup table identifies words with multiple senses. The pipeline only runs disambiguation when ambiguity is possible.
- Enumerate Candidate Senses — For each ambiguous word, retrieve the list of canonical senses from the knowledge graph. Each sense has its own entity ID and affinity profile.
- Extract Local Entity Context — Identify entities mentioned in the surrounding text (or query). These form a small graph neighborhood representing the local context.
- Score Each Sense Against Context — Each candidate sense has known affinities (entities it usually appears near). Compare the affinity set to the observed neighborhood. High overlap means high score.
- Pick The Best Sense — The sense with the highest affinity score wins. If no sense scores well, the system falls back to the popularity default.
- Propagate Resolved Sense Downstream — The resolved sense's entity ID is passed to retrieval, ranking, and answer extraction so downstream systems work with the correct interpretation.
- Log For Continuous Improvement — Resolved senses and downstream outcomes (clicks, follow-up queries) feed back into affinity profile refinement. Sense models improve with use.
Entity Neighborhood Resolves Meaning
The patent's load-bearing idea is to use surrounding entities, not surrounding words, as the disambiguation signal. Entities are denser, less noisy, and tied to canonical knowledge-graph IDs whose relationships are already enumerated.
Sense Is What The Neighbors Are
An ambiguous word becomes unambiguous as soon as you see the entities it sits among. The patent operationalizes this intuition by reading the graph neighborhood and matching it to sense profiles.
- Entity-Anchored Context — Surrounding entities are the signal. They tie the local context to known knowledge-graph nodes with rich relational structure.
- Sense Affinity Profiles — Each sense of a word has a profile of typical entity co-occurrences. Profiles come from analyzing labeled training corpora.
- Score By Overlap — The sense whose affinity profile best overlaps with the observed neighborhood wins. The math is straightforward set-overlap with weighting.
Technical Foundation
Technical Foundation
The patent specifies the ambiguity detection table, the sense enumeration, the affinity profile store, the scoring algorithm, and the integration with downstream retrieval.
- Ambiguity Detection Table — Precomputed dictionary of words with multiple senses. Lookup is O(1) and gates whether disambiguation runs for a given word.
- Sense Enumeration Store — For each ambiguous word, the senses are listed with their canonical entity IDs and affinity profiles. Senses come from the knowledge graph and curated dictionaries.
- Affinity Profile Format — Each profile is a weighted list of entity IDs the sense typically appears near. Weights reflect co-occurrence strength in training data.
- Context Entity Extractor — An entity recognizer identifies entities in the surrounding text or query. Output is the observed entity neighborhood.
- Scoring Algorithm — Weighted overlap between observed neighborhood and each sense's affinity profile. Implementation uses sparse vector dot products for speed.
- Default Fallback — When no sense scores above the threshold, the system uses the most-popular sense as the default. This handles cases where context is too thin to disambiguate.
The Process
The Process
The pipeline runs in the query and ranking paths. Each ambiguous word triggers a fast disambiguation lookup; resolved senses propagate to downstream systems.
- Receive Text Or Query — Disambiguation runs on incoming queries and on document content during indexing.
- Identify Ambiguous Words — Each token is checked against the ambiguity dictionary. Ambiguous tokens enter the disambiguation path.
- Enumerate Senses And Extract Context — For each ambiguous word, list candidate senses. Extract the entity neighborhood from surrounding text.
- Score Senses — Compute affinity overlap scores per sense. The scoring is fast even for many candidate senses.
- Pick Winner Or Fallback — Highest-scoring sense wins above threshold. Below threshold, fall back to popularity default.
- Annotate Output — The text or query is annotated with resolved sense IDs. Downstream systems read the annotations.
- Log And Learn — Decisions are logged. Affinity profiles are refined periodically based on feedback from downstream success metrics.
Quality Control
Quality Control
Disambiguation errors propagate to retrieval and ranking, so the system must be robust. The patent specifies the safeguards.
- Threshold Calibration — The minimum score for sense selection is calibrated to balance precision and recall. Too high and many cases fall to default; too low and weak matches win.
- Fallback To Popularity Default — When confidence is low, default to the most-popular sense. This is safer than picking a weak match.
- Profile Refinement — Affinity profiles refine as more data accumulates. Profiles drift with the world, so refresh keeps them current.
- Cross-Source Validation — When multiple disambiguation signals are available (entity recognizer, n-gram model, user history), the system can cross-validate to reduce error rates on important queries.
- Edge Case Auditing — Common ambiguous queries are audited periodically to verify resolution quality. Audits identify profile weaknesses for targeted improvement.
Real-World Application
Entity-graph-based word sense disambiguation runs across the search query pipeline, document indexing, and Knowledge Graph extraction. It is one of the layers that makes the same query produce sensible results across users with very different intents.
- Entity-anchored Context Source — Surrounding entities, not raw words, provide the disambiguation signal. Entity context is denser and tied to graph-level knowledge.
- Profile-based Sense Representation — Each sense carries an affinity profile of typical entity co-occurrences. Scoring is overlap between observed and expected neighborhoods.
- Online Inference Latency — Disambiguation runs in the query path in milliseconds. Sparse-vector dot products plus precomputed profiles keep latency low.
Why Topical Entity Density Matters
Content that mentions many entities tightly clustered around its target topic gives the disambiguator strong neighborhood signal. Pages thin on related entities can be misclassified when they use polysemous words, since the surrounding context is too weak to pick the right sense.
Why Entity Schema Helps Disambiguation
Pages with explicit entity markup (Schema.org Organization, Place, Person) give the disambiguation pipeline pre-resolved sense information. The page essentially tells the engine which sense is intended, eliminating ambiguity for content the markup covers.
<\/section>What This Means for SEO
What This Means for SEO
Word-sense disambiguation through entity graphs means the meaning the system picks for an ambiguous word depends on the entities it finds nearby.
- Surrounding Entities Disambiguate Keywords — A page about "Apple" with nearby entities like "iPhone", "Tim Cook", "Cupertino" resolves cleanly to the company. The same word in a recipe context resolves to the fruit. Plant disambiguating entities deliberately.
- Entity Co-Occurrence Builds Topical Identity — The set of entities your page mentions defines its topic in the graph. A coherent entity set wins disambiguation, a scattered one loses on every ambiguous term.
- Glossary And Definition Sections Help — A short definition block early on the page anchors which sense of an ambiguous word you mean. The model uses it the same way readers do.