Conversational voice-search framework that maintains keyword context across multi-turn voice queries, supporting follow-up questions without users having to restate prior context.
Patent Overview
- Inventor
- Srinivasan Venkatachary
- Assignee
- Google LLC
- Filed
- 2013-05-24
- Granted
- 2016-04-05
- Application Number
- US 13/902,401
The Challenge
The Challenge
Voice queries are conversational. Users ask follow-ups that depend on prior turns. Treating each voice query as isolated forces users to re-state context every time. The system needed multi-turn handling that preserves keyword context across the conversation.
- Single-Turn Voice Misses Conversational Pattern — Real voice interactions are dialogues. A user asks about a restaurant, then asks 'what are its hours' — the 'its' depends on the prior turn.
- Keyword Context Must Persist — The subject entity from one turn carries into subsequent turns. Without persistence, the system cannot resolve pronouns or implicit references.
- Topic Shifts Must Be Detected — Not every consecutive query continues the prior thread. The system must detect topic shifts and reset context when conversation turns to a new subject.
- Voice Recognition Adds Noise — Speech-to-text introduces transcription errors. Conversational context handling must be robust to noisy transcripts.
- Latency Demands Are Tight — Voice users expect near-immediate responses. The context-maintenance plus retrieval pipeline must run within voice-experience latency budgets.
Innovation
How The System Works
The system maintains a keyword context across voice turns, augments each new query with persistent context entities, detects topic shifts to reset context, performs retrieval against the augmented query, and returns the answer formatted for voice.
- Transcribe Voice To Text — Speech-to-text produces the query transcript. Confidence signals accompany each token.
- Extract Keywords From Current Turn — Identify entities, topics, and key phrases in the current turn. These are candidate context updates.
- Detect Topic Shift Or Continuation — Compare current turn keywords to maintained context. Strong overlap means continuation; weak overlap signals topic shift and triggers context reset.
- Augment Query With Persistent Context — If continuation, augment the current query with context keywords. Pronouns and implicit references resolve against the persistent context.
- Retrieve Against Augmented Query — The augmented query goes to retrieval. Results reflect the conversational context, not just the literal current turn.
- Format For Voice — The answer is formatted for voice delivery: concise, conversational tone, optional follow-up cue. Format adapts to the voice interface.
- Update Context — Successful turns update the maintained context with the current turn's keywords. Context evolves as conversation progresses.
Persistent Keyword Context
The patent's load-bearing idea is to carry keyword context across voice turns, so each new query is interpreted in light of the conversation rather than as an isolated request.
Conversations Are Continuous
Treating each turn as isolated breaks the conversational flow. Persistent context makes voice search behave like a real conversation, with follow-ups resolving naturally.
- Keyword Context Persistence — Entities, topics, and key phrases from prior turns persist into subsequent turns. Pronouns and implicit references resolve against the persistent set.
- Topic-Shift Detection — When the user changes topic, context resets. Detection uses keyword overlap and explicit-shift cues.
- Query Augmentation — Continuation turns get augmented with context keywords. Retrieval works on the augmented query so results reflect conversation state.
Technical Foundation
Technical Foundation
The patent specifies the context-maintenance store, the keyword extractor, the topic-shift detector, the augmentation logic, and the voice formatting layer.
- Context Maintenance Store — Per session, maintains the keyword context across voice turns. Time-bounded so stale conversations expire.
- Keyword Extractor — Per turn, identifies entities, topics, and key phrases. Output feeds context update and topic-shift detection.
- Topic-Shift Detector — Compares current turn keywords to maintained context. Threshold-based decision distinguishes continuation from topic shift.
- Augmentation Logic — On continuation, augments current query with context keywords. Pronouns and implicit references resolve via the augmented query.
- Voice-Formatted Response — Retrieval results pass through a voice-formatting layer. Output is concise, conversational, and optionally cues follow-up.
- Session Expiry — Context expires after a configurable idle period. Stale context does not pollute new conversations.
The Process
The Process
The pipeline runs in the voice-query path. Latency is bounded so voice users experience natural conversation pace.
- User Speaks — Voice interface captures audio. Speech-to-text produces the transcript with confidence signals.
- Extract Keywords — Keyword extractor identifies entities, topics, and key phrases in the current turn.
- Check Topic Continuity — Topic-shift detector compares current keywords to maintained context. Output is continuation or shift.
- Augment Or Reset — On continuation, augment the query with context keywords. On shift, reset context and start fresh.
- Retrieve — The query (augmented or not) goes to retrieval. Standard ranking produces candidates.
- Format And Speak — Voice-formatted response is generated and spoken to the user.
- Update Context — Successful turn updates the maintained context. Conversation continues with evolving state.
Quality Control
Quality Control
Wrong context maintenance produces baffling voice responses. The patent specifies safeguards.
- Topic-Shift Threshold Calibration — Threshold tuned to balance continuation accuracy and shift detection. Wrong setting causes either stale-context errors or broken conversations.
- Confidence-Weighted Updates — Low-confidence transcripts contribute less to context updates. Voice recognition errors do not pollute the maintained state heavily.
- Idle Expiry — Context expires after idle. Conversations that pause and resume hours later start fresh rather than reusing stale state.
- Explicit User Reset — Users can verbally reset context ('start over'). Explicit reset overrides automatic detection.
- Voice Response Conciseness — Responses are bounded in length. Voice users tolerate short responses; long ones lose engagement.
Real-World Application
Conversational voice context underpins Google Assistant's multi-turn dialog handling, the voice interfaces in Chrome and Android, and the conversational layers feeding into Search Generative Experience.
- Multi-turn Conversation Model — Context persists across turns. Follow-ups resolve naturally without re-stating subject.
- Shift-aware State Management — Topic shifts reset context. Users can change subject without polluting state.
- Voice-formatted Output Style — Responses bounded in length and conversational in tone. Format adapts to voice interfaces.
Why Conversational Search Inherits These Primitives
Search Generative Experience follow-up handling and Assistant multi-turn dialogue both build on the keyword-context-persistence primitives this patent describes. Each turn carries state from prior turns, supporting natural conversation.
Why Voice Queries Reward Entity-First Content
Voice queries often invoke entities by name. Content with strong entity coverage and clear definitional structure surfaces well in voice answers. Voice-first SEO emphasizes entity clarity over keyword density.
<\/section>What This Means for SEO
What This Means for SEO
The patent maintains keyword context across multi-turn voice queries so follow-ups resolve without users restating prior context, resetting on detected topic shifts. SEO implication: conversational and assistant search rewards entity-clear, definitional content that answers follow-ups carried from prior turns.
- Voice Rewards Entity-First Content — Voice queries often invoke entities by name. Content with strong entity coverage and clear definitional structure surfaces well in voice answers. Voice-first SEO emphasizes entity clarity over keyword density.
- Conversational Surfaces Inherit This — Generative-search follow-up handling and Assistant multi-turn dialogue build on this keyword-context persistence. Content that answers a topic and its natural follow-ups positions you across the whole conversational thread, not just the opening query.
- Anticipate Follow-Up Questions — Each new turn is interpreted in light of prior turns. Pages that cover a primary question plus its likely follow-ups (the next things a user would ask) align with how context carries forward across turns.
- Format Answers For Voice — Answers are returned formatted for voice. Concise, spoken-friendly, self-contained answers are favored over long prose that does not read aloud well. Provide a crisp answer first, detail after.
- Topic Shifts Reset Context — The system resets context on detected topic shifts. Clearly delineated topics on a page help the system understand when a query belongs to a new context, so unambiguous topical boundaries aid correct routing.
- Persistent Context Favors Coherent Coverage — Because context persists, a single page that coherently covers an entity and its related sub-questions can serve multiple conversational turns. Coherent, connected coverage outperforms scattered single-answer pages here.
- Definitional Structure Wins Spoken Answers — Clear definitional language is easiest to surface as a voice answer. Leading with direct, factual statements about the entity makes your content the natural pick for the spoken response.