Noam Shazeer, Google Transformer, MoE & Search Patents

By NizamUdDeen · Updated January 1, 2026 · Reviewed by the Nizam SEO War Room editorial team.

First, the short version. Below is the AIO-eligible passage and the question-format primer for Noam Shazeer, Google Transformer, MoE & Search Patents.

~46 captured Google patents (full portfolio estimated at 120-150) by Noam Shazeer. Co-inventor on the foundational Transformer attention architecture (US 10,452,978 with Vaswani, Polosukhin, Uszkoreit, Jones, Gomez, Kaiser, Parmar), the Sparsely-Gated Mixture-of-Experts scaling approach (US 11,769,055 with Dean, Hinton, Le, Mirhoseini), the Switch Transformer (US 12,093,829), and the Navboost implicit-feedback ranking family (US 8,661,029 with Kim/Tong/Diligenti — cross-listed). Also covers LLM-in-assistant response generation, distributed tensor computations (GSPMD/Pathways), and the 2008-2015 pre-Transformer large-scale ML ranking infrastructure. Spans 2008 to 2025.

About the Noam Shazeer, Google Transformer, MoE & Search Patents track

Transformer & Attention

Attention-Based Sequence Transduction Neural Networks (Transformer) (US 10,452,978 · October 22, 2019)
Attention-Based Sequence Transduction (2020 continuation) (US 10,719,764 · July 21, 2020)
Attention-Based Sequence Transduction (2021) (US 11,113,602 · September 7, 2021)
Attention-Based Sequence Transduction (2025) (US 12,217,173 · February 4, 2025)
Decoder-Only Transformer Architecture (US 12,354,005 · July 8, 2025)
Attention-Based Image Generation (US 12,142,034 · November 12, 2024)
Speech-Recognition Attention RNN (US 12,100,391 · September 24, 2024)

Mixture of Experts & Sparse Scaling

Mixture of Experts Neural Networks (US 11,769,055 · September 26, 2023)
Mixture of Experts (2024) (US 12,067,476 · August 20, 2024)
Neural Networks with Switch Layers (Switch Transformer) (US 12,093,829 · September 17, 2024)

LLM-Driven Search & Assistant

Using Large Language Models in Generating Automated Assistant Responses (US 12,148,421 · November 19, 2024)
Evaluating Output Sequences Using an Auto-Regressive Language Model (US 12,086,713 · September 10, 2024)
Distributing Tensor Computations (Pathways / GSPMD) (US 12,265,903 · April 1, 2025)

Pre-Transformer Ranking & Navboost (cross-list)

Modifying Search Result Ranking Based on Implicit User Feedback (Navboost) (US 8,661,029 · February 25, 2014)
Ranking Documents Based on Large Data Sets (US 9,116,976 · August 25, 2015)
Model Generation for Ranking Documents (US 7,743,050 · June 22, 2010)
Equivalent Descriptions for an Information Need (US 7,392,244 · June 24, 2008)
Determining Geographical Relevance of Web Documents (US 8,086,690 · December 27, 2011)

Why this inventor matters

Each inventor track inside the Nizam SEO War Room patents archive isolates one engineer's research arc — typically a decade or more of continuations, divisionals, and follow-up patents on a coherent research thread. Reading by inventor (rather than by topic) recovers the narrative: how the original disclosure evolved, what the continuations added, which claims got carved out into divisional applications, and how the thread eventually intersected with other research lines at Google or Microsoft. This is how working SEOs build durable intuition about search-engine internals — not by memorizing claim language, but by following the research bibliography that shipped the algorithms we now optimize against.

How to read this track

Start with the earliest filing — it sets the foundational disclosure. Continuations refine the claims; divisional applications split out separable inventions; the follow-up patents tend to introduce performance optimizations, edge-case handling, or downstream integration with other systems. Each patent on this site is annotated with the ranking surface it touches — query understanding, document retrieval, ranking, behavioral signals, knowledge graph, or AI search — so the practitioner can map the research back to the algorithm output observed on live SERPs.

For example, a working SEO consultant uses Noam Shazeer, Google Transformer, MoE & Search Patents when diagnosing a ranking drop, planning a content calendar, or briefing a client on why a tactic shifted. However, the concept only compounds when paired with the surrounding entries in the encyclopedia and patents archive. In addition, the platform connects this concept to live SERP data so the theory carries through to execution.

Finally, to summarize. Noam Shazeer, Google Transformer, MoE & Search Patents matters because it intersects directly with the signals search engines and AI answer engines use to rank and surface results. The full article above covers the mechanism in depth, the patents it derives from, and the related encyclopedia entries to read next.

Noam Shazeer, Google Transformer, MoE & Search Patents | Google Patents

What is Noam Shazeer, Google Transformer, MoE & Search Patents?

About the Noam Shazeer, Google Transformer, MoE & Search Patents track

Transformer & Attention

Mixture of Experts & Sparse Scaling

LLM-Driven Search & Assistant

Pre-Transformer Ranking & Navboost (cross-list)

Why this inventor matters

How to read this track

How does Noam Shazeer, Google Transformer, MoE & Search Patents work in modern search?

Where Noam Shazeer, Google Transformer, MoE & Search Patents fits in the Semantic SEO + AEO stack

Sources and related research

Transformer & Attention

1Attention-Based Sequence Transduction Neural Networks (Transformer)

2Attention-Based Sequence Transduction (2020 continuation)

3Attention-Based Sequence Transduction (2021)

4Attention-Based Sequence Transduction (2025)

5Decoder-Only Transformer Architecture

6Attention-Based Image Generation

7Speech-Recognition Attention RNN

Mixture of Experts & Sparse Scaling

8Mixture of Experts Neural Networks

9Mixture of Experts (2024)

10Neural Networks with Switch Layers (Switch Transformer)

LLM-Driven Search & Assistant

11Using Large Language Models in Generating Automated Assistant Responses

12Evaluating Output Sequences Using an Auto-Regressive Language Model

13Distributing Tensor Computations (Pathways / GSPMD)

Pre-Transformer Ranking & Navboost (cross-list)

14Modifying Search Result Ranking Based on Implicit User Feedback (Navboost)

15Ranking Documents Based on Large Data Sets

16Model Generation for Ranking Documents

17Equivalent Descriptions for an Information Need

18Determining Geographical Relevance of Web Documents