A graph-placement policy learns to lay out chip floorplans in hours instead of weeks. The compute-economics mechanism that lets Google afford Transformer-scale ranking at web search volume.
Patent Overview
- Inventor
- Azalia Mirhoseini, Anna Goldie, others
- Assignee
- Google LLC
- Filed
- 2020-04-22
- Granted
- October 18, 2022
The Challenge
The Challenge
Chip floorplanning is the step where macro blocks like memory and compute units are arranged on a die. Human engineers iterate for weeks, and every layout decision ripples through power, performance, and area. The challenge: produce floorplans of human-quality or better in hours, so each new accelerator generation arrives sooner and lets the search stack absorb heavier ranking models per query.
- Floorplanning Bottlenecks New Silicon — Per design cycle, human macro placement takes weeks and gates the entire tape-out schedule for accelerator generations.
- Search Space Is Combinatorial — Per chip, the placement search space explodes faster than classical solvers can prune, so heuristics leave performance on the table.
- Quality Metrics Are Multi-Objective — Per layout, wirelength, congestion, and timing must all be optimized at once, and they trade off against each other in non-obvious ways.
- Transfer Across Designs Is Weak — Per new chip, traditional tools restart from scratch and cannot reuse what was learned from prior floorplans.
- Compute Cost Caps Model Size In Production — Per query served, slower silicon means fewer Transformer parameters can run in the ranking budget, which holds back ML-driven relevance.
Innovation
How The System Works
The system frames chip floorplanning as a reinforcement learning problem on a graph. A policy network reads the netlist as a graph, places macros sequentially on a grid, and receives a reward based on wirelength, congestion, and density. The policy learns from prior chips and transfers to new designs.
- Represent Netlist As Graph — Per chip, the netlist is encoded as a graph where nodes are macros and edges encode connectivity.
- Encode Graph With GNN — Per state, a graph neural network produces an embedding that captures placement-relevant structure.
- Policy Selects Next Macro Position — Per step, the policy network reads the embedding and chooses a grid cell for the next macro.
- Place Macros Sequentially — Per episode, macros are placed one after another until the floorplan is complete.
- Score With Reward Function — Per layout, the reward blends wirelength, congestion, and density into a single scalar signal.
- Update Policy By Gradient — Per training step, policy parameters are updated to favor placement sequences that produced higher reward.
- Transfer Across Chip Families — Per new design, the pre-trained policy generalizes from prior chips, so new accelerators ship faster.
Layout As A Learned Policy, Not A Hand Craft
The patent's load-bearing idea is that chip placement is a sequential decision problem that responds to learning. Once a policy has seen enough netlists, it produces layouts of human quality or better in a fraction of the time.
Graph-Conditioned Placement Policy
Per chip, the netlist graph conditions a placement policy. Per macro, the policy chooses a location that respects connectivity, density, and timing.
- Graph Encoding — Per netlist, a graph neural network captures structural context.
- Reinforcement Learning — Per episode, policy parameters update by reward gradient.
- Cross-Design Transfer — Per new chip, the pre-trained policy starts ahead of zero.
Technical Foundation
Technical Foundation
The patent specifies graph representation, GNN encoding, sequential placement, reward computation, policy gradient training, and transfer learning across chip families.
- Graph Representation Of Netlists — Per chip, nodes carry macro attributes and edges carry connectivity weights.
- Graph Neural Network Encoder — Per state, message passing produces an embedding that summarizes placement context.
- Sequential Action Space — Per step, the policy selects a discrete grid cell for the next macro.
- Multi-Objective Reward — Per layout, wirelength, congestion, and density combine into a scalar reward.
- Policy Gradient Training — Per batch, gradients update the policy to favor higher-reward placement sequences.
- Cross-Design Transfer Learning — Per new chip family, prior policy weights initialize the new training run so convergence is faster.
The Process
The Process
From a new netlist arriving at the placement stage, the system encodes it as a graph, runs the trained policy to place macros, scores the floorplan, and either ships it to downstream tooling or refines it further.
- Receive Netlist — Per chip, a netlist arrives with macros, ports, and connectivity defined.
- Build Graph Representation — Per netlist, nodes and edges are constructed and attributes are attached.
- Run Graph Encoder — Per state, the GNN produces an embedding for the policy.
- Sequential Macro Placement — Per step, the policy selects a grid cell and the macro is placed.
- Score Completed Layout — Per floorplan, wirelength, congestion, and density are computed.
- Hand Off To Downstream EDA — Per accepted layout, standard tools handle detailed routing and signoff.
- Feed Back Into Training — Per shipped chip, the layout and reward extend the dataset for future policy updates.
Quality Control
Quality Control
Learned floorplanning introduces risks around timing closure, congestion hot spots, and overfitting to seen designs. The patent specifies safeguards to keep layouts production-ready.
- Multi-Objective Reward Balance — Per layout, no single objective dominates so the policy cannot game one metric at the cost of another.
- Density Constraints — Per grid cell, density limits prevent the policy from clustering macros into infeasible regions.
- Congestion Estimation — Per layout, routing congestion is estimated and penalized so downstream routing remains feasible.
- Generalization Holdouts — Per training cycle, held-out chip families verify the policy did not overfit to specific designs.
- Human Review Hook — Per shipped floorplan, engineers can inspect, override, and refine before signoff.
Real-World Application
The work shipped in production TPU design at Google. Each accelerator generation that arrives faster expands the ML budget that ranking, retrieval, and language understanding can spend per query. The same compute lift that lets a recommender run a larger model also lets Search run a heavier Transformer in the ranking stack.
- Hours, not weeks Floorplan Latency — RL placement compresses the schedule for each new chip generation.
- Multi-objective Reward Function — Wirelength, congestion, and density combine into one signal.
- Cross-design Transfer — Pre-trained policies start ahead on new chips.
Why Chip Design Is A Search Problem
Per accelerator generation, faster TPUs translate directly into more model parameters per query in production. The ranking stack absorbs the lift, which means smarter relevance reaches the SERP without raising serving cost.
Why Compute Economics Shape The Ranking Era
Per query, the model size that can be run inside the ranking budget is gated by how cheap the underlying hardware is. Better floorplans lower the cost per inference, which is the lever that lets Transformer-scale ranking become the default rather than the exception.
<\/section>What This Means for SEO
What This Means for SEO
Chip placement is not where SEO normally looks, but it is upstream of every ranking model that runs at Google scale. The economics of inference set the ceiling on how much ML can be applied per query, which sets the ceiling on how nuanced ranking can be.
- Compute Cost Sets The Ranking Model Size — Per query served, the ranking stack runs only as much model as the inference budget allows. Faster, cheaper TPUs raise that budget, which is why each accelerator generation widens the gap between simple retrieval and full neural relevance. Plan content for an ML-heavy ranker that will only grow heavier.
- Cheaper Inference Means More Pages Get Neural Treatment — When ranking is expensive, only the head queries and high-value documents receive the heaviest treatment. As inference gets cheaper, the long tail of queries and documents starts receiving the same neural scoring. Tail-content quality matters because the long tail is now in scope for neural ranking, not only lexical.
- Embedding-Based Retrieval Spreads Down The Stack — Lower inference cost lets embedding-based retrieval and cross-encoder rerankers run for more queries per second. Pages that read clean to a language model, not only a lexical index, are favored because the lexical-only fast path shrinks as compute gets cheaper.
- Multimodal Ranking Becomes Affordable — Image, video, and audio understanding require heavy compute per query. As chip generations compress that cost, multimodal signals enter mainstream ranking. Pages with first-class images, captions, and video transcripts gain leverage that pure text pages do not.
- Generative SERP Features Run On The Same Substrate — AI Overviews, generative answers, and on-SERP summarization run on TPU inference. Every floorplan improvement that lowers cost per token raises the volume of queries that trigger generative surfaces, which changes the click landscape on those queries. Content that answers and earns citation inside the generated block keeps visibility.
- Personalization Latency Drops — Personalization per user requires running ranking models with extra context. Cheaper inference reduces the latency penalty of that extra context, which means more personalization arrives per query. Audience-fit and persona signals carry more weight when the system can afford to apply them at scale.
- The Frontier Of Relevance Is Set By Hardware Economics — Per ranking generation, what becomes the default is gated by what becomes affordable. The neural ranking era is downstream of the chip design era. SEO strategy should assume the ceiling keeps rising because the substrate keeps getting cheaper, not assume it has plateaued.