Identifies canonical resources by combining organic-content signals with structured-data markup. Bridges Schema.org markup into resource identification — the patent that operationalizes structured-data signal at the retrieval layer.
Patent Overview
- Inventor
- Trystan G. Upstill, Jack W. Menzel
- Assignee
- Google LLC
- Filed
- 2013
- Granted
- 2017-03-07
The Challenge
The Challenge
Resource identification (matching queries to canonical pages) traditionally relies on organic content signals. Structured-data markup provides additional signal — Schema.org, microdata, RDFa carry explicit entity and content-type assertions. Combining organic and structured signals improves identification.
- Organic-Only Identification Misses Markup Signal — Pages with rich structured-data carry stronger entity claims than organic content alone reveals.
- Structured-Only Identification Misses Coverage — Many pages lack structured-data markup. Pure-structured identification has poor coverage.
- Combination Improves Both Coverage And Precision — Per resource, organic-plus-structured signals together improve identification.
- Schema.org Becomes A First-Class Signal — Per resource, structured markup is a structural input to identification, not just metadata.
- Manipulation Resistance Required — Per resource, markup manipulation must be detected.
Innovation
How The System Works
The system extracts organic-content signals and structured-data markup per resource, validates structured-markup correctness, combines signals into resource-identification score, and applies in retrieval and ranking.
- Extract Organic Signals — Per resource, organic content signals extracted.
- Extract Structured Markup — Per resource, Schema.org / microdata / RDFa extracted.
- Validate Markup — Per markup instance, validate against schema and content consistency.
- Combine Signals — Per resource, organic plus structured signals combine into identification score.
- Apply In Retrieval — Per query, retrieval uses combined identification signal.
- Apply In Ranking — Per resource, combined score modulates ranking.
- Detect Manipulation — Per resource, markup-manipulation patterns flagged.
Structured Markup Joins Organic At The Retrieval Layer
The patent's load-bearing idea is that Schema.org and structured-data markup join organic content signals at the resource-identification layer. The combination produces richer identification than either alone.
Combined Signals Improve Identification
Per resource, organic plus structured signals combine. Schema.org becomes a first-class retrieval signal, not just SERP-feature metadata.
- Organic Signal Extraction — Per resource, organic-content signals extracted.
- Structured Markup Extraction — Per resource, Schema.org / microdata / RDFa extracted.
- Combined Identification — Combined signals produce richer resource-identification score.
Technical Foundation
Technical Foundation
The patent specifies the organic extractor, markup extractor, validator, combiner, retrieval integrator, and manipulation detector.
- Organic Extractor — Per resource, extracts organic content signals.
- Markup Extractor — Per resource, extracts Schema.org / microdata / RDFa.
- Validator — Per markup, validates against schema and content consistency.
- Combiner — Per resource, combines organic and structured into identification score.
- Retrieval Integrator — Per query, retrieval uses combined signal.
- Manipulation Detector — Per resource, markup manipulation flagged.
The Process
The Process
Extraction and validation run at indexing; retrieval and ranking use combined signal at query time.
- Index Resource — Per resource, organic and markup extracted.
- Validate Markup — Per markup, validation runs.
- Combine Signals — Per resource, combined identification score produced.
- Receive Query — Query arrives.
- Retrieve With Combined Signal — Retrieval uses combined signal.
- Rank — Ranking uses combined signal.
- Filter Manipulation — Manipulated markup flagged.
Quality Control
Quality Control
Wrong markup corrupts identification. The patent specifies safeguards.
- Markup Validation — Per markup, validation against schema and content.
- Consistency Check — Markup must align with organic content.
- Manipulation Detection — Misleading markup flagged.
- Confidence-Weighted Combination — Per signal, confidence weights combination contribution.
- Continuous Recalibration — Combination weights refresh.
Real-World Application
Combined organic-and-structured identification underpins modern Schema.org integration in Google search. The pattern of structured-markup as first-class retrieval signal informs every modern entity-aware retrieval system.
- Combined Signal Pattern — Organic plus structured signals combine.
- Validated Markup Quality — Markup validated against schema and content.
- Retrieval + ranking Application Scope — Combined signal applies in both retrieval and ranking.
Why Schema.org Markup Compounds Discovery
Per resource, structured markup contributes to combined identification signal. Pages with comprehensive, validated Schema.org markup surface more reliably and gain identification-layer benefits organic content alone cannot match.
Why Consistency Between Markup And Content Matters
Per markup, validation requires content consistency. Markup claiming things content doesn't support fails validation and contributes nothing — worse, may flag as manipulation. Accurate markup is the structural requirement.
<\/section>What This Means for SEO
What This Means for SEO
Resource identification combines organic-content signals with structured-data markup (Schema.org, microdata, RDFa), with markup validated against content for consistency. SEO implication: comprehensive, accurate Schema.org markup is a first-class retrieval signal, but only when it matches the page.
- Schema.org Markup Is First-Class Signal — Structured markup joins organic content at the identification layer, not as mere metadata. Comprehensive, validated Schema.org markup helps your pages be identified and surfaced more reliably than organic content alone. Mark up your key entities.
- Markup Must Match Content — Validation requires content consistency, and markup claiming things the content does not support contributes nothing or flags as manipulation. Only mark up what your page genuinely contains. Accuracy is the requirement, not coverage for its own sake.
- Combination Improves Coverage And Precision — Organic-plus-structured signals together beat either alone. Pages with both rich content and accurate markup get the most reliable identification. Pair strong content with structured data rather than relying on one.
- Markup Strengthens Entity Claims — Structured data carries explicit entity and content-type assertions stronger than organic text reveals. For entities, products, and articles, markup lets you state claims the system can use directly. Make your entity identity explicit.
- Pure Markup Without Content Lacks Coverage — Structured-only identification has poor coverage because many pages lack markup and markup alone is thin. Markup amplifies good content; it does not substitute for it. Keep the underlying content strong.
- Manipulation Resistance Penalizes Overreach — Markup manipulation is detected. Aggressive or false markup is a liability, not a hack. Stay within accurate, supported assertions to keep your structured signal counting in your favor.
- Validated Markup Compounds Discovery — Pages with comprehensive, validated markup gain identification-layer benefits organic content cannot match. Systematically marking up your important content types is a durable discoverability investment as search becomes more entity-aware.