Assesses web page staleness by examining internal date references and other temporal indicators within page content. The IBM-era patent that adds a per-page decay signal complementing the Dean/Haahr freshness families.
Patent Overview
- Inventor
- Andrei Broder
- Assignee
- International Business Machines Corporation
- Filed
- 2007
- Granted
- Published 2008-04-24
The Challenge
The Challenge
Pages decay over time. Update velocity alone misses the per-page decay signal — pages that haven't been updated may still be evergreen, while pages with recent timestamps may be stale-content with cosmetic refreshes. The system needs to read decay from page content directly.
- Update Velocity Doesn't Equal Freshness — Pages that update timestamps without content change look fresh by velocity; their content is stale.
- Internal Date References Carry Signal — Pages reference dates internally — when described events occurred, when cited sources are from. These references reveal content age.
- Some Topics Decay Faster Than Others — News decays in days; reference content decays in years. Per-topic decay rates differ.
- Decay Assessment Must Scale — Per page, decay assessment runs at indexing time. Extraction and scoring must be fast.
- Freshness Plus Decay Together — Update velocity (Dean's content-update family) plus per-page decay (this patent) together produce richer per-document freshness signal.
Innovation
How The System Works
The system extracts internal date references from page content, identifies the temporal context (event dates, cited-source dates, currency markers), computes per-page decay score, and feeds the score into ranking alongside update-velocity signals.
- Extract Internal Date References — Per page, NLP identifies dates in text (event dates, citation dates, currency markers).
- Classify Temporal Context — Per date, classify context: event reference, citation date, page-publish date, currency marker.
- Compute Decay Score — Per page, aggregate temporal context into decay score. Older internal references increase decay.
- Apply Per-Topic Decay Rate — Per topic, apply decay-rate multiplier. News decays fast; reference content decays slowly.
- Combine With Update-Velocity Signal — Per page, decay score combines with update-velocity signal for composite freshness.
- Apply In Ranking — Composite freshness signal modulates ranking score.
- Refresh At Crawl — Per crawl, decay score refreshes.
Content Reveals Its Own Age
The patent's load-bearing idea is that page content itself reveals temporal context. Internal date references, citation dates, and currency markers expose decay independent of update-velocity signals.
Per-Page Decay Independent Of Update Velocity
Update velocity captures publisher-side refresh patterns. Per-page decay captures content-side age signals. Both are needed for complete freshness assessment.
- Internal Date Reference Extraction — Per page, NLP extracts dates from content.
- Temporal Context Classification — Per date, context (event, citation, publish, currency) classified.
- Per-Topic Decay Rate — Per topic, decay-rate multiplier applied. News fast; reference slow.
Technical Foundation
Technical Foundation
The patent specifies the date extractor, context classifier, decay scorer, per-topic adjuster, composite freshness combiner, and refresh path.
- Date Extractor — Per page, NLP identifies internal date references.
- Context Classifier — Per date, classifies temporal context type.
- Decay Scorer — Per page, aggregates context into decay score.
- Per-Topic Adjuster — Applies topic-specific decay-rate multipliers.
- Composite Combiner — Per page, combines decay with update-velocity signal.
- Refresh Path — Per crawl, decay score refreshes.
The Process
The Process
Decay assessment runs at indexing; signal feeds ranking.
- Crawl Page — Page content fetched.
- Extract Dates — Internal dates identified.
- Classify Contexts — Per date, context classified.
- Score Decay — Per-page decay score computed.
- Apply Topic Rate — Per-topic multiplier applied.
- Combine With Velocity — Composite freshness produced.
- Feed Ranking — Signal modulates ranking.
Quality Control
Quality Control
Wrong decay classification corrupts ranking. The patent specifies safeguards.
- Date-Extraction Accuracy — Per page, date extraction validated.
- Context-Classification Validation — Per date, context classification validated.
- Per-Topic Calibration — Topic-specific decay rates calibrated against labeled data.
- Evergreen-Content Recognition — Some content is genuinely evergreen despite old internal references. Recognition prevents false-decay flagging.
- Continuous Recalibration — Models refresh against fresh data.
Real-World Application
Per-page decay assessment complements update-velocity freshness across modern search. The pattern of content-side temporal extraction adds richness that publisher-side update tracking alone cannot achieve.
- Content-driven Signal Source — Per page, internal date references drive decay score.
- Topic-aware Decay Rate — Per topic, decay rates differ. News fast; reference slow.
- Composite-freshness Integration — Combines with update-velocity signal for richer freshness assessment.
Why Citing Current Sources Wins For Time-Sensitive Topics
Decay signal reads internal date references. Pages citing recent sources signal lower decay; pages anchored to old citations signal higher decay. Citing current sources is the structural way to signal content currency.
Why Updating Examples And Statistics Matters
Per-topic decay favors content whose internal references stay current. Updating examples, statistics, and time-sensitive references reduces decay signal — independent of changing core argument or structure.
<\/section>What This Means for SEO
What This Means for SEO
This patent reads a page's staleness from its own content, extracting internal date references and citation dates to compute a per-page decay score independent of update velocity. SEO implication: refreshing a timestamp is not freshness; the dates and sources inside your content signal age.
- Timestamp Refreshes Do Not Fool Decay — The system explicitly separates publisher-side update velocity from content-side decay. Changing the modified date without updating the body leaves the internal date references stale, so the decay signal still fires.
- Cite Current Sources For Time-Sensitive Topics — Decay reads citation dates. Pages anchored to recent sources signal low decay; pages leaning on years-old citations signal high decay. Refreshing the sources you reference is a direct freshness lever.
- Update Examples And Statistics, Not Just Prose — Internal references like figures, dates, and statistics drive the score. Updating these time-sensitive elements reduces decay even when the core argument and structure stay the same.
- Decay Rates Are Topic-Specific — News content decays in days while reference content decays in years. The same age means different decay depending on topic, so evergreen reference pages are not penalized merely for being old.
- Genuinely Evergreen Content Is Recognized — The system has explicit evergreen-recognition safeguards so old internal references on truly timeless topics do not falsely trigger decay. You do not need to fake updates on stable reference material.
- Currency Markers Are Read Directly — Phrases that anchor content to a moment, like as-of dates and year references, are extracted and classified. Keeping these markers accurate signals current content rather than abandoned content.
- Pair Decay Control With Real Updates — Because decay combines with update velocity for a composite freshness signal, the strongest position is both genuinely updating the page and keeping its internal date references current.