Elicits structure from a collection of information entities by analyzing explicit and implicit relations between them, producing notions of quality, relevance, similarity, and authority used for filtering.
Patent Overview
- Inventor
- Prabhakar Raghavan
- Assignee
- IBM Corporation
- Filed
- 1997-10-08
- Granted
- 2006-02-07
- Application Number
- US 08/947,178
The Challenge
Information Filtering Needs Structure It Cannot Directly See
Filtering information entities (documents, web pages, news items) against user interest profiles requires more than keyword matching. The system needs to read structure from the collection itself: which entities are authoritative, which are relevant to specific information needs, which are similar to others, which are definitive sources. This structure is implicit in the relationships among entities; the system has to elicit it.
- Keyword Match Misses Quality — A document can match a user's interest keywords yet be low quality, derivative, or off-topic. Pure keyword filtering returns relevant-looking but bad results.
- Authority Is Not An Explicit Label — No document is labeled 'authoritative'. Authority has to be inferred from how the entity relates to others: who cites it, who refers to it, how it sits in the collection's graph.
- Relations Can Be Explicit Or Implicit — Some relations are explicit (hyperlinks, citations, tags). Others are implicit (co-mention, similar wording, shared sources). The filter has to read both kinds.
- Relations Can Be Static Or Dynamic — Some relations are fixed (citation in a paper). Others change over time (user clicks, mentions in current news). The filter must handle both temporal modes.
- Affinity Is The Right Abstraction — All these relation types can be unified under the abstraction of 'affinity' between entities. The filter operates on the affinity graph to derive quality, relevance, similarity, and authority signals.
Innovation
Affinity Graph As The Filter Substrate
The patent defines an 'affinity' between every pair of entities, where affinity captures the strength of relation regardless of explicit/implicit and static/dynamic nature. From the affinity graph, the system elicits four kinds of structure: quality (authority), relevance to information needs, similarity among retrieved items, and definitiveness of sources. These structural readings drive the filtering decision.
- Identify Entity Collection — Define the set of information entities to be filtered. The collection can be documents, web pages, news items, products, or any indexable resources.
- Compute Pairwise Affinities — For each pair of entities, compute an affinity value combining explicit relations (links, citations, tags) and implicit relations (co-mention, content similarity, shared audience).
- Distinguish Static And Dynamic Components — Separate the affinity into a static base (stable relations like citations) and a dynamic component (recent activity like clicks). Each contributes to the affinity score with appropriate weighting.
- Elicit Quality Structure — Use the affinity graph to compute authority and definitiveness scores per entity. High-authority entities are those that many others are affiliated with via strong affinity edges.
- Elicit Relevance Structure — Compute relevance of each entity to user-specified information needs by walking affinity edges from need-defining entities outward.
- Elicit Similarity Structure — Compute pairwise similarity for retrieved entities. Similar entities are clustered together; the user can see which results are near-duplicates and which add new information.
- Apply To Filter — Combine the quality, relevance, similarity, and definitiveness readings into a filtering decision. Entities above the threshold pass; the rest are filtered out before reaching the user.
One Graph, Four Structural Readings
Quality, relevance, similarity, and definitiveness are all derived from a single underlying affinity graph. The graph contains the structural information; the readings extract different views of it for different filtering decisions.
Affinity Subsumes Many Relation Types
Hyperlinks, citations, co-mentions, tag overlap, click correlations, content similarity. All become affinity edges with appropriate weights. The unification lets one graph algorithm serve many filtering goals.
- Quality / Authority — Derived by analyzing which entities are densely affiliated. High in-affinity entities are authoritative; low in-affinity entities are noise.
- Relevance — Computed by walking affinity edges from user-specified need-defining entities outward. Entities reachable with high affinity weight are relevant.
- Similarity — Pairwise affinity between retrieved entities. Used to cluster results and reduce redundancy.
- Definitiveness — Authority readings filtered for definitiveness signals (canonical-source patterns). Definitive entities are the preferred fillers when multiple authority candidates compete.
Technical Foundation
What The Filter Reads
The filter operates on an affinity graph that subsumes explicit, implicit, static, and dynamic relations.
- Entity — A resource being filtered: document, page, news item, product. Granularity is task-dependent.
- Affinity Edge — A weighted relation between two entities. Weights combine explicit and implicit relation sources.
- Static Component — The stable portion of an affinity edge derived from long-lived relations (citations, hyperlinks, tags).
- Dynamic Component — The time-varying portion of an affinity edge derived from recent activity (clicks, mentions, co-views).
Key Insight: The patent's lasting contribution is the affinity abstraction. By unifying disparate relation types under a single edge-weight framework, the filter can use one set of graph algorithms to derive multiple structural readings. Modern personalization and recommendation systems use the same abstraction (sometimes under different names like 'preference graph' or 'interaction graph'), and the multi-reading approach (quality, relevance, similarity together) maps directly to how modern feed and search systems combine signals.
<\/section>The Process
End-To-End Filtering
The filter runs offline (building the affinity graph) and online (applying readings per user/query).
- Build Affinity Graph — Compute pairwise affinities across the entity collection. Combine explicit and implicit relations; separate static and dynamic components.
- Pre-Compute Quality And Definitiveness — Offline graph algorithms produce authority and definitiveness scores per entity.
- Receive User Need — User specifies information need (query, profile, interest set). The need defines starting points in the affinity graph.
- Compute Relevance Per Entity — Walk affinity edges from the need-defining entities outward. Entities with high weighted reachability are relevant to this need.
- Compute Similarity Among Candidates — For the relevant candidate set, compute pairwise similarity. Cluster near-duplicates.
- Filtering Decision — Combine quality, relevance, similarity, and definitiveness into a per-entity score. Filter entities below the threshold; return the rest in ranked order.
What This Means for SEO
What This Means for SEO
Information filtering via affinity graphs is a foundation under modern recommendation and personalization. Understanding the four-reading structure changes how to think about authority signals, relevance evidence, and the role of explicit versus implicit relations.
- Multiple Relation Types Compound Affinity — Hyperlinks, citations, co-mentions, anchor patterns, and shared audience signals all become affinity edges. Pages with many relation types reinforcing their position score higher than pages with one strong signal alone.
- Static And Dynamic Both Matter — Long-lived relations (citations, in-content links) form the static component; recent engagement (clicks, mentions) forms the dynamic component. SEO that produces both is sturdier than SEO that produces only one.
- Definitiveness Is A Distinct Signal — Beyond raw authority, the system reads which entities are definitive sources. Canonical, original, source-of-record content earns the definitiveness signal that distinguishes it from authority-without-uniqueness.