Case Study – NewsIndexer

The Need:

As print newspapers are being replaced by or supplemented with online news publications, people who have traditionally relied on the delivery of their morning paper are turning more and more to newspaper websites for local and world news. Now, in addition to current updates, it is possible to find out what happened in Paris (France or Texas), over the past week, the past month, or even the past year, simply by going online.

This vast potential for article storage, however, carries its own set of complications. Online searches for newspaper articles often yield a flood of erroneous hits and useless links. Discouraged readers who look to newspapers for information begin turning elsewhere for the news.

To retain the faith of readers, online searches must yield more useful results. News publishers have needed better ways of indexing articles, and sought indexing software to make the process easier.

The Solution:

Taxonomy experts at Access Innovations have created a specialized group of terms – with the newspaper industry’s indexing needs in mind – a controlled vocabulary, and associated indexing rules that reflect the language and reporting practices of the news media.

As with any sizable vocabulary, there are numerous terms in the NewsIndexer taxonomy that may appear similar to a machine, but which contain very different meanings. For example, does “bears” refer to “Chicago Bears” or “Grizzly Bears”? While such terms may cause problems for simpler indexing systems, NewsIndexer has the capability to distinguish between them.

One of NewsIndexer’s primary features is a simple and remarkable programming system called rulebuilding, which teaches the system to bring up terms by context. NewsIndexer will scan an article about the Chicago Bears and, drawing on its knowledge base, locate terms related to the football team, but will not suggest zoological terms pertaining to wild bears.

NewsIndexer’s ability to distinguish between such similar terms is a tremendous time saver, both for the human indexer, and for the person who is searching for articles through a newspaper’s website. Hence, the reader focuses on reading the articles instead of searching for them.

The process does not end there. NewsIndexer keeps statistics on what is and is not selected by the human indexer. Approved terms suggested by NewsIndexer are classified as “hits.” Terms that were not selected are called “noise,” and terms that should have been suggested, but were not, are “misses.” Having statistics for hits, noise, and misses makes it easy to measure NewsIndexer’s accuracy rate, and to systematically reduce noise and misses. Statistics also make it easier to tailor NewsIndexer to the specific data of every client.

The Results:

A revolutionary indexing system that caters to the newspaper industry, NewsIndexer is the ultimate news indexing tool, a jump start to the human indexer’s brain.

NewsIndexer scans an article, then suggests indexing terms to be approved by the human indexer. This union between the machine’s ability to store and retrieve knowledge, and the human’s capacity for judgment and reason results in articles being indexed quickly and accurately. Searchers find just what they want, efficiently and without significant noise. News can also be personalized for delivery to handheld devices. The end result is searcher satisfaction at lower cost to the publisher.