Case Study: IEEE Indexing Group


IEEE Indexing Group is responsible for daily processing from 200 to 3,000 new research articles for the Xplore online delivery platform, generating the metadata needed for each article.


IEEE maintains its thesaurus of approximately 5,000 terms and nearly 2,000 synonyms in Thesaurus Master. The taxonomy tool is synchronized and integrated with the categorization rulebase in M.A.I. The rules in M.A.I. govern application of the taxonomy terms to the stream of incoming documents as well as to IEEE’s vast store of legacy documents. The rules are initially generated programmatically, and IEEE editors can fine-tune rules to capture new expressions of taxonomy concepts.


To manage the indexing vocabulary and categorization for the wealth of electronic documents and to promote productivity in the Indexing Group, IEEE used Access Innovations’ Data Harmony MAIstro™.

In the IEEE indexing workflow, content streams into IEEE’s Digital Asset Management System to start processing. Programmatic checks filter records to pass through material with certain metadata entries automatically applied. For automatic subject indexing by taxonomy categories, selected metadata are processed by M.A.I., including the publication title, abstract and first paragraph of XML articles. PDF documents are preprocessed to extract text and then indexed.

Of the 180,000 articles auto-indexed annually, the editorial team performs a quality review on about 15,000-20,000, or 10 to 200 in a day. They may catch the odd error such as the taxonomy term “Gold” as a chemical element being suggested by the phrase “the gold standard.” Discovery of a categorization error or missing a significant concept prompts an editor to fine-tune a rule in M.A.I., adding or modifying a condition for more precise application of indexing terms.


Indexed documents stream forward to publication on IEEE’s Xplore database, where the subject metatags on the documents enable precision in search and retrieval by topic. In addition to subject metadata, subscribers can access content by searching authors, terminology from additional taxonomies, and other metadata.