Case Study – Institute of Electrical and Electronics Engineers (IEEE)
IEEE needed to create a five-year update to the thesaurus used both to index their publications and to enable enhanced search on IEEE Xplore, their publications portal. The ultimate goal was improved searching of IEEE’s vast repository of published research and other documentation. The new taxonomy would also be used to structure the content for improved site navigation and data repurposing, designed to improve membership retention and increase revenues.
IEEE is the world’s largest engineering association and publishes across all facets of engineering. The organization includes 38 specialized societies and has more than 375,000 members. IEEE publishes over 1300 standards, and has more than two million documents in its electronic library. More than 850 conferences are sponsored by IEEE each year.
Using Data Harmony’s MAIstro, Access Innovations taxonomists reviewed the IEEE uncontrolled term lists, search logs, and other term collections using an initial sample set of their corpus for the past 6 years. Our team also performed extensive hierarchical structuring of the thesaurus to reflect the technological changes in the engineering fields since the initial thesaurus construction. The editors identified appropriate new terms and placed them into the thesaurus.
The new taxonomy reflects the professional terminology used by IEEE’s more than 40 separate interest groups. It also enables IEEE to create and leverage additional semantically-enriched metadata for guided navigation of the content body, in addition to the INSPEC terms and additional controlled terms presently used. The screenshot shows part of the thesaurus as it appears in Access Innovations’ Data Harmony software.
MAIstro is a powerful semantic reasoner that automatically generates semantic rules for each thesaurus term. Access Innovations refined the machine-generated rules in the M.A.I. rulebase to enable precise subject indexing. The screenshot illustrates a rule developed by Access Innovations editorial staff to refine the accuracy and expand the completeness of automated indexing based on the thesaurus.
Elisabeth Moscara of IEEE has commented, “I want to take this time to thank Margie’s and Jay’s staff at Access Innovations for their hard work and professionalism in providing IEEE with an updated thesaurus. They were very helpful in answering any questions I had and provided me with timely e-mails and reports. They also delivered the initial draft of the 2006 IEEE Thesaurus one week earlier than projected!”
IEEE uses the thesaurus to increase findability within the more than two million documents on their Xplore database. The subject metatags on the documents enable both increased precision and recall, improving the user experience. The immediate benefit is the increased productivity in the indexing staff, who are faced with an ever-increasing load of information to process into the database. Adding metadata to the documents will enable faceted searches for users and efficient sorts for repurposing document subsets.
IEEE Indexing Group
The IEEE Indexing Group is responsible for daily processing from 200 to 3,000 new research articles for the Xplore online delivery platform, generating the metadata needed for each article.
IEEE maintains its thesaurus of approximately 5,000 terms and nearly 2,000 synonyms in Thesaurus Master. The taxonomy tool is synchronized and integrated with the categorization rulebase in M.A.I. The rules in M.A.I. govern application of the taxonomy terms to the stream of incoming documents as well as to IEEE’s vast store of legacy documents. The rules are initially generated programmatically, and IEEE editors can fine-tune rules to capture new expressions of taxonomy concepts.
To manage the indexing vocabulary and categorization for the wealth of electronic documents and to promote productivity in the Indexing Group, IEEE uses Access Innovations’ Data Harmony MAIstro™.
In the IEEE indexing workflow, content streams into IEEE’s Digital Asset Management System to start processing. Programmatic checks filter records to pass through material with certain metadata entries automatically applied. For automatic subject indexing by taxonomy categories, selected metadata are processed by M.A.I., including the publication title, abstract and first paragraph of XML articles. PDF documents are preprocessed to extract text and then indexed.
Of the 180,000 articles auto-indexed annually, the editorial team performs a quality review on about 15,000-20,000, or 10 to 200 in a day. At this point, they may catch the odd error such as the taxonomy term “Gold” as a chemical element being suggested by the phrase “the gold standard.” Discovery of a categorization error or missing a significant concept prompts an editor to fine-tune a rule in M.A.I., adding or modifying a condition for more precise application of indexing terms.
Indexed documents stream forward to publication on IEEE’s Xplore database, where the subject metatags on the documents enable precision in search and retrieval by topic. In addition to subject metadata, subscribers can access content by searching authors, terminology from additional taxonomies, and other metadata.
The thumbnail below can be enlarged to display a diagram that shows the workflow of IEEE’s nightly automated M.A.I. processing. Click on it for the full sized PDF.