Case Study – Institute of Electrical and Electronics Engineers (IEEE)
IEEE needed to create a five-year update to its thesaurus The thesaurus is used both in indexing the IEEE publications and to enable improved searching of IEEE’s vast repository of published research and other documentation. The new taxonomy would also be used to structure the content for improved navigation and repurposing, designed to improve membership retention and increase revenues.
IEEE is the world’s largest engineering association and publishes across all facets of engineering. The organization includes 38 specialized societies and has more than 375,000 members. IEEE publishes over 1300 standards, and has more than two million documents in its electronic Library. More than 850 conferences are sponsored by IEEE each year.
Using Data Harmony’s Thesaurus Master and M.A.I. combined into MAIstro, Access Innovations taxonomists reviewed the uncontrolled term lists from the IEEE, six years of all journals, search logs, and other term collections. They also performed extensive hierarchical structuring of the thesaurus to reflect the many changes in the engineering fields over the past few years. The editors identified appropriate new terms and placed them into the thesaurus.
The new taxonomy, which reflects the professional terminology used by IEEE’s more than 40 separate interest groups, provides another basis for guided navigation of the content pool, in addition to the INSPEC-controlled terms presently used. The screenshot shows part of the thesaurus as it appears in Access Innovations’ Data Harmony software.
So that the new thesaurus could be deployed to increase productivity in the IEEE editorial staff, Access Innovations refined the default rules in the M.A.I. rulebase associated with the thesaurus, and added new rules. The screenshot illustrates a rule developed by Access Innovations editorial staff to refine the accuracy and expand the completeness of automated indexing based on the thesaurus.
Elisabeth Moscara of IEEE has commented, “I want to take this time to thank Margie’s and Jay’s staff at Access Innovations for their hard work and professionalism in providing IEEE with an updated thesaurus. They were very helpful in answering any questions I had and provided me with timely e-mails and reports. They also delivered the initial draft of the 2006 IEEE Thesaurus one week earlier than projected!”
IEEE uses the thesaurus to increase findability within the more than two million documents on their Xplore database. The subject metatags on the documents enable both increased precision and recall, improving the user experience. The immediate benefit is the increased productivity in the indexing staff, who are faced with an ever-increasing load of information to process into the database. Adding metadata to the documents will enable faceted searches for users and efficient sorts for repurposing document subsets.
IEEE Indexing Group
The IEEE Indexing Group is responsible for daily processing from 200 to 3,000 new research articles for the Xploreonline delivery platform, generating the metadata needed for each article.
IEEE maintains its thesaurus of approximately 5,000 terms and nearly 2,000 synonyms in Thesaurus Master. The taxonomy tool is synchronized and integrated with the categorization rulebase in M.A.I. The rules in M.A.I. govern application of the taxonomy terms to the stream of incoming documents as well as to IEEE’s vast store of legacy documents. The rules are initially generated programmatically, and IEEE editors can fine-tune rules to capture new expressions of taxonomy concepts.
To manage the indexing vocabulary and categorization for the wealth of electronic documents and to promote productivity in the Indexing Group, IEEE uses Access Innovations’ Data Harmony MAIstro™.
In the IEEE indexing workflow, content streams into IEEE’s Digital Asset Management System to start processing. Programmatic checks filter records to pass through material with certain metadata entries automatically applied. For automatic subject indexing by taxonomy categories, selected metadata are processed by M.A.I., including the publication title, abstract and first paragraph of XML articles. PDF documents are preprocessed to extract text and then indexed.
Of the 180,000 articles auto-indexed annually, the editorial team performs a quality review on about 15,000-20,000, or 10 to 200 in a day. At this point, they may catch the odd error such as the taxonomy term “Gold” as a chemical element being suggested by the phrase “the gold standard.” Discovery of a categorization error or missing a significant concept prompts an editor to fine-tune a rule in M.A.I., adding or modifying a condition for more precise application of indexing terms.
Indexed documents stream forward to publication on IEEE’s Xplore database, where the subject metatags on the documents enable precision in search and retrieval by topic. In addition to subject metadata, subscribers can access content by searching authors, terminology from additional taxonomies, and other metadata.
The thumbnail below can be enlarged to display a diagram that shows the workflow of IEEE’s nightly automated M.A.I. processing. Click on it for the full sized PDF.