The American Society of Civil Engineers (ASCE) hired Access Innovations to construct a thesaurus and rule base for the manual indexing of the Society’s document corpus – the controlled vocabulary contained 2406 Preferred Terms. Over time, many subject terms were added, as civil engineering disciplines evolved. Eventually, the rule base needed corresponding refinements, to reflect ASCE’s expanding terminology as captured in the form of these new terms.
Client Information Needs
One major challenge for making the thesaurus refinements was working with an active knowledge base for the project’s duration. ASCE and their membership would continue to rely on both thesaurus and rule base, to support daily activities of the organization. For this reason, taking the rule base offline for a period of time to make detailed adjustments was out of the question; the Access team would complete the update process while maintaining continuity of the knowledge base for ongoing document indexing.
Further, since their subject matter experts and document curation team had vetted and thoroughly discussed all subject terms added since initial thesaurus development, ASCE didn’t want Access Innovations to make any changes to the terms.
Initially, this approach was an obstacle, because a commonly-used, effective way to adjust indexing rules is modifying relevant thesaurus term records, then leveraging Data Harmony software to convert the batch of term adjustments into rule base modifications. Without the authority to refine thesaurus terms, that method wouldn’t be available.
Input documents were often found to contain non-typical instances of syntax. The language of ASCE’s content is highly specialized, and many authors contributing to their collection are non-native English speakers. In a large percentage of the articles, these two factors meant that normal sentence structures exploited by Access Innovations taxonomists to refine rules were not present.
On a related note, concepts in the literature often appeared in a ‘passive voice’ in the flow of text, rather than the ‘active voice’ that thesaurus term-names utilize.
Another semantic characteristic of the input data: adjectives for use in rule conditions to disambiguate key indexing words (‘text-to-matches’) were often much farther away from the keyword than would normally be expected – sometimes several paragraphs or even pages away. Access Innovations had to find a way to increase indexing accuracy without tools and tricks normally deployed to refine rule bases, and complete the job in eight weeks.
Access Innovations overcame these challenges to answer the client’s information needs, focusing on high frequency terms in the thesaurus. If a certain word was contained in several thesaurus terms, that word was targeted for intensive disambiguation in the rule base.
The Access team applied proximity rule conditions with adaptability. Editors found new patterns in the data to inform proximity indicator usage, and they generated creative strategies to refine the rules accordingly.
Because the ASCE staff members were also simultaneously adding and editing rules, this project required daily coordination and constant feedback between Access Innovations staff and their contacts at ASCE. This level of collaboration proved to be quite fruitful for both teams; the ASCE team was able to point out very specific issues within their literature for targeted treatment, and Access Innovations staff members taught them how to use MAIstro creatively, to solve these kinds of issues within their production environment.
- The Access Innovations team zeroed in on problem terms for maximum production efficiency and generating desired indexing results
- The ASCE team learned how to exploit the power of the rule base in synergistic integration with their finely-crafted thesaurus terms