Access Innovations White Paper

The Notation Module in Thesaurus Design
An innovative view of Thesaurus Hierarchies

by Scott Denning

Traditional thesaurus design has necessarily been a matter of alphabetization of the terms. While different views – permuted, hierarchical, rotated – offer different insights into the relationships of the terms in the thesaurus, the structure within these views is still ultimately alphabetical.

Though alphabetization is at the core of organization in language, the necessity to hold to an alphabetical structure may lead to reluctant and sometimes unwieldy compromises on the part of thesaurus designers. Non-alphabetical expressions and uses of a thesaurus have often necessitated the application of a separate term management program, and thesaurus designers have had to be mindful of the program(s) which might be used to output the thesaurus.

With the creation of the Notation Module for thesaurus construction, Access Innovation's Thesaurus Master™ presents a new way to approach thesaurus design. The Notation Module allows user-assigned notations, either alphabetical or numerical, to be prepended onto thesaurus terms, while still maintaining the ability to view the thesaurus in traditional alphabetized view. Thus, a thesaurus may be structured in Notation View for the unique requirements of a user, while still being available in a Hierarchy View consistent with ISO/NISO and ANSI standards. Based upon customer requests, the Notation Module is designed to work as part of Access Innovation's Data Harmony suite of thesaurus management products.

The ability to annotate thesaurus terms opens up new possibilities in thesaurus design. While notation of thesaurus terms is not a new concept, the Notation Module allows annotated forms to exist in parallel with traditional structures, allowing thesaurus structures to more accurately reflect some of the structures found in business taxonomies and process structures.

For example, a thesaurus structured alphabetically

might be annotated numerically to give “Physics” order primacy in the thesaurus:

Note that the order of the above hierarchy is based upon the prepended numerical notation, and not alphabetized by the initial letters of the terms. Also note that the second level of terms is similarly freed from alphabetization; “Nuclear Physics” is given primacy over “Astrophysics”.

The Notation Module also allows the notations to be alphabetical rather than numeric:

An advantage of using alphabetical notation is familiarity on the part of the user (an inherent A-Z structure is understood), and using the alphabet necessarily implies a limitation of 26 Top Terms. Conversely, a disadvantage of using alphabetical notation is that, for larger thesauri, having only combinations of 26 notation elements available may be limiting, and notation for the secondary and narrower levels may become cumbersome and non-intuitive. For larger thesauri, the numerical notation may be preferable. Alphanumeric blendings for notations may be enabled on a project-specific basis.

Prepended notations allow great flexibility in the structure of a thesaurus, allowing it to reflect:

Other advantages of notation

A very important advantage of notation is the intuitive positional relationship when a term record is viewed in isolation, especially valuable when considering terms from very large thesauri. The notation contains information which gives the user insight into the hierarchical position of the term: “2.2.3: Chemical reactions” immediately indicates that this is a third-level term, under the top term annotated with “2”. Notation facilitates a thesaurus being dynamically responsive. As priorities change within a process, system, or organization the thesaurus may easily be changed in kind. As designed to work with Data Harmony, the Notation module supports these changes by automatically allowing the re-numeration of notations – when a branch in the thesaurus is moved, the notations for the term and all its children are changed to reflect the first available open number (or a numbering system may be assigned by the user.) This simplifies re-structuring a thesaurus to reflect the changing needs of a business or industry. Notations may be used to reflect national or regional application or sources of thesauri. A company with branches around the world may have a version or section of a thesaurus for North American usage, one for the U.K., etc. Where this is the case, notation can be used to differentiate from which of the thesauri (or section of a thesaurus) a particular term originates. This also suggests utility in multilingual thesauri.

It should be noted that, while notation allows freedom from strict alphabetical structure, a thesaurus can be both annotated and alphabetized, to exploit the strengths of both systems, as below:

Addressing the confusion for non-taxonomists

One common confusion experienced by non-taxonomists is trying to come to grips with the fact that the structure of a classical thesaurus may bear little resemblance to the final form the average user sees. “Translated” through a variety of programs, thesauri typically exist in the background; most users rarely if ever see the hierarchy tree. However, personnel not specifically trained as taxonomists are sometimes called upon to examine or maintain taxonomies, and the structures which taxonomists utilize daily may not be readily grasped by the untrained. An advantage of the Notation Module is that it allows a View of the thesaurus which can directly correspond with the output view. Thus, the Notation View for a thesaurus may be designed as the “User View”, while still making available a classic hierarchical view for experienced taxonomists.

Prepended notations are particularly valuable for users who have entered a thesaurus or retrieved a thesaurus term by way of a keyword search. Depending upon the search engine used, the retrieved term record may have information regarding broader and narrower terms, relationships, etc. But the term record viewed in isolation may not provide sufficient information for the average user to gain insight into the term's relative position in a large thesaurus. Notation such as “4.1.3.6: Forensic science equipment” immediately informs the user that this term is fourth-level. If effect, such notation reflects the “thread” of the term's branching back to the Top Term, a structure readily familiar to many computer users. The numerical structure is also familiar to many from reports, catalog listings, etc., and therefore may be more readily intuitive.

Non-taxonomists exposed to thesaurus hierarchies are sometimes perplexed by the equivalence given to top terms, and the lack of primacy of the subject of the thesaurus: “If this thesaurus is about mammals, why isn't it at the beginning of the top term list?” Similarly, a thesaurus on the steps in a process, while detailing equivalent steps and their narrower steps, may in traditional view be quite bewildering.

For example, the simple thesaurus for the steps involved in operating a lawn mower, while all of equal importance for the process, when alphabetized do not reflect the process flow. An annotated version of the same thesaurus constitutes step-by-step instructions, while still maintaining the equivalency of the terms.

  Classic, alphabetical view Notation view
  Fuel check 1: Fuel check
  Lubricant check 2: Lubricant check
  Mower propulsion 3: Safety gear check
  Obstacle check      3.1: Ear protection
  Pull cord operation      3.1: Eye protection
  Safety bar engagement 4: Obstacle check
  Safety gear check 5: Safety bar engagement
       Ear protection 6: Pull cord operation
       Eye protection 7: Mower propulsion