Notes
Slide Show
Outline
1
Taxonomies
and Metadata
for Information
Architecture
  • Alice Redmond-Neal
  • Thesaurus Development Manager
  • Access Innovations, Inc. - Booth 217
  • ared@accessinn.com


  • Internet Librarian 2005
2
What we’ll cover
  • Key definitions
    • Taxonomies, metadata, information architecture
  • How taxonomies and metadata influence information architecture
  • Using taxonomies to enhance retrieval




3
Key points
  • A taxonomy provides both a browsable outline and descriptive metadata.
  • Metadata provide efficient searchable handles for content.
  • Taxonomy-based subject metadata yields the most precise retrieval.
  • Taxonomy is the basis for Information architecture.
  • Information architecture that takes full  advantage of taxonomy and subject metadata supports findability.


4
What’s a taxonomy?
  • Words
    • Controlled vocabulary for a subject area
  • Descriptive labels
  • Hierarchy
    • Simple hierarchical view of a thesaurus
  • Knowledge organization system
  • Storage and retrieval aid
5
Info retrieval starts with a
knowledge organization system
  • Uncontrolled list
  • Name authority file
  • Synonym set/ring
  • Controlled vocabulary
  • Taxonomy
  • Thesaurus
  • Ontology
  • Semantic network
6
Structure of
controlled vocabularies
7
Taxonomy? Thesaurus?
  • Often used interchangeably
  • Thesaurus is a taxonomy with extras
    • Related Terms
    • Nonpreferred Terms (USE/Used for)
    • Scope Notes
    • more
  • Use the word your audience understands
    • Avoid confusion with Roget’s thesaurus


8
 
9
Basic taxonomy / thesaurus features
  • Hierarchy structure
    • Broader Terms = more general concepts
    • Narrower Terms = more specific concepts
  • Related Terms = conceptual cousins
  • Term equivalents
  • Facets
  • Scope notes
  • Other elements as needed


10
Perspectives on taxonomies
  • Taxonomist                                                     (aka Lexicographer, Thesaurus builder)
  • Indexer
  • Information architect
  • Searcher


  • Each has a different view and need for words in retrieving information.
  • Each need relates to using a taxonomy for  indexing / categorizing content.


11
Taxonomies for
information retrieval online
  • Conceptual framework for web content – reflects organization of knowledge in a domain
  • Foundation for information architecture
  • Term records contain valuable info
  • Often 3 levels deep – depends on domain
  • May be displayed in full or part, modified,  or hidden
12
Taxonomy display
depends on purpose
  • Descriptive taxonomy
    • Includes term variants, synonyms, nonpreferred terms
    • Query term expansion links synonyms to valid taxonomy term
    • Supports discovery through hierarchy, Related Terms
    • Used primarily at indexing stage of content mgmt workflow to categorize documents
  • Navigational taxonomy
    • Reflects user’s mental model
    • Reflects user’s vernacular
    • Supports discovery through browsing
    • May be modified version of full taxonomy


13
"Taxonomy"
  • Taxonomy
  • provides a way to describe the content --
  • the basis for subject metadata


  • Metadata
  • provide a way for that description to be
  • captured for a website
14
What’s metadata?
  • “When it comes to definitions, metadata is a  slippery fish.”
  • Data about data
    • Tags used to describe documents, pages, images, software, video and audio files, and other content objects for the purposes of improved navigation and retrieval
  • Finding tool
    • Keywords not displayed to the viewer but available to search engines
  • Viewable in HTML keyword meta tag field of most web sites


15
Data about data - like what?
  • Title
  • Author name
  • Date of creation
  • Language used in the creation
  • Publisher
  • Subject of the creation
  • Keywords... our focus re: taxonomies
  • Other stuff, depending on need
16
How does metadata work?
  • Search engine / web crawler looks at the HTML header on a web page
    • View à Page source
  • Subject Metadata is one part of the HTML header


  • <META NAME="KEYWORDS" CONTENT= … >


17
 
18
 
19
Taxonomy terms as metadata
  • Most precise topic identifiers –
  • 100% relevant
  • Searchable as metadata
    • Gives more precise results than free text search – if you know what you’re looking for
    • Prevents hit on random occurrence of your query word
20
What’s Information Architecture?
  •    The art and science of structuring and classifying web sites and intranets to help people find and manage information


21
 
22
What’s an Information Architect?
23
What IA is not
  • Graphic / visual design
  • Software development
  • Content management
  • Knowledge management
  • Coding (HTML, etc.)
  • Usability engineering
  • Library science
24
Information Architecture –
major components
  • Taxonomies
  •  Metadata


  • Organization
  • Search
  • Labeling
  • Navigation
25
1–Taxonomies aid site organization
  • Taxonomy provides
  • Framework for content organization
  • Hierarchical outline of your content by subject categories
  • Basis for faceted browsing
26
 
27
Value of Category search
  • Searchers find info 50% faster using browsable categories than using list returned from free text search
    • Results even stronger when results not in top 20 returns
  • Searchers prefer browsable category search
  • Chen, H., and Dumais, S.
28
MediaSleuth – displaying
taxonomy categories improves IA
  • MediaSleuth is:
  • Online source of educational media
    • Videos, software, audio, etc.
    • Over 96,000 products, nearly 64,000 titles
  • Based on NICEM database (National Information Center for Educational Media)


29
 
30
 
31
 
32
 
33
Taxonomy terms on documents help sort and organize the content
  • M.A.I. suggests the correct terms from the taxonomy as descriptors
  • M.A.I. rulebase recognizes term equivalents
    • germs à Microorganisms
    • vaccin* à Pharmaceutical drugs



34
Taxonomy descriptors
become subject metadata
  • Selected descriptors are XML-tagged and stored with document
  • Descriptors available as webpage metadata
  • Metatags enable precise document retrieval
  • Term equivalence enables query expansion in search (coming)
35
Search: body growth
  • 1,100 document sample
  • Category search results
    • 3 hits

  • Complete database
  • Free text search
    • 8 hits — some irrelevant
  • Free text search on titles
    • 6 hits — limited recall
  • Search by taxonomy descriptor (AKA category)
    • 470 hits
      • 100% relevant
      • 100% recall
36
Sidebar: Recall, Precision,
and Relevance
37
 
38
 
39
 
40
 
41
Facets offer finer organization
  • Add details about any term
    • Pre-established aspects that pertain to each item
  • Cross-cut a taxonomic hierarchy
  • Basis for fine-tuning search results
    • Market group / audience
    • Price
    • Color
    • Size range
    • Source / company
    • Other attributes, varying by domain and need


42
 
43
 
44
Alternative ways to display
content organization
  • Alphabetically
  • Chronologically
  • Geographically
  • Permuted list of taxonomy terms
    • Content management system
    • management system, Content
    • system, Content management
45
2–Taxonomies aid search
  • Taxonomy provides
  • Authority terms of a controlled vocabulary
  • Synonyms and other alternative expressions
    • Typos (lathes, laiths, laths, layth…)
    • Obsolete names (Cooper’s plane / Lamb’s tongue)
    •    à Query expansion
46
 
47
 
48
 
49
 
50
 
51
 
52
 
53
3–Taxonomies aid labeling
  • Taxonomy provides
  • Basis for labels on site/portal
  • Concepts that can be re-worded for audience
54
 
55
Adapt taxonomy terms for labeling
  • What words do users use? Gather variants from
    • Search logs
    • User focus groups
    • Subject matter experts
  • Tailor site/portal labels to typical users
  • Include variants as Nonpreferred terms (USE/Used for equivalents) in taxonomy
    • M.A.I. can also capture variants as rules without formalizing them as Nonpreferred terms


56
4–Taxonomies aid navigation
  • Taxonomy provides
  • Major categories
  • Expansion to Narrower Terms
  • Additional term information
57
 
58
 
59
Integrate taxonomy
to enhance findability
  • Browsable categories of a directory
    • Browsable faceted navigation
  • Smart search for term equivalents
  • Taxonomy terms (original or modified) as labels
  • Navigation aids incorporate taxonomy terms and relationships
60
Use software tools to support IA
  • Thesaurus creation / management tools
    • ANSI/NISO standards compliant
    • Support features you need
      • Customizable fields
      • Import ability
  • Categorization tools
    • Human / automatic / hybrid categorizer
  • Content management systems
61
"Foundation of information"
  • Foundation of information    architecture


  •  Source of subject metadata


  •  Path to portal usability


62
Recap
  • Taxonomies and metadata are cornerstones of information architecture
  • Taxonomies are the basis for content organization
  • Taxonomies provide a browsable outline of your content
  • Subject metadata using taxonomy terms yield 100% relevant retrieval
  • Taxonomies are the basis for search, labeling, and navigation in information architecture
  • Tools that recognize synonyms (query expansion) improve taxonomy implementation
63
References
  • Aitchison, J., Gilchrist, A., and Bawden, D. Thesaurus Construction and Use: A Practical Manual (4th edition). Aslib, 2000
  • Chen, H., Dumais, S., Bringing order to the web: automatically categorizing search results. Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI'00), ACM (2000) 145-152.
  • Rosenfeld, L., and Morville, P. Information Architecture for the World Wide Web. O'Reilly, 1998.
  • Sullivan, D., Proven Portals: Best Practices for Planning, Designing, and Developing Enterprise Portals. Addison Wesley, 2003


64
Thank you!
 
Questions?