Notes
Slide Show
Outline
1
 
2
NewsIndexer –
a case study in filtering
  • Filters / categorizes / tags news content
  • Manages massive information flow
  • Based on Thesaurus Master and M.A.I.
    • Specialized thesaurus
    • Specialized rulebase
3
NewsIndexer’s vocabulary
  • Broad and general subject matter
  • Reflects coverage of typical news publications
  • Over 5200 terms, nine levels deep
    • Six top level categories
    • Geographic terms
  • Starter vocabulary
  • Easily adapted and customized
4
 
5
NewsIndexer’s brain
  • M.A.I. rulebase customized for news topics
  • Words in text trigger M.A.I. rules
  • Conditions in rules determine precise taxonomy term(s) to apply
    • Rules capture human knowledge and analysis
    • Rules use context to distinguish between homographs
    • Chicago Bears
    • Bear market
    • Bears in the woods
6
 
7
 
8
Filtering data –
a matter of degree
  • M.A.I. suggests terms as directed by rules
  • Index with most specific appropriate terms
9
I want it all!
  • No filtering yields ALL terms that meet conditions of M.A.I. rules
  • Editor has option to select/reject and add terms
  • Most specific appropriate term – as chosen by editor – is saved with the document
    • Subject metadata
    • XML format
10
 
11
 
12
Middle-level filtering
  • Roll up terms to the second or third level in your taxonomy
13
 
14
 
15
 
16
 
17
No details –
just the big picture
  • Index comprehensively and retain details
  •                   BUT
  • Display only general terms for end user
18
 
19
 
20
Filtering to disambiguate
  • Common words used with very different meanings in different contexts
    • Utilities –
    • electricity / water / sewer?
    • utility software?
    • Architecture –
    • of buildings?
    • of computer systems?
  • M.A.I. rule conditions differentiate concepts
    • Information Architect doesn’t want to retrieve building blueprints
21
Filtering by user profile
  • User expresses interest in general topics
    • e.g., Technology, Environment, Law
  • Materials indexed with those topics or any or their Narrower Terms are forwarded
22
Filtering IPTC terms through NewsIndexer
  • International Press Telecommunications Council (IPTC) proposal for NewsCodes
  • Part of News Industry Text Format (NITF)
  • ~1300 terms describe topics of news articles
  • Broad coverage (heavy on sports)


  • NewsIndexer rulebase can apply detailed
  • NewsIndexer terms and/or IPTC NewsCodes
  • Comply with growing news standards
  • Achieve greater detail for news indexing
23
Filtering advantages
  • For the End User
    • Simpler, more manageable presentation of concepts
    • Consistent with typical user’s search strategy
    • Differentiated concepts associated with homographs
    • Targeted information according to user profile
  • For the Internal User
    • Documents retain subject metadata reflecting granular indexing
    • Precision search gets precision results
24