Semantic Fingerprinting

Semantic Fingerprinting disambiguates author names and affiliations by leveraging semantic metadata. Additionally, it’s an indexing tool to capture the research profiles of authors, for improving descriptive metadata attached to their documents.

What the software does 

Semantic Fingerprinting suggests the most appropriate subject terms from a controlled vocabulary to describe a contributing author, their affiliated institutions and other relevant information – based on analysis of document text.

It then generates keyword metadata that’s been enriched with an author’s research profile for documents moving through the content management system (CMS) – it adds the author’s ‘semantic fingerprint.’ The Data Harmony semantic fingerprint captures specific scientific/academic research topics, accurately reflecting subject areas covered by the author’s publications.

Semantic Fingerprinting interface

Powered by M.A.I.™ (Machine Aided Indexer) and driven by the source files

The process is powered by M.A.I.™ and driven by a publisher’s source files – research articles, journals, conference proceedings, thesis papers or other scholarly/academic documents.

The Semantic Fingerprinting Web service data mines a publisher’s document collection, then builds a database of named authors and affiliated institutions, and then expands those authority lists over time. The author/affiliation database determines the semantic algorithms deployed by M.A.I. for matching names in incoming content objects.

Software features – leveraging semantic patterns to optimize metadata information

Resolving remaining author names after the Semantic Fingerprinting application completes its initial entity disambiguation pass on a document requires human review. The user interface supports an interactive approach boosted by clues extracted from the input text.

How the process operates: The Semantic Fingerprinting interface presents a list of unresolved entities identified by Data Harmony as probable names, that didn’t support a correct match in the author/affiliation database. A human reviewer resolves the list, using the ‘Search authors’ pane to find the correct person’s name. There are several kinds of search parameters.

Click on a questionable entity and corresponding information for that entity will display. Semantic Fingerprinting captures related information about questionable entities so you can view those clues for effective name resolution.

When the M.A.I. Concept Extractor finds clues during one disambiguation pass, the Semantic Fingerprinting application retains the related information, increasing the accuracy for correctly resolving that author’s name in a future search! Technically, with resolution of a questionable entity, the semantic fingerprint attaches pertinent subject terms to a single author name entity.

Moving beyond metadata!

Organizations can use the semantic fingerprints to build a knowledge base for many uses, including:

  • Supporting smarter search and retrieval
  • CMS applications
  • Identifying research communities
  • Marketing campaigns.

Access Innovations customizes implementation of the Web service extension

Access Innovations provides customization and administration services during configuration for the Semantic Fingerprinting Web service extension.

The graphical user interfaces (GUIs) and entity-matching algorithms are adjustable, because every data set requires a targeted approach. Regular monitoring of the output is important to maintain an optimal accuracy level for name entity disambiguation. As new semantic patterns appear in the data stream, results will change.

Author disambiguation flowchart