Number: 2005-049-1-700
Title: Prototype analysis of glossary terms to establish biological
context by text data mining
Task Group
Chairman: Michael
Liebman
Members: C.
Robin Ganellin, Thomas
Perun, and Paul Erhardt
Objective:
To extend the usefulness and applicability of the glossaries,
it would be worthwhile to explore methods for identifying the various
contexts in which the terms appear in the scientific literature.
Description:
A prototype project using a text data mining tool, LexiMine,
from LexiQuest, an SPSS company, will evaluate the ability to automatically,
objectively and exhaustively analyze downloaded journal articles in
terms of their syntactical construction. This analysis will generate
a concept map of all concepts within the analyzed articles and this
will be compared with the list of terms from the glossaries to establish
their presence within the literature, their interactions and relationships,
both among themselves and with other concepts, and show the link to
the original citation in the text. In this manner it will be possible
to identify and evaluate the glossary terms for their contextual extensions
of their definitions. This can be used to either develop a parallel
and complementary glossary that may be published directly or as a
web-enabled product, or to augment the existing glossaries and compendium.
The activity proposed for this prototype study will involve the selection
from one of the three problems listed below, access to and use of
any related glossaries and analysis, as described above, in two ACS
journals, namely, Biochemistry and Journal of Medicinal
Chemistry, for the years 1998-2003, inclusive.