Part of Advances in Neural Information Processing Systems 17 (NIPS 2004)
Rion Snow, Daniel Jurafsky, Andrew Ng
Semantic taxonomies such as WordNet provide a rich source of knowl- edge for natural language processing applications, but are expensive to build, maintain, and extend. Motivated by the problem of automatically constructing and extending such taxonomies, in this paper we present a new algorithm for automatically learning hypernym (is-a) relations from text. Our method generalizes earlier work that had relied on using small numbers of hand-crafted regular expression patterns to identify hyper- nym pairs. Using "dependency path" features extracted from parse trees, we introduce a general-purpose formalization and generalization of these patterns. Given a training set of text containing known hypernym pairs, our algorithm automatically extracts useful dependency paths and applies them to new corpora to identify novel pairs. On our evaluation task (de- termining whether two nouns in a news article participate in a hypernym relationship), our automatically extracted database of hypernyms attains both higher precision and higher recall than WordNet.