{"title": "Kernels for Structured Natural Language Data", "book": "Advances in Neural Information Processing Systems", "page_first": 643, "page_last": 650, "abstract": "", "full_text": "Kernels for Structured Natural Language Data\n\nJun Suzuki, Yutaka Sasaki, and Eisaku Maeda\n\nNTT Communication Science Laboratories, NTT Corp.\n\n2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237 Japan\nfjun, sasaki, maedag@cslab.kecl.ntt.co.jp\n\nAbstract\n\nThis paper devises a novel kernel function for structured natural language\ndata.\nIn the \ufb01eld of Natural Language Processing, feature extraction\nconsists of the following two steps: (1) syntactically and semantically\nanalyzing raw data, i.e., character strings, then representing the results\nas discrete structures, such as parse trees and dependency graphs with\npart-of-speech tags; (2) creating (possibly high-dimensional) numerical\nfeature vectors from the discrete structures. The new kernels, called Hier-\narchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs\nwhose nodes can contain DAGs. HDAG data structures are needed to\nfully re\ufb02ect the syntactic and semantic structures that natural language\ndata inherently have. In this paper, we de\ufb01ne the kernel function and\nshow how it permits ef\ufb01cient calculation. Experiments demonstrate that\nthe proposed kernels are superior to existing kernel functions, e.g., se-\nquence kernels, tree kernels, and bag-of-words kernels.\n\n1\n\nIntroduction\n\nRecent developments in kernel technology enable us to handle discrete structures, such as\nsequences, trees, and graphs. Kernel functions suitable for Natural Language Processing\n(NLP) have recently been proposed. Convolution Kernels [4, 12] demonstrate how to build\nkernels over discrete structures. Since texts can be analyzed as discrete structures, these\ndiscrete kernels have been applied to NLP tasks, such as sequence kernels [8, 9] for text\ncategorization and tree kernels [1, 2] for (shallow) parsing.\n\nIn this paper, we focus on tasks in the application areas of NLP, such as Machine Trans-\nlation, Text Summarization, Text Categorization and Question Answering. In these tasks,\nricher types of information within texts, such as syntactic and semantic information, are re-\nquired for higher performance. However, syntactic information and semantic information\nare formed by very complex structures that cannot be written in simple structures, such as\nsequences and trees. The motivation of this paper is to propose kernels speci\ufb01cally suited to\nstructured natural language data. The proposed kernels can handle several of the structures\nfound within texts and calculate kernels with regard to these structures at a practical cost\nand time. Accordingly, these kernels can be ef\ufb01ciently applied to learning and clustering\nproblems in NLP applications.\n\n\f(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)\n\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n\n(cid:10) (cid:3)(cid:11) (cid:2)\n(cid:10) (cid:3)(cid:11) (cid:2)\n\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n\n(cid:3)(cid:12)\n(cid:3)(cid:12)\n\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n\n(cid:11) (cid:2)(cid:16)\n(cid:11) (cid:2)(cid:16)\n\n(cid:12) (cid:5)(cid:30)\n\n(cid:29) (cid:2)(cid:16)\n\n# (cid:2)(cid:16)\n\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n\n(cid:10) (cid:3)(cid:11) (cid:2)\n(cid:10) (cid:3)(cid:11) (cid:2)\n\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n\n(cid:3)(cid:12)\n(cid:3)(cid:12)\n\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n\n(cid:2)(cid:16)\n(cid:2)(cid:16)\n\n(cid:12) (cid:5)(cid:30)\n\n(cid:29) (cid:2)(cid:16)\n\n(cid:17) (cid:6)(cid:14)\n\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n\n(cid:10) (cid:3)(cid:11) (cid:2)\n(cid:10) (cid:3)(cid:11) (cid:2)\n\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n(cid:12) (cid:2)(cid:8)\n(cid:12) (cid:2)(cid:8)\n\n(cid:15) (cid:3)(cid:15) (cid:3)(cid:14)\n\n(cid:3)(cid:12)\n(cid:3)(cid:12)\n\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n\n(cid:11) (cid:2)(cid:16)\n(cid:11) (cid:2)(cid:16)\n\n(cid:24) (cid:2)(cid:5)(cid:8)\n(cid:24) (cid:2)(cid:5)(cid:8)\n\n(cid:14) (cid:6)(cid:17)\n\n(cid:15) (cid:3)(cid:9)\n\n(cid:29) (cid:3)(cid:8)\n\n(cid:16) (cid:2)(cid:11) (cid:6)(cid:17)\n\n(cid:15) (cid:3)(cid:2)(cid:8)\n\n(cid:11) (cid:2)(cid:6)(cid:29)\n\n! (cid:3)(cid:9)\n\n(cid:15) (cid:3)(cid:2)(cid:8)\n\n\u2019 (cid:2)(cid:11)\n\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n\n(cid:10) (cid:3)(cid:11) (cid:2)\n\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n\n(cid:3)(cid:12)\n\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n\n(cid:11) (cid:2)(cid:16)\n\n(cid:12) (cid:5)(cid:30)\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n\n(cid:29) (cid:2)(cid:16)\n(cid:10) (cid:3)(cid:11) (cid:2)\n(cid:10) (cid:3)(cid:11) (cid:2)\n\n(cid:8) (cid:2)(cid:5)(cid:8)\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n\n(cid:10) (cid:5)(cid:8)\n\n(cid:3)(cid:12)\n(cid:3)(cid:12)\n\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n\n(cid:2)(cid:16)\n(cid:2)(cid:16)\n\n(cid:12) (cid:5)(cid:30)\n\n(cid:29) (cid:2)(cid:16)\n\n(cid:11) (cid:5)(cid:9)\n\n(cid:15) (cid:5)(cid:11)\n\n(cid:25) (cid:4)(cid:14)\n\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n(cid:7) (cid:5)(cid:8) (cid:3)(cid:9)\n\n(cid:10) (cid:3)(cid:11) (cid:2)\n(cid:10) (cid:3)(cid:11) (cid:2)\n\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:3)\n\n(cid:3)(cid:12)\n(cid:3)(cid:12)\n\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n(cid:11) (cid:3)(cid:6)(cid:14) (cid:6)(cid:3)(cid:8) (cid:3)(cid:12)\n\n(cid:2)(cid:16)\n(cid:2)(cid:16)\n\n(cid:1)(cid:17) (cid:5)(cid:20)\n\n(cid:21) (cid:2)(cid:12) (cid:9)\n\n(cid:1)(cid:2)(cid:3)(cid:2)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:2)(cid:9)\n\n(cid:1)(cid:2)(cid:3)(cid:2)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:2)(cid:10)\n\n(cid:11) (cid:7)(cid:12) (cid:2)(cid:4)(cid:6)(cid:13)\n\n(cid:12) (cid:9)\n\n(cid:1)(cid:14)\n\n(cid:1)(cid:14)\n\n(cid:15) (cid:7)(cid:16)\n(cid:15) (cid:7)(cid:16)\n\n(cid:13) (cid:5)(cid:17) (cid:6)(cid:12)\n(cid:16) (cid:6)(cid:7)(cid:13)\n\n(cid:19) (cid:9)\n(cid:17) (cid:9)\n\n(cid:18) (cid:2)(cid:10)\n\n(cid:11) (cid:2)\n\n(cid:8) (cid:4)(cid:2)(cid:10)\n\n(cid:1)(cid:17) (cid:5)(cid:20)\n\n(cid:21) (cid:2)(cid:12) (cid:9)\n\n(cid:1)(cid:2)(cid:3)(cid:2)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:2)(cid:9)\n\n(cid:1)(cid:14)\n\n(cid:15) (cid:7)(cid:16)\n\n(cid:19) (cid:9)\n\n(cid:13) (cid:5)(cid:17) (cid:6)(cid:12)\n(cid:12) (cid:4)(cid:10)\n\nFigure 1: Examples of structures within texts as determined by basic NLP tools\n\n2 Structured Natural Language Data for Application Tasks in NLP\n\nIn general, natural language data contain many kinds of syntactic and semantic structures.\nFor example, texts have several levels of syntactic and semantic chunks, such as part-of-\nspeech (POS) chunks, named entities (NEs), noun phrase (NP) chunks, sentences, and dis-\ncourse segments, and these are bound by relation structures, such as dependency structures,\nanaphora, discourse relations and coreference. These syntactic and semantic structures can\nprovide important information for understanding natural language and, moreover, tackling\nreal tasks in application areas of NLP. The accuracies of basic NLP tools such as POS tag-\ngers, NP chunkers, NE taggers, and dependency structure analyzers have improved to the\npoint that they can help to develop real applications.\n\nThis paper proposes a method to handle these syntactic and semantic structures in a single\nframework: We combine the results of basic NLP tools to make one hierarchically struc-\ntured data set. Figure 1 shows an example of structures within texts analyzed by basic NLP\ntools that are currently available and that offer easy use and high performance. As shown\nin Figure 1, structures in texts can be hierarchical or recursive \u201cgraphs in graph\u201d. A cer-\ntain node can be constructed or characterized by other graphs. Nodes usually have several\nkinds of attributes, such as words, POS tags, semantic information such as WordNet [3],\nand classes of the named entities. Moreover, the relations between nodes are usually di-\nrected. Therefore, we should employ a (1) directed, (2) multi-labeled, and (3) hierarchically\nstructured graph to model structured natural language data.\n\nLet V be a set of vertices (or nodes) and E be a set of edges (or links). Then, a graph\nG = (V; E) is called a directed graph if E is a set of directed links E (cid:26) V (cid:2) V .\n\nDe\ufb01nition 1 (Multi-Labeled Graph) Let (cid:0) be a set of labels (or attributes) and M (cid:26) V (cid:2)(cid:0)\nbe label allocations. Then, G = (V; E; M ) is called a multi-labeled graph.\n\nDe\ufb01nition 2 (Hierarchically Structured Graph) Let Gi = (Vi; Ei) be a subgraph in G =\n(V; E) where Vi (cid:18) V and Ei (cid:18) E, and G = fG1; : : : ; Gng be a set of subgraphs in G.\nF (cid:26) V (cid:2) G represents a set of vertical links from a node v 2 V to a subgraph Gi 2 G.\nThen, G = (V; E; G; F ) is called a hierarchically structured graph if each node has at most\none vertical edge. Intuitively, vertical link fi;Gj 2 F from node vi to graph Gj indicates\nthat node vi contains graph Gj.\n\nFinally, in this paper, we successfully represent structured natural language data by using a\nmulti-labeled hierarchical directed graph.\n\nDe\ufb01nition 3 (Multi-Labeled Hierarchical Directed Graph) G = (V; E; M; G; F ) is a\nmulti-labeled hierarchical directed graph.\n\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:11)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:7)\n(cid:7)\n(cid:19)\n(cid:19)\n(cid:20)\n(cid:19)\n(cid:18)\n(cid:19)\n(cid:19)\n(cid:21)\n(cid:19)\n(cid:19)\n(cid:21)\n(cid:19)\n(cid:19)\n(cid:21)\n(cid:22)\n(cid:23)\n(cid:7)\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:11)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:7)\n(cid:7)\n(cid:19)\n(cid:19)\n(cid:20)\n(cid:19)\n(cid:18)\n(cid:19)\n(cid:19)\n(cid:21)\n(cid:19)\n(cid:19)\n(cid:21)\n(cid:19)\n(cid:19)\n(cid:21)\n(cid:22)\n(cid:23)\n(cid:7)\n(cid:15)\n(cid:11)\n(cid:25)\n(cid:21)\n(cid:14)\n(cid:11)\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:15)\n(cid:11)\n(cid:25)\n(cid:21)\n(cid:14)\n(cid:11)\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:19)\n(cid:21)\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:11)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:19)\n(cid:21)\n(cid:19)\n(cid:21)\n(cid:19)\n(cid:21)\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:11)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:19)\n(cid:21)\n(cid:19)\n(cid:21)\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:11)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:11)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:26)\n(cid:27)\n(cid:28)\n(cid:29)\n(cid:11)\n(cid:14)\n(cid:15)\n(cid:29)\n(cid:17)\n(cid:29)\n(cid:29)\n(cid:13)\n(cid:10)\n(cid:11)\n(cid:17)\n(cid:12)\n(cid:14)\n(cid:29)\n(cid:9)\n(cid:31)\n(cid:14)\n(cid:11)\n(cid:26)\n \n(cid:28)\n(cid:29)\n(cid:11)\n(cid:14)\n(cid:15)\n(cid:29)\n(cid:17)\n(cid:29)\n!\n(cid:14)\n(cid:13)\n(cid:14)\n(cid:8)\n!\n(cid:14)\n(cid:8)\n(cid:9)\n(cid:25)\n(cid:29)\n(cid:12)\n(cid:15)\n(cid:14)\n(cid:29)\n(cid:17)\n(cid:8)\n(cid:17)\n(cid:30)\n(cid:11)\n(cid:26)\n\"\n(cid:28)\n(cid:11)\n(cid:14)\n(cid:15)\n(cid:29)\n(cid:17)\n(cid:29)\n(cid:13)\n(cid:17)\n(cid:11)\n(cid:15)\n#\n(cid:12)\n(cid:13)\n(cid:14)\n(cid:14)\n(cid:9)\n(cid:10)\n(cid:29)\n(cid:15)\n(cid:17)\n$\n$\n(cid:14)\n(cid:11)\n(cid:26)\n%\n(cid:28)\n(cid:29)\n(cid:11)\n(cid:14)\n(cid:15)\n(cid:29)\n(cid:17)\n(cid:29)\n(cid:8)\n!\n(cid:29)\n(cid:14)\n(cid:8)\n(cid:12)\n(cid:29)\n(cid:15)\n(cid:17)\n$\n$\n(cid:14)\n(cid:11)\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:13)\n(cid:15)\n(cid:14)\n(cid:7)\n(cid:17)\n(cid:13)\n(cid:17)\n(cid:8)\n(cid:18)\n(cid:26)\n&\n(cid:28)\n(cid:29)\n(cid:12)\n(cid:8)\n(cid:29)\n(cid:16)\n(cid:17)\n(cid:11)\n(cid:25)\n(cid:29)\n(cid:26)\n(cid:14)\n$\n(cid:18)\n(cid:29)\n!\n#\n(cid:19)\n(cid:14)\n(cid:15)\n(cid:28)\n(cid:17)\n(cid:10)\n(cid:18)\n(cid:17)\n(cid:10)\n(cid:17)\n(cid:7)\n(cid:8)\n(cid:9)\n(cid:10)\n(cid:7)\n(cid:11)\n(cid:7)\n(cid:12)\n(cid:7)\n(cid:13)\n(cid:14)\n(cid:15)\n(cid:15)\n(cid:16)\n(cid:16)\n(cid:17)\n(cid:16)\n(cid:16)\n(cid:18)\n(cid:19)\n(cid:20)\n(cid:15)\n(cid:21)\n(cid:13)\n(cid:22)\n(cid:23)\n(cid:16)\n(cid:18)\n(cid:15)\n(cid:24)\n(cid:9)\n(cid:24)\n(cid:12)\n(cid:25)\n(cid:25)\n(cid:16)\n(cid:16)\n(cid:18)\n(cid:17)\n(cid:10)\n(cid:18)\n(cid:26)\n(cid:13)\n(cid:7)\n(cid:27)\n(cid:22)\n(cid:11)\n(cid:7)\n(cid:15)\n(cid:22)\n(cid:12)\n(cid:7)\n(cid:28)\n(cid:29)\n(cid:7)\n(cid:10)\n(cid:13)\n(cid:16)\n(cid:16)\n(cid:18)\n(cid:16)\n(cid:16)\n(cid:18)\n(cid:8)\n(cid:13)\n(cid:12)\n(cid:16)\n(cid:18)\n(cid:26)\n\"\n(cid:28)\n(cid:29)\n#\n(cid:26)\n&\n(cid:28)\n\f(cid:21) (cid:1)(cid:20)\n\n(cid:14) (cid:1)(cid:2)\n\n(cid:1)\n\n(cid:1)\n\n(cid:1)(cid:2)(cid:9)\n(cid:1)(cid:2)(cid:9)\n(cid:1)(cid:2)(cid:8) (cid:6)(cid:11)\n(cid:1)(cid:2)(cid:8) (cid:6)(cid:11)\n\n(cid:16) (cid:3)(cid:12)\n(cid:16) (cid:3)(cid:12)\n\n(cid:7) (cid:4)(cid:3)(cid:2)(cid:17) (cid:6)(cid:10)\n(cid:7) (cid:4)(cid:3)(cid:2)(cid:17) (cid:6)(cid:10)\n\n(cid:16) (cid:3)(cid:12)\n(cid:16) (cid:3)(cid:12)\n\n(cid:21) (cid:6)(cid:6)(cid:3)\n(cid:21) (cid:6)(cid:6)(cid:3)\n\n(cid:11) (cid:1)(cid:13)\n\n(cid:14) (cid:1)(cid:2)\n\n(cid:7) (cid:1)(cid:20)\n\n(cid:1)\n\n(cid:1)(cid:1)\n\n(cid:1)(cid:4)\n\n(cid:1)(cid:3)\n\n(cid:1)(cid:7)\n\n(cid:1)\n\n(cid:1)(cid:1)\n\n(cid:1)\n(cid:1)\n\n(cid:1)(cid:1)(cid:1)(cid:4)\n(cid:1)(cid:4)\n(cid:1)(cid:1)\n(cid:1)(cid:5)\n\n(cid:1)(cid:1)\n\n(cid:1)(cid:2)\n\n(cid:1)\n\n(cid:3) (cid:1)(cid:10)\n\n(cid:1)\n\n(cid:1)(cid:1)\n\n(cid:2)(cid:1)(cid:1)(cid:3)\n(cid:1)(cid:3)\n(cid:2)(cid:1)\n\n(cid:1)\n(cid:1)\n\n(cid:2)(cid:1)(cid:1)(cid:7)\n(cid:1)(cid:7)\n(cid:2)(cid:1)\n\n(cid:1)\n(cid:1)\n\n(cid:1)(cid:6)\n\n(cid:1)\n\n(cid:2)(cid:1)\n\n(cid:1)(cid:8)\n\n(cid:1)\n\n(cid:2)(cid:1)\n\n(cid:2)(cid:1)(cid:1)(cid:4)\n(cid:1)(cid:4)\n(cid:2)(cid:1)\n\n(cid:1)\n(cid:1)\n\n(cid:1)(cid:1)(cid:1)(cid:5)\n(cid:1)(cid:5)\n(cid:1)(cid:1)\n\n(cid:1)\n(cid:1)\n\n(cid:1)\n\n(cid:1)(cid:1)\n\n(cid:3) (cid:1)(cid:10)\n\n(cid:1)\n\n(cid:2) (cid:1)\n\n(cid:1)\n\n(cid:2)(cid:1)\n\n(cid:2)(cid:1)(cid:1)(cid:6)\n(cid:1)(cid:6)\n(cid:2)(cid:1)\n\n(cid:1)\n(cid:1)\n\n(cid:1)\n\n(cid:1)\n\n(cid:2)(cid:1)\n\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)\n(cid:1)(cid:2)(cid:15)\n(cid:1)(cid:2)(cid:15)\n(cid:1)(cid:2)(cid:5)(cid:7)\n(cid:1)(cid:2)(cid:5)(cid:7)\n(cid:1)(cid:2)(cid:13) (cid:6)(cid:8)\n(cid:1)(cid:2)(cid:13) (cid:6)(cid:8)\n\n(cid:8) (cid:6)(cid:9)\n(cid:8) (cid:6)(cid:9)\n\n(cid:1)(cid:9)\n(cid:1)(cid:9)\n\n(cid:1)\n(cid:1)\n\n(cid:1)\n(cid:1)\n\n(cid:10) (cid:6)(cid:5)(cid:2)(cid:11)\n(cid:10) (cid:6)(cid:5)(cid:2)(cid:11)\n(cid:11) (cid:2)(cid:11)\n(cid:11) (cid:2)(cid:11)\n\n(cid:7) (cid:3)(cid:12)\n(cid:7) (cid:3)(cid:12)\n(cid:7) (cid:3)(cid:12)\n(cid:7) (cid:3)(cid:12)\n\n(cid:1)(cid:2)(cid:3)\n(cid:1)(cid:2)(cid:3)\n(cid:1)(cid:9)\n(cid:1)(cid:9)\n(cid:1)(cid:1)(cid:9)\n(cid:1)\n(cid:1)\n\n(cid:11)(cid:1)\n(cid:11)(cid:1)\n(cid:11)(cid:1)\n\nFigure 2: Examples of Hierarchical Directed Graph structures (these are also HDAG): each\nletter represents an attribute\n\nFigure 2 shows examples of multi-labeled hierarchical directed graphs. In this paper, we\ncall a multi-labeled hierarchical directed graph a hierarchical directed graph.\n\n3 Kernels on Hierarchical Directed Acyclic Graph\n\nAt \ufb01rst, in order to calculate kernels ef\ufb01ciently, we add one constraint: that the hierarchical\ndirected graph has no cyclic paths. First, we de\ufb01ne a path on a Hierarchical Directed Graph.\nIf a node has no vertical link, then the node is called a terminal node, which is denoted as\nT (cid:26) V ; otherwise it is a non-terminal node, which is denoted as (cid:22)T (cid:26) V .\n\nDe\ufb01nition 4 (Hierarchical Path (HiP)) Let p = hvi; ei;j; vj; : : : ; vk; ek;l; vli be a path.\nLet (cid:7)(v) be a function that returns a subgraph Gi that is linked with v by a vertical link if\nv 2 (cid:22)T . Let P(G) be a function that returns the set of all HiPs in G, where links between\nv 2 G and v =2 G are ignored. Then, ph = hh(vi); ei;j; h(vj); : : : ; h(vk); ek;l; h(vl)i is\nx 2 P(Gx) s.t. Gx = (cid:7)(v) if v 2 (cid:22)T otherwise\nde\ufb01ned as a HiP, where h(v) returns vph\nIntuitively, a HiP is constructed by a path in the path structure, e.g., ph =\nreturns v.\nhvi; ei;j; vjhvm; em;n; vni; : : : ; vk; ek;l; vli.\n\nx, ph\n\nDe\ufb01nition 5 (Hierarchical Directed Acyclic Graph (HDAG)) hierarchical directed graph\nG = (V; E; M; G; F ) is an HDAG if there is no HiP from any node v to the same node v.\n\nA primitive feature for de\ufb01ning kernels on HDAGs is a hierarchical attribute subsequence.\n\nDe\ufb01nition 6 (Hierarchical Attribute Subsequence (HiAS)) A HiAS is de\ufb01ned as a list of\nattributes with hierarchical information extracted from nodes on HiPs.\n\nFor example, let ph = hvi; ei;j; vjhvm; em;n; vni; : : : ; vk; ek;l; vli be a HiP, then, HiASs in\nph are written as (cid:28) (ph) = hai; ajham; ani; : : : ; ak; ali, which is all combinations for all\nai 2 (cid:28) (vi), where (cid:28) (v) of node v is a function that returns the set of attributes allocated to\nnode v, and (cid:28) (ph) of HiP ph is a function that returns all possible HiASs extracted from\nHiP ph.\n(cid:0)(cid:3) denotes all possible HiASs constructed by the attribute in (cid:0) and (cid:13)i 2 (cid:0)(cid:3) denotes the\ni\u2019th HiAS. An explicit representation of a feature vector of an HDAG kernel is de\ufb01ned\nas (cid:30)(G) = ((cid:30)1(G); : : : ; (cid:30)j(cid:0)(cid:3)j(G)), where (cid:30) represents the explicit feature mapping from\nHDAG to the numerical feature space. The value of (cid:30)i(G) becomes the weighted number\nof occurrences of (cid:13)i in G. According to this approach, the HDAG kernel, K(G 1; G2) =\ni=1 h(cid:30)i(G1) (cid:1) (cid:30)i(G2)i, calculates the inner product of the weighted common HiASs in\n\nPj(cid:0)(cid:3)j\n\n(cid:2)\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:2)\n(cid:6)\n(cid:6)\n(cid:4)\n(cid:7)\n(cid:3)\n(cid:5)\n(cid:2)\n(cid:2)\n(cid:2)\n(cid:5)\n(cid:2)\n(cid:4)\n(cid:6)\n(cid:2)\n(cid:6)\n(cid:2)\n(cid:7)\n(cid:2)\n(cid:8)\n(cid:2)\n(cid:2)\n(cid:3)\n(cid:10)\n(cid:7)\n(cid:9)\n(cid:14)\n(cid:16)\n(cid:17)\n(cid:18)\n(cid:8)\n(cid:14)\n(cid:19)\n(cid:20)\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:2)\n(cid:9)\n(cid:10)\n(cid:11)\n(cid:11)\n(cid:11)\n(cid:11)\n(cid:10)\n(cid:7)\n(cid:9)\n(cid:14)\n(cid:16)\n(cid:17)\n(cid:18)\n(cid:8)\n(cid:14)\n(cid:19)\n(cid:20)\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:2)\n(cid:9)\n(cid:10)\n(cid:11)\n(cid:11)\n(cid:11)\n(cid:11)\n(cid:10)\n(cid:11)\n(cid:11)\n(cid:11)\n(cid:11)\n(cid:2)\n(cid:3)\n(cid:6)\n(cid:4)\n(cid:5)\n(cid:2)\n(cid:6)\n(cid:8)\n(cid:9)\n(cid:10)\n(cid:11)\n(cid:2)\n(cid:12)\n(cid:13)\n(cid:15)\n(cid:9)\n(cid:16)\n(cid:17)\n(cid:5)\n(cid:7)\n(cid:15)\n(cid:9)\n(cid:17)\n(cid:18)\n(cid:9)\n(cid:19)\n(cid:20)\n(cid:11)\n(cid:21)\n(cid:11)\n(cid:7)\n(cid:19)\n(cid:9)\n(cid:22)\n(cid:14)\n(cid:20)\n(cid:13)\n(cid:14)\n(cid:23)\n(cid:9)\n(cid:20)\n(cid:7)\n(cid:24)\n(cid:20)\n(cid:25)\n(cid:21)\n(cid:15)\n(cid:20)\n(cid:14)\n(cid:26)\n(cid:27)\n(cid:2)\n(cid:4)\n(cid:7)\n(cid:15)\n(cid:7)\n(cid:5)\n(cid:9)\n(cid:28)\n(cid:14)\n(cid:7)\n(cid:11)\n(cid:2)\n(cid:15)\n(cid:9)\n(cid:29)\n(cid:14)\n(cid:11)\n(cid:7)\n(cid:5)\n(cid:9)\n(cid:10)\n(cid:11)\n(cid:2)\n(cid:12)\n(cid:13)\n(cid:7)\n(cid:3)\n(cid:2)\n(cid:6)\n(cid:4)\n(cid:30)\n(cid:5)\n(cid:20)\n(cid:14)\n(cid:10)\n(cid:9)\n(cid:20)\n(cid:15)\n(cid:20)\n(cid:14)\n(cid:10)\n(cid:9)\n(cid:20)\n(cid:15)\n(cid:10)\n(cid:11)\n(cid:10)\n(cid:11)\n(cid:5)\n(cid:3)\n(cid:2)\n(cid:10)\n(cid:5)\n(cid:3)\n(cid:4)\n(cid:10)\n(cid:3)\n(cid:3)\n(cid:6)\n(cid:10)\n(cid:5)\n(cid:3)\n(cid:3)\n(cid:10)\n(cid:7)\n(cid:10)\n(cid:11)\n(cid:10)\n(cid:11)\n(cid:10)\n(cid:11)\n(cid:10)\n(cid:11)\n(cid:2)\n(cid:10)\n(cid:11)\n(cid:10)\n(cid:11)\n(cid:10)\n(cid:11)\n(cid:10)\n(cid:11)\n(cid:10)\n(cid:11)\n(cid:10)\n(cid:11)\n(cid:3)\n(cid:7)\n(cid:10)\n(cid:8)\n(cid:3)\n(cid:7)\n(cid:10)\n(cid:2)\n(cid:3)\n(cid:6)\n(cid:10)\n(cid:3)\n(cid:3)\n(cid:4)\n(cid:10)\n(cid:5)\n\f(cid:1)(cid:1)\n\n(cid:5) (cid:1)(cid:2)(cid:3)(cid:5)\n\n(cid:1)(cid:8) (cid:1)(cid:2)(cid:3)(cid:5)\n\n(cid:1)(cid:1)(cid:2)(cid:3) (cid:1)(cid:5)\n(cid:1)(cid:1)(cid:1)(cid:1)(cid:2)(cid:3) (cid:1)(cid:5)\n(cid:1)(cid:1)\n\n(cid:3)(cid:11)\n(cid:3)(cid:11)\n\n(cid:3)(cid:11)\n(cid:6) (cid:1)(cid:5)\n(cid:1)(cid:5) (cid:1)(cid:5)\n(cid:1)(cid:6) (cid:1)(cid:5)\n\n(cid:3)(cid:6)\n(cid:3)(cid:7)\n\n(cid:1)(cid:1)\n\n(cid:4) (cid:1)(cid:5)\n\n(cid:3)(cid:10)\n\n(cid:1)(cid:4) (cid:1)(cid:2)(cid:3)(cid:4)\n\n(cid:1)(cid:4)\n\n(cid:7) (cid:1)(cid:5)\n\n(cid:3)(cid:10)\n\n(cid:3)(cid:8)\n\n(cid:7) (cid:1)(cid:5)\n(cid:1)(cid:7) (cid:1)(cid:5)\n\n(cid:3)(cid:8)\n\n(cid:1)(cid:6)\n\n(cid:7) (cid:1)(cid:5)\n\n(cid:3)(cid:9)\n\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)\n(cid:1)(cid:2)(cid:15)\n(cid:1)(cid:2)(cid:15)\n(cid:1)(cid:2)(cid:5)(cid:7)\n(cid:1)(cid:2)(cid:5)(cid:7)\n(cid:1)(cid:2)(cid:13) (cid:6)(cid:8)\n(cid:1)(cid:2)(cid:13) (cid:6)(cid:8)\n\n(cid:8) (cid:6)(cid:9)\n(cid:8) (cid:6)(cid:9)\n\n(cid:1)\n(cid:1)\n\n(cid:1)\n(cid:1)\n\n(cid:1)\n(cid:1)\n\n(cid:10) (cid:6)(cid:5)(cid:2)(cid:11)\n(cid:10) (cid:6)(cid:5)(cid:2)(cid:11)\n(cid:11) (cid:2)(cid:11)\n(cid:11) (cid:2)(cid:11)\n\n(cid:7) (cid:3)(cid:12)\n(cid:7) (cid:3)(cid:12)\n(cid:7) (cid:3)(cid:12)\n(cid:7) (cid:3)(cid:12)\n\n(cid:1)(cid:4) (cid:2)(cid:5)\n(cid:1)(cid:4) (cid:2)(cid:5)\n(cid:1)(cid:4) (cid:2)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:1)\n(cid:1)(cid:4) (cid:2)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:1)\n(cid:1)(cid:1)(cid:4) (cid:2)(cid:3)(cid:3)(cid:3)(cid:3)(cid:3)(cid:1)\n(cid:1)\n(cid:1)\n\n(cid:1)(cid:1) (cid:2)\n(cid:1)(cid:1) (cid:2)\n(cid:1) (cid:2)\n(cid:1) (cid:2)\n\n(cid:1)(cid:1)(cid:2)\n(cid:1)(cid:1)(cid:2)\n\n(cid:3) (cid:1)(cid:2)\n(cid:3) (cid:1)(cid:2)\n\n(cid:1)\n(cid:1)\n\n(cid:1)\n(cid:1)\n\n(cid:3) (cid:1)(cid:12)\n(cid:3) (cid:1)(cid:12)\n\n(cid:3) (cid:1)(cid:1)\n(cid:3) (cid:1)(cid:1)\n(cid:1)\n(cid:1)\n\n(cid:1) (cid:9)\n(cid:1) (cid:9)\n\n(cid:1) (cid:9)\n(cid:1) (cid:9)\n\n(cid:1)\n(cid:1)\n\n(cid:1)\n(cid:1)\n\nFigure 3: An Example of Hierarchical Directed Graph \u201cG\u201d with weight factors\n\ntwo HDAGs, G1 and G2. In this paper, we use j stand for the meaning of \u201csuch that,\u201d since\nit is simple.\n\nKHDAG(G1; G2) = X(cid:13)i2(cid:0)(cid:3)\n\nX(cid:13)i2(cid:28) (ph\n\n1 )jph\n\n1 2P(G 1)\n\nX(cid:13)i2(cid:28) (ph\n\n2 )jph\n\n2 2P(G 2)\n\nW(cid:13)i (ph\n\n1 )W(cid:13)i (ph\n\n2 );\n\n(1)\n\nwhere W(cid:13)i (ph) represents the weight value of HiAS (cid:13)i in HiP ph. The weight of HiAS (cid:13)i\nin HiP ph is determined by\n\nW(cid:0)(a); (2)\n\nW(cid:13)i (ph) = Yv2V (ph)\n\nWV (v) Yei;j 2E(ph)\n\nWE(vi; vj) Yfi;Gj 2F (ph)\n\nWF (vi; Gj) Ya2(cid:28) ((cid:13)i)\n\nwhere WV (v), WE(vi; vj), WF (vi; Gj), and W(cid:0)(a) represent the weight of node v, link\nfrom vi to vj, vertical link from vi to subgraph Gj, and attribute a, respectively. An example\nof how each weight factor is given is shown in Figure 3. In the case of NL data, for example,\nW(cid:0)(a) might be given by the score of tf (cid:3) idf from large scale documents, WV (v) by the\ntype of chunk such as word, phrase or named entity, WE(vi; vj) by the type of relation\nbetween vi and vj, and WF (vi; Gj) by the number of nodes in Gj.\n\nSoft Structural Matching Frameworks\n\nSince HDAG kernels permit not only the exact matching of substructures but also approx-\nimate matching, we add the framework of node skip and relaxation of hierarchical infor-\nmation.\nFirst, we discuss the framework of the node skip. We introduce decay function (cid:3)V (v)(0 <\n(cid:3)V (v) (cid:20) 1), which represents the cost of skipping node v when extracting HiASs from\nthe HiPs, which is almost the same architecture as [8]. For example, a HiAS under the\nnode skips is written as h(cid:3)ha2; a3i; (cid:3); ha5ii from HiP hv1hv2; v3i; v4; hv5ii, where (cid:3) is the\nexplicit representation of a node that is skipped.\n\nNext, in the case of the relaxation of hierarchical information, we perform two processes:\n(1) we form one hierarchy if there is multiple hierarchy information in the same point, for\nexample, hhhai; ajii; aki becomes hhai; aji; aki; and (2) we delete hierarchical information\nif there exists only one node, for example, hhaii; aj; aki becomes hai; aj; aki.\nThese two frameworks achieve approximate substructure matching automatically. Table 1\nshows an explicit representation of the common HiASs (features) of G 1 and G2 in Figure 2.\nFor the sake of simplicity, for all the weights WV (v), WE(vi; vj), WF (vi; Gj), and W(cid:0)(a),\nare taken as 1 and for all v, (cid:3)V (v) = (cid:21) if v has at least one attribute, otherwise (cid:3)V (v) = 1.\n\nEf\ufb01cient Recursive Computation\n\nIn general, when the dimension of the feature space j(cid:0)(cid:3)j becomes very high, it is com-\nputationally infeasible to generate feature vector (cid:30)(G) explicitly. We de\ufb01ne an ef\ufb01cient\ncalculation formula between HDAGs G 1 and G2, which is written as:\n\nKHDAG(G1; G2) = Xq2QXr2R\n\nK(q; r);\n\n(3)\n\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:6)\n(cid:7)\n(cid:8)\n(cid:9)\n(cid:3)\n(cid:10)\n(cid:11)\n(cid:13)\n(cid:5)\n(cid:14)\n(cid:3)\n(cid:2)\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:6)\n(cid:7)\n(cid:8)\n(cid:9)\n(cid:3)\n(cid:10)\n(cid:11)\n(cid:3)\n(cid:15)\n(cid:6)\n(cid:16)\n(cid:5)\n(cid:17)\n(cid:9)\n(cid:5)\n(cid:15)\n(cid:3)\n(cid:14)\n(cid:6)\n(cid:18)\n(cid:19)\n(cid:3)\n(cid:11)\n(cid:16)\n(cid:10)\n(cid:20)\n(cid:3)\n(cid:2)\n(cid:10)\n(cid:3)\n(cid:2)\n(cid:2)\n(cid:2)\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:6)\n(cid:7)\n(cid:8)\n(cid:9)\n(cid:3)\n(cid:10)\n(cid:11)\n(cid:3)\n(cid:18)\n(cid:10)\n(cid:15)\n(cid:5)\n(cid:3)\n(cid:2)\n(cid:2)\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:6)\n(cid:7)\n(cid:8)\n(cid:9)\n(cid:3)\n(cid:10)\n(cid:11)\n(cid:3)\n(cid:21)\n(cid:5)\n(cid:16)\n(cid:9)\n(cid:6)\n(cid:17)\n(cid:12)\n(cid:14)\n(cid:3)\n(cid:14)\n(cid:6)\n(cid:18)\n(cid:19)\n(cid:3)\n(cid:11)\n(cid:16)\n(cid:10)\n(cid:20)\n(cid:3)\n(cid:2)\n(cid:10)\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:6)\n(cid:7)\n(cid:8)\n(cid:9)\n(cid:3)\n(cid:10)\n(cid:11)\n(cid:13)\n(cid:5)\n(cid:14)\n(cid:3)\n(cid:2)\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:6)\n(cid:7)\n(cid:8)\n(cid:9)\n(cid:3)\n(cid:10)\n(cid:11)\n(cid:3)\n(cid:15)\n(cid:6)\n(cid:16)\n(cid:5)\n(cid:17)\n(cid:9)\n(cid:5)\n(cid:15)\n(cid:3)\n(cid:14)\n(cid:6)\n(cid:18)\n(cid:19)\n(cid:3)\n(cid:11)\n(cid:16)\n(cid:10)\n(cid:20)\n(cid:3)\n(cid:2)\n(cid:10)\n(cid:3)\n(cid:2)\n(cid:2)\n(cid:2)\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:6)\n(cid:7)\n(cid:8)\n(cid:9)\n(cid:3)\n(cid:10)\n(cid:11)\n(cid:3)\n(cid:18)\n(cid:10)\n(cid:15)\n(cid:5)\n(cid:3)\n(cid:2)\n(cid:2)\n(cid:3)\n(cid:4)\n(cid:5)\n(cid:6)\n(cid:7)\n(cid:8)\n(cid:9)\n(cid:3)\n(cid:10)\n(cid:11)\n(cid:3)\n(cid:21)\n(cid:5)\n(cid:16)\n(cid:9)\n(cid:6)\n(cid:17)\n(cid:12)\n(cid:14)\n(cid:3)\n(cid:14)\n(cid:6)\n(cid:18)\n(cid:19)\n(cid:3)\n(cid:11)\n(cid:16)\n(cid:10)\n(cid:20)\n(cid:3)\n(cid:2)\n(cid:10)\n(cid:2)\n(cid:2)\n(cid:2)\n(cid:2)\n(cid:2)\n(cid:2)\n(cid:10)\n(cid:7)\n(cid:9)\n(cid:14)\n(cid:16)\n(cid:17)\n(cid:18)\n(cid:8)\n(cid:14)\n(cid:19)\n(cid:20)\n(cid:2)\n(cid:10)\n(cid:7)\n(cid:9)\n(cid:14)\n(cid:16)\n(cid:17)\n(cid:18)\n(cid:8)\n(cid:14)\n(cid:19)\n(cid:20)\n(cid:2)\n\fTable 1: Common HiASs of G 1 and G2 in Figure 2: (N.S. represents the node skip, H.R.\nrepresents the relaxation of hierarchical information)\n\nG1\n\nHiAS with (cid:3)\n\nhP i\nhN i\nhai\nhbi\nhci\nhdi\n\nhc; bi\nhd; bi\nP hai\nP hci\n\nh(cid:3)hN i; h(cid:3)i; (cid:3)haii\n\nh(cid:3)hN i; h(cid:3)i; P i\n\nhN; bi\n\nh(cid:3)hN i; hdii\n\nh(cid:3)hbi; h(cid:3)i; (cid:3)haii\n\nh(cid:3)hbi; h(cid:3)i; P i\n\nh(cid:3)hbi; hdii\n\nh(cid:3)hci; h(cid:3)i; (cid:3)haii\n\nh(cid:3)hci; hdii\nhhdi; (cid:3)haii\n\nh(cid:3)hN i; h(cid:3)i; P haii\nh(cid:3)hbi; h(cid:3)i; P haii\n\nHiAS\nhP i\nhN i\nhai\nhbi\nhci\nhdi\n\nvalue\n\nHiAS with (cid:3)\n\nG2\n\nN.S.\n\nN.S.+ H.R.\n\nvalue common HiAS value common HiAS value\n\nHiAS\nhP i\nhN i\nhai\nhbi\nhci\nhdi\n\nhc; bi\n\nhc; bi\n\nhN; bi\n\nhc; bi\n\nP hai\n\nhhdi; hhbiii\n\nhhdi; hhbiii\n\nhhN i; (cid:3)haii\n\nP hai\nP hhcii\n\nP hai\nP hhcii\n\nhhN i; haii\nhhN i; P i\n\nhc; bi\nhd; bi\nP hai\nP hci\n\nhP i\nhN i\nhai\nhbi\nhci\nhdi\n\nhP i\nhN i\nhai\nhbi\nhci\nhdi\n\nhP i\nhN i\nhai\nhbi\nhci\nhdi\n\n2\n1\n2\n1\n1\n1\n1\n1\n2\n1\n(cid:21)3\n(cid:21)2\n1\n(cid:21) hhN i; (cid:3)hhdiii\n(cid:21)3\n(cid:21)2\n(cid:21)\n(cid:21)3\n(cid:21)\n(cid:21)\nhhN i; P haii (cid:21)2\n(cid:21)2\nhhbi; P haii\n\n1\n2\n1\n1\n1\n2\n1\n1\n1\n1\n1\n1\n1\n1\n1\n0\n1\n2\n1\n0\n(cid:21)4\n(cid:21)\n(cid:21)2\n1\n1\n1\n0\n(cid:21)\n(cid:21)4\n(cid:21)\n(cid:21)2\n1\n0\n(cid:21)\n1\n0\n1\n0\n1\n0\nhhN i; P haii (cid:21)2\n1\n(cid:21)2\n1\nhhbi; P haii\n(cid:21) hhN; bi; haii (cid:21)4\n(cid:21)2\n(cid:21)2\n1\n0\n(cid:21) hhN; bi; (cid:3)hhdiii hhN; bi; hhdiii (cid:21)\n\n2\n1\n2\n1\n1\n1\n1\n1\n2\n1\n(cid:21)4\n(cid:21)2\n1\n(cid:21)2\n(cid:21)4\n(cid:21)2\n(cid:21)2\n(cid:21)3\n(cid:21)\n(cid:21)\n(cid:21)2\n(cid:21)2\n(cid:21)4\n(cid:21)2\n(cid:21)2\n1 hhN; bi; P haii (cid:21)2 hhN; bi; P haii (cid:21)2\n\nhc; bi\nhb; di\nP hai\nP hci\nhN; ai\nhN; P i\nhN; bi\nhN; di\nhb; ai\nhb; P i\nhb; di\nhc; ai\nhc; di\nhd; ai\n\nhhN i; hdii\nhhbi; haii\nhhbi; P i\nhhbi; hdii\nhhci; haii\nhhci; hdii\nhhdi; haii\n\nhN; P haii\nhb; P haii\nhhN; bi; ai\nhhN; bi; P i\nhhN; bi; di\n\nhhN i; P haii\nhhbi; P haii\nhhN; bi; haii\nhhN; bi; P i\n\nhhN i; P haii\nhhbi; P haii\n\nhhN i; haii\nhhN i; P i\n\nhhN i; haii\nhhN i; P i\n\nhhbi; haii\nhhbi; P i\n\nhhbi; haii\nhhbi; P i\n\nhhbi; (cid:3)hhdiii\n\nhhN i; hhdiii\n\nhhbi; hhdiii\n\n-\n\n-\n\n-\n\n-\n-\n-\n-\n\nhhci; ai\n\nhc; di\n\nhhdi; ai\n\nhhN; bi; P i\n\nhhci; ai\n\nhc; di\n\nhhdi; ai\n\nhhN i; P i\n\nhN; bi\n\nhhN; bi; P i\n\nhhbi; (cid:3)haii\n\nhhbi; P i\n\nhN; bi\n\nhN; bi\n\n-\n\nh(cid:3)hN; bi; h(cid:3)i; (cid:3)haii hhN; bi; haii (cid:21)3 hhN; bi; (cid:3)haii\n\nh(cid:3)hN; bi; h(cid:3)i; P i\n\nh(cid:3)hN; bi; hdii\n\nhhN; bi; P i\nhhN; bi; hdii\n\nh(cid:3)hN; bi; h(cid:3)i; P haii hhN; bi; P haii (cid:21)2 hhN; bi; P haii hhN; bi; P haii\n\nwhere Q = fq1; : : : ; qjQjg and R = fr1; : : : ; rjRjg represent nodes in G1 and G2, respec-\ntively. K(q; r) represents the sum of the weighted common HiASs that are extracted from\nthe HiPs whose sink nodes are q and r.\n\nK(q; r) = J 00\n\nG 1;G 2(q; r)H(q; r) + ^H(q; r)I(q; r) + I(q; r)\n\nFunction I(q; r) returns the weighted number of common attributes of nodes q and r,\n\nI(q; r) = WV (q)WV (r) Xa12(cid:28) (q) Xa22(cid:28) (r)\n\nW(cid:0)(a1)W(cid:0)(a2)(cid:14)(a1; a2);\n\n(4)\n\n(5)\n\nwhere (cid:14)(a1; a2) = 1 if a1 = a2, and 0 otherwise. Let H(q; r) be a function that returns the\nsum of the weighted common HiASs between q and r including (cid:7)(q) and (cid:7)(r).\nif q; r 2 (cid:22)T\n\nH(q; r) = (cid:26) I(q; r) + (I(q; r) + (cid:3)V (q)(cid:3)V (r)) ^H(q; r);\n^H(q; r) = Xs2G 1\ni )WF (r; G2\nx;y(q; r), and J 00\nLet Jx;y(q; r), J 0\nto calculate H(q; r) and K(q; r).\n\ni =(cid:7)(q) Xt2G 2\nx;y(q; r), where x; y are (sub)graphs, be recursive functions\n\nWF (q; G1\n\notherwise\n\nI(q; r);\n\nj )JG 1\n\nj =(cid:7)(r)\n\n(s; t)\n\ni ;G 2\n\nj jG 2\n\ni jG 1\n\n(7)\n\n(6)\n\nj\n\nJx;y(q; r) = J 00\nWE(q; t)(cid:0)(cid:3)0\n\nx;y(q; r)H(q; r) + H(q; r)\n\nV (t)J 0\n\nx;y(q; t)+Jx;y(q; t)(cid:1);\n\nWE(s; r)(cid:0)(cid:3)0\n\nV (s)J 00\n\nx;y(s; r)+J 0\n\nx;y(s; r)(cid:1);\n\nJ 0\n\n0;\n\notherwise\n\nXt2f (r)\\V (y)g\n\nx;y(q; r) = 8<\n:\nx;y(q; r) = 8<\nXs2f (q)\\V (x)g\n:\n\notherwise\n\n0;\n\nJ 00\n\nif (r) 6= ;\n\nif (q) 6= ;\n\n(8)\n\n(9)\n\n(10)\n\n\fV (v) = (cid:3)V (v)Qt2GijGi=(cid:7)(v) (cid:3)V (t) if v 2 (cid:22)T , (cid:3)0\n\nV (v) = (cid:3)V (v) otherwise. Func-\nwhere (cid:3)0\ntion (q) returns a set of nodes that have direct links to node q. (q) = ; means that no\nnode has a direct link to s.\nNext, we show the formula when using the framework of relaxation of hierarchical infor-\nmation. The functions have the same meanings as in the previous formula. We denote\n~H(q; r) = H(q; r) + H 0(q; r).\n\nK(q; r) = J 00\n\nG 1;G 2(q; r) ~H(q; r) +(cid:0)H 0(q; r) + H 00(q; r)(cid:1)I(q; r) + I(q; r)\n\n0;\n\nj jG 2\n\nWF (r; G2\n\nj =(cid:7)(r)\notherwise\n\nXt2G 2\n\nH(q; r) =(cid:0)H 0(q; r) + H 00(q; r)(cid:1)I(q; r) + H 00(q; r) + I(q; r)\nj ) ~H(q; t); if r 2 (cid:22)T\nH 0(q; r) = 8<\n:\nXs2G 1\n8>>>><\nXs2G 1\n>>>>:\n\ni =(cid:7)(q)\notherwise\n\nWF (q; G1\n\nWF (q; G1\n\ni )H(s; r);\n\nif q 2 (cid:22)T\n\ni jG 1\n\ni =(cid:7)(q)\n\ni jG 1\n\n0;\n\ni )H(s; r) + ^H(q; r); if q; r 2 (cid:22)T\n\nH 00(q; r) =\n\n(11)\n\n(12)\n\n(13)\n\n(14)\n\n(16)\n\nJx;y(q; r) = J 00\nWE(q; t)(cid:0)(cid:3)0\n\nx;y(q; r) ~H(q; r)\nx;y(q; t)+Jx;y(q; t)+ ~H(q; t)(cid:1); if (r) 6= ;\n\nV (t)J 0\n\n(15)\n\nJ 0\n\nx;y(q; r) = 8<\n:\n\nXt2f (r)\\V (y)g\n\n0;\n\notherwise\n\nFunctions I(q; r), J 00\n\nx;y(q; r), and ^H(q; r) are the same as those shown above.\n\nAccording to equation (3), given the recursive de\ufb01nition of KHDAG(q; r), the value between\ntwo HDAGs can be calculated in time O(jQjjRj). In actual use, we may want to eval-\nuate only the subset of all HiASs whose sizes are under n when determining the kernel\nvalue because of the problem discussed in [1]. This can simply realized by not calculating\nthose HiASs whose size exceeds n when calculating K(q; r); the calculation cost becomes\nO(njQjjRj).\nFinally, we normalize the values of the HDAG kernels to remove any bias introduced by\nthe number of nodes in the graphs. This normalization corresponds to the standard unit\nnorm normalization of examples in the feature space corresponding to the kernel space\n^K(x; y) = K(x; y) (cid:1) (K(x; x)K(y; y))(cid:0)1=2 [4].\nWe will now elucidate an ef\ufb01cient processing algorithm. First, as a pre-process, the nodes\nare sorted under two conditions, V ((cid:7)(v)) (cid:30) v and (cid:9)(v) (cid:30) v, where (cid:9)(v) represents all\nnodes that have a path to v. The dynamic programming technique can be used to compute\nHDAG kernels very ef\ufb01ciently: By following the sorted order, the values that are needed to\ncalculate K(q; r) have already been calculated in the previous calculation.\n\n4 Experiments\n\nOur aim was to test the ef\ufb01ciency of using the richer syntactic and semantic structures\navailable within texts, which can be treated now for the \ufb01rst time by our proposed method.\nWe evaluated the performance of the proposed method in the actual NLP task of Question\nClassi\ufb01cation, which is similar to the Text Classi\ufb01cation task except that it requires many\n\n\f(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:9) (cid:1)(cid:2)(cid:3) (cid:4)\n(cid:1)(cid:2)(cid:3) (cid:4)\n(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:9)\n\n(cid:16) (cid:3)(cid:7)\n\n(cid:18) (cid:3)(cid:7)\n\n(cid:11) (cid:7)\n(cid:11) (cid:7)\n\n(cid:18) (cid:3)(cid:12)\n\n(cid:6)(cid:4)\n(cid:6)(cid:4)\n\n(cid:7) (cid:3)(cid:12)\n(cid:7) (cid:3)(cid:12)\n\n(cid:13) (cid:6)(cid:14) (cid:3)\n(cid:13) (cid:6)(cid:14) (cid:3)\n\n(cid:6)(cid:8)(cid:6)(cid:4)(cid:5)(cid:3)(cid:13)\n(cid:6)(cid:8)(cid:6)(cid:4)(cid:5)(cid:3)(cid:13)\n\n(cid:7)(cid:15)\n(cid:7)(cid:15)\n\n(cid:17) (cid:8)\n(cid:17) (cid:8)\n\n(cid:1)(cid:13) (cid:5)(cid:19)\n(cid:1)(cid:13) (cid:5)(cid:19)\n\n(cid:20) (cid:2)(cid:17) (cid:9)\n(cid:20) (cid:2)(cid:17) (cid:9)\n\n(cid:1)(cid:2)(cid:3)(cid:2)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:2)(cid:9)\n(cid:1)(cid:2)(cid:3)(cid:2)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:2)(cid:9)\n\n(cid:23) (cid:7)(cid:2)(cid:8)(cid:5)(cid:13)\n(cid:23) (cid:7)(cid:2)(cid:8)(cid:5)(cid:13)\n(cid:11) (cid:7)(cid:12)\n(cid:11) (cid:7)(cid:12)\n\n(cid:16) (cid:5)(cid:13) (cid:6)(cid:17)\n(cid:16) (cid:5)(cid:13) (cid:6)(cid:17)\n\n(cid:18) (cid:9)\n(cid:18) (cid:9)\n\n(cid:1)(cid:10)\n(cid:1)(cid:10)\n\n(cid:18) (cid:3)(cid:12)\n(cid:18) (cid:3)(cid:12)\n\n(cid:11) (cid:7)\n(cid:11) (cid:7)\n\n(cid:6)(cid:4)\n(cid:6)(cid:4)\n\n(cid:13) (cid:6)(cid:14) (cid:3)\n(cid:13) (cid:6)(cid:14) (cid:3)\n\n(cid:6)(cid:8)(cid:6)(cid:4)(cid:5)(cid:3)(cid:13)\n(cid:6)(cid:8)(cid:6)(cid:4)(cid:5)(cid:3)(cid:13)\n\n(cid:7)(cid:15)\n(cid:7)(cid:15)\n\n(cid:17) (cid:8)\n(cid:17) (cid:8)\n\n(cid:1)(cid:13) (cid:5)(cid:19)\n(cid:1)(cid:13) (cid:5)(cid:19)\n\n(cid:20) (cid:2)(cid:17) (cid:9)\n(cid:20) (cid:2)(cid:17) (cid:9)\n\n(cid:1)(cid:2)(cid:3)(cid:2)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:2)(cid:9)\n(cid:1)(cid:2)(cid:3)(cid:2)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:2)(cid:9)\n\n(cid:23) (cid:7)(cid:2)(cid:8)(cid:5)(cid:13)\n(cid:23) (cid:7)(cid:2)(cid:8)(cid:5)(cid:13)\n(cid:11) (cid:7)(cid:12)\n(cid:11) (cid:7)(cid:12)\n\n(cid:16) (cid:5)(cid:13) (cid:6)(cid:17)\n(cid:16) (cid:5)(cid:13) (cid:6)(cid:17)\n\n(cid:18) (cid:9)\n(cid:18) (cid:9)\n\n(cid:1)(cid:10)\n(cid:1)(cid:10)\n\n(cid:2)(cid:4)\n\n(cid:27) (cid:2)(cid:4)\n\n(cid:27) (cid:2)(cid:20)\n\n(cid:11) (cid:2)(cid:9)\n\n(cid:4) (cid:3)(cid:10)\n\n(cid:11) (cid:7)\n\n(cid:6)(cid:4)\n\n(cid:13) (cid:6)(cid:14) (cid:3)\n\n(cid:6)(cid:8)(cid:6)(cid:4)(cid:5)(cid:3)(cid:13)\n\n(cid:7)(cid:15)\n\n(cid:17) (cid:8)\n\n(cid:1)(cid:13) (cid:5)(cid:19)\n\n(cid:20) (cid:2)(cid:17) (cid:9)\n\n(cid:1)(cid:2)(cid:3)(cid:2)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:2)(cid:9)\n\n(cid:1)(cid:10)\n\n(cid:18) (cid:9)\n\n(cid:16) (cid:5)(cid:13) (cid:6)(cid:17)\n(cid:11) (cid:7)(cid:12)\n(cid:23) (cid:7)(cid:2)(cid:8)(cid:5)(cid:13)\n\nFigure 4: Examples of input data of comparison methods\n\nTable 2: Results of question classi\ufb01cation by SVM with comparison kernel functions eval-\nuated by F-measure\n\nTIME TOP\n\n3\n\n4\n\n2\n\nn\n\n1\n.951 .942 .926\nHDAG-K -\n.946 .913 .869\n-\nDAG-K\n.615 .564 .403\n-\nDS-K\n.946 .910 .866\nSeq-K\n-\nBOW-K .899 .906 .885 .853\n\nLOCATION\n\n2\n\n3\n\n4\n\n.802 .813 .784\n.803 .774 .729\n.544 .507 .466\n.792 .774 .733\n.748 .772 .757 .745\n\n1\n-\n-\n-\n-\n\n2\n\n3\n\nORGANIZATION\n1\n4\n-\n-\n-\n-\n\n.716 .712 .697\n.704 .671 .610\n.535 .509 .419\n.706 .668 .595\n.638 .690 .633 .571\n\nNUMEX\n2\n3\n\n4\n\n1\n-\n-\n-\n-\n\n.916 .922 .874\n.912 .880 .813\n.602 .504 .424\n.913 .885 .815\n.841 .846 .804 .719\n\nmore semantic features within texts [7, 10]. We used three different QA data sets written\nin Japanese [10].\n\nWe compared the performance of the proposed kernel, the HDAG Kernel (HDAG-K), with\nDAG kernels (DAG-K), Dependency Structure kernels (DS-K) [2], and sequence kernels\n(Seq-K) [9]. Moreover, we evaluated the bag-of-words kernel (BOW-K) [6], that is, the\nbag-of-words with polynomial kernels, as the baseline method. The main difference be-\ntween each method is the ability to treat syntactic and semantic information within texts.\nFigure 4 shows the differences of input objects between each method. For better under-\nstanding, these examples are shown in English. We used words, named entity tags, and se-\nmantic information [5] for attributes. Seq-K only treats word order, DS-K and DAG-K treat\ndependency structures, and HDAG-K treats the NP and NE chunks with their dependency\nstructures. We used the same formula with our proposed method for DAG-K. Comparing\nHDAG-K to DAG-K shows the difference in performance between handling the hierarchi-\ncal structures and not handling them. We extended Seq-K and DS-K to improve the total\nperformance and to establish a more equal evaluation, with the same conditions, against our\nproposed method. Note that though DAG-K and DS-K handle input objects of the same\nform, their kernel calculation methods differ as do their return values. We used node skip\nparameter (cid:3)V (v) = 0:5 for all nodes v in each comparison.\nWe used SVM [11] as a kernel-based machine learning algorithm. We evaluated the per-\nformance of the comparison methods with question type TIME TOP, ORGANIZATION,\nLOCATION, and NUMEX, which are de\ufb01ned in the CRL QA-data1.\nTable 2 shows the average F-measure as evaluated by 5-fold cross validation. n in Table 2\nindicates the threshold of an attribute\u2019s number, that is, we evaluated only those HiASs that\ncontain less than n-attributes for each kernel calculation. As shown in this table, HDAG-\nK showed the best performance in the experiments. The experiments in this paper were\ndesigned to investigate how to improve the performance by using the richer syntactic and\nsemantic structures within texts. In the task of Question Classi\ufb01cation, a given question\nis classi\ufb01ed into Question Type, which re\ufb02ects the intention of the question. These results\n\n1http://www.cs.nyu.edu/\u02dcsekine/PROJECT/CRLQA/\n\n(cid:5)\n(cid:6)\n(cid:7)\n(cid:4)\n(cid:8)\n(cid:9)\n(cid:8)\n(cid:4)\n(cid:10)\n(cid:4)\n(cid:5)\n(cid:11)\n(cid:9)\n(cid:13)\n(cid:14)\n(cid:6)\n(cid:14)\n(cid:10)\n(cid:15)\n(cid:5)\n(cid:6)\n(cid:7)\n(cid:4)\n(cid:8)\n(cid:9)\n(cid:8)\n(cid:4)\n(cid:10)\n(cid:4)\n(cid:5)\n(cid:11)\n(cid:9)\n(cid:13)\n(cid:14)\n(cid:6)\n(cid:14)\n(cid:10)\n(cid:15)\n(cid:17)\n(cid:17)\n(cid:9)\n(cid:7)\n(cid:18)\n(cid:14)\n(cid:11)\n(cid:11)\n(cid:7)\n(cid:4)\n(cid:19)\n(cid:20)\n(cid:11)\n(cid:9)\n(cid:5)\n(cid:18)\n(cid:21)\n(cid:22)\n(cid:9)\n(cid:23)\n(cid:24)\n(cid:25)\n(cid:26)\n(cid:10)\n(cid:12)\n(cid:14)\n(cid:16)\n(cid:17)\n(cid:12)\n(cid:18)\n(cid:16)\n(cid:16)\n(cid:19)\n(cid:19)\n(cid:20)\n(cid:19)\n(cid:21)\n(cid:10)\n(cid:22)\n(cid:24)\n(cid:19)\n(cid:19)\n(cid:22)\n(cid:25)\n(cid:26)\n(cid:16)\n(cid:13)\n(cid:14)\n(cid:15)\n(cid:10)\n(cid:12)\n(cid:14)\n(cid:16)\n(cid:17)\n(cid:12)\n(cid:18)\n(cid:16)\n(cid:16)\n(cid:19)\n(cid:19)\n(cid:20)\n(cid:19)\n(cid:21)\n(cid:10)\n(cid:22)\n(cid:24)\n(cid:19)\n(cid:19)\n(cid:22)\n(cid:25)\n(cid:26)\n(cid:16)\n(cid:13)\n(cid:14)\n(cid:15)\n(cid:10)\n(cid:12)\n(cid:14)\n(cid:16)\n(cid:16)\n(cid:19)\n(cid:19)\n(cid:20)\n(cid:19)\n(cid:10)\n(cid:22)\n(cid:25)\n(cid:26)\n(cid:16)\n(cid:17)\n(cid:9)\n(cid:6)\n(cid:9)\n(cid:10)\n(cid:17)\n(cid:9)\n(cid:10)\n(cid:27)\n(cid:28)\n(cid:18)\n(cid:5)\n(cid:11)\n(cid:7)\n(cid:20)\n(cid:27)\n(cid:11)\n(cid:20)\n(cid:7)\n(cid:9)\n(cid:5)\n(cid:18)\n(cid:14)\n(cid:11)\n(cid:11)\n(cid:7)\n(cid:4)\n(cid:19)\n(cid:20)\n(cid:11)\n(cid:9)\n(cid:5)\n(cid:18)\n(cid:21)\n(cid:29)\n(cid:22)\n(cid:24)\n(cid:25)\n(cid:30)\n(cid:18)\n(cid:29)\n(cid:31)\n \n(cid:24)\n(cid:25)\n(cid:26)\n(cid:16)\n(cid:17)\n(cid:12)\n(cid:18)\n(cid:21)\n(cid:24)\n(cid:19)\n(cid:19)\n(cid:22)\n(cid:13)\n(cid:14)\n(cid:15)\n(cid:10)\n(cid:12)\n(cid:14)\n(cid:16)\n(cid:16)\n(cid:19)\n(cid:19)\n(cid:20)\n(cid:19)\n(cid:10)\n(cid:22)\n(cid:25)\n(cid:26)\n(cid:16)\n(cid:17)\n(cid:9)\n(cid:6)\n(cid:9)\n(cid:10)\n(cid:17)\n(cid:9)\n(cid:10)\n(cid:27)\n(cid:28)\n(cid:18)\n(cid:5)\n(cid:11)\n(cid:7)\n(cid:20)\n(cid:27)\n(cid:11)\n(cid:20)\n(cid:7)\n(cid:9)\n(cid:5)\n(cid:18)\n(cid:14)\n(cid:11)\n(cid:11)\n(cid:7)\n(cid:4)\n(cid:19)\n(cid:20)\n(cid:11)\n(cid:9)\n(cid:5)\n(cid:18)\n(cid:21)\n(cid:29)\n(cid:22)\n(cid:24)\n(cid:25)\n(cid:30)\n(cid:18)\n(cid:29)\n(cid:31)\n \n(cid:24)\n(cid:25)\n(cid:26)\n(cid:16)\n(cid:17)\n(cid:12)\n(cid:18)\n(cid:21)\n(cid:24)\n(cid:19)\n(cid:19)\n(cid:22)\n(cid:13)\n(cid:14)\n(cid:15)\n(cid:10)\n(cid:12)\n(cid:14)\n(cid:16)\n(cid:16)\n(cid:19)\n(cid:19)\n(cid:20)\n(cid:19)\n(cid:10)\n(cid:22)\n(cid:19)\n(cid:22)\n(cid:19)\n(cid:22)\n(cid:25)\n(cid:26)\n(cid:16)\n(cid:9)\n(cid:7)\n(cid:14)\n(cid:7)\n(cid:27)\n(cid:14)\n!\n(cid:18)\n(cid:10)\n\"\n(cid:5)\n(cid:18)\n(cid:14)\n(cid:10)\n(cid:17)\n(cid:18)\n(cid:4)\n(cid:7)\n(cid:18)\n(cid:7)\n(cid:9)\n!\n(cid:14)\n(cid:11)\n(cid:5)\n(cid:18)\n(cid:21)\n#\n(cid:29)\n(cid:31)\n \n(cid:24)\n(cid:25)\n(cid:26)\n(cid:24)\n(cid:19)\n(cid:22)\n(cid:16)\n(cid:17)\n(cid:12)\n(cid:18)\n(cid:21)\n(cid:19)\n(cid:19)\n(cid:22)\n(cid:13)\n(cid:14)\n(cid:15)\n\findicate that our approach, incorporating richer structure features within texts, is well suited\nto the tasks in the NLP applications.\n\nThe original DS-K requires exact matching of the tree structure, even when it is extended\nfor more \ufb02exible matching. This is why DS-K showed the worst performance in our ex-\nperiments. The sequence, DAG, and HDAG kernels offer approximate matching by the\nframework of node skip, which produces better performance in the tasks that evaluate the\nintention of the texts.\n\nThe structure of HDAG approaches that of DAG if we do not consider the hierarchical\nstructure. In addition, the structures of sequences and trees are entirely included in that of\nDAG. Thus, the HDAG kernel subsumes some of the discrete kernels, such as sequence,\ntree, and graph kernels.\n\n5 Conclusions\n\nThis paper proposed HDAG kernels, which can handle more of the rich syntactic and\nsemantic information present within texts. Our proposed method is a very generalized\nframework for handling structured natural language data. We evaluated the performance of\nHDAG kernels with the real NLP task of question classi\ufb01cation. Our experiments showed\nthat HDAG kernels offer better performance than sequence kernels, tree kernels, and the\nbaseline method bag-of-words kernels if the target task requires the use of the richer infor-\nmation within texts.\n\nReferences\n\n[1] M. Collins and N. Duffy. Convolution Kernels for Natural Language.\n\nInformation Processing Systems (NIPS\u20192001), 2001.\n\nIn Proc. of Neural\n\n[2] M. Collins and N. Duffy. Parsing with a Single Neuron: Convolution Kernels for Natural\n\nLanguage Problems. In Technical Report UCS-CRL-01-10. UC Santa Cruz, 2001.\n\n[3] C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.\n[4] D. Haussler. Convolution Kernels on Discrete Structures. In Technical Report UCS-CRL-99-10.\n\nUC Santa Cruz, 1999.\n\n[5] S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo, H. Nakaiwa, K. Ogura, Y. Oyama, and Y. Hayashi,\neditors. The Semantic Attribute System, Goi-Taikei \u2014 A Japanese Lexicon, volume 1. Iwanami\nPublishing, 1997. (in Japanese).\n\n[6] T. Joachims. Text Categorization with Support Vector Machines: Learning with Many Relevant\nFeatures. In Proc. of European Conference on Machine Learning(ECML \u201998), pages 137\u2013142,\n1998.\n\n[7] X. Li and D. Roth. Learning Question Classi\ufb01ers. In Proc. of the 19th International Conference\n\non Computational Linguistics (COLING 2002), pages 556\u2013562, 2002.\n\n[8] H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text Classi\ufb01cation\n\nUsing String Kernel. Journal of Machine Learning Research, 2:419\u2013444, 2002.\n\n[9] N. Cancedda and E. Gaussier and C. Goutte and J.-M. Renders. Word-Sequence Kernels. Jour-\n\nnal of Machine Learning Research, 3:1059\u20131082, 2003.\n\n[10] J. Suzuki, H. Taira, Y. Sasaki, and E. Maeda. Question Classi\ufb01cation using HDAG Kernel. In\nWorkshop on Multilingual Summarization and Question Answering (2003), pages 61\u201368, 2003.\n\n[11] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.\n[12] C. Watkins. Dynamic Alignment Kernels. In Technical Report CSD-TR-98-11. Royal Holloway,\n\nUniversity of London Computer Science Department, 1999.\n\n\f", "award": [], "sourceid": 2536, "authors": [{"given_name": "Jun", "family_name": "Suzuki", "institution": null}, {"given_name": "Yutaka", "family_name": "Sasaki", "institution": null}, {"given_name": "Eisaku", "family_name": "Maeda", "institution": null}]}