互联网定义挖掘:多特征N - gram Plus 分类方法

于洁

文章摘要

于洁.互联网定义挖掘:多特征N - gram Plus 分类方法[J].海南师范大学学报自科版,2017,30(3):253-260

互联网定义挖掘:多特征N - gram Plus 分类方法

Internet Definition Mining: Multi-feature N-gramPlus Classification Method

投稿时间：2017-03-25

DOI：10.12051/j.issn.1674-4942.2017.03.003

中文关键词: 知识库文本分类定义抽取 N元模型维基百科

英文关键词: knowledge base text classification definition extraction N-gram model Wikipedia

基金项目:

作者	单位
于洁	福建信息职业技术学院计算机工程系，福建福州3 50003

摘要点击次数: 1489

全文下载次数: 408

中文摘要:

互联网大数据的飞速发展对知识库的自动构建提出了迫切需求，互联网定义挖掘是知识发现研究的基础.文章基于N -gram语言模型提出了一种改进的N- gram Plus 语言模型，综合了词语、词性、语法依赖关系和定义的语言学模式等多种特征.通过定义挖掘框架生成互联网语料库，在定义抽取研究中引入N -gram Plus 特征集和句子最大定义隶属度，将句子转换为多特征向量，比较使用几种分类器进行学习和分类.该方法在实验中取得了较好F2 measure 成绩.

英文摘要:

The rapid development of large internet data puts forward the urgent need for the automatic construction of knowledge base. Internet definition mining is the foundation of knowledge discovery research. Based on the N-gram language model , this paper proposes an improved N-gram Plus language model , which combines various features such as word features , part of speech features , grammatical dependencies and linguistic patterns. Internet corpus is generated by definition mining framework. The N-gram Plus feature set and the max membership degrees of sentences are introduced in the definition extraction study. The sentence is transformed into vectors with multi-features , and several classifiers are used for learning and classification. The method obtains a good F2 -measure result in the experiment.

查看全文查看/发表评论下载PDF阅读器

关闭