文章摘要
于洁.互联网定义挖掘:多特征N - gram Plus 分类方法[J].海南师范大学学报自科版,2017,30(3):253-260
互联网定义挖掘:多特征N - gram Plus 分类方法
Internet Definition Mining: Multi-feature N-gramPlus Classification Method
投稿时间:2017-03-25  
DOI:10.12051/j.issn.1674-4942.2017.03.003
中文关键词: 知识库  文本分类  定义抽取  N元模型  维基百科
英文关键词: knowledge base  text classification  definition extraction  N-gram model  Wikipedia
基金项目:
作者单位
于洁 福建信息职业技术学院计算机工程系福建福州3 50003 
摘要点击次数: 799
全文下载次数: 408
中文摘要:
      互联网大数据的飞速发展对知识库的自动构建提出了迫切需求,互联网定义挖 掘是知识发现研究的基础.文章基于N -gram语言模型提出了一种改进的N- gram Plus 语言模 型,综合了词语、词性、语法依赖关系和定义的语言学模式等多种特征.通过定义挖掘框架生成 互联网语料库,在定义抽取研究中引入N -gram Plus 特征集和句子最大定义隶属度, 将句子转 换为多特征向量,比较使用几种分类器进行学习和分类.该方法在实验中取得了较好F2 measure 成绩.
英文摘要:
      The rapid development of large internet data puts forward the urgent need for the automatic construction of knowledge base. Internet definition mining is the foundation of knowledge discovery research. Based on the N-gram language model , this paper proposes an improved N-gram Plus language model , which combines various features such as word features , part of speech features , grammatical dependencies and linguistic patterns. Internet corpus is generated by definition mining framework. The N-gram Plus feature set and the max membership degrees of sentences are introduced in the definition extraction study. The sentence is transformed into vectors with multi-features , and several classifiers are used for learning and classification. The method obtains a good F2 -measure result in the experiment.
查看全文   查看/发表评论  下载PDF阅读器
关闭