于洁.互联网定义挖掘:多特征N - gram Plus 分类方法[J].海南师范大学学报自科版,2017,30(3):253-260 |
互联网定义挖掘:多特征N - gram Plus 分类方法 |
Internet Definition Mining: Multi-feature N-gramPlus Classification Method |
投稿时间:2017-03-25 |
DOI:10.12051/j.issn.1674-4942.2017.03.003 |
中文关键词: 知识库 文本分类 定义抽取 N元模型 维基百科 |
英文关键词: knowledge base text classification definition extraction N-gram model Wikipedia |
基金项目: |
|
摘要点击次数: 897 |
全文下载次数: 408 |
中文摘要: |
互联网大数据的飞速发展对知识库的自动构建提出了迫切需求,互联网定义挖
掘是知识发现研究的基础.文章基于N -gram语言模型提出了一种改进的N- gram Plus 语言模
型,综合了词语、词性、语法依赖关系和定义的语言学模式等多种特征.通过定义挖掘框架生成
互联网语料库,在定义抽取研究中引入N -gram Plus 特征集和句子最大定义隶属度, 将句子转
换为多特征向量,比较使用几种分类器进行学习和分类.该方法在实验中取得了较好F2 measure
成绩. |
英文摘要: |
The rapid development of large internet data puts forward the urgent need for the automatic construction of knowledge
base. Internet definition mining is the foundation of knowledge discovery research. Based on the N-gram language
model , this paper proposes an improved N-gram Plus language model , which combines various features such as word features
, part of speech features , grammatical dependencies and linguistic patterns. Internet corpus is generated by definition
mining framework. The N-gram Plus feature set and the max membership degrees of sentences are introduced in the definition
extraction study. The sentence is transformed into vectors with multi-features , and several classifiers are used for learning
and classification. The method obtains a good F2 -measure result in the experiment. |
查看全文
查看/发表评论 下载PDF阅读器 |
关闭 |