STIKOM DB Digital Library

Home

Login / Registerasi / Aktivasi

Kontak

Info

Versi liveCD dari koleksi perpustakaan
STIKOM DB Digital Library
Alamat: Jln. Jenderal Sudirman
Info lebih lanjut

Bahasa

Links

Path: Top -> Journal -> Telkomnika -> 2015 -> Vol 13, No 4: December

Automatically generation and evaluation of Stop words list for Chinese Patents

Automatically generation and evaluation of Stop words list for Chinese Patents

Journal from gdlhub / 2016-11-16 08:40:12
Oleh : Deng Na, Chen Xu, Telkomnika
Dibuat : 2015-12-01, dengan 1 file

Keyword : stop word; patent; statistics; info rmation retrieval; word frequency
Url : http://journal.uad.ac.id/index.php/TELKOMNIKA/article/view/2389

As an important preprocessing step of information retrieval and information processing, the accuracy of stop words elimination directly influences the ultimate result of retrieval and mining. In information retrieval, stop words elimination can compress the storage space of index, and in text mining, it can reduce the dimension of vector space enormously, save the storage space of vector space and speed up the calculation. However, Chinese patents are a kind of legal documents containing technical information, and the general Chinese stop words list is not applicable for them. This paper advances two methodologies for Chinese patents. One is based on word frequency and the other on statistics. Through experiments on real patents data, these two methodologies accuracy are compared under several corpuses with different scale, and also compared with general stop list. The experiment result indicates that both of these two methodologies can extract the stop words suitable for Chinese patents and the accuracy of Methodology based on statistics is a little higher than the one based on word frequency.

Deskripsi Alternatif :

As an important preprocessing step of information retrieval and information processing, the accuracy of stop words elimination directly influences the ultimate result of retrieval and mining. In information retrieval, stop words elimination can compress the storage space of index, and in text mining, it can reduce the dimension of vector space enormously, save the storage space of vector space and speed up the calculation. However, Chinese patents are a kind of legal documents containing technical information, and the general Chinese stop words list is not applicable for them. This paper advances two methodologies for Chinese patents. One is based on word frequency and the other on statistics. Through experiments on real patents data, these two methodologies accuracy are compared under several corpuses with different scale, and also compared with general stop list. The experiment result indicates that both of these two methodologies can extract the stop words suitable for Chinese patents and the accuracy of Methodology based on statistics is a little higher than the one based on word frequency.

Beri Komentar ?#(0) | Bookmark

Properti	Nilai Properti
ID Publisher	gdlhub
Organisasi	Telkomnika
Nama Kontak	Herti Yani, S.Kom
Alamat	Jln. Jenderal Sudirman
Kota	Jambi
Daerah	Jambi
Negara	Indonesia
Telepon	0741-35095
Fax	0741-35093
E-mail Administrator	elibrarystikom@gmail.com
E-mail CKO	elibrarystikom@gmail.com

Print ...

Kontributor...

, Editor: sukadi

Download...

Download hanya untuk member.
2389-6180-1-PB

File : 2389-6180-1-PB.pdf
(204524 bytes)

GDL

Info

Menu

Bahasa

Links

GDL