Path: Top -> Journal -> Jurnal Internasional -> Journal -> Computer

Effective Unsupervised Arabic Word Stemming: Towards an Unsupervised Radicals Extraction

Effective Unsupervised Arabic Word Stemming: Towards an Unsupervised Radicals Extraction

2010
Journal from gdlhub / 2017-08-14 11:52:32
Oleh : Ahmed Khorsi, IAJIT
Dibuat : 2012-06-23, dengan 1 file

Keyword : Computational morphology, machine learning, natural language processing, classical arabic, and semitic languages.
Subjek : Effective Unsupervised Arabic Word Stemming: Towards an Unsupervised Radicals Extraction
Url : http://www.ccis2k.org/iajit/PDF/vol.9,no.6/4047-10.pdf
Sumber pengambilan dokumen : Internet

This paper presents a new totally unsupervised and 90% effective stemming approach for classical Arabic. This


stemming is meant to be a preparatory step to an unsupervised root (i.e., radicals) extraction. As a learning input, our


stemming system requires no linguistic knowledge but a plain classical Arabic text. Once the learning input analyzed, our


stemming system is able to extract the strongest segment of a given length, namely the stem. We start by a definition of the


targeted stem, then, we show how our system performs about 90% true positives after a leaning of less than 15000 words.


Unlike the other unsupervised approaches, ours does not suppose the perfectness of the input text and deals efficiently with the


eventual (practically very frequent) misspellings. The test corpus we have used is an ultimate reference in the classical Arabic


and its labeling has been rigorously done by a team of experts.

Deskripsi Alternatif :

This paper presents a new totally unsupervised and 90% effective stemming approach for classical Arabic. This


stemming is meant to be a preparatory step to an unsupervised root (i.e., radicals) extraction. As a learning input, our


stemming system requires no linguistic knowledge but a plain classical Arabic text. Once the learning input analyzed, our


stemming system is able to extract the strongest segment of a given length, namely the stem. We start by a definition of the


targeted stem, then, we show how our system performs about 90% true positives after a leaning of less than 15000 words.


Unlike the other unsupervised approaches, ours does not suppose the perfectness of the input text and deals efficiently with the


eventual (practically very frequent) misspellings. The test corpus we have used is an ultimate reference in the classical Arabic


and its labeling has been rigorously done by a team of experts.

Beri Komentar ?#(0) | Bookmark

PropertiNilai Properti
ID Publishergdlhub
OrganisasiIAJIT
Nama KontakHerti Yani, S.Kom
AlamatJln. Jenderal Sudirman
KotaJambi
DaerahJambi
NegaraIndonesia
Telepon0741-35095
Fax0741-35093
E-mail Administratorelibrarystikom@gmail.com
E-mail CKOelibrarystikom@gmail.com

Print ...

Kontributor...

  • , Editor: fachruddin

Download...

  • Download hanya untuk member.

    23
    Download Image
    File : 23.47.PDF

    (638760 bytes)