Path: Top -> Journal -> Jurnal Internasional -> King Saud University -> 2014 -> Volume 26, Issue 4, December
Transliteration normalization for Information Extraction and Machine Translation
Oleh : Yuval Marton, Imed Zitouni, King Saud University
Dibuat : 2014-12-16, dengan 1 file
Keyword : Arabic Named Entity Recognition Transliteration Name normalization Information Extraction Machine Translation
Url : http://www.sciencedirect.com/science/article/pii/S1319157814000354
Sumber pengambilan dokumen : web
Foreign name transliterations typically include multiple spelling variants. These variants cause data sparseness and inconsistency problems, increase the Out-of-Vocabulary (OOV) rate, and present challenges for Machine Translation, Information Extraction and other natural language processing (NLP) tasks. This work aims to identify and cluster name spelling variants using a Statistical Machine Translation method: word alignment. The variants are identified by being aligned to the same pivot name in another language (the source-language in Machine Translation settings). Based on word-to-word translation and transliteration probabilities, as well as the string edit distance metric, names with similar spellings in the target language are clustered and then normalized to a canonical form. With this approach, tens of thousands of high-precision name transliteration spelling variants are extracted from sentence-aligned bilingual corpora in Arabic and English (in both languages). When these normalized name spelling variants are applied to Information Extraction tasks, improvements over strong baseline systems are observed. When applied to Machine Translation tasks, a large improvement potential is shown.
Beri Komentar ?#(0) | Bookmark
Properti | Nilai Properti |
---|---|
ID Publisher | gdlhub |
Organisasi | King Saud University |
Nama Kontak | Herti Yani, S.Kom |
Alamat | Jln. Jenderal Sudirman |
Kota | Jambi |
Daerah | Jambi |
Negara | Indonesia |
Telepon | 0741-35095 |
Fax | 0741-35093 |
E-mail Administrator | elibrarystikom@gmail.com |
E-mail CKO | elibrarystikom@gmail.com |
Print ...
Kontributor...
- , Editor: sukadi
Download...
Download hanya untuk member.
1-s2
File : 1-s2.0-S1319157814000354-main.pdf
(560919 bytes)