Path: Top -> Journal -> Jurnal ITB -> 2017 -> Vol 11, No 2
A Printed PAW Image Database of Arabic Language for Document Analysis and Recognition
Oleh : Bilal Bataineh, ITB
Dibuat : 2017-11-06, dengan 1 file
Keyword : Arabic language; database; document images; information retrieval; OCR; PAWs
Url : http://journals.itb.ac.id/index.php/jictra/article/view/3271
Sumber pengambilan dokumen : WEB
Document image analysis and recognition are important topics in the field of artificial intelligence. In this context, the availability of a database with good script samples is an important requirement for machine-learning processes. For Latin and Asian languages many suitable databases exist. However, there is a shortage of databases with Arabic samples. In this work, a new database of printed Arabic text is introduced. The new concept of collecting sub-words (PAWs) instead of words or individual character samples was adopted. These PAWs constitute all words in the Arabic language. The collected database consists of 83,056 images of PAWs extracted from approximately 550,000 different words. Each sample is presented in the database in five font types: Thuluth, Naskh, Andalusi, Typing Machine, and Kufi. In total, the database consists of 415,280 images. Moreover, ground truth information is included with each PAW image to describe its occurrence number, occurrence frequency, positions and the shapes of the characters. This paper presents a statistical analysis of the frequency of each PAW in the Arabic language
Beri Komentar ?#(0) | Bookmark
Properti | Nilai Properti |
---|---|
ID Publisher | gdlhub |
Organisasi | ITB |
Nama Kontak | Herti Yani, S.Kom |
Alamat | Jln. Jenderal Sudirman |
Kota | Jambi |
Daerah | Jambi |
Negara | Indonesia |
Telepon | 0741-35095 |
Fax | 0741-35093 |
E-mail Administrator | elibrarystikom@gmail.com |
E-mail CKO | elibrarystikom@gmail.com |
Print ...
Kontributor...
- , Editor: sukadi
Download...
Download hanya untuk member.
3271-18846-3-PB
File : 3271-18846-3-PB.pdf
(452056 bytes)