Path: Top -> Journal -> Jurnal Internasional -> King Saud University -> 2020 -> Volume 32, Issue 5, June

A new hybrid method for Arabic multi-font text segmentation, and a reference corpus construction

Journal from gdlhub / 2021-08-24 11:54:42
Oleh : Abdelhay Zoizou, Arsalane Zarghili, Ilham Chaker, King Saud University
Dibuat : 2021-08-04, dengan 0 file

Keyword : Arabic text segmentation, Word segmentation, Template matching, Contour, Reference corpus
Url : http://www.sciencedirect.com/science/article/pii/S1319157818301769
Sumber pengambilan dokumen : Web

In analytical systems of Arabic text recognition, segmentation is a critical and decisive stage. In this stage, words after being extracted from the text document are segmented into individual characters to allow the feature extraction and classification. The absence of standard corpus for printed Arabic text does not allow a good and objective comparison between different segmentation systems. In this paper, we propose two contributions. The first one is a multi-font reference corpus of printed Arabic text, in which we grouped all segmentation problems and which can be used as a reference to compare different segmentation systems. The second is a hybrid method for segmentation of printed multi-font Arabic text. The proposed method is based on two of the most known techniques in the field: contour based and Template matching techniques. This method is insensitive to the font variability and the characters overlapping. To evaluate the results of this method, we have studied, implemented and tested other methods in literature by using the proposed corpus. The experimental results show that our method gives better segmentation rates.

Deskripsi Alternatif :

In analytical systems of Arabic text recognition, segmentation is a critical and decisive stage. In this stage, words after being extracted from the text document are segmented into individual characters to allow the feature extraction and classification. The absence of standard corpus for printed Arabic text does not allow a good and objective comparison between different segmentation systems. In this paper, we propose two contributions. The first one is a multi-font reference corpus of printed Arabic text, in which we grouped all segmentation problems and which can be used as a reference to compare different segmentation systems. The second is a hybrid method for segmentation of printed multi-font Arabic text. The proposed method is based on two of the most known techniques in the field: contour based and Template matching techniques. This method is insensitive to the font variability and the characters overlapping. To evaluate the results of this method, we have studied, implemented and tested other methods in literature by using the proposed corpus. The experimental results show that our method gives better segmentation rates.

Beri Komentar ?#(0) | Bookmark

PropertiNilai Properti
ID Publishergdlhub
OrganisasiKing Saud University
Nama KontakHerti Yani, S.Kom
AlamatJln. Jenderal Sudirman
KotaJambi
DaerahJambi
NegaraIndonesia
Telepon0741-35095
Fax0741-35093
E-mail Administratorelibrarystikom@gmail.com
E-mail CKOelibrarystikom@gmail.com

Print ...

Kontributor...

  • Editor: Calvin