Path: Top -> Journal -> Jurnal Internasional -> King Saud University -> 2020 -> Volume 32, Issue 5, June
Classifying protein-protein interaction articles from biomedical literature using many relevant features and context-free grammar
Oleh : Sabenabanu Abdulkadhar, Gurusamy Murugesan, Jeyakumar Natarajan, King Saud University
Dibuat : 2021-08-04, dengan 0 file
Keyword : Article classification task, Protein-protein interaction, Named entity recognition, Boosting classifier, Latent semantic analysis, Context free grammar
Url : http://www.sciencedirect.com/science/article/pii/S1319157817301829
Sumber pengambilan dokumen : Web
Detecting the articles which consist of protein-protein interactions (PPI) is a significant step in biological information extraction. In this paper, we present a hybrid text classification (TC) method to identify protein–protein interaction articles. Our methodology comprises of four modules i) Feature extraction, ii) Semantic similarity based feature selection iii) Ensemble learning and iv) Context free grammar (CFG) based post processing to classify PPI relevant articles. In first module, we extracted many linguistic and domain specific features such as protein names, interaction cues etc., to classify the documents. The second module used similarity based feature selection to extract the relevant efficient features. In third module, we employed AdaBoost based ensemble learning to improve the performance of weak learning classifiers. The final module incorporates CFG based pattern matching to resolve the errors in the classifiers. The performance of our hybrid TC method was trained and tested on BioCreative III corpus in which we attained the precision of 0.5813 and recall of 0.6582. The overall F-score of the system was 0.6228 and our hybrid approach combined with ensemble classifier and CFG post-processing method outperforms most of the state of-the-art systems.
Deskripsi Alternatif :Detecting the articles which consist of protein-protein interactions (PPI) is a significant step in biological information extraction. In this paper, we present a hybrid text classification (TC) method to identify protein–protein interaction articles. Our methodology comprises of four modules i) Feature extraction, ii) Semantic similarity based feature selection iii) Ensemble learning and iv) Context free grammar (CFG) based post processing to classify PPI relevant articles. In first module, we extracted many linguistic and domain specific features such as protein names, interaction cues etc., to classify the documents. The second module used similarity based feature selection to extract the relevant efficient features. In third module, we employed AdaBoost based ensemble learning to improve the performance of weak learning classifiers. The final module incorporates CFG based pattern matching to resolve the errors in the classifiers. The performance of our hybrid TC method was trained and tested on BioCreative III corpus in which we attained the precision of 0.5813 and recall of 0.6582. The overall F-score of the system was 0.6228 and our hybrid approach combined with ensemble classifier and CFG post-processing method outperforms most of the state of-the-art systems.
Beri Komentar ?#(0) | Bookmark
Properti | Nilai Properti |
---|---|
ID Publisher | gdlhub |
Organisasi | King Saud University |
Nama Kontak | Herti Yani, S.Kom |
Alamat | Jln. Jenderal Sudirman |
Kota | Jambi |
Daerah | Jambi |
Negara | Indonesia |
Telepon | 0741-35095 |
Fax | 0741-35093 |
E-mail Administrator | elibrarystikom@gmail.com |
E-mail CKO | elibrarystikom@gmail.com |
Print ...
Kontributor...
- Editor: Calvin