Path: Top -> Journal -> Telkomnika -> 2021 -> Vol 19, No 1, February

WEIDJ: Development of a new algorithm for semi-structured web data extraction

Journal from gdlhub / 2021-09-04 15:47:59
Oleh : Ily Amalina Ahmad Sabri, Mustafa Man, Telkomnika
Dibuat : 2021-02-06, dengan 1 file

Keyword : document object model; JavaScript object notation; web data extraction; Wrapper extraction of image
Url : http://journal.uad.ac.id/index.php/TELKOMNIKA/article/view/16205
Sumber pengambilan dokumen : web

In the era of industrial digitalization, people are increasingly investing in solutions that allow their process for data collection, data analysis and performance improvement. In this paper, advancing web scale knowledge extraction and alignment by integrating few sources by exploring different methods of aggregation and attention is considered in order focusing on image information. The main aim of data extraction with regards to semi-structured data is to retrieve beneficial information from the web. The data from web also known as deep web is retrievable but it requires request through form submission because it cannot be performed by any search engines. As the HTML documents start to grow larger, it has been found that the process of data extraction has been plagued with lengthy processing time. In this research work, we propose an improved model namely wrapper extraction of image using document object model (DOM) and JavaScript object notation data (JSON) (WEIDJ) in response to the promising results of mining in a higher volume of image from a various type of format. To observe the efficiency of WEIDJ, we compare the performance of data extraction by different level of page extraction with VIBS, MDR, DEPTA and VIDE. It has yielded the best results in Precision with 100, Recall with 97.93103 and F-measure with 98.9547.

Beri Komentar ?#(0) | Bookmark

PropertiNilai Properti
ID Publishergdlhub
OrganisasiTelkomnika
Nama KontakHerti Yani, S.Kom
AlamatJln. Jenderal Sudirman
KotaJambi
DaerahJambi
NegaraIndonesia
Telepon0741-35095
Fax0741-35093
E-mail Administratorelibrarystikom@gmail.com
E-mail CKOelibrarystikom@gmail.com

Print ...

Kontributor...

  • , Editor: sukadi

Download...