Path: Top -> Journal -> Telkomnika -> 2013 -> Vol 11, No 4: December
Streamed Sampling on Dynamic data as Support for Classification Model
Streamed Sampling on Dynamic data as Support for Classification Model
Journal from gdlhub / 2016-11-08 04:28:08Oleh : Astried Silvanie, Taufik Djatna, Heru Sukoco, Telkomnika
Dibuat : 2013-12-01, dengan 1 file
Keyword : random sample, relative entropy, skewness, kullback lieblerdivergence, dynamic classification
Url : http://journal.uad.ac.id/index.php/TELKOMNIKA/article/view/1210
Data mining process on dynamically changing data have several problems, such as unknown data size and changing of class distribution. Random sampling method commonly applied for extracting general synopsis from very large database. In this research, VitterÂ’s reservoir algorithm is used to retrieve k records of data from the database and put into the sample. Sample is used as input for classification task in data mining. Sample type is backing sample and it saved as table contains value of id, priority and timestamp. Priority indicates the probability of how long data retained in the sample. Kullback-Leibler divergence applied to measure the similarity between database and sample distribution. Result of this research is showed that continuously taken samples randomly is possible when transaction occurs. Kullback-Leibler divergence with interval from 0 to 0.0001, is a very good measure to maintain similar class distribution between database and sample. Sample results are always up to date on new transactions with similar class distribution. Classifier built from balance class distribution showed to have better performance than from imbalance one.
Data mining process on dynamically changing data have several problems, such as unknown data size and changing of class distribution. Random sampling method commonly applied for extracting general synopsis from very large database. In this research, VitterÂ’s reservoir algorithm is used to retrieve k records of data from the database and put into the sample. Sample is used as input for classification task in data mining. Sample type is backing sample and it saved as table contains value of id, priority and timestamp. Priority indicates the probability of how long data retained in the sample. Kullback-Leibler divergence applied to measure the similarity between database and sample distribution. Result of this research is showed that continuously taken samples randomly is possible when transaction occurs. Kullback-Leibler divergence with interval from 0 to 0.0001, is a very good measure to maintain similar class distribution between database and sample. Sample results are always up to date on new transactions with similar class distribution. Classifier built from balance class distribution showed to have better performance than from imbalance one.
Beri Komentar ?#(0) | Bookmark
Properti | Nilai Properti |
---|---|
ID Publisher | gdlhub |
Organisasi | Telkomnika |
Nama Kontak | Herti Yani, S.Kom |
Alamat | Jln. Jenderal Sudirman |
Kota | Jambi |
Daerah | Jambi |
Negara | Indonesia |
Telepon | 0741-35095 |
Fax | 0741-35093 |
E-mail Administrator | elibrarystikom@gmail.com |
E-mail CKO | elibrarystikom@gmail.com |
Print ...
Kontributor...
- , Editor: sukadi
Download...
Download hanya untuk member.
1210-2007-1-SM
File : 1210-2007-1-SM.pdf
(113824 bytes)