An Advanced Approach to Inspect the Influence of Dataset Size on the Enactment of Data Mining Processes

Back to Accomplishments

Accomplishments

An Advanced Approach to Inspect the Influence of Dataset Size on the Enactment of Data Mining Processes

Details
Share

Category

Articles

Authors

Dhanashree Toradmalle

Publisher

Scopus

volume

Issue

Pages

Abstract

In order to organise potential donors into distinct groups based on their eligibility and level of interest, a new method is being proposed. Information extraction and categorization methods have been developed. Learning that leads to a definitive categorization, based on an assessment of the relevant true values, corresponds to these. Typically, the same large-scale clustering algorithms are employed. Advanced clustering methods are being defined, with the partitioning approach over medoids being the most commonly used to construct clusters. With each iteration, a clearer and more condensed set of cluster objects is produced in parallel with the donor search. To make the system more resilient against noise and structure, it is being defined in a way that simplifies the process of establishing clusters. The study also takes outliers into account. We evaluate the efficiency of classification algorithms by changing the number of records in the dataset from 500 to 4000, using a mix of classification algorithms and the Bayesian-D pre-processing technique implemented in the KEEL tool. We look into how different sized datasets affect training and testing classification accuracy. Experiment results show that C4.5-C fared better than the rest of the field, and that the global classification error is on average 0.00185, with a standard deviation of 0.00421, and a rate of correctly classified samples of 0.996 when the sample size is varied from 500 to 4000.