On Studying the Effect of Data Quality on Classification Performance

Roxane Jouseau, Sébastien Salva and Chafik Samir

September, 2022

Abstract

During the last decade, data have played a key role for learn- ing and decision making models. Unfortunately, the quality of data has been ignored or partially investigated as a pre-processing step. Moti- vated by applications in various fields, we propose to study data quality and its impact on the performance of several learning models. In this work, we first introduce a list of elementary repairing tasks ranging from easy to complex with an increasing level. Then, we form categories from the state-of-the-art cleaning and repairing methods. We also investigate if it is always efficient to repair data. By including standard classifica- tions models and public dataset, our work enables their use in different contexts and can be extended to other machine learning applications.

Type

Publication

23rd International Conference on Intelligent Data Engineering and Automated Learning (IDEAL), 24-26 November 2022, Manchester, UK