A New Metric for Measuring the Intrinsic Quality in Data Collected for Quantitative Classification

Abstract

Learninganoptimalclassificationmodelintrinsicallydepends on data quality. Despite many efforts for its characterization, existing methods have often limited quality measures to specific criteria, lead- ing to the lack of comprehensive definitions and rigorous formulations. Indeed, its evaluation is related to the context and often requires exter- nal elements, which implies a process that is long and prone to errors. Therefore, there is still a strong need for solutions that enable effective data quality assessment. This paper addresses the resulting scientific challenges and introduces a new metric, specifically designed for numerical classification problems. Unlike existing measures, the proposed solution is based on the corre- lated evolution between classification performance and data deteriora- tion. Therefore, it offers three main advantages: Being model indepen- dent, not requiring the use of external reference data while offering a solution that is easy to adapt for several real-world scenarios. Addition- ally, we provide a comprehensive interpretation of the quality scores and illustrate the main evaluation levels with use cases. We demonstrate its effectiveness through extensive experiments and comparisons with the state of the art.

Type
Publication
In Book Agents and Artificial Intelligence - 16th International Conference, {ICAART} 2024, Rome, Italy, February 24-26, 2024, Revised Selected Papers, Part {I}