A New Metric for Measuring the Intrinsic Quality in Data Collected for Quantitative Classification

Roxane Jouseau and Sébastien Salva and Chafik Samir

May, 2025 Data quality

Abstract

Learninganoptimalclassificationmodelintrinsicallydepends on data quality. Despite many eﬀorts for its characterization, existing methods have often limited quality measures to specific criteria, lead- ing to the lack of comprehensive definitions and rigorous formulations. Indeed, its evaluation is related to the context and often requires exter- nal elements, which implies a process that is long and prone to errors. Therefore, there is still a strong need for solutions that enable eﬀective data quality assessment. This paper addresses the resulting scientific challenges and introduces a new metric, specifically designed for numerical classification problems. Unlike existing measures, the proposed solution is based on the corre- lated evolution between classification performance and data deteriora- tion. Therefore, it oﬀers three main advantages: Being model indepen- dent, not requiring the use of external reference data while oﬀering a solution that is easy to adapt for several real-world scenarios. Addition- ally, we provide a comprehensive interpretation of the quality scores and illustrate the main evaluation levels with use cases. We demonstrate its eﬀectiveness through extensive experiments and comparisons with the state of the art.

Type

Publication

In Book Agents and Artificial Intelligence - 16th International Conference, {ICAART} 2024, Rome, Italy, February 24-26, 2024, Revised Selected Papers, Part {I}

Data quality