Cost-Sensitive Evaluation for Binary Classifiers

Selecting an appropriate evaluation metric for classifiers is crucial for model comparison, parameter optimization, and deployment decisions, yet there is no consensus on a broadly accepted evaluation paradigm explicitly aligned with Total Classification Cost (TCC) minimization. At the same time, class imbalance is often treated as a problem to be corrected \emph{per se}, potentially causing misalignments with TCC minimization. To address these limitations, (\emph{i}) we define Weighted Accuracy (WA), an evaluation metric for binary classifiers with a straightforward interpretation as a weighted version of accuracy and (\emph{ii}) we propose a general reweighting framework for handling class imbalance in cost-sensitive scenarios, providing an alternative to resampling techniques. This framework applies to any evaluation metric or loss function that can be expressed as a linear combination of example-dependent quantities; it enables meaningful comparison of evaluation results obtained on different datasets and accounts for discrepancies between the \emph{development} dataset, used for training, validation, and testing, and the \emph{target} dataset, where the model will be deployed. Within this framework, we derive the conditions under which standard rebalancing techniques remain coherent with TCC minimization, and when they may instead become misleading. We prove that, under example-independent Unit Classification Costs, maximizing WA is equivalent to minimizing TCC. Finally, we analyze the robustness of WA in realistic example-dependent cost scenarios by studying its correlation with TCC across a broad range of class imbalance and cost regimes. The results show that WA maintains robust alignment with TCC across almost all examined scenarios.