Anglia Ruskin Research Online (ARRO)
Browse

AI to protect AI: A modular pipeline for detecting label-flipping poisoning attacks

Download (5.33 MB)
journal contribution
posted on 2025-10-27, 15:50 authored by Hossein Abroshan
Modern machine learning models are vulnerable to data poisoning attacks that compromise the integrity of their training data, with label flipping being a particularly insidious variant. In a label flipping attack, an adversary maliciously alters a fraction of the training labels to mislead the model, which can significantly degrade performance or cause targeted misclassifications while often evading simple detection. In this work, we address this threat by introducing a modular, attack-agnostic detection framework (“AI to Protect AI”) that monitors model behaviour for poisoning indicators without requiring internal access or changes to the target model. A Behaviour Monitoring Module (BMM) continuously observes the model’s outputs, extracting telltale features such as prediction probabilities, entropy, and margins for each input. These features are analysed by an ensemble of detector models, including supervised classifiers and unsupervised anomaly detectors, that collaboratively flag suspicious training samples indicative of label tampering. The proposed framework is dataset-agnostic and model-agnostic, as demonstrated across diverse image classification tasks using the MNIST (handwritten digits), CIFAR-10 (natural images), and ChestXray14 (medical X-rays) datasets. Experimental results indicate that the system reliably detects poisoned data with high accuracy (e.g., an area under the ROC curve exceeding 0.95 on MNIST, above 0.90 on CIFAR-10, and up to 0.85 on ChestXray14), while maintaining low false alarm rates. This work highlights a novel “AI to protect AI” approach, leveraging multiple lightweight detectors in concert to safeguard learning processes across different domains and thereby enhance the security and trustworthiness of AI systems.<p></p>

History

Item sub-type

Article

Refereed

  • Yes

Volume

22

Publication title

Machine Learning with Applications

ISSN

2666-8270

Publisher

Elsevier

File version

  • Published version

Affiliated with

  • School of Computing and Information Science Outputs

Usage metrics

    ARU Outputs

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC