Anglia Ruskin Research Online (ARRO)

File(s) under permanent embargo

A novel machine learning pipeline to detect malicious anomalies for the Internet of Things

journal contribution
posted on 2023-09-04, 10:49 authored by Raj Mani Shukla, Shamik Sengupta
Anomaly detection is an imperative problem in the field of the Internet of Things (IoT). The anomalies are considered as samples that do not follow a normal pattern and significantly differ from the expected values. There can be numerous reasons an IoT sensor data is anomalous. For example, it can be due to abnormal events, IoT sensor faults, or malicious manipulation of data generated from IoT devices. There has been wide-scale research done on anomaly detection problems in general, i.e., finding the samples in data that differ significantly from the expected values. However, there has been limited work done to figure out the inherent cause of the anomalies in IoT sensor data. Accordingly, once an abnormal data sample has been observed, the challenge of detecting whether the anomaly is due to an abnormal event or IoT sensor data manipulation by an attacker has not been explored in detail. In this paper, rather than finding the typical anomalies, we propose a method to detect malicious anomalies. The given paper puts forward an idea of where anomalies in IoT can be categorized into different types. Consequently, rather than finding an anomalous sample point, our method filters only malicious anomalies in the measured IoT data. Initially, we provide an attack model for the IoT sensor data and show how the model can affect the decision-making abilities of IoT-based applications by introducing malicious anomalies. Further, we design a novel Machine Learning (ML) based method to detect these malicious anomalies. Our ML method is inspired by ensemble machine learning and uses threshold and aggregation methods rather than the traditional methods of output aggregation in ensemble learning. The proposed ML architecture is tested using pollutant, telemetry, and vehicular traffic data obtained from the state of California. Simulation results show that our architecture performs with a decent accuracy for various sizes of malicious anomalies. In particular, by setting the parameters of the anomaly detector, the precision, recall, and F-score values of 93%, 94%, and 93% are obtained; i.e., a well-balance between all three metrics. By varying model parameters either precision or recall value can be increased further at the cost of other showing that the model is tunable to meet the application requirement.



  • No



Issue number


Page range


Publication title

Internet of Things




Elsevier BV

File version

  • Accepted version


  • eng

Legacy posted date


Legacy creation date


Legacy Faculty/School/Department

Faculty of Science & Engineering

Usage metrics

    ARU Outputs


    No categories selected


    Ref. manager