Researchers: Liangwei Zhang, Ramin Karim, Janet Lin
Big Data analytics has attracted intense interest recently for its attempt to extract information, knowledge and wisdom from Big Data. In industry, with the development of sensor technology and Information & Communication Technologies (ICT), reams of high-dimensional, streaming, and nonlinear data are being collected and curated to support decision-making. The detection of faults in these data is an important application in eMaintenance solutions, as it can facilitate maintenance decision-making. Early discovery of system faults may ensure the reliability and safety of industrial systems and reduce the risk of unplanned breakdowns.
Complexities in the data, including high dimensionality, fast-flowing data streams, and high nonlinearity, impose stringent challenges on fault detection applications. From the data modelling perspective, high dimensionality may cause the notorious “curse of dimensionality” and lead to deterioration in the accuracy of fault detection algorithms. Fast-flowing data streams require algorithms to give real-time or near real-time responses upon the arrival of new samples. High nonlinearity requires fault detection approaches to have sufficiently expressive power and to avoid overfitting or underfitting problems.
Most existing fault detection approaches work in relatively low-dimensional spaces. Theoretical studies on high-dimensional fault detection mainly focus on detecting anomalies on subspace projections. However, these models are either arbitrary in selecting subspaces or computationally intensive. To meet the requirements of fast-flowing data streams, several strategies have been proposed to adapt existing models to an online mode to make them applicable in stream data mining. But few studies have simultaneously tackled the challenges associated with high dimensionality and data streams. Existing nonlinear fault detection approaches cannot provide satisfactory performance in terms of smoothness, effectiveness, robustness and interpretability. New approaches are needed to address this issue.
This research develops an Angle-based Subspace Anomaly Detection (ABSAD) approach to fault detection in high-dimensional data. The efficacy of the approach is demonstrated in analytical studies and numerical illustrations. Based on the sliding window strategy, the approach is extended to an online mode to detect faults in high-dimensional data streams. Experiments on synthetic datasets show the online extension can adapt to the time-varying behaviour of the monitored system and, hence, is applicable to dynamic fault detection. To deal with highly nonlinear data, the research proposes an Adaptive Kernel Density-based (Adaptive-KD) anomaly detection approach. Numerical illustrations show the approach’s superiority in terms of smoothness, effectiveness and robustness.