In this second technical deep dive we cover Fault Classification, which is an extension of Anomaly Detection. Anomaly Detection finds abnormalities within your data and separates them. As explained in the previous post, the separation is based on how many samples have similar values and create a group. Samples which are part of a small group and differ from normal operation are often caused by a malfunction of some kind. Although finding abnormalities provides very useful information for system operators, it does not explain the reason.

Fault Classification makes this extension and labels abnormalities with the corresponding malfunction. We predict the label by comparing the sample and finding the group to which it has most similarities. If a particular data sample has many similarities with a group of samples from ‘Malfunction 2’, it is most likely to belong in this group and we predict so. To achieve an accurate labeling of the abnormalities, the model compares a sample with many examples of every malfunction. This requires the examples to contain the status of the asset, e.g. ‘Healthy’, ‘Fault 1’, ‘Malfunction 2’, etc.
When we have more examples to compare our data with, our predictions become statistically more significant; we gain confidence. However, some data is very noisy, for example due to physical uncertainty or sensor noise. In this case, it is difficult to predict the faults accurately and with confidence. Generally, different sensors or physical knowledge is required. 

Fault Classification is implemented with a wide range of mathematical methods to separate the abnormalities. These methods range from very simple to very complex methods. The simple methods generally need few examples for training and are easily optimized. However, they are only able to predict faults with simple logic. For example, leaking tubes are easily explained with a simple model if you measure the pressure inside, but you need a more complex model to predict leakages when you only measure the vibrations. 

A widely chosen model for fault detection is a Support Vector Machine (SVM) with a Gaussian Kernel. This model only separates data of one particular fault from healthy data. However, by combining multiple SVM models in parallel, it is possible to predict multiple faults.
A SVM fits a scaled normal distribution (the dotted bell-shaped curves in the figure) on every datapoint. An optimization scheme finds the scales to which minimize the prediction error.
When we add all these individual normal distributions we get the overall prediction curve. We predict the group of unknown samples by checking whether this curve is positive or negative. The confidence of a prediction is directly correlated with the magnitude of the curve. In the example below, the curve is negative around the left two healthy examples. In the middle, we cannot confidently predict the group. Around the faulty examples on the right we confidently predict a sample does belong to the this particular fault group.