Biotechnology and Machine Learning with SVM and LSS: Least Similar Spheres (LSS)

How LSS interprets SVM

LSS machine design requires data, which should be loaded in to memory as lists of feature vectors with their class labels. It needs two types of data, one with assigned class labels for learning phase, another with unknown class labels to predict them. Then for each class label there should be a pair of lists representing each “one to all”. To evaluate prediction precision it should use same data for learning as for predicting, but reset class labels to unknowns and then predict them. Once it is done predicting it can then compare it with learning data. The learning phase needs to find closest vectors to opposite class and then use these to find the furthest vectors. Predicting phase has to use these vectors and calculate distances which represent each class multiple radius spheres of each class and then check if new unknown value is within the radius. The inseparable problem is then attempted to be solved by using different formulas for measuring the distances and using a slack value to reduce or increase each radius size. LSS could also try to learn and predict multiple times to find best prediction, by changing the settings. Additional ideas like creating the averaged vector clusters, finding and removing outliers could be used to try to improve prediction precision. The outlier could be furthest vector which is far from its closest vectors but close to opposite class closest vectors. The averaging could allow finding smaller averaged cluster which has higher abstraction than original one, but still has information describing it. The formula for finding closest vectors could be different from finding furthest and different from the prediction formula. Its success is mentioned further. The design of LSS idea will be shown in following pictures and comments.

Learning phase, finding furthest and closest vectors.

Once closest/support points/vectors are found, by measuring each vector of one class and checking how far away from other class it is, then choosing closest one, these points are then used to find the furthest ones. Furthest points could help finding even further ones, but author started from this point and maybe will come up with idea of how to do it later.

Figure 6. Finding closest and furthest vectors. (Source: Ubaby, 2016)

Predict phase

For each “one to all” class label feature vector pairs iterate through furthest vectors. For each furthest vector find closest opposite class vector from already found closest ones. Find closest opposite class vector, which this time is its own closest vector. Find mid-point and then calculate the distance from furthest vector to mid-point, then use this distance as sphere radius which represents this class and check if unknown vector distance from this furthest vector is less or equal to that radius, indicating that it is inside or outside this class. In case closest vectors themselves are out of range of the radius, they also have their own radius. It is calculated by finding closest opposite class vector and then using radius to midpoint between them.

Figure 7. How spheres and their radiuses are built. (Source: Ubaby, 2016)

The Big Picture of Least Similar Spheres

It can be seen that some circles overlap, that is where slack variable helps to reduce the original size of spheres.

Figure 8. How LSS predicts. (Source: Ubaby, 2016)

Least Similar Spheres (LSS)

How LSS interprets SVM

Learning phase, finding furthest and closest vectors.

Predict phase

The Big Picture of Least Similar Spheres

No comments:

Post a Comment