Abstract of the Research about Biotechnology and Machine Learning with SVM

Advancement of science caused more cross-disciplinary and narrow fields to arise, so that more new topics could be covered and generate experts in various areas. One of these fields started from biology and with the rise of computers became bioinformatics. With the need to know about biotechnology and computers the knowledge gap between experts and beginners kept increasing. This project report offers a partial solution to the problem by educating about biotechnology and machine learning with implementation of SVM interpretation in Java.

Introduction describes bioinformatics topics which cover the problems like the complexity of finding the problem of disease because protein structure, biochemical pathways and even more has to be taken in to consideration and about how the big data gave rise to new techniques to manage it, because mathematical statistics started to become more and more complicated.

Literature review mentions several bioinformatics papers about biological data analysis with various machine learning techniques, while starting to explain older techniques, then ending with new ones like Bayesian networks and Support Vector Machines. This chapter touches on how the big data needs to be approximated, reduced, modelled, discretized, normalized or abstracted to some level and gives the examples on what has been taken from nature to create evolutionary algorithms and artificial neural networks. From the literature review it becomes clearer that SVM has been shown to be intuitive to use high precision algorithm. SVM is further explained with math and simple diagrams.

In first posts author explained some of SVM as its distinct interpretation in Java with hopes that this way it would be easier to understand. The interpretation idea asks to look at SVM hyperplane in 2D and see that it is a line, which separates two sides, then imagine that a line can be two parallel lines very close to each other and that these lines could be just edges of very large circles with opposite centers very far away from each other. That is how Least Similar Spheres (LSS) idea was discovered. Further paragraphs explain how this machine interpretation was designed, that it was used on flower Iris dataset by Marshall M. (2016) and large bioassay dataset by A. Schierz (2009) and its results were compared with Weka SVM and R SVM.

In conclusions it is mentioned that although LSS had unexpectedly good results, the main point of it was to show that mathematical ideas could be interpreted differently, while still being used as they are and that machine learning might not be as difficult as it looks, because you can have your own interpretations. Some of the examples in LSS were the distance measure viewed as similarity, support vectors as the most similar and separated spaces with hyperplanes as spheres with radiuses. Further mentioned that literature review shown SVM is one of the best techniques. From implementation and explanation of SVM interpretation it could be concluded that it does not really matter whether the data is biological or not and that it can be useful even if precision is not high, because it can reduce the search space for further confirmation.

No comments:

Post a Comment