Advancement
of science caused more cross-disciplinary and narrow fields to arise, so that
more new topics could be covered and generate experts in various areas. One of
these fields started from biology and with the rise of computers became
bioinformatics. With the need to know about biotechnology and computers the
knowledge gap between experts and beginners kept increasing. This project report
offers a partial solution to the problem by educating about biotechnology and
machine learning with implementation of SVM interpretation in Java.
Introduction
describes bioinformatics topics which cover the problems like the complexity of
finding the problem of disease because protein
structure, biochemical pathways and even more has to be taken in to
consideration and about how the big data gave rise to new techniques to manage
it, because mathematical statistics started to become more and more
complicated.
Literature
review mentions several bioinformatics papers about biological data analysis
with various machine learning techniques, while starting to explain older
techniques, then ending with new ones like Bayesian networks and Support Vector
Machines. This chapter touches on how the big data needs to be approximated,
reduced, modelled, discretized, normalized or abstracted to some level and
gives the examples on what has been taken from nature to create evolutionary
algorithms and artificial neural networks. From the literature review it
becomes clearer that SVM has been shown to be intuitive to use high precision
algorithm. SVM is further explained with math and simple diagrams.
In first posts author explained some of SVM as its distinct interpretation in Java with hopes
that this way it would be easier to understand. The interpretation idea asks to
look at SVM hyperplane in 2D and see that it is a line, which separates two
sides, then imagine that a line can be two parallel lines very close to each
other and that these lines could be just edges of very large circles with
opposite centers very far away from each other. That is how Least Similar
Spheres (LSS) idea was discovered. Further paragraphs explain how this machine
interpretation was designed, that it was used on flower Iris dataset by Marshall M. (2016) and large bioassay dataset by A.
Schierz (2009) and its results were compared with Weka
SVM and R SVM.
In conclusions it is mentioned that
although LSS
had unexpectedly good results, the main point of it was to show that
mathematical ideas could be interpreted differently, while still being used as
they are and that machine learning might not be as difficult as it looks,
because you can have your own interpretations. Some of the examples in LSS were
the distance measure viewed as similarity, support vectors as the most similar
and separated spaces with hyperplanes as spheres with radiuses. Further
mentioned that literature review shown SVM is one of the best techniques. From
implementation and explanation of SVM interpretation it could be concluded that
it does not really matter whether the data is biological or not and that it can
be useful even if precision is not high, because it can reduce the search space
for further confirmation.
No comments:
Post a Comment