Biotechnology and Machine Learning with SVM and LSS: Literature Review (Machine Learning examples taken from biology)

Evolutionary algorithms

Even if evolution took so long it has been successful, that is why taking examples from nature has also been done in computer science. Some of these include evolutionary algorithms and artificial neural networks. Evolutionary algorithms use the ideas like natural selection and random mutation. “The evolutionary algorithm is one of the major methods used to investigate protein folding” (J. Tsay et al., 2012, p2). It is done by taking the parameters of molecule, its atoms and surrounding physics as mathematical representations, although these parameters or attributes might not necessarily be mathematical representations, because “for every application of a genetic algorithm one has to decide the representation formalism for the genes” (M. Smitha et al., 2008, p5). Each attribute randomly changed, then checked if new combination or fold is more successful than previous one. The process repeats for most successful candidate folds and it often does not lead to perfect solutions or might never finish searching, so it is stopped at some point and then tested. Protein folding is used to determine its functions, but there are different ways to do that. It can be compared to other known proteins, in other words, “protein function can also be determined via sequence comparison with other species” (W. Noble, 2003, p10), and it can also be determined by finding its 3D structure from a chain of amino acids, because “it is the tertiary structure of a protein that determines its function” (J. Tsay et al., 2012, p156).

Representing proteins for evolutionary algorithm folding

Protein folding techniques may use laws of physics, chemistry or quantum mechanics. “Physics-based folding is far from routine for general protein structure prediction of normal size proteins, mainly because of the prohibitive computing demand and the best current free-modeling results come from those which combine both knowledge-based and physics-based approaches” (S. Wu et al., 2009, p234). Because of that, data has to be represented in abstracted and simplified ways. There has been various publications from different authors and their interpretations of how data could be abstracted.

Tsay et al. (2012) used evolutionary algorithms to predict protein structure and mentioned the use of 2D or 3D transformations as tuneable parameters for evolutionary algorithms to be randomly tweaked until best structure is found. Smith et al. (2014) used computation of each atom pair interaction energies to evaluate how well protein and ligand binds given certain pose or alignment. Representation can also be done by using “fitness function based on a simple force field” (M. Smitha et al., 2008, p2). Because of high computation needed, advanced optimization or minimal use of computer resources has been also considered, Smith et al. (2014) used genetic algorithms to determine best alignment of protein and ligand and tried to optimize it to the point where each atom description fits just in to 16 bytes, then compared different hardware computational power. If we consider ASCII character representation format, it would mean that each atom was represented by two short words, which is impressive, given that each atom has multiple properties like its mass, proton and electron count. 16 bytes is quite common size of memory addresses, it aligns “much more efficiently with most hardware’s memory interfaces” (S. Smith et al., 2014, p124).

Artificial neural networks

Another example taken from nature is the brain and its ability to recognize patterns, which for long time seemed like very complicated process until someone thought of our sensory data as inputs, brain as machine made from many repeated bits called neurons and the result of recognition as output. That machine takes inputs and is taught that these inputs represent certain value or meaning and when new similar situation happens, it can be recognized. Each repeated bit would be neuron and each neuron also has minimized version of input, machine and output, which are interconnected with others. How neuron learns relies on remembering previous input and comparing how different it is, then strengthening or weakening connections which interact with most informative data This new neural net then is used to evaluate how new data values are similar to strongest connections. Interconnections are mixed and crossed, inputs are summed and then ran through functions, which lead to some problems, because” what they learn is difficult to understand” (J. Fox et al., 1994, p292) and “make many non-computational experts very wary and distrustful of the results” (J. Fox et al., 1994, p292). But” neural networks can learn very complicated relationships between their inputs and outputs”. (N. Srivastava et al., 2014, p1929) and” deep neural nets with a large number of parameters are very powerful machine learning system”. (N. Srivastava et al., 2014, p1929). Ding et al. (2000) mentions that neural network accuracy drops significantly because of higher noise levels. Noise is a big problem to most machine learning algorithms, because it can skew the results towards some bad value. That is why noise reduction methods are usually added to the main machine learning algorithms.

Literature Review (Machine Learning examples taken from biology)

Evolutionary algorithms

Representing proteins for evolutionary algorithm folding

Artificial neural networks

No comments:

Post a Comment