Introduction
Curiosity
and desire for discovering new things is one of human traits. These things
could be called problems, which need solutions. The more problems are solved,
the more can be achieved and technology can help saving time and effort. Preventing
sickness and diseases would allow more minds to join the progress of discovery.
“Drugs are essential for the prevention
and treatment of disease” (S. Mandal et al., 2009, p90) and their
development is one of the biggest time and resource consuming problems. More
sciences involved in to this problem led to new mixed fields like biophysics,
biochemistry, biotechnology and recently, bioinformatics, which emerged because
“the enormous amount of data gathered by
biologists—and the need to interpret it— requires tools that are in the realm
of computer science” (J. Cohen, 2004, p123). Bioinformatics is one of the
fields where different sciences collaborate to solve its problems and
increasing amount of data about it. “This
area has arisen from the needs of biologists to utilize and help interpret the
vast amounts of data that are constantly being gathered in genomic research—and
its more recent counterparts, proteomics and functional genomics” (J. Cohen,
2004, p122). This field uses computer science and technology to automate parts
of it and improves the process of discovery by trying to reduce required time
and resources needed to do it. The rise of this science and big data lead to
more and more sophisticated statistics, which got more and more complicated. Data
mining and machine learning emerged as solution to the difficulties in
statistics with multi-dimensional spaces caused by large amount of attributes. Machine
learning started from combining programming and different mathematical ideas ranging
from regression (drawing a function, which can describe points on graph),
clustering (finding distinct groups), to unique ideas taken from examples of
the way humans think (decision trees, Apriori Algorithm) and how process of
thinking works in biology (neural networks). Further improvements and their
combinations lead to Support Vector Machines and Bayesian networks. The most
recent ones use different combinations of multiple techniques. In machine
learning “each area involves one or more
reasoning problems for which significant expertise exists in the AI community,
such as simulation, planning, redesign, diagnosis, and learning” (D. Karp
et al., 1994, p8). Machine learning tools are used in bioinformatics, their use
and improvement “will have numerous
benefits such as efficiency, cost effectiveness, time saving” (S. Mandal et
al., 2009, p90).
Drug Development
Drug
engineering just like software engineering has its methodologies. The top parts
are called “Discovery’, ‘Development’ and
‘Registration’ phases. The ‘Discovery’ phase, routinely three to four years,
involves identification of new therapeutic targets, lead finding and
prioritisation, lead optimisation and nomination of new chemical entities
(NCEs)” (J. Wang et al., 2004, p73). Discovering new drugs is very large
and costly process and “to address this
issue, several multidisciplinary approaches are required for the process of
drug development, including structural biology, computational chemistry, and
information technology, which collectively form the basis of rational drug
design” (Y. Wang et al., 2015, p489). The discovery phase is where
biotechnology and bioinformatics are mostly used. This phase has its own main
parts called ADME, which is "absorption,
distribution, metabolism, and excretion" (S. K. Balani et al., 2005,
p1). Another technique in drug discovery focuses on 3D structures of
biomolecules and is called structure based drug design or SBDD. “SBDD provides insight in the interaction of
a specific protein-ligand pair, allowing medicinal chemists to devise highly
accurate chemical modifications around the ligand scaffold” (V. Lounnas et
al., 2013, p1). Together these rules allow better drug discovery, because it is
not enough for a drug to be effective, it should also be less toxic (T in
ADME/T stands for toxicity). In reality, a drug can be effective but not
satisfy all the requirements of ADME/T and therefore not released for medical
use. “Investigation of terminated projects
revealed that the primary cause for drug failure in the development phase was
the poor pharmacokinetic and ADMET (ADME+Toxicity) properties rather than
unsatisfactory efficacy” (J. Wang., 2004, p73). There are various drug
design techniques, some of them include structure based drug design, target
based drug design and more recent rational drug design which involves multiple
disciplines and techniques, it “can be
applied to develop drugs to treat a wide variety of diseases and can also be
used for designing drugs for disease prevention” (S. Mandal et al., 2009,
p90). Drug discovery is the beginning of drug development and involves
identifying problem or drug target, which “is
a biomolecule which is involved in signaling or metabolic pathways that are
specific to a disease process” (S. Mandal et al., 2009, p90). Identifying
which molecules or ligands can bind to the receptors of the target is one of
the most complicated parts of drug discovery.
Cell Biology
Life
forms are made of cells and “cells are
complex molecular machines contained within phospholipid membranes that isolate
a unique chemical environment” (E. Yoruk et al., 2011, p1). These machines
are made from big protein, average peptide, smaller amino acid and very small
molecules like H2O. Each big molecule is made from the atoms held by the atomic
force. ”The atomic force field model
describes physical systems as collections of atoms kept together by interatomic
forces” (J. Meller, 2001, p2). The proteins are also machines and ”are made of long chains of amino acids
which in their natural environment (in solution) fold up into simple
"secondary" structures, like helices, and then by further folding
into higher-order structures” (J. Fox et al., 1994, p290). While big
molecules with more than 50 amino acids are called proteins,” short strings of amino acids, called
peptides” (W. Noble, 2003, p23). ”Peptides
can be considered to be up to 50 amino acids in length, with proteins being
larger than this” (C. Walle, 2011, p4). And “there are twenty common amino acids” (F. Altschul,
2011, p8). Cells can be divided in two communication parts, the inside
and the outside. Both sides involve sequences of bio-reactions. “The metabolism of a cell is the set of
bioreactions that its enzymes can catalyse and such a sequence of reactions is called a path way” (D. Karp et
al., 1994, p7). These pathways are like a constantly moving and floating queues
of large and small molecules interacting with each other, they are called
biochemical pathways “and, in reality,
metabolic, signaling, and regulatory pathways interact and intersect in the
course of cellular growth and activity” (R. Gostner et al., 2014, p16:2).
The inside is made of biochemical pathways, which allow communication between
smaller inner organelles. ”Living cells
are complex systems whose growth and existence depends on thousands of
biochemical reactions” (D. Karp et al., 1994, p2). A lot of biochemical
reactions need help or catalysts for them to happen, these are called enzymes.
Enzymes “facilitate the association of
several molecules to form a complex, and it lowers the energy barrier required
for the bond rearrangements that constitute a reaction” (D. Karp et al.,
1994, p5). The outside on membrane’s surface receptors and ligands are used to
communicate with other cells. How strongly ligand molecule reacts with receptor
is called binding affinity. ”Binding
affinity represents the strength of association between the ligand and its
receptor protein” (M. Ashtawy et al., 2012, p1301) that’s why” the recognition of signal peptides is
important for the development of new drugs” (W. Noble, 2003, p12) and” finding the structure and the fold of a
protein is very important since it helps to understand the functions (A.
Chinnasamy et al., 2003, p1).
All
these and even more has to be taken in to consideration when trying to find the
disease or a problem with cells. Because of that much of complexity even todays
computing power is not enough. Finding similar protein structures can also
reduce the problem and machine learning can help recognize them.
Machine Learning
Multi-disciplinary
approach may increase reliability and effectiveness of problem solving, but it
becomes more and more difficult to grasp it all as a whole. Increasing amount
of data is pushing the limits of how much more should be done in the same
amount of time to be able to expand the knowledge, which could possibly lead to
the problem solutions. “In the era of big
data, a necessary goal is the ability to use rapidly accumulating data to
pinpoint potential ADME/T issues before entering late-stage development.”
(Y. Wang et al., 2015, p508). The big data also gave rise to new techniques for
managing it, because mathematical statistics started to become too complicated.
That led to various machine learning approaches, which can use large amounts of
data to learn and then predict the outcome of unknown new data. ”Computational analysis of biological data
obtained in genome sequencing and other projects is essential for understanding
cellular function and the discovery of new drugs and therapies” (C. Ding et
al., 2000, p349). The hopes are not lost because of how successful machine
learning was and still is. As in the early years scientists trusted computers
will help understand big problems saying that” it is now generally accepted that modern molecular biology research
needs many different types of software to support the management, analysis and
interpretation of data (J. Fox et al., 1994, p287) and in later years
hoping that “progress in reliable
computational methods is greatly anticipated” (M. Blaszczyk et al., 2015).
No comments:
Post a Comment