COMPUTATIONAL GENOMICS
THE "HOW" OF SCIENCE
PIONEERS
ALAN TURING
ALAN TURING
At the heart of computational genomics is the idea that the automation of computational methods can in turn perform human tasks. No one made further developments in this field than Alan Turing, a mathematician and computer scientists who worked in British code-breaking during World War II [1].
Alan Turing's development of algorithms that could follow logical instructions and computationally solve mathematical problems forms the basis of modern theories of computation. He imagined the possibility of a machine that could perform tasks based on a supplied algorithmic program. He called this computer a "Turing machine," a computational device that could perform automated tasks on the will of a clearly-defined algorithm.
Turing applied his theories on computational analysis to cryptoanalytic work at Bletchley Park during WWII. With the help of his electronic technology, Alan Turing was able to crack the complex Enigma code used by the German naval communications [2].
By the end of WWII, Turing began to conceptualize a universal machine that could perform automated tasks at high speeds and reliability. His ideas led to the development of early hardware as well as the use of arithmetic in computer programming. He became the director of the computing laboratory at Manchester University.
Many of Turing's ideas about the use of algorithms to create computers are critical components of modern-day computational genomics. His principle that computers can perform automated tasks if guided by an algorithmic program is applied in the computational analysis of gene expression, DNA sequencing, and gene annotation [2].
This is one of Alan Turing's most famous quotes having to do with his belief in the power and intelligence of machines. These theories eventually led to his development of the computer [1].
This picture and quote refers to Alan Turing's "Turing Machine" that he developed during WWII [2].
MARGARET OAKLEY DAYHOFF
Margaret Oakley Dayhoff is pictured here with an early version of the computer. She is believed to be the "mother of bioinformatics" by many people [3].
This is one of the books that Margaret O. Dayhoff wrote in order to provide curate information about proteins to the research community [4].
RICHARD PEARSON
This image shows the processing speeds of the FASTA Algorithm developed by Richard Pearson [5].
DAVID LIPMAN
After his work with Richard Pearson in developing FASTA technology to perform automated tasks, David Lipman partnered with Stephen Altschu, Warren Gish, Webb Miller and Gene Myers to create a faster, more accurate and specific program. Their product, Basic Local Alignment Search Tool, or BLAST, is one of the most widely used computer programs for bioinformatics. The algorithm that was developed specifically for the BLAST production focuses on speed, which makes the program very practical for the analyzation of large banks of data [6].
BLAST was designed to be more time-efficient than FASTA by focusing on the significant aspects and patterns in the sequences. The technology opperates similarly to the algorithmic approach of FASTA. The software searches specifically for regions of local homology within the sequenced gene. Like the technology developed prior, the BLAST technology uses statistical information and a sequence database to calculate the significance of each nucleotide or protein alignment. Therefore, BLAST technology can be used specifically to interpret evolutionary relationships between sequences.
The tremendous power and popularity of BLAST technology may stem from the ease at which it can find accurate similarities between sequences. For ultimate accuracy, several different types of BLAST algorithms exist, each accessing a unique database to help identify genes, proteins, and specific regions [6].
Although this technology has existed since its original release in the 1990s, the technology continues to evolve and improve over the years. Through user input, the BLAST databases have grown in reference size, contributing to the overall accuracy of the program. The NCBI has also increased the power of the computer itself to perform more efficiently. They hope to incorporate the program into the cloud in coming years so that the information is available and accessible to all.
This picture shows some of the many functions that the BLAST bioinformatics tool has [6].
Margaret Oakley Dayhoff was one of the first people who applied computational and mathematical techniques to advance her studies in biochemistry. She spent her career developing and implementing computational techniques to make advancements in biology and medicine. One of her most notable achievements in this field is her creation of protein and nucleic acid databases and the tools that allowed navigation and utilization of these tools [1] [3].
In the late 50s, Margaret Dayhoff collaborated with Robert Ledley to coauthor a paper titled "COMPROTEIN: A computer program to aid primary protein structure determination." The paper detailed a computer program that could automatically produce information about peptides [3].
Dayhoff also collaborated with Ellis Lippincott and Carl Saganto in the early 1960s to develop a computer program that could calculate equilibrium concentration of planetary atmospheres. This computer program was immensely helpful in the study of the atmospheres both of Earth and of Venus, Jupiter, and Mars [4].
Perhaps Dayhoff's greatest contribution to computational genomics was her work in applying computational techniques to study evolutionary trees (or phylogeny). She used computational algorithms to analyze the molecular relationships between two genomes in order to better understand their divergence from a common ancestor. She also worked to create methods for using computers to compare and measure the evolutionary distance between genomes using an automated alignment matrix.
Dayhoff's development of an Atlas of Protein Sequence and Structure led to the creation of the Protein Information Resource database. Both resources revolutionized how data was stored and made available to the research community.
In 1985, Richard Pearson partnered with David Lipman to create a revolutionary protein analysis software. The two scientists recognized how rapidly the genetic information was expanding and how the methods for analyzing said DNA was limited in terms of speed and memory. The original FASTP program was designed to complete DNA searches, alignment analysis of multiple genomes, and strategically comprehend the significance of each dataset [5].
The emergence of the FASTA technology set was revolutionary because it provided scientists with intelligent algorithms that could provide insightful analysis about the significance, function, and likelihood of alignments based on statistical strategies [5].
FASTA was the first of many bioinformatics and computational tools that provided an automated option for researchers to easily and efficiently perform analysis-based tasks. The introduction of FASTA technology into the culture of the lab caused a revolution in the way the researchers approach the process of gathering and analyzing their data. Many tools followed in the coming years based on similar technological concepts.
For example, BLAST, one of the leading computational tools for data analysis in a lab setting uses the file format that was developed by David Lipman and Richard Pearson for FASTA [5].