Unveiling the Hidden World of Bioinformatics: A Journey into the Heart of Data-Driven Discovery
Picture a realm where biology and technology interlace, where scientists and programmers unite to decode the secrets of life itself. This world is none other than bioinformatics, a captivating scientific subdiscipline that harnesses the power of computers to unlock, preserve, and share the profound intricacies of biological information. It’s a realm where the DNA double helix dances with lines of code, where proteins and algorithms form an unlikely alliance. Welcome to the realm of bioinformatics, a multidisciplinary tapestry woven by biologists, computer scientists, mathematicians, statisticians, and even physicists. 1,2
Embarking on this adventure, we journey through time, tracing the evolution of bioinformatics from its humble beginnings as a data handler to its current role as a decipherer of life’s mysteries. Once, it was all about taming the torrents of data, but now, it’s about understanding, interpreting, and predicting. The boundaries of bioinformatics stretch beyond biology’s horizons, offering a bridge to new revelations about human health and the natural world.2
Let’s delve into “Introducing Bioinformatics,” a captivating series of articles that will immerse you in the history, real-world tales, and endless applications of this enchanting field. For our inaugural chapter, we embark on a thrilling journey through history, exploring the roots and early milestones of bioinformatics.
A Flashback to Bioinformatics’ Genesis
Imagine the late 1950s, a time of fervent exploration into the structures of proteins. The world was captivated by crystallography’s revelations. But, amid the excitement, a challenge loomed: deciphering protein sequences. The Edman degradation method shone as a pioneer, yet it had limitations, only unveiling about 60 amino acids in one go.3,4
Enter Margaret Dayhoff, a trailblazing American physical chemist who orchestrated the integration of computational marvels into the realm of biochemistry during the 1960s.
Together with Robert Ledley, Dayhoff scripted the story of COMPROTEIN – an ingenious computer program that could assemble protein sequences from the jigsaw of Edman peptide sequencing data. This was the birth of “the first bioinformatics software”.5 Imagine punch cards, machines like the IBM 7090, and Dayhoff’s relentless determination.
It determines a protein’s primary structure based on the Edman peptide sequencing data, wherein both the software’s input and output information would be in the form of three-letter abbreviations for each amino acid.
Dayhoff’s ingenuity didn’t stop there; she revolutionized the code, simplifying the alphabet of amino acids into a one-letter symphony, streamlining information while conquering the limitations of the Edman method.6 This is code that is still used today! By doing so, she greatly reduced the amount of information needed and produced7, hence overcoming the previously mentioned limitation of the Edman degradation method.
In 1965, the pair published the “Atlas of Protein Sequence and Structure” containing 65 protein sequences, which is the first comprehensive protein sequence and structure database. This further helped establish the roots of bioinformatics as an up-and-coming field.8,9
Excerpts of the journal are seen below, while the full PDF copy can be accessed here.
Dayhoff’s innovative changes back then led to the development of a new field – one that is continuously adapting and changing even until now. Her contributions and ideas earned her the title of both “the Mother and the Father of Bioinformatics.”
Tracing Bioinformatics’ Footprints in Modern Times
With a protein-based invention being the foundation of the field, it is not surprising that the same methods emerged to be prominent when wanting to study and understand COVID-19 – the virus that has plagued the world since 2020. From the very beginning of this plague, COVID-19 has been ferociously studied and numerous studies have been published, many of which use in silico methods.
One example of bioinformatics being applied in the study of the virus was in assessing a protein’s role by examining its expression in infected patients, by generating protein-protein interaction networks, and by conducting enrichment analyses. Azodi et al (2020) performed this experiment, focusing on the APOA1 protein, where the results showed that its irregular expression was indicative of the disease’s severity, and that it possibly contributes to how COVID-19 affects the nervous system.11 With this, the researchers chose a proteomic approach to yield findings about significant proteins and their roles in the disease progression.
Despite starting with a focus on protein sequences, the field has adapted to analyzing DNA sequences, generated through various sequencing methods. An example, when researchers employed genome-wide association study to assess over 8.5 million single nucleotide polymorphisms for a possible gene – or group of genes – involved in the increasing severity of COVID-19 in an infected patient. Their study showed that locus and the six genes it contains SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6, and XCR1 may have a significant role in patients experiencing respiratory failure due to COVID-19.12
These are just a couple of the numerous ways that bioinformatics was used to map out COVID-19’s structure, understand its mechanisms, and develop biologics to stop it from severely infecting a person.
Adaptation, Innovation, and the Dance with Machines
As mentioned, initially bioinformatics focused primarily on handling large quantities of data. Over the years, there has been an ever-increasing interest in extracting meaningful findings from the collected data, hence the observable (but slow) integration of machine learning into the field.
Machine learning is a branch of computer science that provides machines with self-learning capability, or when they learn from data without explicit programming.13 It studies the use of computers to mimic human learning by examining patterns in the given data and continuously self-optimizing for improved performance. Its algorithms have two divisions: supervised learning, where input data are mapped to their respective outputs, and unsupervised learning, which involves pattern detection and identification in unlabeled data.14
There are four applications of machine learning.
An example of this is when researchers developed pysster, a Python package that employs convolutional neural networks (CNN) classifiers to assess biological sequences. It classifies sequences by learning sequence and structure motifs and offers an “automated hyper-parameter optimization procedure and options to visualize learned motifs along with positional and class enrichment information.”15. It is available here, where its workflows and tutorials are available.
These applications only scratch the surface of bioinformatics’ capabilities, of the symbiotic dance between data and algorithms. It’s a journey that doesn’t just end with COVID-19. Bioinformatics is a lifeline in modern medicine, an artist’s brushstroke on the canvas of discovery, and a guide to understanding life’s intricacies in ways once thought impossible.
Join us in our upcoming articles as we dive deeper into the myriad applications of bioinformatics, where it intertwines with other domains and leads us through the labyrinth of science’s uncharted territories. As bioinformatics evolves, so does our perspective on the living world. It’s a journey that blurs the lines between disciplines, forging new paths toward understanding and enlightenment.
Want to become more knowledgeable about genomics?
Visit our PHIX Academy website and check out our free introduction course.
For questions, you are invited to Contact Us.
- Adams, D. (2023, March 7). Bioinformatics.From: https://www.genome.gov/genetics-glossary/Bioinformatics
- What is Bioinformatics? Swiss Insitute of Bioinformatics. From: https://www.sib.swiss/what-is-bioinformatics
- Edman, P., & Begg, G. (1967). A Protein Sequenator. European Journal of Biochemistry, 1(1), 80–91. https://doi.org/10.1111/j.1432-1033.1967.tb00047.x
- Gauthier, J., Vincent, A. T., Charette, S. J., & Derome, N. (2019). A brief history of bioinformatics. Briefings in Bioinformatics, 20(6), 1981–1996. https://doi.org/10.1093/bib/bby063
- Margaret Oakley Dayhoff. From: https://en.wikipedia.org/wiki/Margaret_Oakley_Dayhoff
- Sarah Moore. (2021, March 23). History of Bioinformatics. From: https://www.azolifesciences.com/article/History-of-Bioinformatics.aspx
- Jennifer Levine. (2017, May 5). More mothers of science. From: https://crosstalk.cell.com/blog/more-mothers-of-science
- Leila McNeill. (2019, April 9). How Margaret Dayhoff Brought Modern Computing to Biology. From:https://www.smithsonianmag.com/science-nature/how-margaret-dayhoff-helped-bring-computing-scientific-research-180971904/#:~:text=She%20ran%20a%20computer%20analysis,reconstruction%20of%20a%20phylogenetic%20tree.
- Professor Margaret Dayhoff. What is Biotechnology? From: https://www.whatisbiotechnology.org/index.php/people/summary/Dayhoff
- Houtman, J., Shultz, L., Rivera, J. M., Gilmour, J., Luo, D., Diamond, M., & Bright, R. A. (2022). Tracking SARS-CoV-2 and Its Variants in Wastewater: An Old Technique Is Yielding Powerful New Insights in the Covid-19 Pandemic. The Rockefeller Foundation. From: https://www.rockefellerfoundation.org/case-study/tracking-sars-cov-2-and-its-variants-in-wastewater-an-old-technique-is-yielding-powerful-new-insights-in-the-covid-19-pandemic/
- Zamanian Azodi, M., Arjmand, B., Zali, A., & Razzaghi, M. (2020). Introducing APOA1 as a key protein in COVID-19 infection: a bioinformatics approach. Gastroenterology and Hepatology from Bed to Bench, 13(4), 367–373. From: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7682979/
- Ellinghaus, D., Degenhardt, F., Bujanda, L., Buti, M., Albillos, A., Invernizzi, P., Fernández, J., Prati, D., Baselli, G., Asselta, R., Grimsrud, M. M., Milani, C., Aziz, F., Kässens, J., May, S., Wendorff, M., Wienbrandt, L., Uellendahl-Werth, F., Zheng, T., … Karlsen, T. H. (2020). Genomewide Association Study of Severe Covid-19 with Respiratory Failure. The New England Journal of Medicine, 383(16), 1522–1534. https://doi.org/10.1056/NEJMoa2020283
- Srinivasa, K. G., Siddesh, G. M., & Manisekhar, S. R. (Eds.). (2020). Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications (1st ed.). Springer Singapore. https://doi.org/10.1007/978-981-15-2445-5
- Auslander, N., Gussow, A. B., & Koonin, E. V. (2021). Incorporating Machine Learning into Established Bioinformatics Frameworks. International Journal of Molecular Sciences, 22(6), 2903. https://doi.org/10.3390/ijms22062903
- Budach, S., & Marsico, A. (2018). pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks. Bioinformatics, 34(17), 3035–3037. https://doi.org/10.1093/bioinformatics/bty222