Data Diving for Genomics Treasure

Laboratories around the world and here at Brandeis are generating a tsunami of deep-sequencing data from organisms large and small, past and present. These sequencing data range from genomes to segments of chromatin to RNA transcripts. To explore this “big data” ocean, one can navigate the portals of the National Computational Biotechnology Institute’s (NCBI’s) two signature repositories, the Sequencing Read Archive (SRA) and the Gene Expression Omnibus (GEO).  With the right bioinformatics tools, scientists can explore and discover freely-available data that can lead to new biological insights.

Nelson Lau’s lab in the Department of Biology at Brandeis has recently completed two such successful voyages of genomics data mining, with studies published in the Open Access journals of Nucleic Acids Research (NAR) and the Public Library of Science Genetics (PLoSGen).   Publication of both these two studies was supported by the Brandeis University LTS Open Access Fund for Scholarly Communications.

In this scientific journey, the Lau lab made use of important collaborations from across the globe. The NAR study employed openly shared genomics data from the United Kingdom (Casey Bergman lab) and Germany (Björn Brembs lab).  The PlosGen study employed contributions from Austria (Daniel Gerlach), Australia (Benjamin Kile’s lab), Nebraska (Mayumi Naramura’s lab), and next door neighors (Bonnie Berger’s lab at MIT).  This collaborative effort has been noted at Björn Bremb’s blog, who has been a vocal advocate for Open Access and Open Data Sharing to improve the speed and accessibility of communicating scientific research.

tidal fly banner

In the NAR study, postdoctoral fellow Reazur Rahman and the Lau team devised a program called TIDAL (Transposon Insertion and Depletion AnaLyzer) that scoured over 360 fly genome sequences publicly accessible in the SRA portal.  Their study discovered that transposons (jumping genetic parasites) formed different genome patterns in every fly strain.  Common fly strains with the same name but living in different laboratories turn out to have very different patterns of transposons. Simply noting “Canton-S” or “Oregon-R” strains are used may not be enough to fully characterize a strain.  The Lau lab hopes to utilize the TIDAL tool to study how expanding transposon patterns might alter genomes in aging fly brains.

animals
The piRNAs from these animals were compared in the PLoS Genetics story

In the PLoSGen study, visiting scientist Gung-wei Chirn and the Lau team developed a novel small RNA tracking program that discovered Piwi-interacting RNA loci expression patterns from many mammalian datasets extracted from the GEO portal.  Coupling these datasets with other small RNA datasets created in the Lau lab at Brandeis, the Lau group discovered a remarkable diversity of these RNA loci for each species. For example, the piRNA genomic loci made in humans were quite distinct from other primates like the macaque monkey and the marmoset.  However, a special set of these genomic loci have been conserved in their piRNA expression patterns, extending across humans, through primates, to rodents, and even to dogs, horses and pigs.

These conserved piRNA expression patterns span nearly 100 million years of evolution, which is quite a long time for these types of loci to be maintained for some likely important function in mammals.  To test this hypothesis that evolution preserved these piRNAs for their utility, the Lau lab analyzed two existing mouse mutations in these loci.  They showed that the mutations indeed affected the generation of the piRNAs, and these mice were less fertile because sperm count was reduced.  The future studies from the Lau lab will explore how infertility diseases may be linked to these specific piRNA loci.

Leave a comment