Emu, Species-Level Microbial Community Profiling for Full-Length Nanopore 16S Reads

Photo by rawpixel on Unsplash


16S rRNA based analysis is the established standard for elucidating microbial community composition. While short read 16S analyses are largely confined to genus-level resolution at best since only a portion of the gene is sequenced, full-length 16S sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate of long-read data. Here we present Emu, a novel approach that employs an expectation-maximization (EM) algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from one simulated data set and two mock communities prove Emu capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of our new software by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow to those returned by full-length 16S sequences processed with Emu.


  • Alona Tyshaieva (Univ of Dusseldorf)
  • Dr. Alex Dilthey (Univ of Dusseldorf)
  • Dr. Sonia Villapol (Houston Methodist)
  • Dr. Tor Savidge (Texas Childrens)
  • Dr. Qinglong Wu (Texas Childrens)
Dr. Kristen Curry
Dr. Kristen Curry
PhD student

Dr. Curry received her Ph.D. in Computer Science from Rice University May 2024. She obtained a bachelor’s in computer science from the University of California, Berkeley. Prior to enrolling at Rice, she worked for a private biotech company focused on generating personalized health information from blood protein levels. Her primary research interests are microbial interactions and their impact on host health. Outside of the lab, she enjoys backpacking, running and practicing yoga.

Dr. Michael Nute
Dr. Michael Nute
Research Scientist

Mike (Research Scientist) received his Ph.D. in Statistics in 2019 from the University of Illinois at Urbana-Champaign where he was advised by Dr. Tandy Warnow in the Department of Computer Science and worked on algorithms related to multiple sequence alignment and phylogenetic tree estimation, in particular applying these methods to studying microbial communities. He was co-advised by Dr. Rebecca Stumpf in the Department of Anthropology where he and other lab members developed novel methods to compare the microbiomes of human and non-human primates. His research interest is in discovering a new applications for our understanding of microbial communities.

Dr. Qi Wang
Dr. Qi Wang
PhD student from September 2018 through January 2022 (currently Sr. Bioinformatics Scientist at Illumina)

Dr. Wang is a Bioinformatics Scientist at Illumina, and finished her PhD in the Treangen Lab December 2021. Previously, Dr. Wang obtained B.S. degrees in Biotechnology from Hong Kong Baptist University and MS in Biotechnology from Northwestern University. During her undergraduate, she did research in University of Chinese Academy of Sciences, Beijing University of Chemical Technology and Capital Medical University, focusing on using bioinformatics and experimental approaches to solve various life science problems, including synthetic biology, developmental biology, oncology and drug discovery. Her interest is to improve human health and environment by understanding complex biology data.

Todd J. Treangen
Todd J. Treangen
Associate Professor of Computer Science, Bioengineering

My research interests include algorithms and data structures for efficient analysis of microbial genomes and metagenomes