Emu: species-level microbial community profiling of full-length 16S rRNA Oxford Nanopore sequencing data


16S ribosomal RNA-based analysis is the established standard for elucidating the composition of microbial communities. While short-read 16S rRNA analyses are largely confined to genus-level resolution at best, given that only a portion of the gene is sequenced, full-length 16S rRNA gene amplicon sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate often observed in long-read data. Here we present Emu, an approach that uses an expectation–maximization algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from simulated datasets and mock communities show that Emu is capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of Emu by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow with those returned by full-length 16S rRNA gene sequences processed with Emu.

Nature Methods
Qi Wang
Qi Wang
PhD student from September 2018 through January 2022

Dr. Wang is a Bioinformatics Scientist at Illumina, and finished her PhD in the Treangen Lab December 2021. Previously, Dr. Wang obtained B.S. degrees in Biotechnology from Hong Kong Baptist University and MS in Biotechnology from Northwestern University. During her undergraduate, she did research in University of Chinese Academy of Sciences, Beijing University of Chemical Technology and Capital Medical University, focusing on using bioinformatics and experimental approaches to solve various life science problems, including synthetic biology, developmental biology, oncology and drug discovery. Her interest is to improve human health and environment by understanding complex biology data.