Squeegee: de-novo identification of reagent and laboratory induced microbial contaminants in low biomass microbiomes

Abstract

Computational analysis of host-associated microbiomes has opened the door to numerous discoveries relevant to human health and disease. However, contaminant sequences in metagenomic samples can potentially impact the interpretation of findings reported in microbiome studies, especially in low biomass environments. Our hypothesis is that contamination from DNA extraction kits or sampling lab environments will leave taxonomic “bread crumbs” across multiple distinct sample types, allowing for the detection of microbial contaminants when negative controls are unavailable. To test this hypothesis we implemented Squeegee, a de novo contamination detection tool. We tested Squeegee on simulated and real low biomass metagenomic datasets. On the low biomass samples, we compared Squeegee predictions to experimental negative control data and show that Squeegee accurately recovers known contaminants. We also analyzed 749 metagenomic datasets from the Human Microbiome Project and identified likely previously unreported kit contamination. Collectively, our results highlight that Squeegee can identify microbial contaminants with high precision. Squeegee is open-source and available at: https://gitlab.com/treangenlab/squeegee

Publication
Yunxi Liu
Yunxi Liu
PhD student

Louis (3rd year PhD student) obtained a B.S. degree in Computer Science from the University of Houston and a B.S. degree in Pharmacology from China Pharmaceutical University. During his undergraduate in UH, he did research in the Pattern Analysis Laboratory on image feature extraction. His current research interests include computational biology, metagenomics, and data science.

R.A. Leo Elworth
R.A. Leo Elworth
NLM Postdoctoral Fellow

Leo (NLM Postdoctoral Fellow, primary mentor Prof. Lauren Stadler, secondary mentor Prof. Todd Treangen) received his PhD in Computer Science at Rice University in 2019 working on statistical modeling of DNA sequence evolution. He was advised by Dr. Luay Nakhleh, the J.S. Abercrombie Professor and Chair of the Department of Computer Science at Rice. Since joining at Rice, Leo was awarded a graduate research fellowship from the National Library of Medicine, has published work in computational biology in journals such as Bioinformatics, presented research at scientific conferences like RECOMB-CG in Barcelona and WABI in Helsinki, and contributed to a soon to be released book on computational modeling of evolutionary histories of genomes.

Next
Previous