Background Understanding the microbiomes in different organisms is crucial across various scientific fields. To achieve this, specialized computer programs called bioinformatic pipelines have been developed to convert raw data into meaningful biological results. However, the rapid growth of this field in dairy cows’ research has led to a lack of guidelines on how these pipelines differ and what they can do. Methods This study aims to benchmark four different software – MICCA, QIIME (versions 1.9 and 2), and VSEARCH – by examining their features and evaluating their outputs using three different databases, two of which in two different releases. The performances of these combinations will be assessed using two mock communities and four dairy cows-related datasets: milk, skin, rumen, and feces. Results Mock communities’ analysis outcomes focus on evaluating the pipelines' ability to accurately predict the composition of bacterial communities previously documented in the literature. Real-life dataset instead allow to investigate the influence of pipelines and databases on dataset analysis at 3 stages: firstly, a pilot experiment with 1 dataset (milk), 1 pipeline (QIIME version 1.9), and 1 database (SILVA version 132) was used to compare with the results of the same pipeline on the same dataset using other 3 different databases, of which 1 in double release and 1 as the updated version of the one previously used. Then, the application of 4 pipelines (of which 2 with the 2 different approaches) and 4 databases (of which 2 in a double release, for an overall of 36 combinations) on the same dataset, and later, the benchmarking of the same 36 combinations on other datasets will allow to gain a broader view of the effects of the pipelines of analysis and microbial databases available for dairy cows 16S rRNA-gene sequencing data. Conclusions Ultimately, this study aimed to establish a standardized approach and enhance decision-making in 16S sequencing analysis, thereby improving the scientific rigor of microbiome analysis studies in dairy cows. MICCA and QIIME version 2, in conjunction with the SILVA version 138 database, demonstrated the most promising results in achieving this goal.
Contesto La comprensione dei microbiomi in diversi organismi è cruciale in vari campi scientifici. Per raggiungere questo obiettivo, sono stati sviluppati programmi informatici specializzati chiamati pipeline bioinformatiche per convertire i dati grezzi in risultati biologici significativi. Tuttavia, la rapida crescita di questo settore nella ricerca zootecnica ha portato a una mancanza di linee guida su come queste pipeline differiscano tra di loro e come queste differenze possano influenzare i risultati finali. Metodi Questo studio mira a confrontare quattro diversi programmi informatici - MICCA, QIIME (versioni 1.9 e 2) e VSEARCH - esaminando le loro caratteristiche e valutando i loro risultati utilizzando tre diversi database - 16S-ITGDB, RPD, SILVA e Greengenes -, di cui due in due differenti aggiornamenti (SILVA versione 132 e versione 138, Greengenes versione 13.9 e versione 2). Le prestazioni di queste combinazioni saranno testate utilizzando due comunità fittizie e quattro set di dati relativi al bovino: latte, pelle, rumine e feci. Risultati I risultati dell'analisi delle comunità modello si concentrano sul valutare la capacità delle pipeline di predire con precisione la composizione delle comunità batteriche precedentemente documentate in letteratura. I dati reali, invece, consentono di indagare l'influenza delle pipeline e dei database sull'analisi dei dati in tre fasi: inizialmente, con un esperimento pilota con un set di dati (latte), una pipeline e un database, utilizzato per confrontare i risultati della stessa pipeline sullo stesso set di dati utilizzando gli altri database. Successivamente, con l'applicazione di 6 pipeline e 6 database (36 combinazioni in totale) sullo stesso set di dati (latte), e infine il confronto delle stesse 36 combinazioni su altri set di dati, per ottenere una visione più ampia sugli effetti delle pipeline di analisi e dei database microbici disponibili per i dati di sequenziamento del gene 16S rRNA del bovino. Conclusioni In definitiva, questo studio mira a stabilire uno standard e migliorare il processo decisionale nell'analisi di sequenziamento 16S, aumentando il rigore scientifico degli studi di analisi del microbioma nelle vacche da latte. MICCA e QIIME versione 2, in combinazione con il database SILVA versione 138, hanno mostrato i risultati più promettenti nel raggiungimento di questo obiettivo.
BIOINFORMATIC PIPELINES AND MICROBIAL DATABASES FOR THE ANALYSIS OF DAIRY COWS 16S RRNA-GENE SEQUENCING DATA: A BENCHMARKING STUDY / C. Gini ; tutor: F. Ceciliani ; coordinatore: F. Ceciliani. - Lodi. Dipartimento di Medicina Veterinaria e Scienze Animali, 2024 Sep 16. 36. ciclo, Anno Accademico 2022/2023.
BIOINFORMATIC PIPELINES AND MICROBIAL DATABASES FOR THE ANALYSIS OF DAIRY COWS 16S RRNA-GENE SEQUENCING DATA: A BENCHMARKING STUDY.
C. Gini
2024
Abstract
Background Understanding the microbiomes in different organisms is crucial across various scientific fields. To achieve this, specialized computer programs called bioinformatic pipelines have been developed to convert raw data into meaningful biological results. However, the rapid growth of this field in dairy cows’ research has led to a lack of guidelines on how these pipelines differ and what they can do. Methods This study aims to benchmark four different software – MICCA, QIIME (versions 1.9 and 2), and VSEARCH – by examining their features and evaluating their outputs using three different databases, two of which in two different releases. The performances of these combinations will be assessed using two mock communities and four dairy cows-related datasets: milk, skin, rumen, and feces. Results Mock communities’ analysis outcomes focus on evaluating the pipelines' ability to accurately predict the composition of bacterial communities previously documented in the literature. Real-life dataset instead allow to investigate the influence of pipelines and databases on dataset analysis at 3 stages: firstly, a pilot experiment with 1 dataset (milk), 1 pipeline (QIIME version 1.9), and 1 database (SILVA version 132) was used to compare with the results of the same pipeline on the same dataset using other 3 different databases, of which 1 in double release and 1 as the updated version of the one previously used. Then, the application of 4 pipelines (of which 2 with the 2 different approaches) and 4 databases (of which 2 in a double release, for an overall of 36 combinations) on the same dataset, and later, the benchmarking of the same 36 combinations on other datasets will allow to gain a broader view of the effects of the pipelines of analysis and microbial databases available for dairy cows 16S rRNA-gene sequencing data. Conclusions Ultimately, this study aimed to establish a standardized approach and enhance decision-making in 16S sequencing analysis, thereby improving the scientific rigor of microbiome analysis studies in dairy cows. MICCA and QIIME version 2, in conjunction with the SILVA version 138 database, demonstrated the most promising results in achieving this goal.File | Dimensione | Formato | |
---|---|---|---|
phd_unimi_R12929.pdf
embargo fino al 27/02/2026
Descrizione: PhD thesis
Tipologia:
Altro
Dimensione
6.72 MB
Formato
Adobe PDF
|
6.72 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.