Tutorial on How-to-Remove Chimeric 16S rRNA Reads using Mothur with a Graphical User Interface (GUI)
Tutorial video on how to clean 16S rRNA reads from chimeric sequences using Mothur within the DocMind Analyst software
Working with a Mothur GUI to remove chimeric reads
If you really want to make a taxonomic classification based on your 16S rRNA sequence reads, it is simply not enough to just trim the reads and discard the bad ones. You will need to find the chimeric reads and must remove them too since they will lead to false a classification otherwise. Look here for an introduction to chimeric reads. Using DocMind-Analyst, this is very easy. You just need to provide sequence reads with the “_HQ” tag (e.g. ID1_HQ_1.fastq(.gz) and ID1_HQ_2.fastq(.gz)) in the current working directory. These reads are free of adapters and contain high quality sequences. You can get them by running the first step of the pipeline, the Quality Trimming. Click here for a Trimmomatic tutorial on that topic. Check the “Chimera Removal checkbox” in the “Pipeline Options” panel in order to tell the DocMind Analyst to specifically use this module.
The DM-Analyst will use the chimera.vsearch command included in the mothur package. There are no options you need to decide about. After analysis, you will find the output files in a subfolder in you current working directory. It’s called “HQ_reads”. Don’t wonder, you will just find one file per samples although you had two files before (forward and reverse reads). They are now merged into one file, and this file is a FASTA-File, meaning that it does not contain quality information anymore (as this is the case for FASTQ-Files). It will have the structure “YourPrefix_HQ.fasta, e.g. ID1_HQ.fasta). These FASTA output files are the input for the next step – the RDP classifier which is going to perform taxonomic classification.
Check list for Chimeric Read Removal
Input Files: FASTQ files either uncompressed (“*.fastq”) or compressed (“*.fastq.gz”). Forward reads indicated by “*_HQ_1.fastq(.gz)”, reverse reads by “*_HQ_2.fastq(.gz)”. The “*” indicates the file name. Forward and reverse read files need to have the same file name. Save the files in your current working directory.
Output Files: These are FASTA files with the extension “*.fasta”. You find them in the subfolder named “HQ_reads” in your current working directory.
Log-File: “chimera.log”. You will find this file in your current working directory. Always check this file for potential error messages.
Storage: Allow enough space for the output files. These files will require at least as much storage space than your input files. We recommend that you provide at least double as much disk space as you need for your input files. This is very important. If you fail to provide enough storage volumes, your analysis will be canceled at some point. You can change the volume size of your instance by selecting the appropriate volume in your AWS console (under volumes) and increase it. Note, you cannot decrease it.
Recommended instance type: At least m5.2xlarge/m4.2xlarge. When your instance is stopped, you can change your instance type by clicking on Actions -> Instance Settings -> Change Instance Type. You might need to request usage of faster instances from the AWS support.
Timeframe: Approx. 3 – 4 minutes per read pair (depending on the AWS instance and the sequencing coverage).