Tutorial on How-to-Do an A5 Assembly with a Graphical User Interface (GUI)
Tutorial video on how to assemble your Illumina sequence reads using the A5 assembler with a GUI
Whole Genome Sequence (WGS) A5 Assembler GUI within the DocMind Analyst Software Suite
Instead of SPAdes, you might use the A5-MiSeq assembler in the “Assembler Choice” panel and select “A5_miseq” in the “Pipeline Options” panel. This could be the case if you have FASTQ files with reads generated by the MiSeq sequencer. These reads are generally ≥ 250 base pairs long. With this sequence length, the A5 assembler might get you even better results than SPAdes. As with SPAdes, this A5 Assembler GUI module works with paired-end reads, so please make sure you provide both files per sample. If you want to assemble unpaired single-reads, contact the support.
Input File Requirements
Input files are trimmed high quality FASTQ files. Both FASTQ files of a pair must have the same prefix and a “_HQ” tag to display that these are high quality reads. The forward read FASTQ must have the “_1”, the reverse read (R2) the “_2” tag. Following the above example, files with the name “ID1” would be named “ID1_HQ_1.fastq(.gz)” and ID1_HQ_2.fastq(.gz)”. If you are running the assembly pipeline you don’t need to worry it that since these files were produced during the trimming step.
A certain advantage of using the A5 Assembler GUI module is the lack of options which makes it easy to handle for beginners. The disadvantage is that fine tuning for special situations is difficult, and you need to accept the output as it is. Additionally, A5 is a bit slower compared to SPAdes, particularly for sequence data with a high coverage (e.g. output of HiSeq2500, rapid mode, 2x250bp).
When you are happy with all settings, check that you agree to the Terms of Service and press the “Submit Job” button. Once started, you will see that new files are generated in your folder with the reads. The A5 assembler GUI module is designed that all read pairs that you have provided in your current working directory folder will be processed, so you don’t need to do anything but wait until it is finished. You can check the status of your job in the monitor panel (home -> Monitor) by pressing the “Refresh” button. An “R” stands for a running job while “C” means that the job has been completed. In your working folder, you will find a file called “Assembly_parameter.txt”. In that file all your settings for that run are documented.
Module Output Files
A5 will generate a couple of files and write them into your current working directory. Please check A5 documentation if you want to know more about these files. Particularly worth mentioning are files with the ending “assembly_stats.csv.” They contain statistics of each assembly and deserve a look. If you are just interested in your assembly files you can find them in a folder named “A5_Final_Assemblies”. The assemblies will have the same prefix you used in your FASTQ filenames (e.g. ID1) and the ending “.fasta”. Congratulation! At this point you have finished assembly of your sequence reads.
Check list for A5
Input Files: FASTQ files either uncompressed (“*.fastq”) or compressed (“*.fastq.gz”). Forward reads indicated by “*_HQ_1.fastq(.gz)”, reverse reads by “*_HQ_2.fastq(.gz)”. The “*” indicates the file name. Forward and reverse read files need to have the same file name.
Output Files: These are FASTA files with the extension “*.fasta”. You find them in the subfolder named “A5_Final_Assemblies” in your current working directory.
Log-File: “A5.log”. You will find this file in your current working directory. Always check this file for potential error messages.
Storage: A5 needs some free disk space for temporary files. When using compressed FASTQ (.gz ending), plan with 10 times the volume of both FASTQ. E.g. you have 2 x 200 MB FASTQ, allow for 4 GB free disk space. For uncompressed FASTQ plan with 2 times the storage. E.g. you have 2 x 1000 MB FASTQ, allow for 4 GB disk space. This is very important. If you fail to provide enough storage volumes, your assembly will be canceled at some point. You can change the volume size of your instance by selecting the appropriate volume in your AWS console (under volumes) and increase it. Note, you cannot decrease it.
Recommended instance type: At least m5.4xlarge/m4.4xlarge. When your instance is stopped, you can change your instance type by clicking on Actions -> Instance Settings -> Change Instance Type. You might need to request usage of faster instances from the AWS support.
Timeframe: Approx. 20 – 30 minutes per assembly (depending on the AWS instance and the sequencing coverage).