Tutorial on How-to-Do a Skesa Assembly with a Graphical User Interface (GUI)
Tutorial video on how to assemble your Illumina sequence reads using the Skesa assembler with a GUI
Whole Genome Sequence (WGS) Assembly using a Skesa GUI within the DocMind Analyst Software Suite
The last assembler option in the “Assembler Choice” panel is “Skesa”. This is a novel assembler that is supposed to be ultrafast. And indeed, it can assemble genomes from short read Illumina data in minutes. As with the other assemblers, this Skesa GUI module works with paired-end reads, so please make sure you provide both files per sample. If you want to assemble unpaired single-reads, contact the support.
Input File Requirements
Input files are trimmed high quality FASTQ files. Both FASTQ files of a pair must have the same prefix and a “_HQ” tag to display that these are high quality reads. The forward read FASTQ must have the “_1”, the reverse read (R2) the “_2” tag. Following the above example, files with the name “ID1” would be named “ID1_HQ_1.fastq(.gz)” and ID1_HQ_2.fastq(.gz)”. If you are running the assembly pipeline you don’t need to worry it that since these files were produced during the trimming step.
As with A5, a certain advantage of using the Skesa pipeline is that you don’t need to worry about options. It is programmed in a way that the default settings are used. In our experience, this leads to very appropriate results, although we recommend using the post-processing step when using Skesa (click here for the postprocessing tutorial). If you want to use other setting that the default one, have a look at the available options in the Skesa documentation. In case you want to run your analysis differently from the default, you can contact us and we can set up an individual analysis for you.
When you are happy with all settings, check that you agree to the Terms of Service and press the “Submit Job” button. Once started, you will see that new files are generated in your folder with the reads. The Skesa GUI module is designed that all read pairs that you have provided in your current working directory folder will be processed, so you don’t need to do anything but wait until it is finished. You can check the status of your job in the monitor panel (home -> Monitor) by pressing the “Refresh” button. An “R” stands for a running job while “C” means that the job has been completed. In your working folder, you will find a file called “Assembly_parameter.txt”. In that file all your settings for that run are documented.
Module Output Files
Skesa will create FASTA assembly files as output. You find these files in a folder named “Skesa_Final_Assemblies” within the specified current working directory. The assemblies will have the same prefix you used in your FASTQ filenames (e.g. ID1) and the ending “.fasta”. Congratulation! At this point you have finished assembly of your sequence reads.
Check list for Skesa
Input Files: FASTQ files either uncompressed (“*.fastq”) or compressed (“*.fastq.gz”). Forward reads indicated by “*_HQ_1.fastq(.gz)”, reverse reads by “*_HQ_2.fastq(.gz)”. The “*” indicates the file name. Forward and reverse read files need to have the same file name.
Output Files: These are FASTA files with the extension “*.fasta”. You find them in the subfolder named “Skesa_Final_Assemblies” in your current working directory.
Log-File: “skesa.log”. You will find this file in your current working directory. Always check this file for potential error messages.
Storage: Skesa will just produce your FASTA output files. Be prepared toprovide space for these (approx. 6 MB per bacterial genome, depending on its actual size). You can change the volume size of your instance by selecting the appropriate volume in your AWS console (under volumes) and increase it. Note, you cannot decrease it.
Recommended instance type: At least m5.4xlarge/m4.4xlarge. When your instance is stopped, you can change your instance type by clicking on Actions -> Instance Settings -> Change Instance Type. You might need to request usage of faster instances from the AWS support.
Timeframe: Approx. 3 – 7 minutes per assembly (depending on the AWS instance and the sequencing coverage).