Tutorial on How-to-Reconstruct a phylogeny using RAxML with a Graphical User Interface (GUI)

Tutorial video on how to use RAxML for constructing a phylogenetic tree with the DocMind Analyst 

Working with a RAxML GUI within the DocMind Analyst Software Suite

The final tutorial for a maximum likelihood phylogenetic reconstruction is about RAxML. It is THE standard tool that has been used for years all over the world. When you read a scientific publication, most likely was RAxML used for phylogenetic reconstruction. And since we don’t want to miss it and since it is still considered as a kind of gold standard, it is part of the DocMind Analyst.

Input File Requirements

Running the program is as easy as for the other tools. You just need to provide a subfolder in your current working directory and name it “Alignment”. In this subfolder you need to locate your FASTA alignment file called “Core_Alignment.faa”. While this file can be any alignment file you have generated, it is recommended that you only use alignment files generated by this pipeline. As usual, if you run the phylogenetic reconstruction RAxML GUI as part of the pipeline, you don’t need to worry about these input requirements since they are already set up by the previous steps. In order to submit the job, you just need to choose “RAxML” from the combo box in the “Variant calling and Phylogeny Options” panel and you need to check the “Phylogeny” box in the “Pipeline Options” panel. Press the “Submit Job” button to start the job. However, you need to agree with the DocMind Analyst “Terms of Service” by checking the respective box.

Module Options

There are no basic options to consider. You could simply start. However, if you use RAxML you might want to check out the possibilities you have. The “Standard RAxML Advanced Options” panel provides you with two options. If you have read the chapters for the other tools, you will be quite aware of their meaning. “Model” indicates the choice for the DNA substitution model. You can only choose the GTR model. It allows substitution of the four nucleotides in unequal frequencies and a distinct rate for each of the six pairwise nucleotide substitutions. It is not constrained by any of these parameters compared to other models.

You can select the GTR model with a CAT or Gamma correction of rate heterogeneity as well as a combination with the invariable-sites model (I). Models and their background are described in the IQTree tutorial. The original program has a lot of more models to choose from. If you are interested in more choices you can contact us.

Regarding the number of bootstrap replicates you can choose it with the “Number of replicates” option. This follows the same principles as described in the IQTree tutorial.

Module Output Files

Output files can be found after completion of analysis in the subfolder “RAxML_Results” in your current working directory. The most important file is the Tree File “RAxML_bipartitions.RAxML”. You can import it into FigTree or equal tree visualization tools. Trees can be displayed with that file, including the bootstrap values.

Finally, one word regarding the computational resources. RAxML is a hardware eater. For large alignments with many sequences you need many cores and RAM memory. It is highly advisable to estimate the memory requirements before starting the job. You can do it on the RAxML website

One example: If you have a core genome with 5 million positions and 600 samples in your alignment, you will need approx. 365 GB RAM. The m5.24xlarge would be an appropriate instance in this case. For even higher requirements you need to check out the memory optimized instances. Since these instances have quite a high price, you need to think about either to reduce the number of alignment positions by removing the invariable sites (for instance using the Gubbins output file) or to use faster and less computational demanding but accurate tools like IQTree or FastTree in the first place. 

Input Files: Any alignment file in FASTA format. The files needs to be located in a folder called “Alignment” in your current working directory. The file needs to be named “Core_Alignment.faa”. 

Output Files: The most important one is the tree file called “RAxML_bipartitions.RAxML”. You find it in the subfolder named “RAxML_Results” in your current working directory. 

Log-File: “RAxML.log”. You will find this file in your current working directory. Always check this file for potential error messages.   

Storage: RAxML needs only very little free disk space for temporary files. You are safe if you allow for 1 GB free disk space.  

Recommended instance type: At least  m5.4xlarge/m4.4xlarge and up to m5.24xlarge or memory optimized instances. When your instance is stopped, you can change your instance type by clicking on Actions -> Instance Settings -> Change Instance Type. You might need to request usage of faster instances from the AWS support.

Timeframe: Approx. 3 minutes for a 10-strains alignment (depending on the AWS instance and the alignment length).

Close Menu