Tutorial on How-to-Produce a hierarchically clustered heatmap from your 16S rRNA data with a Graphical User Interface (GUI)
Tutorial video on how to generate a hierarchically clustered heatmap from your 16S rRNA classifcation results using the DocMind Analyst GUI
Microbiome heatmap based on 16S rRNA within the DocMind Analyst Software Suite
The last module of the 16S rRNA pipeline will perform a hierarchical cluster analysis in order to investigate the taxonomic relatedness between your samples and visualize it using a heatmap design.
Input File Requirements
In order to make the analysis work, you need to provide a subfolder called “Results_sample_set” in your current working directory. In that folder, you need to include output files from the previous pipeline module (RDP classification, find tutorial here) with the ending “_merged_final.csv”. You can include different taxonomic ranks if you wish. The analysis will be performed for several ranks in this case. You also need to check the “Hierarchically-clustered heatmap” box in the “Pipeline Options” panel. If you simply run this analysis as part of the pipeline you don’t need to worry about these requirements since all files and folders will be automatically generated in the previous steps of the pipeline.
Module Options and Output Files
You can simply perform the analysis if you are a beginner. It will produce a heatmap that is most likely useful for you. You find these PNG files in the “Results_sample_set” folder. You can open them with any imaging processing software. The files have the ending “.png”.
You can also modify the heatmap by changing some of the parameters given in the “Hierarchically-clustered heatmap” panel. It starts with the “Image Resolution (dpi)”. Default is 100 dots per inch (dpi) which you can easily increase if you need it as figure in a publication (300 – 600 dpi recommended, see the journals instruction). In the next step, you can also choose between many different linkage criteria (“Linkage Method”) and distance metrics (“Distance Metric”). This is a very complex topic and we recommend starting with the Wikipedia website for an introduction.
Checking the “Standardization” box leads seaborn to standardize the relative abundance, meaning that for each taxon it will subtract the minimum and divide it by the maximum of the taxon. This is the case in the example figure above. The color bar indicates the relative abundance with 1 (maximum) to 0 (minimum). You can see three samples, one microbiome from a healthy person, one with a colonic adenoma and one with colon cancer. The clustering revealed that the “cancer” and “adenoma” microbiome are a bit closer related to each other compared to the “healthy” microbiome.
The samples are displayed on the y-axis. When you want the font to rotate, you can do it with the “Y Label Rotation” field. On the x-axis, the bacterial families are listed. The heatmap shows the standardized relative abundance for each phylum. Here you can see that Fusobacteria are much more common in the “cancer” microbiome than in the others, a phenomenon that has been previously described. The rotation degree of the x-axis fonts can be changed applying new values to the “X Label Rotation” field.
The final option are the figure width and height. The size of your figure can be very important. The default size (width 12, height 12) is usually not enough to display all taxa on the x -axis or samples on the y-axis if there are more than 50 labels on one of these axes. In this case, the graph itself is still correct but it will only display a fraction of all labels. By changing the width and/or height of the figure you can control this. Best approach is usually to test a certain width and height and to check whether all labels are shown on the axes. However, it might be a good idea to start with the default values since this will give you a great overview over the structure of your data set.
Output files will be images with the ending “.png”. You can open them with any image viewer software. They are importable in programs like Microsoft Word or other text editing programs. Most journal will accept them directly. They can also be easily converted to other formats like “.tif” or “.jpeg”.
Input Files: Taxonomic classification files produced from the RDP module. Files must have the ending “_merged_final.csv” and be stored in a folder named “Results_sample_set” in your current working directory.
Output Files: PNG files with the ending “.png”. You find them in the subfolder named “Results_sample_set” in your current working directory.
Log-File: “Heatmap.log”. You will find this file in your current working directory. Always check this file for potential error messages.
Storage: Just enough space for the PNG files. Usually only some MB.
Recommended instance type: At least m5.large/m4.large. When your instance is stopped, you can change your instance type by clicking on Actions -> Instance Settings -> Change Instance Type. You might need to request usage of faster instances from the AWS support.
Timeframe: Approx. 1 minute for the entire analysis, even in case of multiple taxon input CSV files.