Problem with inconsistent lines output from K-means clustering #1253

magnusdottir · 2023-09-07T13:58:50Z

Hi,
I'm having problems with plotHeatmap and K-means clustering. I am not able to reproducibly plot heatmaps, as the numbers of lines in them varies between runs, even when giving the exact same command. Somehow it seems like the initial runs have the expected line density but then subsequent runs don't, which is strange as I'm running scripts on SLURM. I do get a high number of lines in unclustered heat maps, and fewer clusters tend to perform "better" in terms of including the full number of lines, but this is still erratic.

E.g. I have two data points that I ran a matrix for and plotted a heat map. This looked good without clustering and then good as well with two clusters as well as three clusters but four clusters gave me what looks like a much lower density (in terms of lines) heat map.

I was outputting .pdf files and started thinking this might be something to do with how the program outputs/plots pdfs. I therefore ran the exact same script with a .png output and it gave the more dense heat map (i.e. what appears to have the same total line density as the original heat map and the 2 cluster heatmap). But then increasing to 6 clusters gave me a coarser heatmap again, and THEN going back to 4 clusters, still WITH .png gave me the less dense heatmap again. The only thing I've changed in the below script between runs is the --kmeans cluster number and the file name.

Python: Python/3.9.6-GCCcore-11.2.0
deepTools: deepTools/3.5.1-foss-2021b

This is my code with the file names changed that gave the different results when run two different times on the same matrix:

plotHeatmap     -m $outPath/Matrix/Matrix_TSS_2Kb \
                -out $outPath/Plots/TSS_2Kb_4Clusters.png \
                --colorMap RdBu \
                --whatToShow 'heatmap and colorbar' \
                --zMin -3 --zMax 3 \
                --kmeans 4

This is the top of the clustered heat map that I get, with the right hand side plot seeming to be a lot sparser in terms of number of data points:

Has anyone had a similar problem?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with inconsistent lines output from K-means clustering #1253

Problem with inconsistent lines output from K-means clustering #1253

magnusdottir commented Sep 7, 2023

Problem with inconsistent lines output from K-means clustering #1253

Problem with inconsistent lines output from K-means clustering #1253

Comments

magnusdottir commented Sep 7, 2023