Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add --csv output to 'sourmash compare' #217

Merged
merged 3 commits into from
May 16, 2017
Merged

Conversation

ctb
Copy link
Contributor

@ctb ctb commented May 14, 2017

Fixes #167.

@taylorreiter @brooksph could one of you try this out and provide a small snippet of R code for reading in the resulting matrix? thx!

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@ctb ctb changed the title Add --csv output to 'sourmash compare' [WIP] Add --csv output to 'sourmash compare' May 14, 2017
Copy link
Contributor

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, waiting for R snippet.

@taylorreiter
Copy link
Contributor

Is this the functionality you wanted in the snippet of R code? Or did you want a plot as well?

# Read in data
sourmash_comp_matrix <- read.csv("tara_dummy_comp.csv")
# Label the rows
rownames(sourmash_comp_matrix) <- colnames(sourmash_comp_matrix)
# Check data
head(sourmash_comp_matrix)

@ctb
Copy link
Contributor Author

ctb commented May 16, 2017 via email

@taylorreiter
Copy link
Contributor

R code works. One would have to point R to their own compare matrix output file, and be in the working directory in which that file resides, but it works.

For plotting:

read_and_plot_sourmash_csv<-function(sourmash_comp_csv_name){ 
  # Read in data
    sourmash_comp_matrix <- read.csv(sourmash_comp_csv_name)
  # Label the rows
    rownames(sourmash_comp_matrix) <- colnames(sourmash_comp_matrix)
  
  # format for plotting
    x  <- as.matrix(sourmash_comp_matrix)
    dd.row <- as.dendrogram(hclust(dist(x)))
    row.ord <- order.dendrogram(dd.row)

    dd.col <- as.dendrogram(hclust(dist(t(x))))
    col.ord <- order.dendrogram(dd.col)
  # Call packages and plot
  # modified from http://latticeextra.r-forge.r-project.org/#dendrogramGrob&theme=default
  library(lattice)
  library(latticeExtra)
    levelplot(x[row.ord, col.ord],
              aspect = "fill",
              scales = list(x = list(rot = 90)),
              colorkey = list(space = "left"),
              legend =
                list(right =
                       list(fun = dendrogramGrob,
                            args =
                              list(x = dd.col, ord = col.ord,
                                   side = "right",
                                   size = 5)),
                     top =
                       list(fun = dendrogramGrob,
                            args =
                              list(x = dd.row, ord = row.ord, 
                                   side = "top",
                                   size = 5))))
}

# Usage
read_and_plot_sourmash_csv("tara_dummy_comp.csv")

@ctb
Copy link
Contributor Author

ctb commented May 16, 2017

thank you @taylorreiter!

@codecov-io
Copy link

codecov-io commented May 16, 2017

Codecov Report

Merging #217 into master will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #217      +/-   ##
==========================================
+ Coverage    85.6%   85.67%   +0.06%     
==========================================
  Files          13       13              
  Lines        1897     1906       +9     
  Branches       52       52              
==========================================
+ Hits         1624     1633       +9     
  Misses        262      262              
  Partials       11       11
Impacted Files Coverage Δ
sourmash_lib/commands.py 90.09% <100%> (+0.13%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6f6effd...8abd957. Read the comment docs.

@ctb ctb mentioned this pull request May 16, 2017
5 tasks
@ctb ctb merged commit 3cd0fe2 into master May 16, 2017
@ctb ctb deleted the output/compare_matrix branch May 16, 2017 16:04
@taylorreiter
Copy link
Contributor

@ctb I can write these as functions or as documentation. Everything is in base R except Rtsne

# cd ecoli_many_sigs
# 
# curl -O -L https://github.com/dib-lab/sourmash/raw/master/data/eschericia-sigs.tar.gz
# 
# tar xzf eschericia-sigs.tar.gz
# rm eschericia-sigs.tar.gz
# 
# sourmash compare --csv ecoli.comp.csv *sig



sourmash_comp_matrix <- read.csv("~/Desktop/ecoli_many_sigs/ecoli.comp.csv")

# Label the rows
rownames(sourmash_comp_matrix) <- colnames(sourmash_comp_matrix)

# Transform for plotting
sourmash_comp_matrix <- as.matrix(sourmash_comp_matrix)


# make an mds plot
    fit <- dist(sourmash_comp_matrix)
    fit <- cmdscale(fit)
    x <- fit[, 1]
    y <- fit[, 2]
    plot(fit[ , 1], fit[ , 2], xlab = "Dimension 1", ylab = "Dimesion 2")
    text(x, y, pos = 4, labels = row.names(fit))

# make a tsne plot

library(Rtsne)
tsne_model <- Rtsne(sourmash_comp_matrix, check_duplicates=FALSE, pca=TRUE, perplexity=5, theta=0.5, dims=2)
d_tsne = as.data.frame(tsne_model$Y) 
plot(d_tsne$V1, d_tsne$V2)

# unclustered heatmap
heatmap(sourmash_comp_matrix, Colv=F, scale='none')

# clustered heatmap
hc.rows <- hclust(dist(sourmash_comp_matrix))
hc.cols <- hclust(dist(t(sourmash_comp_matrix)))
heatmap(sourmash_comp_matrix[cutree(hc.rows,k=2)==1,], Colv=as.dendrogram(hc.cols), scale='none')

@ctb
Copy link
Contributor Author

ctb commented Mar 10, 2018

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants