Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting issue #8

Closed
itsdfish opened this issue May 18, 2019 · 6 comments
Closed

Plotting issue #8

itsdfish opened this issue May 18, 2019 · 6 comments

Comments

@itsdfish
Copy link
Collaborator

Hi Rob-

I am implementing a plotdensity and plotsummary function to automate the generation of the plots we currently use (see the plotting branch). However, I experiencing a technical problem when I pass a variable to the group keyword of density. Unfortunately, I haven't found any help on Discourse yet. I'm not sure if you might have a solution. Here is the plot density function and helper functions:

using StatsPlots,DataFrames

"""
`df`: dataframe of results
`metric`: name of metric, such as :ess for effective sample size
`group`: a tuple of grouping factors, e.g. (:sampler,:Nd)
`save`: save=true saves each plot
`figfmt`: figure format
"""
function plotdensity(df::DataFrame,metric::Symbol,group=(:sampler,);save=false,
    figfmt="pdf",options...)
    plots = Plots.Plot[]
    layout = SetLayout(df,group)
    for c in names(df)
        !isin(metric,c) ? (continue) : nothing
        xlabel = string(c)
        p=@df df density(cols(c),group=cols(group),grid=false,xlabel=xlabel,
            ylabel="Density",layout=layout,fill=(0,.5),width=1.5,options...)
        push!(plots,p)
        save ? savefig(p,string(c,".",figfmt)) : nothing
    end
    return plots
end

"""
Test whether column is a metric, e.g. mu_ess for ess.
"""
function isin(metric,col)
    occursin(string(metric),string(col))
end

"""
Creates a layout = (n,1) where n is the number of factors
in the last grouping variable.
"""
function SetLayout(df,group)
    isempty(group) ? (return (1,1)) : nothing
    length(group) == 1 ? (return (1,1)) : nothing
    col = group[end]
    n = length(unique(df[col]))
    return (n,1)
end

Here is an example that produces the problem:


df = DataFrame(mu_ess=rand(100),sigma_ess=rand(100),a=rand(1:2,100),b=rand(1:2,100))
plotdensity(df,:ess,(:a,:b))

Everything works if I replace group = cols(group) with hardcoded values group=(:a,:b). So I can say confidently group = cols(group) is the only problem. Any ideas?

@goedman
Copy link
Member

goedman commented May 18, 2019

Chris, in plot density, group is a positional keyword, shouldn't it be:

using StatsPlots,DataFrames

"""
`df`: dataframe of results
`metric`: name of metric, such as :ess for effective sample size
`group`: a tuple of grouping factors, e.g. (:sampler,:Nd)
`save`: save=true saves each plot
`figfmt`: figure format
"""
function plotdensity(df::DataFrame,metric::Symbol,group=(:sampler,);save=false,
    figfmt="pdf",options...)
    plots = Plots.Plot[]
    layout = SetLayout(df,group)
    for c in names(df)
        !isin(metric,c) ? (continue) : nothing
        xlabel = string(c)
        p=@df df density(cols(c),cols(group),grid=false,xlabel=xlabel,
            ylabel="Density",layout=layout,fill=(0,.5),width=1.5,options...)
        push!(plots,p)
        save ? savefig(p,string(c,".",figfmt)) : nothing
    end
    return plots
end

"""
Test whether column is a metric, e.g. mu_ess for ess.
"""
function isin(metric,col)
    occursin(string(metric),string(col))
end

"""
Creates a layout = (n,1) where n is the number of factors
in the last grouping variable.
"""
function SetLayout(df,group)
    isempty(group) ? (return (1,1)) : nothing
    length(group) == 1 ? (return (1,1)) : nothing
    col = group[end]
    n = length(unique(df[col]))
    return (n,1)
end

df = DataFrame(mu_ess=rand(100),sigma_ess=rand(100),a=rand(1:2,100),b=rand(1:2,100))
plotdensity(df,:ess,(:a,:b))

test_cols.pdf

@itsdfish
Copy link
Collaborator Author

itsdfish commented May 18, 2019

Thanks Rob. It looks like what you suggested creates a different graph. Here is the desired result for one variable using:

@df df density(:mu_ess,group=(:a,:b),layout=(2,1))

This will create a density for each combination of values for factors a and b, resulting in four density lines displayed in two subplots.

test_keyword.pdf

@itsdfish
Copy link
Collaborator Author

Here is what happens in the simplest case:

group = (:a,:b)

 p = @df df density(:mu_ess,group=cols(group),layout=(2,1))

Error:

ERROR: MethodError: no method matching extractGroupArgs(::Array{Int64,2}, ::Array{Float64,1})

@itsdfish
Copy link
Collaborator Author

I found a workaround for the limitations of the macro. I'll be pushing the improved plotting functionality soon.

@itsdfish
Copy link
Collaborator Author

The plotting functionality is now available on master. The Gaussian example shows how it will work. Note that I did not rerun the benchmarks. So the file names will differ. Let me know if you want me to rerun them. Here is what it looks like:

dir = "results/"
#Plot mean run time as a function of number of data points (Nd) for each sampler
summaryPlots = plotsummary(results,:Nd,:time,(:sampler,);save=true,dir=dir)

#Plot density of effective sample size as function of number of data points (Nd) for each sampler
essPlots = plotdensity(results,:ess,(:sampler,:Nd);save=true,dir=dir)

#Plot density of rhat as function of number of data points (Nd) for each sampler
rhatPlots = plotdensity(results,:r_hat,(:sampler,:Nd);save=true,dir=dir)

#Plot density of time as function of number of data points (Nd) for each sampler
timePlots = plotdensity(results,:time,(:sampler,:Nd);save=true,dir=dir)

#Scatter plot of epsilon and effective sample size as function of number of data points (Nd) for each sampler
scatterPlots = plotscatter(results,:epsilon,:ess,(:sampler,:Nd);save=true,dir=dir)

The functions plotdensity, plotsummary and plotscatter programmatically generate plots associated with each parameter. So this will prove to be useful for models with many parameters.

@goedman
Copy link
Member

goedman commented May 19, 2019

Very nice! I’ll rerun the Gaussian example on my other machine, and LBA.

itsdfish added a commit that referenced this issue Mar 25, 2020
…23-23-07-02-616-1160984258

CompatHelper: bump compat for "StatsPlots" to "0.14"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants