Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Outlie in outlier page #87

Open
chfleming opened this issue May 22, 2019 · 15 comments
Open

Use Outlie in outlier page #87

chfleming opened this issue May 22, 2019 · 15 comments

Comments

@chfleming
Copy link
Contributor

@jmcalabrese @NoonanM

Mike and I both prefer selecting outliers just from the plot of the output of outlie, rather than the histograms.

For instance:

OUT <- outlie(DATA)
plot(OUT)

It seems easier to see the outliers in plot(OUT) than in the histograms of OUT$speed and OUT$distance.

@xhdong-umd
Copy link
Contributor

That will be a big challenge to current structure. If we use the outlie plot, do you mean

  • speed and distance will be handled in one page instead of separate?
  • the histogram is not needed, and the scatter plot is not needed too
  • the detalied path for selected points are also in outlie plot

So just a plot of outlie results, and a table showing selected points, that will be all?

@chfleming
Copy link
Contributor Author

outlie() has a plot side effect and you have a similar plot. I think that's fine.

outlie() also returns a 2-column data.frame, with columns speed and distance. Personally, I would replace the 2 histogram (+ selector) plots with 1 data.frame (+ selector) plot. I think its easier/cleaner to just look at the 2D data rather than looking at 2 histogram estimates of the marginal distributions of that data.

With your plot similar to the side effect of outlie(), I know that you use different "distance" statistics with a sliding window. I don't have a preference for which "distance" estimates are used, just that the selection can be made on the speed-versus-distance plot rather than the speed & distance histograms.

@xhdong-umd xhdong-umd changed the title Outlier histograms Use Outlie in outlier page Oct 19, 2019
@xhdong-umd
Copy link
Contributor

@chfleming What's your typical workflow to use outlie?
There is a side effect scatter plot, and the plot of outlie result. How do you read the plot? Read the result plot, pick some threshold then filter the result data frame? How do you actually filter the data, since the result data.frame only have t, distance etc, do you map these points back to original data, or you filter on original data with some process?

image

image

And do you have to deal with each animal individually, or can you filter them at the same time?

@chfleming
Copy link
Contributor Author

@xhdong-umd The side-effect plot is just for a quick visual check to see if anything abnormal is sticking out. Threshold based filtering can be done with the data frame output, usually based on speed (or the vertical deviation of terrestrial animals, when that information is present). The rows of the outlie() data frame correspond to the rows of the telemetry data frame.

I've I applied a threshold to multiple individuals in a loop.

@xhdong-umd
Copy link
Contributor

@chfleming I'm trying to show the side-effect plot but meet some difficulties.

  • I want to deal with multiple individuals, so I need to show side-effect plot for multiple individuals at the same time.
  • I have to run outlie function, record the base plot for each individual then plot them together. The result is less ideal
    image

I used cowplot to arrange them, and I have to convert each base plot with as_grob, as arranging recorded plots directly doesn't work.

  1. I have to adjust label size to make sure the axis and labels show up, even with adjustment, the axis still didn't show up completely. And the canvas box show up differently.
  2. the plot could have margin too big error and not showing up at all depend on label size or plot size
  3. I'd like to add title of animal name. I can add title to each plot right before recording the plot, but somehow the final plot became much, much slower. It took about 20 seconds to generate the plot comparing to about 2-3 seconds.

I think there are multiple approaches to deal with these:

  1. The ideal solution will be change the plot to ggplot, which is easier to arrange and less likely to have the margin/label size problem. However I guess too much work will be needed, and you will need extra dependency of ggplot.
  2. The simplest way to solve the problem is only allow user to check one individual at a time. The problem is user will need multiple steps to check a dataset with multiple individuals. This will be a step back comparing to the ideal workflow.
  3. Or we can skip the title, ignore the imperfection of plots if that's OK ...

@chfleming
Copy link
Contributor Author

The way I have coded this for multiple individuals in a list argument is to plot one-by-one with individual titles on each plot. If run after par(mfrow=c(x,y)) this will produce a table of plots like you want. The ... argument is only passed to segments and points used to make the plot. Is there not an easy way to fix this with base plot if ... is not doing what you want at the moment?

@xhdong-umd
Copy link
Contributor

Oh I forgot outlie can take a list. I have been running it one by one. Let me try again with the list input.

@xhdong-umd
Copy link
Contributor

list input worked and it's much better. Do you also need the histogram plot?

@chfleming
Copy link
Contributor Author

I think the histogram-like plot can be useful for selecting a threshold. I can update the plot method to plot those all together from a list.

@xhdong-umd
Copy link
Contributor

That will be great! Thanks!

@chfleming
Copy link
Contributor Author

I just pushed an update that makes plot.outlie work on lists of outlie objects.

@xhdong-umd
Copy link
Contributor

xhdong-umd commented Apr 20, 2022

I'm having this error:

>   res_list <- outlie(buffalo)
DOP values missing. Assuming DOP=1.
DOP values missing. Assuming DOP=1.
DOP values missing. Assuming DOP=1.
DOP values missing. Assuming DOP=1.
DOP values missing. Assuming DOP=1.
DOP values missing. Assuming DOP=1.
>   scatter_p <- recordPlot()
> plot(res_list)
Error in 1:n : argument of length 0

@chfleming
Copy link
Contributor Author

I found an error with plot=FALSE (not relevant here), but I'm not getting this error?

@xhdong-umd
Copy link
Contributor

After restarting R session I don't have this error anymore. Now the histogram is just one plot for all the individuals in the list, right?

@chfleming
Copy link
Contributor Author

Yes, that plot is compiling all of the outputs into one plot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants