Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to grouped-distribution scale? (instead of equal buckets) #7

Open
daguar opened this issue Jan 17, 2014 · 5 comments
Open

Default to grouped-distribution scale? (instead of equal buckets) #7

daguar opened this issue Jan 17, 2014 · 5 comments

Comments

@daguar
Copy link
Owner

daguar commented Jan 17, 2014

@lyzidiamond raised this issue. Right now the data is scaled really simply (given min and max, break it into N buckets, and put each value in a bucket, called the Quantize scale in D3.)

This is analytically-problematic for values that are close to the bucket boundaries (since otherwise-similar values may be broken out.) @lyzidiamond mentioned that ArcGIS defaults to a grouped-distribution, and so might be worth using that instead.

Other options:

  • Warn user when a quantize scale has many boundary problems
  • Make scaling approach a configurable parameter

cc @itsthomson, in case you have an opinion

@itsthomson
Copy link

Is your goal to have each bucket equally sized? If so, +1 to grouping by quantiles--it's more robust against any kind of distribution.

@daguar
Copy link
Owner Author

daguar commented Jan 17, 2014

@itsthomson I should clarify. Right now, the interval size is equal across buckets (example, 5-bucket 1-10 scale is 0-2, >2-4, etc.)

So you're saying a quantile scale (equal number of data points per bucket) is more robust for an arbitrary distro, yes?

@daguar
Copy link
Owner Author

daguar commented Jan 17, 2014

Also, high-level goal is really "most clearly shows gradients of an arbitrary distribution."

If there's an intermediate metric to instrument the distro (variance?) I'd be even into using that, and then conditioning the scaling on it.

@Mr0grog
Copy link

Mr0grog commented Jan 18, 2014

This potentially a really naive question, but would just using gradients instead of buckets be the simple solution here?

@lyzidiamond
Copy link

Maybe I wasn't understanding how it was being grouped before. I believed that if there were 5 buckets, assuming 50 states, the lowest 10 FEATURES would go in the first group, the next lowest ten FEATURES in the second, etc. If they are being broken up equally as you describe (assuming 50 values ranging from 0 to 20, 5 buckets, first bucket is 0-4 with a variable number of features, next is 5-8 with a variable number of features, etc.) it's not as much of a big deal as I had stated previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants