-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to grouped-distribution scale? (instead of equal buckets) #7
Comments
Is your goal to have each bucket equally sized? If so, +1 to grouping by quantiles--it's more robust against any kind of distribution. |
@itsthomson I should clarify. Right now, the interval size is equal across buckets (example, 5-bucket 1-10 scale is 0-2, >2-4, etc.) So you're saying a quantile scale (equal number of data points per bucket) is more robust for an arbitrary distro, yes? |
Also, high-level goal is really "most clearly shows gradients of an arbitrary distribution." If there's an intermediate metric to instrument the distro (variance?) I'd be even into using that, and then conditioning the scaling on it. |
This potentially a really naive question, but would just using gradients instead of buckets be the simple solution here? |
Maybe I wasn't understanding how it was being grouped before. I believed that if there were 5 buckets, assuming 50 states, the lowest 10 FEATURES would go in the first group, the next lowest ten FEATURES in the second, etc. If they are being broken up equally as you describe (assuming 50 values ranging from 0 to 20, 5 buckets, first bucket is 0-4 with a variable number of features, next is 5-8 with a variable number of features, etc.) it's not as much of a big deal as I had stated previously. |
@lyzidiamond raised this issue. Right now the data is scaled really simply (given min and max, break it into N buckets, and put each value in a bucket, called the Quantize scale in D3.)
This is analytically-problematic for values that are close to the bucket boundaries (since otherwise-similar values may be broken out.) @lyzidiamond mentioned that ArcGIS defaults to a grouped-distribution, and so might be worth using that instead.
Other options:
cc @itsthomson, in case you have an opinion
The text was updated successfully, but these errors were encountered: