-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why restricted to 2014 onwards? #37
Comments
By disabling the check, I was able to download data going back to 2007 without any extra modification. From 2006 downwards, the API endpoint seems to change. |
Older PUMS data is available under this endpoint: https://www2.census.gov/programs-surveys/acs/data/pums/2003/ |
Ok, my understanding is that 2014 was excluded (the issue only really affects 2014) because the PINCP column contains empty strings which fails the string -> float conversion. I will submit a fix soon. |
Hi @sehoffmann, I've just come across this repo and I'm also interested in looking at longer time horizons. I just thought I'd quickly check whether you've been able to use this modified code successfully in your own work? I can see that your PR hasn't yet been merged, but if it's working for you then I may just adopt it in my local copy of the code. |
Sorry for the slow response. Similar requests came up in the past. The reason we didn't implemented this at first is because some of the attribute encodings change. So while nothing may break loudly, you'd still have to worry about harmonizing feature encodings across different years. This was a task we didn't have sufficient resources to take on. See, for example, the discussion in #22 Please let me know if you believe that you have a general solution to this problem. I think this would be certainly nice to have in the package. |
Dear Authors,
Thanks for curating this amazing dataset. We plan to use it to research continuous distribution shifts across time. For that, having long time-horizons available is very beneficial in order to highlight the shift (and to have enough time-points for extrapolation).
2014 is currently enforced as a hard-threshold in the code. I was wondering about the reason for that? A quick test revealed to me that older years are still accessible at the same API endpoint. Are there any bigger differences in format or data quality, for instance missing variables?
If so, I would be willing to submit a PR to make this older historic data available as well, given that it can be adapted to the current formats. I would be very glad if you could point me to the right directions.
Best Regards from Tübingen
The text was updated successfully, but these errors were encountered: