Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why restricted to 2014 onwards? #37

Open
sehoffmann opened this issue Aug 24, 2023 · 5 comments
Open

Why restricted to 2014 onwards? #37

sehoffmann opened this issue Aug 24, 2023 · 5 comments

Comments

@sehoffmann
Copy link

Dear Authors,

Thanks for curating this amazing dataset. We plan to use it to research continuous distribution shifts across time. For that, having long time-horizons available is very beneficial in order to highlight the shift (and to have enough time-points for extrapolation).

2014 is currently enforced as a hard-threshold in the code. I was wondering about the reason for that? A quick test revealed to me that older years are still accessible at the same API endpoint. Are there any bigger differences in format or data quality, for instance missing variables?

If so, I would be willing to submit a PR to make this older historic data available as well, given that it can be adapted to the current formats. I would be very glad if you could point me to the right directions.

Best Regards from Tübingen

@sehoffmann
Copy link
Author

By disabling the check, I was able to download data going back to 2007 without any extra modification. From 2006 downwards, the API endpoint seems to change.

@sehoffmann
Copy link
Author

Older PUMS data is available under this endpoint: https://www2.census.gov/programs-surveys/acs/data/pums/2003/

@sehoffmann
Copy link
Author

Ok, my understanding is that 2014 was excluded (the issue only really affects 2014) because the PINCP column contains empty strings which fails the string -> float conversion. I will submit a fix soon.

@tombewley
Copy link

Hi @sehoffmann, I've just come across this repo and I'm also interested in looking at longer time horizons. I just thought I'd quickly check whether you've been able to use this modified code successfully in your own work? I can see that your PR hasn't yet been merged, but if it's working for you then I may just adopt it in my local copy of the code.

@mrtzh
Copy link
Member

mrtzh commented May 1, 2024

Sorry for the slow response. Similar requests came up in the past. The reason we didn't implemented this at first is because some of the attribute encodings change. So while nothing may break loudly, you'd still have to worry about harmonizing feature encodings across different years. This was a task we didn't have sufficient resources to take on.

See, for example, the discussion in #22

Please let me know if you believe that you have a general solution to this problem. I think this would be certainly nice to have in the package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants