Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for python-us #68

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
cdb6b0d
add additional shapefile urls
Jul 21, 2022
6e71179
Pin jellyfish to anything under 1.0
dgilmanAIDENTIFIED Sep 16, 2022
ce8b2a5
Add classifiers for python 3.9 and 3.10
dgilmanAIDENTIFIED Sep 16, 2022
dffd60f
Bump versions in github workflows
dgilmanAIDENTIFIED Sep 16, 2022
1458552
Bump pipfile versions
dgilmanAIDENTIFIED Sep 16, 2022
4880400
Black formatting
dgilmanAIDENTIFIED Sep 16, 2022
898be6a
Merge remote-tracking branch 'johnseekins/master'
dgilmanAIDENTIFIED Sep 16, 2022
af638ca
Remove unused fstring
dgilmanAIDENTIFIED Sep 16, 2022
8da70cd
Actually use the cache.
dgilmanAIDENTIFIED Sep 17, 2022
67ebc27
Add Midway Islands
dgilmanAIDENTIFIED Sep 17, 2022
c3fec65
Fix timezone lists, add test to sync with IANA timezone file
dgilmanAIDENTIFIED Sep 17, 2022
7bf2705
Delete redundant test_dc
dgilmanAIDENTIFIED Sep 17, 2022
6106066
Add tests for DC statehood and non-DC statehood
dgilmanAIDENTIFIED Sep 17, 2022
ff7bdad
Add birthday to constants
dgilmanAIDENTIFIED Sep 17, 2022
248df67
Install older pipenv on older pythons
dgilmanAIDENTIFIED Sep 17, 2022
c98bf84
Remove importlib_metadata dev dependency
dgilmanAIDENTIFIED Sep 17, 2022
5b2a2e6
Skip the lockfile in CI, which is a sin, but seemingly necessary
dgilmanAIDENTIFIED Sep 17, 2022
ec6899d
Black formatting
dgilmanAIDENTIFIED Sep 17, 2022
6c000b9
CLI typo
dgilmanAIDENTIFIED Sep 17, 2022
3237d98
Initial release notes
dgilmanAIDENTIFIED Sep 17, 2022
892cd52
Support customizable metaphone matching in lookup()
dgilmanAIDENTIFIED Sep 22, 2022
7a74025
Midway Islands not a territory :)
dgilmanAIDENTIFIED Sep 22, 2022
3f41a6b
Add additional metaphones for the US territories
dgilmanAIDENTIFIED Sep 22, 2022
5e85e6d
Black formatting
dgilmanAIDENTIFIED Sep 22, 2022
eccb429
Fix tests for Midway Islands
dgilmanAIDENTIFIED Sep 22, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions .github/workflows/pythonpackage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,25 @@ jobs:
strategy:
max-parallel: 4
matrix:
python-version: [3.6, 3.7, 3.8]
python-version: [3.6, 3.7, 3.8, 3.9, '3.10']

steps:
- uses: actions/checkout@v1
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install pipenv
- name: Install pipenv for python 3.6
uses: dschep/install-pipenv-action@v1
if: ${{ matrix.python-version == '3.6' }}
with:
version: 2022.4.20
- name: Install pipenv for modern python
uses: dschep/install-pipenv-action@v1
if: ${{ matrix.python-version != '3.6' }}
- name: Install dependencies
run: |
pipenv install --dev --python `which python`
pipenv install --skip-lock --dev --python `which python`
- name: Linting and formatting
run: |
# stop the build if there are Python syntax errors or undefined names
Expand All @@ -32,4 +38,5 @@ jobs:
pipenv run black --check us
- name: Test with pytest
run: |
pipenv run pytest
pipenv run pytest us/tests --timezone
DC_STATEHOOD=yes pipenv run pytest us/tests --dc-statehood
10 changes: 6 additions & 4 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,18 @@ url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]
black = "==19.10b0"
black = "22.8.0"
flake8 = "*"
importlib_metadata = {version = "*", markers = "python_version < '3.8'"}
pytest = "*"
pytz = "*"
requests = "<3.0"
geopandas = "*"
rtree = "*"
iso6709 = "*"

[packages]
jellyfish = "==0.7.2"
jellyfish = "<1.0"


[pipenv]
allow_prereleases = true
allow_prereleases = true
520 changes: 373 additions & 147 deletions Pipfile.lock

Large diffs are not rendered by default.

10 changes: 10 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,16 @@ commits to the repo. To run these tests yourself: ::
Changelog
---------

3.0.1
~~~~~

* Relax constraint on jellyfish dependency
* Add the Midway Islands as a territory
* Add the 2020 TIGER URLs to shapefile_urls() where possible
* Sync all states with the latest timezone information
* Fix bug with lookup() caching logic


3.0.0
~~~~~

Expand Down
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@
license="BSD",
packages=find_packages(),
include_package_data=True,
install_requires=["jellyfish==0.7.2"],
install_requires=["jellyfish<1.0"],
entry_points={"console_scripts": ["states = us.cli.states:main"]},
platforms=["any"],
classifiers=[
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
],
)
127 changes: 97 additions & 30 deletions us/states.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,46 +42,78 @@ def __str__(self) -> str:
return self.name

def shapefile_urls(self) -> Optional[Dict[str, str]]:
""" Shapefiles are available directly from the US Census Bureau:
https://www.census.gov/cgi-bin/geo/shapefiles/index.php
"""Shapefiles are available directly from the US Census Bureau:
https://www.census.gov/cgi-bin/geo/shapefiles/index.php
"""

fips = self.fips

if not fips:
return None

base = f"https://www2.census.gov/geo/tiger/TIGER2010/"
base = "https://www2.census.gov/geo/tiger/TIGER2020/"
base_2010 = "https://www2.census.gov/geo/tiger/TIGER2010/"
urls = {
"tract": urljoin(base, f"TRACT/2010/tl_2010_{fips}_tract10.zip"),
"cd": urljoin(base, f"CD/111/tl_2010_{fips}_cd111.zip"),
"county": urljoin(base, f"COUNTY/2010/tl_2010_{fips}_county10.zip"),
"state": urljoin(base, f"STATE/2010/tl_2010_{fips}_state10.zip"),
"zcta": urljoin(base, f"ZCTA5/2010/tl_2010_{fips}_zcta510.zip"),
"block": urljoin(base, f"TABBLOCK/2010/tl_2010_{fips}_tabblock10.zip"),
"blockgroup": urljoin(base, f"BG/2010/tl_2010_{fips}_bg10.zip"),
"tract": urljoin(base, f"TRACT/2020/tl_2020_{fips}_tract.zip"),
"block": urljoin(base, f"TABBLOCK/2020/tl_2020_{fips}_tabblock10.zip"),
"blockgroup": urljoin(base, f"BG/2020/tl_2020_{fips}_bg.zip"),
"upperchamber": urljoin(base, f"SLDU/2020/tl_2020_{fips}_sldu.zip"),
# following don't have 2020 directories yet
"cd": urljoin(base_2010, f"CD/111/tl_2010_{fips}_cd111.zip"),
"county": urljoin(base_2010, f"COUNTY/2010/tl_2010_{fips}_county10.zip"),
"state": urljoin(base_2010, f"STATE/2010/tl_2010_{fips}_state10.zip"),
"zcta": urljoin(base_2010, f"ZCTA5/2010/tl_2010_{fips}_zcta510.zip"),
}
# unicameral legislatures don't have a lower chamber
if self.abbr not in ["DC", "NE"]:
urls["lowerchamber"] = urljoin(base, f"SLDL/2020/tl_2020_{fips}_sldl.zip")

return urls


def lookup(val, field: Optional[str] = None, use_cache: bool = True) -> Optional[State]:
""" Semi-fuzzy state lookup. This method will make a best effort
attempt at finding the state based on the lookup value provided.

* two digits will search for FIPS code
* two letters will search for state abbreviation
* anything else will try to match the metaphone of state names

Metaphone is used to allow for incorrect, but phonetically accurate,
spelling of state names.

Exact matches can be done on any attribute on State objects by passing
the `field` argument. This skips the fuzzy-ish matching and does an
exact, case-sensitive comparison against the specified field.

This method caches non-None results, but can the cache can be bypassed
with the `use_cache=False` argument.
DEFAULT_ADDITIONAL_METAPHONES = {
"N HMXR": "N HMPXR", # New Hamshire -> New Hampshire
"WXNKTN TK": "TSTRKT OF KLMB", # Washington, D.C. -> District of Columbia
"WXNKTN STT": "WXNKTN", # Washington State -> Washington
"KMNWL0 OF KNTK": "KNTK", # Commonwealth of Kentucky -> Kentucky
"KMNWL0 OF MSXSTS": "MSXSTS", # Commonwealth of Massachusetts -> Massachusetts
"KMNWL0 OF PNSLFN": "PNSLFN", # Commonwealth of Pennsylvania -> Pennsylvania
"KMNWL0 OF FRJN": "FRJN", # Commonwealth of Virginia -> Virginia
"KMNWL0 OF 0 NR0RN MRN ISLNTS": "NR0RN MRN ISLNTS", # Commonwealth of the Northern Mariana Islands -> Northern Mariana Islands
"MRN ISLNTS": "NR0RN MRN ISLNTS", # Mariana Islands -> Northern Mariana Islands
"MRN ISLNT": "NR0RN MRN ISLNTS", # Mariana Island -> Northern Mariana Islands
"KMNWL0 OF PRT RK": "PRT RK", # Commonwealth of Puerto Rico -> Puerto Rico
"UNTT STTS FRJN ISLNTS": "FRJN ISLNTS", # United States Virgin Islands -> Virgin Islands
"FRJN ISLNTS OF 0 UNTT STTS": "FRJN ISLNTS", # Virgin Islands of the United States -> Virgin Islands
}


def lookup(
val,
field: Optional[str] = None,
use_cache: bool = True,
additional_metaphones: Dict[str, str] = DEFAULT_ADDITIONAL_METAPHONES,
) -> Optional[State]:
"""Semi-fuzzy state lookup. This method will make a best effort
attempt at finding the state based on the lookup value provided.

* two digits will search for FIPS code
* two letters will search for state abbreviation
* anything else will try to match the metaphone of state names

Metaphone is used to allow for incorrect, but phonetically accurate,
spelling of state names.

Exact matches can be done on any attribute on State objects by passing
the `field` argument. This skips the fuzzy-ish matching and does an
exact, case-sensitive comparison against the specified field.

This method caches non-None results, but can the cache can be bypassed
with the `use_cache=False` argument.

You can pass extra metaphones via the `additional_metaphones` argument.
Use this to catch typos or alternate names for states that defeat the
metaphone algorithm. A default set of alternatives is provided.
"""

matched_state = None
Expand All @@ -96,10 +128,13 @@ def lookup(val, field: Optional[str] = None, use_cache: bool = True) -> Optional
val = jellyfish.metaphone(val)
field = "name_metaphone"

val = additional_metaphones.get(val, val)

# see if result is in cache
cache_key = f"{field}:{val}"
if use_cache and cache_key in _lookup_cache:
matched_state = _lookup_cache[cache_key]
return matched_state

for state in STATES_AND_TERRITORIES:
if val == getattr(state, field):
Expand Down Expand Up @@ -150,7 +185,15 @@ def mapping(
"capital": "Juneau",
"capital_tz": "America/Anchorage",
"ap_abbr": "Alaska",
"time_zones": ["America/Anchorage", "America/Adak"],
"time_zones": [
"America/Anchorage",
"America/Adak",
"America/Juneau",
"America/Sitka",
"America/Metlakatla",
"America/Yakutat",
"America/Nome",
],
"name_metaphone": "ALSK",
}
)
Expand Down Expand Up @@ -416,7 +459,7 @@ def mapping(
"capital": "Boise",
"capital_tz": "America/Denver",
"ap_abbr": "Idaho",
"time_zones": ["America/Denver", "America/Los_Angeles"],
"time_zones": ["America/Denver", "America/Los_Angeles", "America/Boise"],
"name_metaphone": "ITH",
}
)
Expand Down Expand Up @@ -611,6 +654,25 @@ def mapping(
)


UM = State(
**{
"fips": "74",
"name": "Midway Islands",
"abbr": "UM",
"is_territory": True,
"is_obsolete": False,
"is_contiguous": False,
"is_continental": False,
"statehood_year": None,
"capital": None,
"capital_tz": "Pacific/Pago_Pago",
"ap_abbr": None,
"time_zones": ["Pacific/Pago_Pago"],
"name_metaphone": "MTW ISLNTS",
}
)


MI = State(
**{
"fips": "26",
Expand All @@ -624,7 +686,12 @@ def mapping(
"capital": "Lansing",
"capital_tz": "America/New_York",
"ap_abbr": "Mich.",
"time_zones": ["America/New_York", "America/Chicago"],
"time_zones": [
"America/New_York",
"America/Chicago",
"America/Detroit",
"America/Menominee",
],
"name_metaphone": "MXKN",
}
)
Expand Down
15 changes: 15 additions & 0 deletions us/tests/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
def pytest_addoption(parser):
parser.addoption(
"--timezone",
action="store_true",
dest="timezone",
default=False,
help="enable checking timezone data against IANA database",
)
parser.addoption(
"--dc-statehood",
action="store_true",
dest="dc_statehood",
default=False,
help="enable DC statehood tests (you must export DC_STATEHOOD envvar)",
)
57 changes: 57 additions & 0 deletions us/tests/test_timezones.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import io
import gzip
import tarfile

import pytest


STATES_SHAPEFILE = (
"https://www2.census.gov/geo/tiger/TIGER2020/STATE/tl_2020_us_state.zip"
)
IANA_TIMEZONES = "https://data.iana.org/time-zones/releases/tzdata2022c.tar.gz"


def timezones():
import iso6709
import requests

timezone_gz = io.BytesIO()
for chunk in requests.get(IANA_TIMEZONES).iter_content():
timezone_gz.write(chunk)
timezone_gz.seek(0)

gz_fd = gzip.open(timezone_gz)
tar_fd = tarfile.open(fileobj=gz_fd, format=tarfile.GNU_FORMAT)

for line in tar_fd.extractfile("zone.tab"):
if not line.startswith(b"US"):
continue
_, coords, tz_name, _ = line.split(b"\t")
coords = iso6709.Location(coords.decode("ASCII"))
tz_name = tz_name.decode("ASCII")

yield (tz_name, coords.lat.decimal, coords.lng.decimal)


@pytest.mark.skipif("not config.getoption('timezone')")
def test_timezone():
import us.states

import geopandas as gpd

state_df = gpd.read_file(STATES_SHAPEFILE)

timezone_df = gpd.GeoDataFrame().from_records(
timezones(), columns=["timezone", "lat", "lng"], coerce_float=True
)
timezone_df.geometry = gpd.points_from_xy(timezone_df.lng, timezone_df.lat)
timezone_df: gpd.GeoDataFrame = timezone_df.drop(columns=["lat", "lng"])
timezone_df = timezone_df.set_crs(crs="EPSG:4326") # probably?
timezone_df = timezone_df.to_crs(state_df.crs)

joined_df = gpd.sjoin(timezone_df, state_df, how="inner", op="within")

for row in joined_df[["timezone", "STATEFP"]].itertuples(index=False, name=None):
timezone_name, state_fips = row
state_obj = us.states.lookup(state_fips, field="fips")
assert timezone_name in state_obj.time_zones
Loading