Skip to content

Commit

Permalink
Merge pull request #13 from NREL/pp/ords
Browse files Browse the repository at this point in the history
ELM-based ordinance retrieval and extraction
  • Loading branch information
ppinchuk authored May 1, 2024
2 parents d9d9791 + 8c32394 commit 42e9ed6
Show file tree
Hide file tree
Showing 91 changed files with 17,864 additions and 134 deletions.
29 changes: 29 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
[run]
branch = True

[report]
# Regexes for lines to exclude from consideration
exclude_lines =
# Have to re-enable the standard pragma
pragma: no cover

# Don't complain about missing debug-only code:
def __repr__
if self\.debug

# Don't complain if tests don't hit defensive assertion code:
raise AssertionError
raise NotImplementedError

# Don't complain if non-runnable code isn't run:
if __name__ == .__main__.:

# Don't complain about abstract methods, they aren't run:
@(abc\.)?abstractmethod


omit =
# omit test files
tests/*
# omit setup file
setup.py
18 changes: 4 additions & 14 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ jobs:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ['3.10']
python-version: [3.11]
include:
- os: ubuntu-latest
python-version: 3.9
python-version: '3.10'
- os: ubuntu-latest
python-version: 3.8
python-version: 3.9

steps:
- uses: actions/checkout@v2
Expand All @@ -34,14 +34,4 @@ jobs:
python -m pip install .
- name: Run pytest and Generate coverage report
run: |
python -m pytest -v --disable-warnings --cov=./ --cov-report=xml:coverage.xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: ./coverage.xml
flags: unittests
env_vars: OS,PYTHON
name: codecov-umbrella
fail_ci_if_error: false
verbose: true
python -m pytest --ignore=tests/ords --ignore=tests/utilities --ignore=tests/web -v --disable-warnings
49 changes: 49 additions & 0 deletions .github/workflows/pytest_ords.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: pytests-ords

on: pull_request

jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: [3.11]

steps:
- uses: actions/checkout@v2
with:
ref: ${{ github.event.pull_request.head.ref }}
fetch-depth: 1
- name: Set up Python ${{ matrix.python-version }}
uses: conda-incubator/setup-miniconda@v2
with:
auto-update-conda: true
python-version: ${{ matrix.python-version }}
miniconda-version: "latest"
- name: Install dependencies'
shell: bash -l {0}
run: |
conda install -c conda-forge poppler
python -m pip install --upgrade pip
python -m pip install pdftotext
python -m pip install pytest
python -m pip install pytest-mock
python -m pip install pytest-cov
python -m pip install .
playwright install
- name: Run pytest and Generate coverage report
shell: bash -l {0}
run: |
python -m pytest -v --disable-warnings --cov=./ --cov-report=xml:coverage.xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: ./coverage.xml
flags: unittests
env_vars: OS,PYTHON
name: codecov-umbrella
fail_ci_if_error: false
verbose: true
3 changes: 3 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ Installing ELM

.. inclusion-install
NOTE: If you are installing ELM to run ordinance scraping and extraction,
see the `ordinance-specific installation instructions <https://github.com/NREL/elm/blob/main/elm/ords/README.md>`_.

Option #1 (basic usage):

#. ``pip install NREL-elm``
Expand Down
8 changes: 8 additions & 0 deletions docs/source/_cli/cli.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. _cli-docs:

Command Line Interfaces (CLIs)
==============================

.. toctree::

elm
3 changes: 3 additions & 0 deletions docs/source/_cli/elm.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.. click:: elm.cli:main
:prog: elm
:nested: full
2 changes: 2 additions & 0 deletions docs/source/examples.ordinance_gpt.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.. include:: ../../examples/ordinance_gpt/README.rst
:start-line: 0
1 change: 1 addition & 0 deletions docs/source/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ Examples
.. toctree::

examples.energy_wizard.rst
examples.ordinance_gpt.rst
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,6 @@
Installation <installation.rst>
Examples <examples.rst>
API reference <_autosummary/elm>
CLI reference <_cli/cli>

.. include:: ../../README.rst
47 changes: 47 additions & 0 deletions elm/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# -*- coding: utf-8 -*-
# fmt: off
"""ELM Ordinances CLI."""
import sys
import json
import click
import asyncio
import logging

from elm.version import __version__
from elm.ords.process import process_counties_with_openai


@click.group()
@click.version_option(version=__version__)
@click.pass_context
def main(ctx):
"""ELM ordinances command line interface."""
ctx.ensure_object(dict)


@main.command()
@click.option("--config", "-c", required=True, type=click.Path(exists=True),
help="Path to ordinance configuration JSON file. This file "
"should contain any/all the arguments to pass to "
":func:`elm.ords.process.process_counties_with_openai`.")
@click.option("-v", "--verbose", is_flag=True,
help="Flag to show logging on the terminal. Default is not "
"to show any logs on the terminal.")
def ords(config, verbose):
"""Download and extract ordinances for a list of counties."""
with open(config, "r") as fh:
config = json.load(fh)

if verbose:
logger = logging.getLogger("elm")
logger.addHandler(logging.StreamHandler(stream=sys.stdout))
logger.setLevel(config.get("log_level", "INFO"))

# asyncio.run(...) doesn't throw exceptions correctly for some reason...
loop = asyncio.get_event_loop()
loop.run_until_complete(process_counties_with_openai(**config))


if __name__ == "__main__":
# pylint: disable=no-value-for-parameter
main(obj={})
10 changes: 10 additions & 0 deletions elm/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# -*- coding: utf-8 -*-
"""Custom Exceptions and Errors for ELM. """


class ELMError(Exception):
"""Generic ELM Error."""


class ELMRuntimeError(ELMError, RuntimeError):
"""ELM RuntimeError."""
27 changes: 27 additions & 0 deletions elm/ords/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Welcome to Energy Language Model - OrdinanceGPT

The ordinance web scraping and data extraction portion of this codebase required a few extra dependencies that do not come out-of-the-box with the base ELM software.
To set up ELM for ordinances, first create a conda environment. Then, _before installing ELM_, run the poppler installation:

$ conda install -c conda-forge poppler

Then, install `pdftotext`:

$ pip install pdftotext

(OPTIONAL) If you want to have access to Optical Character Recognition (OCR) for PDF parsing, you should also install pytesseract during this step:

$ pip install pytesseract pdf2image

At this point, you can install ELM per the [front-page README](https://github.com/NREL/elm/blob/main/README.rst) instructions, e.g.:

$ pip install -e .

After ELM installs successfully, you must instantiate the playwright module, which is used for web scraping.
To do so, simply run:

$ playwright install

Now you are ready to run ordinance retrieval and extraction. See the [example](https://github.com/NREL/elm/blob/main/examples/ordinance_gpt/README.rst) to get started. If you get additional import errors, just install additional packages as necessary, e.g.:

$ pip install beautifulsoup4 html5lib
1 change: 1 addition & 0 deletions elm/ords/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""ELM ordinance document download and structured data extraction. """
Loading

0 comments on commit 42e9ed6

Please sign in to comment.