-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #13 from NREL/pp/ords
ELM-based ordinance retrieval and extraction
- Loading branch information
Showing
91 changed files
with
17,864 additions
and
134 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
[run] | ||
branch = True | ||
|
||
[report] | ||
# Regexes for lines to exclude from consideration | ||
exclude_lines = | ||
# Have to re-enable the standard pragma | ||
pragma: no cover | ||
|
||
# Don't complain about missing debug-only code: | ||
def __repr__ | ||
if self\.debug | ||
|
||
# Don't complain if tests don't hit defensive assertion code: | ||
raise AssertionError | ||
raise NotImplementedError | ||
|
||
# Don't complain if non-runnable code isn't run: | ||
if __name__ == .__main__.: | ||
|
||
# Don't complain about abstract methods, they aren't run: | ||
@(abc\.)?abstractmethod | ||
|
||
|
||
omit = | ||
# omit test files | ||
tests/* | ||
# omit setup file | ||
setup.py |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
name: pytests-ords | ||
|
||
on: pull_request | ||
|
||
jobs: | ||
build: | ||
runs-on: ${{ matrix.os }} | ||
strategy: | ||
fail-fast: false | ||
matrix: | ||
os: [ubuntu-latest, macos-latest, windows-latest] | ||
python-version: [3.11] | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
with: | ||
ref: ${{ github.event.pull_request.head.ref }} | ||
fetch-depth: 1 | ||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: conda-incubator/setup-miniconda@v2 | ||
with: | ||
auto-update-conda: true | ||
python-version: ${{ matrix.python-version }} | ||
miniconda-version: "latest" | ||
- name: Install dependencies' | ||
shell: bash -l {0} | ||
run: | | ||
conda install -c conda-forge poppler | ||
python -m pip install --upgrade pip | ||
python -m pip install pdftotext | ||
python -m pip install pytest | ||
python -m pip install pytest-mock | ||
python -m pip install pytest-cov | ||
python -m pip install . | ||
playwright install | ||
- name: Run pytest and Generate coverage report | ||
shell: bash -l {0} | ||
run: | | ||
python -m pytest -v --disable-warnings --cov=./ --cov-report=xml:coverage.xml | ||
- name: Upload coverage to Codecov | ||
uses: codecov/codecov-action@v1 | ||
with: | ||
token: ${{ secrets.CODECOV_TOKEN }} | ||
file: ./coverage.xml | ||
flags: unittests | ||
env_vars: OS,PYTHON | ||
name: codecov-umbrella | ||
fail_ci_if_error: false | ||
verbose: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
.. _cli-docs: | ||
|
||
Command Line Interfaces (CLIs) | ||
============================== | ||
|
||
.. toctree:: | ||
|
||
elm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
.. click:: elm.cli:main | ||
:prog: elm | ||
:nested: full |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
.. include:: ../../examples/ordinance_gpt/README.rst | ||
:start-line: 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,4 @@ Examples | |
.. toctree:: | ||
|
||
examples.energy_wizard.rst | ||
examples.ordinance_gpt.rst |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
# -*- coding: utf-8 -*- | ||
# fmt: off | ||
"""ELM Ordinances CLI.""" | ||
import sys | ||
import json | ||
import click | ||
import asyncio | ||
import logging | ||
|
||
from elm.version import __version__ | ||
from elm.ords.process import process_counties_with_openai | ||
|
||
|
||
@click.group() | ||
@click.version_option(version=__version__) | ||
@click.pass_context | ||
def main(ctx): | ||
"""ELM ordinances command line interface.""" | ||
ctx.ensure_object(dict) | ||
|
||
|
||
@main.command() | ||
@click.option("--config", "-c", required=True, type=click.Path(exists=True), | ||
help="Path to ordinance configuration JSON file. This file " | ||
"should contain any/all the arguments to pass to " | ||
":func:`elm.ords.process.process_counties_with_openai`.") | ||
@click.option("-v", "--verbose", is_flag=True, | ||
help="Flag to show logging on the terminal. Default is not " | ||
"to show any logs on the terminal.") | ||
def ords(config, verbose): | ||
"""Download and extract ordinances for a list of counties.""" | ||
with open(config, "r") as fh: | ||
config = json.load(fh) | ||
|
||
if verbose: | ||
logger = logging.getLogger("elm") | ||
logger.addHandler(logging.StreamHandler(stream=sys.stdout)) | ||
logger.setLevel(config.get("log_level", "INFO")) | ||
|
||
# asyncio.run(...) doesn't throw exceptions correctly for some reason... | ||
loop = asyncio.get_event_loop() | ||
loop.run_until_complete(process_counties_with_openai(**config)) | ||
|
||
|
||
if __name__ == "__main__": | ||
# pylint: disable=no-value-for-parameter | ||
main(obj={}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# -*- coding: utf-8 -*- | ||
"""Custom Exceptions and Errors for ELM. """ | ||
|
||
|
||
class ELMError(Exception): | ||
"""Generic ELM Error.""" | ||
|
||
|
||
class ELMRuntimeError(ELMError, RuntimeError): | ||
"""ELM RuntimeError.""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Welcome to Energy Language Model - OrdinanceGPT | ||
|
||
The ordinance web scraping and data extraction portion of this codebase required a few extra dependencies that do not come out-of-the-box with the base ELM software. | ||
To set up ELM for ordinances, first create a conda environment. Then, _before installing ELM_, run the poppler installation: | ||
|
||
$ conda install -c conda-forge poppler | ||
|
||
Then, install `pdftotext`: | ||
|
||
$ pip install pdftotext | ||
|
||
(OPTIONAL) If you want to have access to Optical Character Recognition (OCR) for PDF parsing, you should also install pytesseract during this step: | ||
|
||
$ pip install pytesseract pdf2image | ||
|
||
At this point, you can install ELM per the [front-page README](https://github.com/NREL/elm/blob/main/README.rst) instructions, e.g.: | ||
|
||
$ pip install -e . | ||
|
||
After ELM installs successfully, you must instantiate the playwright module, which is used for web scraping. | ||
To do so, simply run: | ||
|
||
$ playwright install | ||
|
||
Now you are ready to run ordinance retrieval and extraction. See the [example](https://github.com/NREL/elm/blob/main/examples/ordinance_gpt/README.rst) to get started. If you get additional import errors, just install additional packages as necessary, e.g.: | ||
|
||
$ pip install beautifulsoup4 html5lib |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
"""ELM ordinance document download and structured data extraction. """ |
Oops, something went wrong.