Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive search for custom.py can lead to errors #605

Open
kindofluke opened this issue Mar 25, 2022 · 1 comment
Open

Recursive search for custom.py can lead to errors #605

kindofluke opened this issue Mar 25, 2022 · 1 comment
Assignees

Comments

@kindofluke
Copy link

kindofluke commented Mar 25, 2022

What: Within the model adapter, DRUM asserts that there can be only one custom.py file within the code folder.

Here is the snippet at line 139:

    def load_custom_hooks(self):
        custom_file_paths = list(Path(self._model_dir).rglob("{}.py".format(CUSTOM_FILE_NAME)))
        assert len(custom_file_paths) <= 1

the problem lies with the rglob function on a path. The full recursive search for custom.py means the the system will search. all subdirectories in the code folder and find any custom.py file. Several major python packages include a custom.py for other reasons and those files are found within the virtual environment folder.

When multiple custom.py files are found, the assertion on line 139 will fail throwing an error.

Example

The openpyxl has several files named custom.py and those all get captured if your virtual environment is part of the code folder.

Here is a look at custom_file_paths in a debugger.
venv_custompy

The assertion will then fail.

I believe that I adopt a fairly common practice of having my virtual environment in my code folder and adding the environment to .gitignore

solution

I think changing rglob to glob will fix this issue. I have tested this locally but I am not able to install all the prerequisites for running the test suite so I can't submit a branch.

@drenfr01 drenfr01 self-assigned this Mar 25, 2022
@kindofluke
Copy link
Author

I was thinking about this more and there may one unintended consequence. I think DRUM was set-up so that users could have kind of a model mono-repo such that they could have a lot of different code and maybe have a subfolder that had a custom.py for datarobot. So consider this structure:

really_cool_predictor/
├─ webapp/
│  ├─ flask.py
├─ model_development.ipynb
├─ artifact.pkl
├─ datarobot-drum/
│  ├─ custom.py

In this case, DRUM won't find the custom.py without using rglob. A couple of ideas,

  1. look for a gitignore file and use the gitignore rules to find the file. This would be cool but would require some work
  2. use the first custom.py entry. The return from rglob is actually ordered. so a custom.py in the main code folder or its sub-folders would always appear before custom.py files buried deep in venv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants