Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests passing, merge with devel_python3 #4

Open
wants to merge 34 commits into
base: master
Choose a base branch
from

Conversation

troycomi
Copy link
Contributor

@troycomi troycomi commented Apr 9, 2019

Had to revert some argument parsing to make unit test passing faster. Will work on replacing arg parsing with click/yaml.

But is ~10x slower, likely due to seeking through file
Changed the implementation of Region_Reader to yield headers and seqs
This manages to cut the memory (somehow from 60 to 1 MB) and the runtime
from 2 minutes to 10 seconds (last commit at 13 minutes).
Changed filter2 to use numpy and the new yielded regions
Execution is ~10s and uses 1.5 MB memory
In addition to floating point precision differences, noticed difference
in sorting of alt ids when the values are equal going form python 2 and
3.  Handled differences in comparison scripts which format to 10 digits
and sort ids when values are equal.
Changed operation of threshold scan to limit number of read throughs of
the region files.
Limited changes as original method was fairly fast
Conflicts:
	code/analyze/filter_1_main.py
	code/analyze/filter_2_main.py
	code/analyze/filter_helpers.py
	code/analyze/id_regions_main.py
	code/analyze/predict.py
	code/analyze/predict_main.py
	code/analyze/summarize_region_quality.py

Unit tests running, but failing on new implementation of args/gp
Mocked and encoded constants enough to get things passing
Added a yaml version of the config to replace global params and setup
args files.  Helper script clean_config performs lookups of referenced
entries in config.  Have not added to main methods yet.
Started moving predict methods into a Predictor class to
clean up the long argument lists for performing a prediction.

Need to test new code and get old tests passing again with minimal
object
Changed the implementation of predict_main.py into click with support
for the new yaml configuration file.  Refactored predict into two main
objects to simplify the main code.

Added a README.md
Modified behavior with missing files to match previous implementation
(continue).  Currently matching original implementation on chromosome 1.
Added log file option and config.

When set, a progress bar is displayed on the console with click.
Added class for adding region ids and integrated with the main click
method.
Created new class to hold configuration and handle setting logic, as
that was heavily reused between main methods.
Move the validation code to the corresponding main objects
Seeing that positions is required for summarize, went back and made that
required.
Refactored summarize region quality main and supporting module with
cleaner implementation.  Added to main click method and all supporting
unit tests.
Combined two filter steps, helper functions, and threshold sweep into a
single file for further refactoring.
Continued refactoring of main methods onto filtering.  Part of the
changes saw a modification to the configuration object to simplify
setting code into a more uniform interface.  Have started checking
flake8 on entire project, fixing occasionally.
Finished refactor and testing of summarize_strain_states
Finished formatting code consistent with FLAKE8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant