-
Notifications
You must be signed in to change notification settings - Fork 2
Updating PreparererStep, creating PreparerMapping. CR changes to DataCleaner. #111
Conversation
…adow internal workflow.
… DataPreparer; 3. DataCleaner functionality
…plementation testing its usage. Merge remote-tracking branch 'remotes/origin/development' into data_cleaner # Conflicts: # foreshadow/exceptions.py # foreshadow/transformers/base.py # pytest.ini
Merge in latest development.
Data Preparer and PreparerSteps base class and Data Cleaner
Merging in config from development. Merge branch 'development' into data_preparer # Conflicts: # foreshadow/core/serializers.py # foreshadow/tests/test_transformers/test_smart.py # pytest.ini
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments.
data = pd.DataFrame({"financials": np.arange(10)}, columns=columns) | ||
cs = ColumnSharer() | ||
dc = DataCleaner(column_sharer=cs) | ||
transformed_data = dc.fit_transform(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might you be able to test them in the same test if they require the same setup (I know I know it breaks my cardinal rule but creating a fixture isn't worth the overhead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look at it.
…hanges. # Conflicts: # foreshadow/cleaners/data_cleaner.py # foreshadow/cleaners/internals/__init__.py # foreshadow/core/__init__.py # foreshadow/core/preparerstep.py # foreshadow/core/smarttransformer.py # foreshadow/core/wrapper.py # foreshadow/foreshadow.py # foreshadow/intents/registry.py # foreshadow/preprocessor.py # foreshadow/tests/test_transformers/test_transformers.py # foreshadow/transformers/concrete/externals.py # foreshadow/transformers/concrete/internals/__init__.py # foreshadow/transformers/core/pipeline.py # foreshadow/transformers/smart/smart.py
Included: tests skipped or changed. Some left failing to change as we integrate DataPreparer. V1 components removed V2 file structure in place with proper import system (some small changes still to be made).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments lets chat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments.
foreshadow/intents/__init__.py
Outdated
@@ -0,0 +1,9 @@ | |||
"""Intents package used by IntentMapper PreparerStep.""" | |||
from foreshadow.intents.base import BaseIntent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be relative
@@ -0,0 +1 @@ | |||
"""old stuff just kept for easier referencing.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure to ignore this folder in coverage
foreshadow/preparer/preparer.py
Outdated
@@ -2,9 +2,9 @@ | |||
|
|||
from sklearn.pipeline import Pipeline | |||
|
|||
from foreshadow.intents import IntentMapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be in steps
foreshadow/preparer/preparerstep.py
Outdated
@@ -137,9 +137,12 @@ def _check_parallelizable_batch(group_process, group_number): | |||
""" | |||
if group_process.single_input is not None: | |||
inputs = group_process.single_input | |||
steps = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the None option to the docs
@@ -7,12 +7,12 @@ def test_resolver_overall(): | |||
import numpy as np | |||
import pandas as pd | |||
from foreshadow.preparer.column_sharer import ColumnSharer | |||
from foreshadow.preparer.steps.resolver import ResolverMapper | |||
from foreshadow.preparer.steps.mapper import IntentMapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just needs to be imported from preparer.steps
@@ -44,6 +44,8 @@ def test_data_cleaner_fit(): | |||
"financials", | |||
], | |||
) | |||
print(data.values) | |||
print(check.values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these prints
from foreshadow.preparer.steps import CleanerMapper | ||
from foreshadow.preparer.column_sharer import ColumnSharer | ||
from foreshadow.preparer import CleanerMapper | ||
from foreshadow.preparer import ColumnSharer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
from foreshadow.preparer.steps import CleanerMapper | ||
from foreshadow.preparer.column_sharer import ColumnSharer | ||
from foreshadow.preparer import CleanerMapper | ||
from foreshadow.preparer import ColumnSharer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire file
from foreshadow.concrete.internals import FixedTfidfVectorizer | ||
from foreshadow.concrete.internals import HTMLRemover | ||
from foreshadow.concrete import FixedTfidfVectorizer | ||
from foreshadow.concrete import HTMLRemover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same line
"""Get the cache path which is in the config directory. | ||
|
||
Note: | ||
This function also makes the directory if it does not already exist. | ||
|
||
Args: | ||
path (str): A path to override the cache save directory path. | ||
|
||
Returns: | ||
str; The path to the cache directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a colon not semi-colon
Description
Fixes #109