Skip to content
This repository has been archived by the owner on Jan 9, 2024. It is now read-only.

Updating PreparererStep, creating PreparerMapping. CR changes to DataCleaner. #111

Merged
merged 57 commits into from
Aug 2, 2019

Conversation

cchoquette
Copy link
Contributor

@cchoquette cchoquette commented Jul 30, 2019

Description

Fixes #109

cchoquette and others added 30 commits July 11, 2019 13:16
…plementation testing its usage.

Merge remote-tracking branch 'remotes/origin/development' into data_cleaner

# Conflicts:
#	foreshadow/exceptions.py
#	foreshadow/transformers/base.py
#	pytest.ini
Merge in latest development.
Data Preparer and PreparerSteps base class and Data Cleaner
Merging in config from development.
Merge branch 'development' into data_preparer

# Conflicts:
#	foreshadow/core/serializers.py
#	foreshadow/tests/test_transformers/test_smart.py
#	pytest.ini
Copy link
Contributor

@adithyabsk adithyabsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

data = pd.DataFrame({"financials": np.arange(10)}, columns=columns)
cs = ColumnSharer()
dc = DataCleaner(column_sharer=cs)
transformed_data = dc.fit_transform(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might you be able to test them in the same test if they require the same setup (I know I know it breaks my cardinal rule but creating a fixture isn't worth the overhead)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at it.

foreshadow/transformers/internals/__init__.py Outdated Show resolved Hide resolved
…hanges.

# Conflicts:
#	foreshadow/cleaners/data_cleaner.py
#	foreshadow/cleaners/internals/__init__.py
#	foreshadow/core/__init__.py
#	foreshadow/core/preparerstep.py
#	foreshadow/core/smarttransformer.py
#	foreshadow/core/wrapper.py
#	foreshadow/foreshadow.py
#	foreshadow/intents/registry.py
#	foreshadow/preprocessor.py
#	foreshadow/tests/test_transformers/test_transformers.py
#	foreshadow/transformers/concrete/externals.py
#	foreshadow/transformers/concrete/internals/__init__.py
#	foreshadow/transformers/core/pipeline.py
#	foreshadow/transformers/smart/smart.py
Included: tests skipped or changed. Some left failing to change as we integrate DataPreparer.
V1 components removed
V2 file structure in place with proper import system (some small changes still to be made).
Copy link
Contributor

@adithyabsk adithyabsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments lets chat.

foreshadow/console.py Show resolved Hide resolved
foreshadow/foreshadow.py Show resolved Hide resolved
foreshadow/metrics.py Show resolved Hide resolved
foreshadow/optimizers/__init__.py Outdated Show resolved Hide resolved
foreshadow/preparer/pipeline.py Outdated Show resolved Hide resolved
foreshadow/concrete/internals/__init__.py Show resolved Hide resolved
foreshadow/concrete/externals.py Show resolved Hide resolved
foreshadow/concrete/externals.py Outdated Show resolved Hide resolved
foreshadow/concrete/__init__.py Outdated Show resolved Hide resolved
Copy link
Contributor

@adithyabsk adithyabsk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments.

@@ -0,0 +1,9 @@
"""Intents package used by IntentMapper PreparerStep."""
from foreshadow.intents.base import BaseIntent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be relative

@@ -0,0 +1 @@
"""old stuff just kept for easier referencing."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to ignore this folder in coverage

@@ -2,9 +2,9 @@

from sklearn.pipeline import Pipeline

from foreshadow.intents import IntentMapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be in steps

@@ -137,9 +137,12 @@ def _check_parallelizable_batch(group_process, group_number):
"""
if group_process.single_input is not None:
inputs = group_process.single_input
steps = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the None option to the docs

@@ -7,12 +7,12 @@ def test_resolver_overall():
import numpy as np
import pandas as pd
from foreshadow.preparer.column_sharer import ColumnSharer
from foreshadow.preparer.steps.resolver import ResolverMapper
from foreshadow.preparer.steps.mapper import IntentMapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just needs to be imported from preparer.steps

@@ -44,6 +44,8 @@ def test_data_cleaner_fit():
"financials",
],
)
print(data.values)
print(check.values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these prints

from foreshadow.preparer.steps import CleanerMapper
from foreshadow.preparer.column_sharer import ColumnSharer
from foreshadow.preparer import CleanerMapper
from foreshadow.preparer import ColumnSharer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

from foreshadow.preparer.steps import CleanerMapper
from foreshadow.preparer.column_sharer import ColumnSharer
from foreshadow.preparer import CleanerMapper
from foreshadow.preparer import ColumnSharer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire file

from foreshadow.concrete.internals import FixedTfidfVectorizer
from foreshadow.concrete.internals import HTMLRemover
from foreshadow.concrete import FixedTfidfVectorizer
from foreshadow.concrete import HTMLRemover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same line

"""Get the cache path which is in the config directory.

Note:
This function also makes the directory if it does not already exist.

Args:
path (str): A path to override the cache save directory path.

Returns:
str; The path to the cache directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a colon not semi-colon

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants