Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create features on one dataset #11

Open
billy-odera opened this issue Dec 7, 2018 · 6 comments
Open

create features on one dataset #11

billy-odera opened this issue Dec 7, 2018 · 6 comments

Comments

@billy-odera
Copy link

I have tried to created automated features using only one dataset but it doesnt work. Does it mean I can only use feature tools when I have two or more datasets. The code is as below:

#create entity
es = ft.EntitySet(id = 'clients')

#create entity of the dataset
es = es.entity_from_dataframe(entity_id = 'app', dataframe = data, index ='customerid')

Default primitives from featuretools

default_agg_primitives = ["sum", "std", "max", "skew", "min", "mean", "count", "percent_true", "num_unique", "mode"]
default_trans_primitives = ["day", "year", "month", "weekday", "haversine", "numwords", "characters"]

DFS with specified primitives

feature_matrix, feature_names = ft.dfs(entityset = es, target_entity = 'app',
trans_primitives = default_trans_primitives,
agg_primitives=default_agg_primitives,
max_depth = 2, features_only=False, verbose = True)

print('%d Total Features' % len(feature_names))

This returns same number of features in the dataframe. No new features created

@kmax12
Copy link
Contributor

kmax12 commented Dec 7, 2018

@billy-odera can you provide an example of a feature you would expect to get created using just that one table?

@billy-odera
Copy link
Author

billy-odera commented Dec 9, 2018

@kmax12 This is the dataframe

customerid  age	   outflows_amout  inflows_amount
1	            28.00	                 0                  355.00	
2	            72.00	                 1	             240.00	
3	            22.00	                 6	              nan

I would expect to get count.outflow_amount, mean,skew etc

@kmax12
Copy link
Contributor

kmax12 commented Dec 9, 2018

@billy-odera not sure i follow your example. if you want to calculate the mean outflows_amount per customer, you would want to create a second entity for your customers that has a relationship to a the entity with multiple rows per customer with different outflow_amounts. let me know if that's helpful or please provide a complete example of what you want to generate so I can better help.

@shellwang
Copy link

Yes. I encounter the same problem.

@bukosabino
Copy link
Contributor

Hi @shellwang ,

Can you provide us more details about your goals?

As Max says, you need more related tables to extract this kind of features.

@turkialjrees
Copy link

turkialjrees commented Feb 3, 2019

I belefie the issue here is to understand the fundamental of Automatie ML methods, whcih is A transformation acts on a single table (thinking in terms of Python, a table is just a Pandas DataFrame ) by creating new features out of one or more of the existing columns. Like many topics in machine learning, automated feature engineering is a complicated concept built on simple ideas.
Using concepts of entitysets, entities, and relationships, featuretools can perform deep feature synthesis to create new features.
Deep feature synthesis in turn stacks feature primitives — aggregations, which act across a one-to-many relationship between tables, and transformations, functions applied to one or more columns in a single table — to build new features from multiple tables.

read more with basic example here
https://towardsdatascience.com/automated-feature-engineering-in-python-99baf11cc219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants