-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create features on one dataset #11
Comments
@billy-odera can you provide an example of a feature you would expect to get created using just that one table? |
@kmax12 This is the dataframe
I would expect to get count.outflow_amount, mean,skew etc |
@billy-odera not sure i follow your example. if you want to calculate the mean outflows_amount per customer, you would want to create a second entity for your customers that has a relationship to a the entity with multiple rows per customer with different outflow_amounts. let me know if that's helpful or please provide a complete example of what you want to generate so I can better help. |
Yes. I encounter the same problem. |
Hi @shellwang , Can you provide us more details about your goals? As Max says, you need more related tables to extract this kind of features. |
I belefie the issue here is to understand the fundamental of Automatie ML methods, whcih is A transformation acts on a single table (thinking in terms of Python, a table is just a Pandas DataFrame ) by creating new features out of one or more of the existing columns. Like many topics in machine learning, automated feature engineering is a complicated concept built on simple ideas. read more with basic example here |
I have tried to created automated features using only one dataset but it doesnt work. Does it mean I can only use feature tools when I have two or more datasets. The code is as below:
#create entity
es = ft.EntitySet(id = 'clients')
#create entity of the dataset
es = es.entity_from_dataframe(entity_id = 'app', dataframe = data, index ='customerid')
Default primitives from featuretools
default_agg_primitives = ["sum", "std", "max", "skew", "min", "mean", "count", "percent_true", "num_unique", "mode"]
default_trans_primitives = ["day", "year", "month", "weekday", "haversine", "numwords", "characters"]
DFS with specified primitives
feature_matrix, feature_names = ft.dfs(entityset = es, target_entity = 'app',
trans_primitives = default_trans_primitives,
agg_primitives=default_agg_primitives,
max_depth = 2, features_only=False, verbose = True)
print('%d Total Features' % len(feature_names))
This returns same number of features in the dataframe. No new features created
The text was updated successfully, but these errors were encountered: