Skip to content

Latest commit

 

History

History
145 lines (54 loc) · 2.1 KB

特征选择.md

File metadata and controls

145 lines (54 loc) · 2.1 KB

特征选择

互信息

from sklearn.feature_selection import SelectKBest, f_regression,mutual_info_regression
from sklearn.datasets import load_boston

boston = load_boston()
print('Boston data shape: ', boston.data.shape)

selector = SelectKBest(mutual_info_regression)
#X_new = selector.fit_transform(boston.data, boston.target)
X_new = selector.fit_transform(boston_x, boston_y)
print('Filtered Boston data shape:', X_new.shape)
print('F-Scores:', selector.scores_)
selector.get_support()

使用shap计算特征贡献率

https://blog.csdn.net/jin_tmac/article/details/106099218

https://blog.csdn.net/demm868/article/details/109523717

https://zhuanlan.zhihu.com/p/101352812?utm_source=qq

架构好文【马东】

https://zhuanlan.zhihu.com/p/96420594

时间序列预测中的数据顺序和训练集数据完整性

对抗检验

https://blog.csdn.net/caoyuan666/article/details/106223344/

多种训练集和验证集划分方法

https://blog.csdn.net/whybehere/article/details/108192957

shap

http://sofasofa.io/tutorials/shap_xgboost/

1.单个样本中每个特征的贡献,shap values,是正向还是负向

2.在特征总体的分析

3.多个变量的交互作用

4.部分依赖图(Partial Dependence Plot、pdpbox)

PI 排序重要性

Permutation Importance

SHAP 值、

置换重要性(permutaion importance)、

删除和重新学习方法(drop-and-relearn approach)

ELI5库可以进行Permutation Importance的计算

https://wanpingdou.blog.csdn.net/article/details/106813825

import eli5
from eli5.sklearn import PermutationImportance

perm = PermutationImportance(my_model, random_state=1).fit(val_X, val_y)
eli5.show_weights(perm, feature_names = val_X.columns.tolist())

kaggle 特征重要性 featexp

https://blog.csdn.net/weixin_41814051/article/details/104300961

https://blog.csdn.net/Datawhale/article/details/103169719

特征泄露与数据泄露

数据穿越

kaggle shake up 分析

https://zhuanlan.zhihu.com/p/68381175

https://zhuanlan.zhihu.com/p/64473570

分类任务中的类别特征

高维稀疏