# 关于机器学习中特征工程的一些实战经验与可直接利用代码的分享 原

WYF_DATA

## 特征选择（分为两类，一类根据自身信息选择，一类借助模型选择）

### 1.根据特征自身信息方差选择

From  sklearn.feature_selection  import  VarianceThreshold

threshold = 0.90

vt = VarianceThreshold().fit(X)

# Find feature names

feat_var_threshold = data.columns[vt.variances_ > threshold * (1-threshold)]

print(feat_var_threshold)

### 2.根据模型算法来选择特征，例如使用RF

model = RandomForestClassifier()

model.fit(X, Y)

feature_imp=pd.DataFrame(model.feature_importances_,index=X.columns,columns=["importance"])

print(feat_imp_20)

names=list(x_train.columns)

# sort importances

indices = np.argsort(model.feature_importances_)

# plot as bar chart

plt.barh(np.arange(len(names)), model.feature_importances_[indices])

plt.yticks(np.arange(len(names)) + 0.25, np.array(names)[indices])

_ = plt.xlabel('Relative importance')

#plt.show()

### 3.通过SelectKBest  chi2 test来选择特征，但是特征的取值必须为正

from sklearn.feature_selection import VarianceThreshold, RFE, SelectKBest, chi2

from sklearn.preprocessing import MinMaxScaler

X_minmax = MinMaxScaler(feature_range=(0,1)).fit_transform(X)

X_scored = SelectKBest(score_func=chi2, k='all').fit(X_minmax, Y)

feature_scoring = pd.DataFrame({'feature': X.columns, 'score': X_scored.scores_})

print(feat_scored_20)

### 4.应用某一种模型来选择特征

from sklearn.feature_selection import SelectFromModel

from sklearn.linear_model import LassoCV

X, y = boston['data'], boston['target']

# We use the base estimator LassoCV since the L1 norm promotes sparsity of features.

clf = LassoCV()

# Set a minimum threshold of 0.25

sfm = SelectFromModel(clf, threshold=0.25)

sfm.fit(X, y)

n_features = sfm.transform(X).shape[1]

1.          # Reset the threshold till the number of features equals two.

# Note that the attribute can be set directly instead of repeatedly

# fitting the metatransformer.

while n_features > 2:

sfm.threshold += 0.1

X_transform = sfm.transform(X)

n_features = X_transform.shape[1]

# Plot the selected two features from X.

plt.title(

"Features selected from Boston using SelectFromModel with "

"threshold %0.3f." % sfm.threshold)

feature1 = X_transform[:, 0]

feature2 = X_transform[:, 1]

plt.plot(feature1, feature2, 'r.')

plt.xlabel("Feature number 1")

plt.ylabel("Feature number 2")

plt.ylim([np.min(feature2), np.max(feature2)])

plt.show()

### 5. RFE with model

from sklearn.feature_selection import RFE

rfe = RFE(LogisticRegression(), 20)

rfe.fit(X, Y)

feature_rfe_scoring = pd.DataFrame({

'feature': X.columns,

'score': rfe.ranking_

})

feat_rfe_20 = feature_rfe_scoring[feature_rfe_scoring['score'] == 1]['feature'].values

print(feat_rfe_20)

### 6.稳定性特征选择方法

from sklearn.linear_model import RandomizedLogisticRegression as RLR

Model=RLR(C=1, scaling=0.5,

sample_fraction=0.75,

n_resampling=200, selection_threshold=0.25)

RLR.fit(x,y)

Print(RLR.get_support())

### 7.将各个方法选出来的特征融合在一起

features = np.hstack([

feat_var_threshold,

feat_imp_20,

feat_scored_20,

feat_rfe_20

])

features = np.unique(features)

print(features)

6.特征评估

### WYF_DATA

stayfoolish_fan
05/13
0
0

07/17
0
0

0
0

07/18
0
0

eva7
07/17
0
0

06/03
0
0

sanman
04/24
0
0

每天能留给学习的时间不多，当入门一个新技术的时候，多么希望学到的每一个字都能立马派上用场，所以我们会偏向选择那些可以“速成”但学完依旧没有什么卵用的技能，对于可以提升整...

07/17
0
0
Web开发工程师转型机器学习的实战经验

07/12
0
0

tw6cy6ukydea86z
05/08
0
0

36分钟前
0
0
SpringBoot | 第十章：Swagger2的集成和使用

oKong

9
0
Python 最小二乘法 拟合 二次曲线

Python 二次拟合 随机生成数据，并且加上噪声干扰 构造需要拟合的函数形式，使用最小二乘法进行拟合 输出拟合后的参数 将拟合后的函数与原始数据绘图后进行对比 import numpy as npimport...

7
0

1
0
Java设计模式学习之工厂模式

194
1
npm profile 新功能介绍

durban

1
0
Serial2Ethernet Bi-redirection

Serial Tool Serial Tool is a utility for developing serial communications, custom protocols or device testing. You can set up bytes to send accordingly to your protocol and save......

zungyiu

1
0
python里求解物理学上的双弹簧质能系统

wangxuwei

0
0
apolloxlua 介绍

##项目介绍 apolloxlua 目前支持javascript到lua的翻译。可以在openresty和luajit里使用。这个工具分为两种模式， 一种是web模式，可以通过网页使用。另外一种是tool模式， 通常作为大规模翻...

2
0
Mybatis入门

2
0