# python：常用的几种预处理方法

2017/05/08 13:28

#### 标准化

` `
1. `from sklearn.preprocessing import scale`
2. `X = np.array([[ 1., -1., 2.],[ 2., 0., 0.],[ 0., 1., -1.]])`
3. `scale(X)`

` `
1. `from sklearn.preprocessing import StandardScaler`
2. `scaler = StandardScaler().fit(train)`
3. `scaler.transform(train)`
4. `scaler.transform(test)`

#### 最小-最大规范化

` `
1. `min_max_scaler = sklearn.preprocessing.MinMaxScaler()`
2. `min_max_scaler.fit_transform(X_train)`

#### 规范化:正则化

` `
1. `X = [[ 1, -1, 2],[ 2, 0, 0], [ 0, 1, -1]]`
2. `sklearn.preprocessing.normalize(X, norm='l2')`
` `
1. `array([[ 0.40, -0.40, 0.81], [ 1, 0, 0], [ 0, 0.70, -0.70]])`

#### 特征二值化

` `
1. `binarizer = sklearn.preprocessing.Binarizer(threshold=1.1)`
2. `binarizer.transform(X)`

#### 标签二值化

` `
1. `from sklearn import preprocessing`
2. `lb = preprocessing.LabelBinarizer()`
3. `lb.fit([1, 2, 6, 4, 2])`
4. `lb.classes_`
5. `array([1, 2, 4, 6])`
6. `lb.transform([1, 6])#必须[1, 2, 6, 4, 2]里面`
7. `array([[1, 0, 0, 0],`
8. `[0, 0, 0, 1]])`

#### 类别特征编码

` `
1. `enc = preprocessing.OneHotEncoder()`
2. `enc.fit([[0, 0, 3], [1, 1, 0], [0, 2, 1], [1, 0, 2]])`
3. `enc.transform([[0, 1, 3]]).toarray() #array([[ 1., 0., 0., 1., 0., 0., 0., 0., 1.]])`

#### 标签编码

` `
1. `le = sklearn.preprocessing.LabelEncoder() `
2. `le.fit([1, 2, 2, 6]) `
3. `le.transform([1, 1, 2, 6]) #array([0, 0, 1, 2]) `
4. `#非数值型转化为数值型`
5. `le.fit(["paris", "paris", "tokyo", "amsterdam"])`
6. `le.transform(["tokyo", "tokyo", "paris"]) #array([2, 2, 1])`

#### 含有异常值

` `
1. `sklearn.preprocessing.robust_scale`

#### 生成多项式

` `
1. `poly = sklearn.preprocessing.PolynomialFeatures(2)`
2. `poly.fit_transform(X)`

0
0 收藏

0 评论
0 收藏
0