# 机器学习-TensorFlow应用之 binned features, Cross features和optimizer

2019/04/10 10:10

• 概述

• Bucketized (Binned) Features

Features engineering咱们在前面讲述了很多很多，并且都用sklearn演示了他们的应用和实现过程。这里补充一下Binned features的知识点，具体什么是Binned features呢？它其实很简单就是将咱们的数据按大小顺序分成n 个bins, 或者这里可以理解成n个quantiles， 然后咱们将每一个bin的boundary记录下来放到一个list里面，最后将咱们的数据在应用到这个bin里面，看看咱们的每一个数据属于哪一个bin，咱的的结果是按照bin的大小从小到大一次是0,1,2，................这样。咱们可以看一个简单的实例

boundaries = [0, 10, 100]
input tensor = [[-5, 10000]
[150,   10]
[5,    100]]

output如下

output = [[0, 3]
[3, 2]
[1, 3]]

def get_quantile_based_boundaries(series,num_bucket):
quantiles = np.arange(1.0,num_bucket)/num_bucket
boundaries = series.quantile(quantiles)#the index are the quantiles
return [boundaries[key] for key in boundaries.keys()]

house_median_age_numeric_column = tf.feature_column.numeric_column("housing_median_age")

bucketized_house_median_age = tf.feature_column.bucketized_column(source_column=house_median_age_numeric_column,
boundaries = get_quantile_based_boundaries(series=cali_housing_dataset_permutation["housing_median_age"],num_bucket=10)　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　)

def demo(feature_column):
feature_layer = tf.keras.layers.DenseFeatures(feature_column)
print(feature_layer(dict(cali_housing_dataset_permutation)).numpy())

[[0. 0. 0. ... 0. 1. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 1. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 1. ... 0. 0. 0.]
[0. 0. 0. ... 1. 0. 0.]]

• Cross_features

lon_x_lat = tf.feature_column.crossed_column(keys=[bucketized_longitude, bucketized_latitude],
hash_bucket_size = 1000)

• Optimizer

0 评论
0 收藏
0