tensorflow - 如何使用 tensorflow feature_columns 作为 keras 模型的输入

标签 tensorflow keras

TensorFlow 的 feature_columns API对于非数值特征处理非常有用。但是,当前的 API 文档更多地是关于将 feature_columns 与 tensorflow Estimator 结合使用。有没有一种可能的方法可以使用 feature_columns 进行分类特征表示,然后基于 tf.keras 构建模型?

我找到的唯一引用是以下教程。它展示了如何将特征列提供给 Keras Sequential 模型:Link

代码片段如下:

from tensorflow.python.feature_column import feature_column_v2 as fc

feature_columns = [fc.embedding_column(ccv, dimension=3), ...]
feature_layer = fc.FeatureLayer(feature_columns)
model = tf.keras.Sequential([
    feature_layer,
    tf.keras.layers.Dense(128, activation=tf.nn.relu),
    tf.keras.layers.Dense(64, activation=tf.nn.relu),
    tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
])
...
model.fit(dataset, steps_per_epoch=8) # dataset is created from tensorflow Dataset API

问题是如何使用带有 keras 功能模型 API 的自定义模型。我尝试了以下,但没有奏效(tensorflow 1.12版)
feature_layer = fc.FeatureLayer(feature_columns)
dense_features = feature_layer(features) # features is a dict of ndarrays in dataset
layer1 = tf.keras.layers.Dense(128, activation=tf.nn.relu)(dense_features)
layer2 = tf.keras.layers.Dense(64, activation=tf.nn.relu)(layer1)
output = tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)(layer2)
model = Model(inputs=dense_features, outputs=output)

错误日志:
ValueError: Input tensors to a Model must come from `tf.layers.Input`. Received: Tensor("feature_layer/concat:0", shape=(4, 3), dtype=float32) (missing previous layer metadata).

我不知道如何将特征列转换为 keras 模型的输入。

最佳答案

可以实现您想要的行为并且可以结合tf.feature_columnkeras functional API .而且,实际上,在 TF 文档中没有提到。
这至少在 TF 2.0.0-beta1 中有效,但在进一步的版本中可能会被更改甚至简化。
请查看 TensorFlow github 存储库中的问题 Unable to use FeatureColumn with Keras Functional API #27416 .在那里你会找到关于 tf.feature_column 的有用评论。和 Keras Functional API .
因为您询问一般方法,所以我会从上面的链接中复制带有示例的片段。 更新:下面的代码应该可以工作

from __future__ import absolute_import, division, print_function

import numpy as np
import pandas as pd

#!pip install tensorflow==2.0.0-alpha0
import tensorflow as tf

from tensorflow import feature_column
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

csv_file = tf.keras.utils.get_file('heart.csv', 'https://storage.googleapis.com/download.tensorflow.org/data/heart.csv')
dataframe = pd.read_csv(csv_file, nrows = 10000)
dataframe.head()

train, test = train_test_split(dataframe, test_size=0.2)
train, val = train_test_split(train, test_size=0.2)
print(len(train), 'train examples')
print(len(val), 'validation examples')
print(len(test), 'test examples')

# Define method to create tf.data dataset from Pandas Dataframe
# This worked with tf 2.0 but does not work with tf 2.2
def df_to_dataset_tf_2_0(dataframe, label_column, shuffle=True, batch_size=32):
    dataframe = dataframe.copy()
    #labels = dataframe.pop(label_column)
    labels = dataframe[label_column]

    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
    ds = ds.batch(batch_size)
    return ds

def df_to_dataset(dataframe, label_column, shuffle=True, batch_size=32):
    dataframe = dataframe.copy()
    labels = dataframe.pop(label_column)
    #labels = dataframe[label_column]

    ds = tf.data.Dataset.from_tensor_slices((dataframe.to_dict(orient='list'), labels))
    if shuffle:
        ds = ds.shuffle(buffer_size=len(dataframe))
    ds = ds.batch(batch_size)
    return ds


batch_size = 5 # A small batch sized is used for demonstration purposes
train_ds = df_to_dataset(train, label_column = 'target', batch_size=batch_size)
val_ds = df_to_dataset(val,label_column = 'target',  shuffle=False, batch_size=batch_size)
test_ds = df_to_dataset(test, label_column = 'target', shuffle=False, batch_size=batch_size)

age = feature_column.numeric_column("age")

feature_columns = []
feature_layer_inputs = {}

# numeric cols
for header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:
  feature_columns.append(feature_column.numeric_column(header))
  feature_layer_inputs[header] = tf.keras.Input(shape=(1,), name=header)

# bucketized cols
age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35])
feature_columns.append(age_buckets)

# indicator cols
thal = feature_column.categorical_column_with_vocabulary_list(
      'thal', ['fixed', 'normal', 'reversible'])
thal_one_hot = feature_column.indicator_column(thal)
feature_columns.append(thal_one_hot)
feature_layer_inputs['thal'] = tf.keras.Input(shape=(1,), name='thal', dtype=tf.string)

# embedding cols
thal_embedding = feature_column.embedding_column(thal, dimension=8)
feature_columns.append(thal_embedding)

# crossed cols
crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)
crossed_feature = feature_column.indicator_column(crossed_feature)
feature_columns.append(crossed_feature)



feature_layer = tf.keras.layers.DenseFeatures(feature_columns)
feature_layer_outputs = feature_layer(feature_layer_inputs)

x = layers.Dense(128, activation='relu')(feature_layer_outputs)
x = layers.Dense(64, activation='relu')(x)

baggage_pred = layers.Dense(1, activation='sigmoid')(x)

model = keras.Model(inputs=[v for v in feature_layer_inputs.values()], outputs=baggage_pred)

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(train_ds)

关于tensorflow - 如何使用 tensorflow feature_columns 作为 keras 模型的输入,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54375298/

相关文章:

Python:在不使用迭代的情况下将for循环操作扩展到矩阵中的每一行

python - 设置sequence_length对dynamic_rnn中返回状态的影响

python - 在 Tensorflow 中,如何检查自定义操作的梯度是否正确?

machine-learning - TensorFlow - 无效参数 : Reshape:0 is both fed and fetched

python - 理解tensorflow中的 `tf.nn.nce_loss()`

python - 如何从数据帧在 keras flow 中提供一个热编码矢量数据帧

python - Keras Sequential 模型,更多输入

python - tf.keras.preprocessing.text.Tokenizer() 和 tfds.features.text.Tokenizer() 的比较

python - Keras 与 TensorFlow : Use memory as it's needed [ResourceExhaustedError]

python - CNN 模型的权重变为非常小的值并且在 NaN 之后