python - 类型错误 : argument of type 'float' is not iterable

标签 python pandas tensorflow

我是 python 和 TensorFlow 的新手。我最近开始理解和执行 TensorFlow 示例,并遇到了这个:https://www.tensorflow.org/versions/r0.10/tutorials/wide_and_deep/index.html

我收到错误,TypeError: argument of type 'float' is not iterable,我认为问题出在以下代码行:

df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int) 

(income_bracket是人口普查数据集的标签列,'>50K'是其中一个可能的标签值,另一个标签是'=<50K'。将数据集读入df_train。提供的解释在这样做的原因的文档是,“由于该任务是一个二元分类问题,我们将构建一个名为“label”的标签列,如果收入超过 50K,其值为 1,否则为 0。”)

如果有人能向我解释到底发生了什么以及我应该如何解决它,那就太好了。我试过用Python2.7和Python3.4,我觉得不是语言版本的问题。此外,如果有人知道适合 TensorFlow 和 pandas 新手的优秀教程,请分享链接。

完整程序:

import pandas as pd
import urllib
import tempfile
import tensorflow as tf

gender = tf.contrib.layers.sparse_column_with_keys(column_name="gender", keys=["female", "male"])
race = tf.contrib.layers.sparse_column_with_keys(column_name="race", keys=["Amer-Indian-Eskimo", "Asian-Pac-Islander", "Black", "Other", "White"])
education = tf.contrib.layers.sparse_column_with_hash_bucket("education", hash_bucket_size=1000)
marital_status = tf.contrib.layers.sparse_column_with_hash_bucket("marital_status", hash_bucket_size=100)
relationship = tf.contrib.layers.sparse_column_with_hash_bucket("relationship", hash_bucket_size=100)
workclass = tf.contrib.layers.sparse_column_with_hash_bucket("workclass", hash_bucket_size=100)
occupation = tf.contrib.layers.sparse_column_with_hash_bucket("occupation", hash_bucket_size=1000)
native_country = tf.contrib.layers.sparse_column_with_hash_bucket("native_country", hash_bucket_size=1000)


age = tf.contrib.layers.real_valued_column("age")
age_buckets = tf.contrib.layers.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
education_num = tf.contrib.layers.real_valued_column("education_num")
capital_gain = tf.contrib.layers.real_valued_column("capital_gain")
capital_loss = tf.contrib.layers.real_valued_column("capital_loss")
hours_per_week = tf.contrib.layers.real_valued_column("hours_per_week")

wide_columns = [gender, native_country, education, occupation, workclass, marital_status, relationship, age_buckets, tf.contrib.layers.crossed_column([education, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([native_country, occupation], hash_bucket_size=int(1e4)), tf.contrib.layers.crossed_column([age_buckets, race, occupation], hash_bucket_size=int(1e6))]

deep_columns = [
  tf.contrib.layers.embedding_column(workclass, dimension=8),
  tf.contrib.layers.embedding_column(education, dimension=8),
  tf.contrib.layers.embedding_column(marital_status, dimension=8),
  tf.contrib.layers.embedding_column(gender, dimension=8),
  tf.contrib.layers.embedding_column(relationship, dimension=8),
  tf.contrib.layers.embedding_column(race, dimension=8),
  tf.contrib.layers.embedding_column(native_country, dimension=8),
  tf.contrib.layers.embedding_column(occupation, dimension=8),
  age, education_num, capital_gain, capital_loss, hours_per_week]

model_dir = tempfile.mkdtemp()
m = tf.contrib.learn.DNNLinearCombinedClassifier(
    model_dir=model_dir,
    linear_feature_columns=wide_columns,
    dnn_feature_columns=deep_columns,
    dnn_hidden_units=[100, 50])


COLUMNS = ["age", "workclass", "fnlwgt", "education", "education_num",
  "marital_status", "occupation", "relationship", "race", "gender",
  "capital_gain", "capital_loss", "hours_per_week", "native_country", "income_bracket"]
LABEL_COLUMN = 'label'
CATEGORICAL_COLUMNS = ["workclass", "education", "marital_status", "occupation", "relationship", "race", "gender", "native_country"]
CONTINUOUS_COLUMNS = ["age", "education_num", "capital_gain", "capital_loss", "hours_per_week"]


train_file = tempfile.NamedTemporaryFile()
test_file = tempfile.NamedTemporaryFile()
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data", train_file.name)
urllib.urlretrieve("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test", test_file.name)


df_train = pd.read_csv(train_file, names=COLUMNS, skipinitialspace=True)
df_test = pd.read_csv(test_file, names=COLUMNS, skipinitialspace=True, skiprows=1)
df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
df_test[LABEL_COLUMN] = (df_test['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)


def input_fn(df):

  continuous_cols = {k: tf.constant(df[k].values)
                     for k in CONTINUOUS_COLUMNS}

  categorical_cols = {k: tf.SparseTensor(
      indices=[[i, 0] for i in range(df[k].size)],
      values=df[k].values,
      shape=[df[k].size, 1])
                      for k in CATEGORICAL_COLUMNS}

  feature_cols = dict(continuous_cols.items() + categorical_cols.items())
  label = tf.constant(df[LABEL_COLUMN].values)
  return feature_cols, label


def train_input_fn():
    return input_fn(df_train)


def eval_input_fn():
    return input_fn(df_test)

m.fit(input_fn=train_input_fn, steps=200)
results = m.evaluate(input_fn=eval_input_fn, steps=1)
for key in sorted(results):
    print("%s: %s" % (key, results[key]))

谢谢

PS:错误的完整堆栈跟踪

Traceback (most recent call last):
  
File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <module>
    df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)
  
File "/usr/lib/python2.7/dist-packages/pandas/core/series.py", line 2023, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
 
File "inference.pyx", line 920, in pandas.lib.map_infer (pandas/lib.c:44780)
  
File "/home/jaspreet/PycharmProjects/TicTacTensorFlow/census.py", line 73, in <lambda>
    df_train[LABEL_COLUMN] = (df_train['income_bracket'].apply(lambda x: '>50K' in x)).astype(int)

TypeError: argument of type 'float' is not iterable

最佳答案

如您所见,当您检查 test.data ,你会明显看到income_bracket中第一行数据有“NAN”字段。

我进一步检查这是唯一包含“NAN”的行:

ib = df_test ["income_bracket"]
t = type('12')
for idx,i in enumerate(ib):
    if(type(i) != t):
        print idx,type(i) 

RESULT: 0 <type 'float'>

因此您可以通过以下方式跳过这一行:

df_test = pd.read_csv(file_test , names=COLUMNS, skipinitialspace=True, skiprows=1)

关于python - 类型错误 : argument of type 'float' is not iterable,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39238057/

相关文章:

Python属性错误: 'Series' object has no attribute 'isdigit'

python - 如何使用某些包含字符串的列在 pandas DataFrame 上绘制平行坐标?

python - 使用 pandas DataFrame 在箱线图上绘制线条

python - 为什么 tf.image.decode_jpeg 可以解码 png?

python - 使用 keras 自定义层时构建错误

python - 从特定标签的 Gmail 中获取所有电子邮件

python - 与 Python 中的导入进一步混淆

python - Pandas :列的指数平滑函数

android - 如何使用tensorflow读取android中的内存映射图

python - Python 的 Crypto/X509 证书解析库