python - Pandas :DataFrame.unstack 错误

标签 python pandas dataframe

我编写了以下函数将数据框的几列转换为数值:

def factorizeMany(data, columns):
    """ Factorize a bunch of columns in a data frame"""
    data[columns] = data[columns].stack().rank(method='dense').unstack()
    return data

这样调用

trainDataPre = factorizeMany(trainDataMerged.fillna(0), columns=["char_{0}".format(i) for i in range(1,10)])

给我一​​个错误。我不知道在哪里寻找原因,可能是输入错误?

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-357f8a4b2ef8> in <module>()
      1 #trainDataPre = trainDataMerged.drop(["people_id", "activity_id", "date"], axis=1)
      2 #trainDataPre = trainDataMerged.fillna(0)
----> 3 trainDataPre = mininggear.factorizeMany(trainDataMerged.fillna(0), columns=["char_{0}".format(i) for i in range(1,10)])

/Users/cls/Dropbox/Datengräber/Kaggle/RedHat/mininggear.py in factorizeMany(data, columns)
     15 def factorizeMany(data, columns):
     16     """ Factorize a bunch of columns in a data frame"""
---> 17     data[columns] = data[columns].stack().rank(method='dense').unstack()
     18     return data
     19 

/usr/local/lib/python3.5/site-packages/pandas/core/series.py in unstack(self, level, fill_value)
   2041         """
   2042         from pandas.core.reshape import unstack
-> 2043         return unstack(self, level, fill_value)
   2044 
   2045     # ----------------------------------------------------------------------

/usr/local/lib/python3.5/site-packages/pandas/core/reshape.py in unstack(obj, level, fill_value)
    405     else:
    406         unstacker = _Unstacker(obj.values, obj.index, level=level,
--> 407                                fill_value=fill_value)
    408         return unstacker.get_result()
    409 

/usr/local/lib/python3.5/site-packages/pandas/core/reshape.py in __init__(self, values, index, level, value_columns, fill_value)
     90 
     91         # when index includes `nan`, need to lift levels/strides by 1
---> 92         self.lift = 1 if -1 in self.index.labels[self.level] else 0
     93 
     94         self.new_index_levels = list(index.levels)

AttributeError: 'Index' object has no attribute 'labels'

最佳答案

该错误是由于您试图通过填充 NaN 对包含数值和分类/字符串值的数据帧子集执行 rank 操作> 在带有 0 的数据框中调用该函数。

考虑这种情况:

df = pd.DataFrame({'char_1': ['cat', 'dog', 'buffalo', 'cat'],
                   'char_2': ['mouse', 'tiger', 'lion', 'mouse'],
                   'char_3': ['giraffe', np.NaN, 'cat', np.NaN]})
df 

Image

df = df.fillna(0)
df[['char_3']].stack().rank()
Series([], dtype: float64)

所以,您基本上是在一个空系列上执行 unstack 操作,毕竟这不是您想要做的。

更好的做法是避免进一步的并发症:

def factorizeMany(data, columns):
    """ Factorize a bunch of columns in a data frame"""
    stacked = data[columns].stack(dropna=False)
    data[columns] = pandas.Series(stacked.factorize()[0], index=stacked.index).unstack()
    return data

关于python - Pandas :DataFrame.unstack 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39546975/

相关文章:

r - 防止数据框列表中的名称消失

pandas - 转置和重新排列 Dataframe pandas

python - 加载无效图像时隐藏 "Failed to load"消息,wxpython

python - Django makemigrations 有效,迁移失败,显示 "django.db.utils.IntegrityError: NOT NULL constraint failed"

python - 使用 Pandas 数据框中的列作为查找来选择同一 df 中的第二列两次,然后对结果进行比较

python - 为有效数据帧添加值并忽略无效解析

python - 从列表中替换 Pandas 系列的值

Python Mechanize browser.forms()

python - 如何以 Tornado 的风格为 Cyclone 编写测试?

python - 如何用python中DataFrame列的模式替换NA值?