python - pandas 应用 np.histogram 来 reshape 数据框

标签 python numpy pandas

我想获取 pandas 数据帧每列的标准化直方图。 np.histogram 是我想使用的,但它返回一个元组,而我只想要第一项。但 Pandas 似乎不喜欢这样。例如,这有效:

import numpy as np

df = pd.DataFrame(np.random.uniform(size=20).reshape(5, 4))

bins = (0, 0.5, 1)
df.apply(np.histogram, bins=bins, normed=True)

并返回

0    ([0.8, 1.2], [0.0, 0.5, 1.0])
1    ([0.8, 1.2], [0.0, 0.5, 1.0])
2    ([0.8, 1.2], [0.0, 0.5, 1.0])
3    ([0.8, 1.2], [0.0, 0.5, 1.0])
dtype: object

但我只想要元组的第一项,所以我尝试了这个:

df.apply(lambda x : np.histogram(x, bins=bins, normed=True)[0]) 

但它出错了:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-51-3191795e120c> in <module>()
----> 1 df.apply(lambda x : np.histogram(x, bins=bins, normed=True)[0])

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   3310                     if reduce is None:
   3311                         reduce = True
-> 3312                     return self._apply_standard(f, axis, reduce=reduce)
   3313             else:
   3314                 return self._apply_broadcast(f, axis)

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
   3415                 index = None
   3416 
-> 3417             result = self._constructor(data=results, index=index)
   3418             result.columns = res_index
   3419 

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in __init__(self, data, index, columns, dtype, copy)
    199                                  dtype=dtype, copy=copy)
    200         elif isinstance(data, dict):
--> 201             mgr = self._init_dict(data, index, columns, dtype=dtype)
    202         elif isinstance(data, ma.MaskedArray):
    203             import numpy.ma.mrecords as mrecords

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _init_dict(self, data, index, columns, dtype)
    321 
    322         return _arrays_to_mgr(arrays, data_names, index, columns,
--> 323                               dtype=dtype)
    324 
    325     def _init_ndarray(self, values, index, columns, dtype=None,

/usr/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   4471     axes = [_ensure_index(columns), _ensure_index(index)]
   4472 
-> 4473     return create_block_manager_from_arrays(arrays, arr_names, axes)
   4474 
   4475 

/usr/local/lib/python2.7/site-packages/pandas/core/internals.pyc in create_block_manager_from_arrays(arrays, names, axes)
   3757         return mgr
   3758     except (ValueError) as e:
-> 3759         construction_error(len(arrays), arrays[0].shape[1:], axes, e)
   3760 
   3761 

/usr/local/lib/python2.7/site-packages/pandas/core/internals.pyc in construction_error(tot_items, block_shape, axes, e)
   3729         raise e
   3730     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3731         passed,implied))
   3732 
   3733 def create_block_manager_from_blocks(blocks, axes):

ValueError: Shape of passed values is (4,), indices imply (4, 5)

> /usr/local/lib/python2.7/site-packages/pandas/core/internals.py(3731)construction_error()
   3730     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 3731         passed,implied))
   3732 

有什么想法吗?

最佳答案

如果你愿意,你可以这样做。

In [26]: df.apply(lambda x : Series(np.histogram(x, bins=bins, normed=True)[0]))
Out[26]: 
     0    1    2    3
0  0.4  1.6  0.8  1.6
1  1.6  0.4  1.2  0.4

np.histogram 既不是reducer(返回单个值),也不是transformer(返回与输入相同的数字) 。所以 apply 不知道如何映射返回值。

这是另一种方式(以及概念上如何思考应用)

In [28]: f = lambda x : Series(np.histogram(x, bins=bins, normed=True)[0])

In [31]: concat([ f(col) for c, col in df.iteritems() ],axis=1)
Out[31]: 
     0    1    2    3
0  0.4  1.6  0.8  1.6
1  1.6  0.4  1.2  0.4

关于python - pandas 应用 np.histogram 来 reshape 数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24542572/

相关文章:

python - 当我们在文件上使用 "open with"时,它提供什么参数?

javascript - Selenium + XPath : element not found

python - 平均分组 2D numpy 数组

python - 运行 func(df) 创建新的数据帧并重命名它们

python - AttributeError: 'Series' 对象没有属性 'isoweekday'

python - 循环中的 Scrapy 调用请求

python - 通过http python发送图像

python - 用另一个 numpy 数组索引 numpy 数组

python - 在 python numpy 中创建动态数组名称

python - Pandas:解释表格摘要中的条目差异和特定列值