python - numpy.histogram2d 在传递子集 pandas.DataFrame 时引发异常

标签 python numpy pandas

我在 pandas dataframe 与 numpy histogram2d 函数交互时遇到问题。具体来说,当此代码正常执行时

import numpy
import pandas
df = pandas.DataFrame(np.random.randn(100, 2), columns=list('AB'))
hist, xe, ye = numpy.histogram2d(df["A"], df["B"])

这段代码,我用 DataFrame 的子集创建直方图失败了

    import numpy
    import pandas
    df = pandas.DataFrame(np.random.randn(100, 2), columns=list('AB'))
    dfSubset = pandas.DataFrame(df[df["A"] < 0])
    hist, xe, ye = numpy.histogram2d(dfSubset["A"], dfSubset["B"])

有以下异常(exception)情况

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-763e2355a7e1> in <module>()
      1 dfSubset = pandas.DataFrame(df[df["A"] < 0])
----> 2 hist, xe, ye = numpy.histogram2d(dfSubset["A"], dfSubset["B"])

/home/mark/.virtualenvs/ipython/lib/python2.6/site-packages/numpy/lib/twodim_base.pyc in histogram2d(x, y, bins, range, normed, weights)
    651         xedges = yedges = asarray(bins, float)
    652         bins = [xedges, yedges]
--> 653     hist, edges = histogramdd([x, y], bins, range, normed, weights)
    654     return hist, edges[0], edges[1]
    655 

/home/mark/.virtualenvs/ipython/lib/python2.6/site-packages/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights)
    312             smax = ones(D)
    313         else:
--> 314             smin = atleast_1d(array(sample.min(0), float))
    315             smax = atleast_1d(array(sample.max(0), float))
    316     else:

/home/mark/.virtualenvs/ipython/lib/python2.6/site-packages/numpy/core/_methods.pyc in _amin(a, axis, out, keepdims)
     19 def _amin(a, axis=None, out=None, keepdims=False):
     20     return um.minimum.reduce(a, axis=axis,
---> 21                             out=out, keepdims=keepdims)
     22 
     23 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):

/home/mark/.virtualenvs/ipython/lib/python2.6/site-packages/pandas/core/generic.pyc in __nonzero__(self)
    663         raise ValueError("The truth value of a {0} is ambiguous. "
    664                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 665                          .format(self.__class__.__name__))
    666 
    667     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我从一些搜索中了解到,Python 容器的真实值应该返回什么是一个有争议的问题,并且 pandas 和 numpy 期望的行为不匹配。我不知道如何解决实际问题。

有人可以建议解决此问题吗?

我正在使用 iPython 笔记本运行 python 2.6.6 以及我正在运行的虚拟环境中的以下软件包:

Babel==0.9.4
Beaker==1.3.1
Jinja2==2.2.1
Magic-file-extensions==0.1
Mako==0.3.4
MarkupSafe==0.9.2
OpenEye-python2.6-redhat-6-x64==2013.10.3
PIL==1.1.6
Pygments==1.1.1
SSSDConfig==1.9.2
Sphinx==0.6.6
argparse==1.2.1
backports.ssl-match-hostname==3.4.0.2
cas==0.15
cups==1.0
cupshelpers==1.0
decorator==3.0.1
docutils==0.6
ethtool==0.6
firstboot==1.110
freeipa==2.0.0.alpha.0
git-remote-helpers==0.1.0
iniparse==0.3.1
iotop==0.3.2
ipapython==3.0.0
ipython==1.1.0
iwlib==1.0
kerberos==1.0
lxml==2.2.3
matplotlib==1.1.1
netaddr==0.7.5
nose==0.10.4
numpy==1.8.0
pandas==0.13.0
paramiko==1.7.5
patsy==0.2.1
pyOpenSSL==0.10
pycrypto==2.0.1
pycurl==7.19.0
pygpgme==0.1
python-dateutil==2.2
python-default-encoding==0.1
python-ldap==2.3.10
python-meh==0.11
python-nss==0.11
pytz==2013.9
pyxdg==0.18
pyzmq==14.0.1
qpid-python==0.14
qpid-tools==0.14
scdate==1.9.60
scikit-learn==0.14.1
scipy==0.13.2
sckdump==2.0.5
scservices==0.99.45
scservices.dbus==0.99.45
six==1.5.2
slip==0.2.20
slip.dbus==0.2.20
slip.gtk==0.2.20
smbc==1.0
stevedore==0.13
sympy==0.7.4.1
tornado==3.2
urlgrabber==3.9.1
virtinst==0.600.0
virtualenv==1.11.1
virtualenv-clone==0.2.4
virtualenvwrapper==4.2
yum-metadata-parser==1.1.2

谢谢!

最佳答案

更改行:

hist, xe, ye = numpy.histogram2d(dfSubset["A"], dfSubset["B"])

至:

hist, xe, ye = numpy.histogram2d(dfSubset["A"].values, dfSubset["B"].values)

将序列强制转换为 numpy 数组

关于python - numpy.histogram2d 在传递子集 pandas.DataFrame 时引发异常,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21558545/

相关文章:

python - gae-init 电话簿示例 : css not load correct?/

python - 我尝试在我的Django应用程序中使用Google Analytics(分析),但无法正常工作

python - 在 matplotlib 上禁用科学记数法

Python numpy.var 返回错误值

python - 连接 pandas DataFrames 只保留列中具有匹配值的行?

python - Pandas dataframe - 将列值转换为单独的列

python 多处理 - OverflowError ('cannot serialize a bytes object larger than 4GiB' )

python - 多次快速切片 numpy 数组

python - 如何创建基于 bins 的矩阵?

python - 如何在循环中使用 Pandas 字符串包含(str.contain)?