python - 使用 pandas.Series.str.get : what is the correct way?

我正在阅读 Wes Mckinney 的精彩著作，以快速了解 pandas。然而，我似乎不明白为什么 pandas.Series.str.get 不起作用。我在这里查看了一些 Github 问题和问题，但似乎没有任何帮助。

数据

data = pd.Series({'Dave': '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="87e3e6f1e2c7e0e8e8e0ebe2a9e4e8ea" rel="noreferrer noopener nofollow">[email protected]</a>', 'Steve': '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3a494e5f4c5f7a5d575b535614595557" rel="noreferrer noopener nofollow">[email protected]</a>', 'Rob': '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="22504d40625b434a4d4d0c414d4f" rel="noreferrer noopener nofollow">[email protected]</a>', 'Wes': np.nan}
)

代码

import pandas as pd
import re
import numpy as np
pattern = '[a-zA-Z0-9]+@.*'
matches = data.str.match(pattern)
matches.str.get(1)

上面的代码应该可以工作并产生如下结果:

Dave NaN
Rob  NaN
Steve NaN

我确实使用了与书中使用的不同的正则表达式模式，但我认为这不是问题。

错误:

raise AttributeError("Can only use .str accessor with string " "values!") AttributeError: Can only use .str accessor with string values

我错过了什么？我正在使用 pycharm 社区和 python 3.6.6， Pandas 版本:0.24.2 如果这有影响的话。

这是本书的屏幕截图:

最佳答案

您得到包含 NaN 的系列的原因是因为 matches 是一个 bool Series:

In[58]:
matches

Out[58]: 
Dave     True
Steve    True
Rob      True
Wes       NaN
dtype: object

因此，在这种情况下，在序数位置返回元素是没有意义的，这就是为什么您会得到 Series 的 NaN。

如果您查看文档中的示例:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get.html#pandas.Series.str.get

In[61]:
s = pd.Series(["String",
...               (1, 2, 3),
...               ["a", "b", "c"],
...               123,
...               -456,
...               {1: "Hello", "2": "World"}])
s

Out[61]: 
0                        String
1                     (1, 2, 3)
2                     [a, b, c]
3                           123
4                          -456
5    {1: 'Hello', '2': 'World'}
dtype: object

In[62]:
s.str.get(1)

Out[62]: 
0        t
1        2
2        b
3      NaN
4      NaN
5    Hello
dtype: object

所以这里它返回每行序数位置的元素，您可以看到对于某些行没有第二个元素，因此它返回 NaN。

关于python - 使用 pandas.Series.str.get : what is the correct way?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57216251/

python - 使用 pandas.Series.str.get : what is the correct way?

上一篇：python - 在将 Worker 对象移动到 pyqt 中的 QThread 之前设置信号和槽

下一篇：python 用其他嵌套字典更新嵌套字典