我正在 jupyter 笔记本中执行此代码。
我正在尝试更改数据框中的对象类型,如下所示:
column 1
#Green #Blue #Orange
#Green #Red
#Blue #Orange
#Orange
对此:
column 1
[Green, Blue, Orange]
[Green, Red]
[Blue, Orange]
[Orange]
当我尝试使用以下代码在一个数据帧中的一列上使用字符串方法时,它可以工作。
df1['column 1'] = df1['column 1'].str.replace('#', ' ')
df1['column 1'] = df1['column 1'].str.split(',')
但是当我尝试将此过程压缩为多个数据帧中同一列的循环时,我收到属性错误(即 AttributeError: Can only use .str accessor with string values, which use np.object_ dtype在 Pandas 中
):
df_list = [df1, df2, df3, df4, df5]
for df in df_list:
df['column 1'] = df['column 1'].str.replace('#', ' ')
df['column 1'] = df['column 1'].str.split(',')
当它们本质上是相同的过程时为什么会出现这种情况?
这是我的 jupyter 笔记本中的回溯错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-276-51c1ece881fa> in <module>
4 for df in df_list:
5 df['player_tags'] = df['player_tags'].str.replace('#', ' ')
----> 6 df['player_tags'] = df['player_tags'].str.split(',')
~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in __getattr__(self, name)
3608 if (name in self._internal_names_set or name in self._metadata or
3609 name in self._accessors):
-> 3610 return object.__getattribute__(self, name)
3611 else:
3612 if name in self._info_axis:
~/anaconda3/lib/python3.6/site-packages/pandas/core/accessor.py in __get__(self, instance, owner)
52 # this ensures that Series.str.<method> is well defined
53 return self.accessor_cls
---> 54 return self.construct_accessor(instance)
55
56 def __set__(self, instance, value):
~/anaconda3/lib/python3.6/site-packages/pandas/core/strings.py in _make_accessor(cls, data)
1908 # (instead of test for object dtype), but that isn't practical for
1909 # performance reasons until we have a str dtype (GH 9343)
-> 1910 raise AttributeError("Can only use .str accessor with string "
1911 "values, which use np.object_ dtype in "
1912 "pandas")
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
最佳答案
根据您提供的代码 - 它应该可以工作(事实上,必须工作,假设您的所有项目都是合法的)。
您始终可以获取示例代码并在您端进行检查:
df = pd.DataFrame({"column 1":['#Green #Blue #Orange','#Green #Red','#Blue #Orange','#Orange']})
df1 = df.copy()
df2 = df.copy()
df3 = df.copy()
df_list = [df1, df2, df3]
for df in df_list:
df['column 1'] = df['column 1'].str.replace('#', ' ')
df['column 1'] = df['column 1'].str.split(',')
效果非常好!结论 - 在您的代码 df_list
中,您发送了相同的 df 两次(或更多)。示例 df2 是对 df1 的引用,这会给您带来问题。请注意,在我上面的代码中,df1 df2 df3 被复制(与引用相反)
检查一下(这将导致错误):
df = pd.DataFrame({"column 1":['#Green #Blue #Orange','#Green #Red','#Blue #Orange','#Orange']})
df['column 1'] = df['column 1'].str.replace('#', ' ')
df['column 1'] = df['column 1'].str.split(',')
然后再做一次:
df['column 1'] = df['column 1'].str.replace('#', ' ')
df['column 1'] = df['column 1'].str.split(',')
您将收到错误消息。
关于python - 为什么字符串方法在 for 循环中使用时停止对对象列起作用,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60085749/