python - iterrow() 时索引超出范围,这怎么可能?

标签 python pandas dataframe

我收到错误信息:

5205
(5219, 25)
5221
(5219, 25)
Traceback (most recent call last):
  File "/Users/Chu/Documents/dssg2018/sa4.py", line 44, in <module>
    df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\
IndexError: index 5221 is out of bounds for axis 0 with size 5219

当我遍历数据框时,索引来自迭代器。我不知道这怎么可能? idx直接来自dataframe

bt = BallTree(df[['lat','lng']], metric="haversine")
indices = bt.query_radius(df[['lat','lng']],r=(float(10)/40000)*360)

for idx,row in df.iterrows():
    for word in bag_of_words:
        if word in row['caption']:
            print(idx)
            print(df.shape)
            df.loc[idx,word]=len(df.iloc[indices[idx]][df[word]==1])/\
                             np.max([1,len(df.iloc[indices[idx]][df[word]!=1])])

iloc 更改为 loc 得到

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/Chu/Documents/dssg2018/sa4.py
(-124.60334244261675, 49.36453144316216, -121.67106179949566, 50.863501888419826)
27
(5219, 25)
/Users/Chu/Documents/dssg2018/sa4.py:42: FutureWarning: 
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  df.loc[idx,word]=len(df.loc[indices[idx]][df[word]==1])/\
/Users/Chu/Documents/dssg2018/sa4.py:42: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  df.loc[idx,word]=len(df.loc[indices[idx]][df[word]==1])/\
Traceback (most recent call last):
  File "/Users/Chu/Documents/dssg2018/sa4.py", line 42, in <module>
    df.loc[idx,word]=len(df.loc[indices[idx]][df[word]==1])/\
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 2133, in __getitem__
    return self._getitem_array(key)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/frame.py", line 2173, in _getitem_array
    key = check_bool_indexer(self.index, key)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexing.py", line 2023, in check_bool_indexer
    raise IndexingError('Unalignable boolean Series provided as '
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

最佳答案

你的 index 不是从 0len(df)-1,这将使 df.iloc[idx] 越界

例如

df = pd.DataFrame({'a': [0, 1]},index=[1,100])

for idx,row in df.iterrows():
    print(idx)
    print(row)

1
a    0
Name: 1, dtype: int64
100
a    1
Name: 100, dtype: int64

然后当你做

df.iloc[100]

IndexError: single positional indexer is out-of-bounds

但是当您执行 .loc 时,您会得到预期的输出。

df.loc[100]
Out[23]: 
a    1
Name: 100, dtype: int64

来自文件:

.iloc:iloc[] 主要基于整数位置

.loc:.loc[] 主要基于标签

解决方案:

使用 .locdf=df.reset_index(drop=True)

关于python - iterrow() 时索引超出范围,这怎么可能?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47665812/

相关文章:

r - 确定列中 NA 值的数量

python - 哪一种是在 Django 模型中使用 Python 类型提示的正确方法?

python - 在 Python 中阻塞线程

python - Pandas 合并而不复制列

python - Pandas 数据框上的累积求和函数

python - Pandas :如何识别具有 dtype 对象但混合类型项目的列?

python - 如何在Python中使用for循环附加数据帧而不覆盖现有数据帧

python - 如何在后台没有控制台的情况下启动这个 python GUI 应用程序?

python - 属性错误 : 'NoneType' object has no attribute 'append'

python - Pandas :在不重新排列数据帧的情况下对两行数据帧求和?