总结: 这不起作用:
df[df.key==1]['D'] = 1
但这确实:
df.D[df.key==1] = 1
为什么?
复制:
In [1]: import pandas as pd
In [2]: from numpy.random import randn
In [4]: df = pd.DataFrame(randn(6,3),columns=list('ABC'))
In [5]: df
Out[5]:
A B C
0 1.438161 -0.210454 -1.983704
1 -0.283780 -0.371773 0.017580
2 0.552564 -0.610548 0.257276
3 1.931332 0.649179 -1.349062
4 1.656010 -1.373263 1.333079
5 0.944862 -0.657849 1.526811
In [6]: df['D']=0.0
In [7]: df['key']=3*[1]+3*[2]
In [8]: df
Out[8]:
A B C D key
0 1.438161 -0.210454 -1.983704 0 1
1 -0.283780 -0.371773 0.017580 0 1
2 0.552564 -0.610548 0.257276 0 1
3 1.931332 0.649179 -1.349062 0 2
4 1.656010 -1.373263 1.333079 0 2
5 0.944862 -0.657849 1.526811 0 2
这行不通:
In [9]: df[df.key==1]['D'] = 1
In [10]: df
Out[10]:
A B C D key
0 1.438161 -0.210454 -1.983704 0 1
1 -0.283780 -0.371773 0.017580 0 1
2 0.552564 -0.610548 0.257276 0 1
3 1.931332 0.649179 -1.349062 0 2
4 1.656010 -1.373263 1.333079 0 2
5 0.944862 -0.657849 1.526811 0 2
但这确实:
In [11]: df.D[df.key==1] = 3.4
In [12]: df
Out[12]:
A B C D key
0 1.438161 -0.210454 -1.983704 3.4 1
1 -0.283780 -0.371773 0.017580 3.4 1
2 0.552564 -0.610548 0.257276 3.4 1
3 1.931332 0.649179 -1.349062 0.0 2
4 1.656010 -1.373263 1.333079 0.0 2
5 0.944862 -0.657849 1.526811 0.0 2
我的问题是:
Why does only the 2nd way work? I can't seem to see a difference in selection/indexing logic.
版本是0.10.0
Edit: This should not be done like this anymore. Since version 0.11, there is
.loc
. See here: http://pandas.pydata.org/pandas-docs/stable/indexing.html
最佳答案
Pandas 文档说:
Returning a view versus a copy
The rules about when a view on the data is returned are entirely dependent on NumPy. Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.
在 df[df.key==1]['D']
中,您首先进行 bool 切片(生成 Dataframe 的副本),然后选择一列 ['D']。
在 df.D[df.key==1] = 3.4
中,您首先选择一列,然后对生成的 Series 进行 bool 切片。
这似乎有所不同,尽管我必须承认这有点违反直觉。
编辑:差异由 Dougal 识别,请参阅他的评论:对于版本 1,复制是作为 __getitem__
方法进行的要求 bool 切片。对于版本 2,仅访问 __setitem__
方法 - 因此不返回副本而只是分配。
关于python - 了解 Pandas 数据帧索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14192741/