我有以下数据框:
start end days
0 2015-07-01 2015-07-07 (1, 2, 3, 4, 5, 6, 7)
1 2015-07-08 2015-07-14 (8, 9, 10, 11, 12, 13, 14)
2 2015-07-15 2015-07-21 (15, 16, 17, 18, 19, 20, 21)
3 2015-07-22 2015-07-28 (22, 23, 24, 25, 26, 27, 28)
4 2015-07-29 2015-08-04 (29, 30, 31, 1, 2, 3, 4)
5 2015-08-05 2015-08-11 (5, 6, 7, 8, 9, 10, 11)
6 2015-08-12 2015-08-18 (12, 13, 14, 15, 16, 17, 18)
7 2015-08-19 2015-08-25 (19, 20, 21, 22, 23, 24, 25)
8 2015-08-26 2015-09-01 (26, 27, 28, 29, 30, 31, 1)
9 2015-09-02 2015-09-08 (2, 3, 4, 5, 6, 7, 8)
10 2015-09-09 2015-09-15 (9, 10, 11, 12, 13, 14, 15)
11 2015-09-16 2015-09-22 (16, 17, 18, 19, 20, 21, 22)
12 2015-09-23 2015-09-29 (23, 24, 25, 26, 27, 28, 29)
我有兴趣使用包含元组的 days 列,使用 Pandas 语法进行基本过滤似乎不起作用:
df[4 in df['days'] == True]
我希望上面的内容能够过滤 DataFrame 以返回以下行,即包含 4 的元组:
start end days
0 2015-07-01 2015-07-07 (1, 2, 3, 4, 5, 6, 7)
4 2015-07-29 2015-08-04 (29, 30, 31, 1, 2, 3, 4)
9 2015-09-02 2015-09-08 (2, 3, 4, 5, 6, 7, 8)
而是返回一个空的 DataFrame。
我还尝试创建一个新列来根据检查表达式来保存 True/False 值,如下所示:
df['daysTF'] = 4 in df['days']
这将返回 DataFrame,其中所有行的“daysTF”列都设置为 True,而不是仅在元组中包含 4 时才为 True。
最佳答案
实现此目的的一种方法是使用 Series.apply
方法,尽管这可能不是很快。示例-
df[df['days'].apply(lambda x: 4 in x)]
演示 -
In [139]: df
Out[139]:
start end days
0 2015-07-01 2015-07-07 (1, 2, 3, 4, 5, 6, 7)
1 2015-07-08 2015-07-14 (8, 9, 10, 11, 12, 13, 14)
2 2015-07-15 2015-07-21 (15, 16, 17, 18, 19, 20, 21)
3 2015-07-22 2015-07-28 (22, 23, 24, 25, 26, 27, 28)
4 2015-07-29 2015-08-04 (29, 30, 31, 1, 2, 3, 4)
5 2015-08-05 2015-08-11 (5, 6, 7, 8, 9, 10, 11)
6 2015-08-12 2015-08-18 (12, 13, 14, 15, 16, 17, 18)
7 2015-08-19 2015-08-25 (19, 20, 21, 22, 23, 24, 25)
8 2015-08-26 2015-09-01 (26, 27, 28, 29, 30, 31, 1)
9 2015-09-02 2015-09-08 (2, 3, 4, 5, 6, 7, 8)
10 2015-09-09 2015-09-15 (9, 10, 11, 12, 13, 14, 15)
11 2015-09-16 2015-09-22 (16, 17, 18, 19, 20, 21, 22)
12 2015-09-23 2015-09-29 (23, 24, 25, 26, 27, 28, 29)
In [141]: df['days'][0]
Out[141]: (1, 2, 3, 4, 5, 6, 7)
In [142]: type(df['days'][0])
Out[142]: tuple
In [143]: df[df['days'].apply(lambda x: 4 in x)]
Out[143]:
start end days
0 2015-07-01 2015-07-07 (1, 2, 3, 4, 5, 6, 7)
4 2015-07-29 2015-08-04 (29, 30, 31, 1, 2, 3, 4)
9 2015-09-02 2015-09-08 (2, 3, 4, 5, 6, 7, 8)
关于python - 对 Pandas DataFrame 列中保存的元组进行操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32872849/