python - 对 Pandas DataFrame 列中保存的元组进行操作

标签 python pandas

我有以下数据框:

   start      end         days
0  2015-07-01 2015-07-07         (1, 2, 3, 4, 5, 6, 7)
1  2015-07-08 2015-07-14    (8, 9, 10, 11, 12, 13, 14)
2  2015-07-15 2015-07-21  (15, 16, 17, 18, 19, 20, 21)
3  2015-07-22 2015-07-28  (22, 23, 24, 25, 26, 27, 28)
4  2015-07-29 2015-08-04      (29, 30, 31, 1, 2, 3, 4)
5  2015-08-05 2015-08-11       (5, 6, 7, 8, 9, 10, 11)
6  2015-08-12 2015-08-18  (12, 13, 14, 15, 16, 17, 18)
7  2015-08-19 2015-08-25  (19, 20, 21, 22, 23, 24, 25)
8  2015-08-26 2015-09-01   (26, 27, 28, 29, 30, 31, 1)
9  2015-09-02 2015-09-08         (2, 3, 4, 5, 6, 7, 8)
10 2015-09-09 2015-09-15   (9, 10, 11, 12, 13, 14, 15)
11 2015-09-16 2015-09-22  (16, 17, 18, 19, 20, 21, 22)
12 2015-09-23 2015-09-29  (23, 24, 25, 26, 27, 28, 29)

我有兴趣使用包含元组的 days 列,使用 Pandas 语法进行基本过滤似乎不起作用:

df[4 in df['days'] == True]

我希望上面的内容能够过滤 DataFrame 以返回以下行,即包含 4 的元组:

       start      end             days
    0  2015-07-01 2015-07-07         (1, 2, 3, 4, 5, 6, 7)
    4  2015-07-29 2015-08-04      (29, 30, 31, 1, 2, 3, 4)
    9  2015-09-02 2015-09-08         (2, 3, 4, 5, 6, 7, 8)

而是返回一个空的 DataFrame。

我还尝试创建一个新列来根据检查表达式来保存 True/False 值,如下所示:

df['daysTF'] = 4 in df['days']

这将返回 DataFrame,其中所有行的“daysTF”列都设置为 True,而不是仅在元组中包含 4 时才为 True。

最佳答案

实现此目的的一种方法是使用 Series.apply 方法,尽管这可能不是很快。示例-

df[df['days'].apply(lambda x: 4 in x)]

演示 -

In [139]: df
Out[139]:
         start         end                          days
0   2015-07-01  2015-07-07         (1, 2, 3, 4, 5, 6, 7)
1   2015-07-08  2015-07-14    (8, 9, 10, 11, 12, 13, 14)
2   2015-07-15  2015-07-21  (15, 16, 17, 18, 19, 20, 21)
3   2015-07-22  2015-07-28  (22, 23, 24, 25, 26, 27, 28)
4   2015-07-29  2015-08-04      (29, 30, 31, 1, 2, 3, 4)
5   2015-08-05  2015-08-11       (5, 6, 7, 8, 9, 10, 11)
6   2015-08-12  2015-08-18  (12, 13, 14, 15, 16, 17, 18)
7   2015-08-19  2015-08-25  (19, 20, 21, 22, 23, 24, 25)
8   2015-08-26  2015-09-01   (26, 27, 28, 29, 30, 31, 1)
9   2015-09-02  2015-09-08         (2, 3, 4, 5, 6, 7, 8)
10  2015-09-09  2015-09-15   (9, 10, 11, 12, 13, 14, 15)
11  2015-09-16  2015-09-22  (16, 17, 18, 19, 20, 21, 22)
12  2015-09-23  2015-09-29  (23, 24, 25, 26, 27, 28, 29)

In [141]: df['days'][0]
Out[141]: (1, 2, 3, 4, 5, 6, 7)

In [142]: type(df['days'][0])
Out[142]: tuple

In [143]: df[df['days'].apply(lambda x: 4 in x)]
Out[143]:
        start         end                      days
0  2015-07-01  2015-07-07     (1, 2, 3, 4, 5, 6, 7)
4  2015-07-29  2015-08-04  (29, 30, 31, 1, 2, 3, 4)
9  2015-09-02  2015-09-08     (2, 3, 4, 5, 6, 7, 8)

关于python - 对 Pandas DataFrame 列中保存的元组进行操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32872849/

相关文章:

python - 找到 TypeError : sequence item 0: expected string, 列表

Python: "in"无法识别 DataFrame 列中的值

python - 用 1 替换列中的非 Null 值

python - 循环两个 Pandas 数据框并应用函数计算城市给定距离内的机场

python - 何时以及如何使用 Python 的 RLock

python - 如何使用 ElementTree 正确解析 utf-8 xml?

python - 在 Cython 与 NumPy 中对整数与 float 求和时性能差异很大

python - 在数据框中查找 "missing"值的最佳方法是什么?

python - pandas.DataFrame.merge 中的错误?

python - 将嵌套的 Python 循环转换为列表理解