python - Pandas - 检查两个日期是否与确定的条件相同(另外两个变量)并执行操作

标签 python pandas dataframe data-science

对于 pandas 来说,我在检查值和基于四个变量(reception_date、final_date、Status、ID)执行多个操作方面遇到了挑战,问题如下表:

      id             user_email reception_date   end_date    status
0  42872     <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1d7768797c7578737c5d747c337e7270337e72" rel="noreferrer noopener nofollow">[email protected]</a>      3/30/2022  3/30/2022  Accepted
1  42872   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c5b3a4abb3a4a9a0ab85a0b1b6b1eba6aaa8eba6aa" rel="noreferrer noopener nofollow">[email protected]</a>       3/1/2022   3/4/2022  Returned
2  42872  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6d01180c1c180403192d000c04010c430e0200430e02" rel="noreferrer noopener nofollow">[email protected]</a>       3/7/2022  3/30/2022  In Study
3  99999                <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="69082908470a0604" rel="noreferrer noopener nofollow">[email protected]</a>       3/6/2022  3/28/2022  In Study
4  42872           <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7a0e1f090e3a0e1f090e541f09" rel="noreferrer noopener nofollow">[email protected]</a>      3/23/2022  3/25/2022  In Study
5  99999                 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="016041632f6472" rel="noreferrer noopener nofollow">[email protected]</a>      3/28/2022   4/5/2022  Accepted
6  78787                 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ee8fae8cc08b9d" rel="noreferrer noopener nofollow">[email protected]</a>      3/15/2022  3/16/2022  In Study

首先要求对同一个ID进行操作,(本例中只找到了很少的数据,但数据库有5万多条数据),检查Status栏中是否有“Accepted”,验证通过后检查状态“In Study”的“end_date”是否等于状态“Accepted”的“reception_date”,如果条件成立,则将状态从“In Study”更改为“Accepted”,预期输出如下:

      id             user_email reception_date   end_date    status
0  42872     <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="87edf2e3e6efe2e9e6c7eee6a9e4e8eaa9e4e8" rel="noreferrer noopener nofollow">[email protected]</a>      3/30/2022  3/30/2022  Accepted
1  42872   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d7a1b6b9a1b6bbb2b997b2a3a4a3f9b4b8baf9b4b8" rel="noreferrer noopener nofollow">[email protected]</a>       3/1/2022   3/4/2022  Returned
2  42872  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1d71687c6c687473695d707c74717c337e7270337e72" rel="noreferrer noopener nofollow">[email protected]</a>       3/7/2022  3/30/2022  Accepted
3  99999                <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bfdeffde91dcd0d2" rel="noreferrer noopener nofollow">[email protected]</a>       3/6/2022  3/28/2022  Accepted
4  42872           <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="681c0d1b1c281c0d1b1c460d1b" rel="noreferrer noopener nofollow">[email protected]</a>      3/23/2022  3/25/2022  In Study
5  99999                 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e786a785c98294" rel="noreferrer noopener nofollow">[email protected]</a>      3/28/2022   4/5/2022  Accepted
6  78787                 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="93f2d3f1bdf6e0" rel="noreferrer noopener nofollow">[email protected]</a>      3/15/2022  3/16/2022  In Study

由于我对 pandas 比较陌生,所以我尝试了几种方法,其中之一是我最后一次尝试,使用

Test=Test.merge(Test.loc[Test.status== 'Accepted'], how='left', left_on=['id'], right_on=['id'], suffixes=("", "_y"))\
.assign(status=lambda x:np.where((x.end_date_y==x.reception_date) & (x.status== 'In Study'), 'Accepted',x.status))

但是结果不是预期的输出,我希望你能帮助我,这让我发疯。

最佳答案

您可以使用:

# which rows are Accepted?
m1 = df['status'].eq('Accepted')

# which rows are In Study?
m2 = df['status'].eq('In Study')

# get In Study indices that also have an Accepted
# on the same date
to_change = (df[m2]
 .reset_index()
 .merge(df[m1],
        left_on=['id', 'end_date'],
        right_on=['id', 'reception_date'])
 ['index']
)
# [2, 3]

# update in place
df.loc[to_change, 'status'] = 'Accepted'

输出:

      id             user_email reception_date   end_date    status
0  42872     <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="5a302f3e3b323f343b1a333b74393537743935" rel="noreferrer noopener nofollow">[email protected]</a>      3/30/2022  3/30/2022  Accepted
1  42872   <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0472656a726568616a44617077702a676b692a676b" rel="noreferrer noopener nofollow">[email protected]</a>       3/1/2022   3/4/2022  Returned
2  42872  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="117d64706064787f65517c70787d703f727e7c3f727e" rel="noreferrer noopener nofollow">[email protected]</a>       3/7/2022  3/30/2022  Accepted
3  99999                <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8beacbeaa5e8e4e6" rel="noreferrer noopener nofollow">[email protected]</a>       3/6/2022  3/28/2022  Accepted
4  42872           <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0f7b6a7c7b4f7b6a7c7b216a7c" rel="noreferrer noopener nofollow">[email protected]</a>      3/23/2022  3/25/2022  In Study
5  99999                 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="026342602c6771" rel="noreferrer noopener nofollow">[email protected]</a>      3/28/2022   4/5/2022  Accepted
6  78787                 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="533213317d3620" rel="noreferrer noopener nofollow">[email protected]</a>      3/15/2022  3/16/2022  In Study

关于python - Pandas - 检查两个日期是否与确定的条件相同(另外两个变量)并执行操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74230740/

相关文章:

python - 如何计算从右边开始具有零值的连续列数,直到出现第一个非零元素

python - 通过索引 pandas DataFrame 提取单个值

python - 安装基于 pyproject.toml 的项目所需的错误 : Could not build wheels for pandas,

python - Pandas Dataframe 数据是相同的还是新的?

python - 删除 pandas 数据框中具有相同值的连续行

count - Pandas python如何计算数据框中的记录或行数

python - 使用 sqlalchemy 多态关系删除整个层次结构

python - 返回类的初始化值(如果属于同一类)的最佳 python 方法

python - 用 python 编写的博客引擎列表

python - 如何运行仅在 Pandas 中选择第一次出现的条件查询?