对于 pandas 来说,我在检查值和基于四个变量(reception_date、final_date、Status、ID)执行多个操作方面遇到了挑战,问题如下表:
id user_email reception_date end_date status
0 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1d7768797c7578737c5d747c337e7270337e72" rel="noreferrer noopener nofollow">[email protected]</a> 3/30/2022 3/30/2022 Accepted
1 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c5b3a4abb3a4a9a0ab85a0b1b6b1eba6aaa8eba6aa" rel="noreferrer noopener nofollow">[email protected]</a> 3/1/2022 3/4/2022 Returned
2 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6d01180c1c180403192d000c04010c430e0200430e02" rel="noreferrer noopener nofollow">[email protected]</a> 3/7/2022 3/30/2022 In Study
3 99999 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="69082908470a0604" rel="noreferrer noopener nofollow">[email protected]</a> 3/6/2022 3/28/2022 In Study
4 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7a0e1f090e3a0e1f090e541f09" rel="noreferrer noopener nofollow">[email protected]</a> 3/23/2022 3/25/2022 In Study
5 99999 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="016041632f6472" rel="noreferrer noopener nofollow">[email protected]</a> 3/28/2022 4/5/2022 Accepted
6 78787 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ee8fae8cc08b9d" rel="noreferrer noopener nofollow">[email protected]</a> 3/15/2022 3/16/2022 In Study
首先要求对同一个ID进行操作,(本例中只找到了很少的数据,但数据库有5万多条数据),检查Status栏中是否有“Accepted”,验证通过后检查状态“In Study”的“end_date”是否等于状态“Accepted”的“reception_date”,如果条件成立,则将状态从“In Study”更改为“Accepted”,预期输出如下:
id user_email reception_date end_date status
0 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="87edf2e3e6efe2e9e6c7eee6a9e4e8eaa9e4e8" rel="noreferrer noopener nofollow">[email protected]</a> 3/30/2022 3/30/2022 Accepted
1 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d7a1b6b9a1b6bbb2b997b2a3a4a3f9b4b8baf9b4b8" rel="noreferrer noopener nofollow">[email protected]</a> 3/1/2022 3/4/2022 Returned
2 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1d71687c6c687473695d707c74717c337e7270337e72" rel="noreferrer noopener nofollow">[email protected]</a> 3/7/2022 3/30/2022 Accepted
3 99999 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bfdeffde91dcd0d2" rel="noreferrer noopener nofollow">[email protected]</a> 3/6/2022 3/28/2022 Accepted
4 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="681c0d1b1c281c0d1b1c460d1b" rel="noreferrer noopener nofollow">[email protected]</a> 3/23/2022 3/25/2022 In Study
5 99999 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e786a785c98294" rel="noreferrer noopener nofollow">[email protected]</a> 3/28/2022 4/5/2022 Accepted
6 78787 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="93f2d3f1bdf6e0" rel="noreferrer noopener nofollow">[email protected]</a> 3/15/2022 3/16/2022 In Study
由于我对 pandas 比较陌生,所以我尝试了几种方法,其中之一是我最后一次尝试,使用
Test=Test.merge(Test.loc[Test.status== 'Accepted'], how='left', left_on=['id'], right_on=['id'], suffixes=("", "_y"))\
.assign(status=lambda x:np.where((x.end_date_y==x.reception_date) & (x.status== 'In Study'), 'Accepted',x.status))
但是结果不是预期的输出,我希望你能帮助我,这让我发疯。
最佳答案
您可以使用:
# which rows are Accepted?
m1 = df['status'].eq('Accepted')
# which rows are In Study?
m2 = df['status'].eq('In Study')
# get In Study indices that also have an Accepted
# on the same date
to_change = (df[m2]
.reset_index()
.merge(df[m1],
left_on=['id', 'end_date'],
right_on=['id', 'reception_date'])
['index']
)
# [2, 3]
# update in place
df.loc[to_change, 'status'] = 'Accepted'
输出:
id user_email reception_date end_date status
0 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="5a302f3e3b323f343b1a333b74393537743935" rel="noreferrer noopener nofollow">[email protected]</a> 3/30/2022 3/30/2022 Accepted
1 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0472656a726568616a44617077702a676b692a676b" rel="noreferrer noopener nofollow">[email protected]</a> 3/1/2022 3/4/2022 Returned
2 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="117d64706064787f65517c70787d703f727e7c3f727e" rel="noreferrer noopener nofollow">[email protected]</a> 3/7/2022 3/30/2022 Accepted
3 99999 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8beacbeaa5e8e4e6" rel="noreferrer noopener nofollow">[email protected]</a> 3/6/2022 3/28/2022 Accepted
4 42872 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0f7b6a7c7b4f7b6a7c7b216a7c" rel="noreferrer noopener nofollow">[email protected]</a> 3/23/2022 3/25/2022 In Study
5 99999 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="026342602c6771" rel="noreferrer noopener nofollow">[email protected]</a> 3/28/2022 4/5/2022 Accepted
6 78787 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="533213317d3620" rel="noreferrer noopener nofollow">[email protected]</a> 3/15/2022 3/16/2022 In Study
关于python - Pandas - 检查两个日期是否与确定的条件相同(另外两个变量)并执行操作,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74230740/