我有一个看起来像这样的 pandas 数据框:
hotel_id date length_of_stay clicks
A 2019-01-01 3 7
B 2019-01-06 2 11
C 2019-01-03 1 4
我希望结果是:
hotel_id date clicks
A 2019-01-01 7
A 2019-01-02 7
A 2019-01-03 7
B 2019-01-06 11
B 2019-01-07 11
C 2019-01-03 4
因此,我们可以看到有人入住该酒店每晚我们获得了多少点击...
我想不出一个优雅的方式来做到这一点..有人可以帮忙吗?
最佳答案
使用numpy.repeat()
:
m= pd.DataFrame(np.repeat(df.values,df.length_of_stay,axis=0),columns=df.columns)
m['date']=m.groupby('hotel_id')['date'].transform(lambda x: pd.date_range(start=x.iloc[0], periods=len(x)))
或:
newdf = pd.DataFrame(np.repeat(df.values,df.length_of_stay,axis=0),columns=df.columns)
newdf['date'] = [i for day, n in zip(df.date,df.length_of_stay)
for i in pd.date_range(start=day, periods=n)]
完整示例:
import pandas as pd
import numpy as np
data = '''\
hotel_id date length_of_stay clicks
A 2019-01-01 3 7
B 2019-01-06 2 11
C 2019-01-03 1 4'''
fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, parse_dates=['date'], sep='\s+')
m= pd.DataFrame(np.repeat(df.values,df.length_of_stay,axis=0),columns=df.columns)
m['date']=m.groupby('hotel_id')['date'].transform(lambda x: pd.date_range(start=x.iloc[0], periods=len(x)))
print(m)
hotel_id date length_of_stay clicks
0 A 2019-01-01 3 7
1 A 2019-01-02 3 7
2 A 2019-01-03 3 7
3 B 2019-01-06 2 11
4 B 2019-01-07 2 11
5 C 2019-01-03 1 4
关于python - 根据条件复制数据框行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54790780/