python - 根据其他列中先前的值填充新列

标签 python pandas

我有一个包含用户、visit_types(预订或搜索)和酒店的数据集。我需要根据该行之前预订的酒店,在新列中填充预订最多的酒店。

例如,

   **user**   **visit_type**   **hotel_code**   **most_booked**
1    user1       search             1                NaN
2    user1       search             2                NaN
3    user1       booking            1                NaN
4    user1       search             8                NaN
5    user1       booking            8                1
6    user2       search             6                NaN
7    user2       booking            6                NaN
8    user2       search             4                NaN
9    user2       booking            4                6
10   user2       booking            6                4
11   user2       booking            4                6

在这个例子中:

用户 1 预订最多的酒店将是第 3 行 hotel = NaN,因为它之前没有预订过酒店,而在第 5 行它将 hotel = 1。

对于 user2,第 7 行将是 hotel = NaN,第 9 行将是 hotel = 6,第 10 行 hotel = 4(因为它是最后预订的,并且只预订了两家酒店),对于最后一行 11,该酒店将为 6,因为它是迄今为止预订最多的酒店(不考虑第 11 行)。

最佳答案

这应该可以实现你想要的:

import pandas as pd
import operator
from collections import defaultdict

d = {      "user":["user1","user1","user1","user1","user1","user2","user2","user2","user2","user2","user2"],
     "visit_type":["search","search","booking","search","booking","search","booking","search","booking","booking","booking"],
     "hotel_code":[1,2,1,8,8,6,6,4,4,6,4]}

df = pd.DataFrame(data=d)
#Setting default value
df['most_booked']='NaN'

for user in df.user.unique():
    #Ignoring searches, only considering bookings
    df_bookings = df.loc[(df["visit_type"] == "booking") & (df['user'] == user)]
    last_booked = ""
    booking_counts = defaultdict(int)

    for i, entry in df_bookings.iterrows():
        #Skipping first booking
        if last_booked != "":
            highest = max(booking_counts.values())
            #Prefers last booked if it equals max
            if booking_counts[last_booked] == highest:
                max_booked = last_booked
            #Otherwise chooses max
            else:
                max_booked = max(booking_counts.items(), key=operator.itemgetter(1))[0]
            df.loc[i, 'most_booked'] = max_booked

        #Update number of bookings in dictionary
        current_booking = entry["hotel_code"]
        booking_counts[current_booking] += 1
        last_booked = current_booking

print(df)

    hotel_code   user visit_type most_booked
0            1  user1     search         NaN
1            2  user1     search         NaN
2            1  user1    booking         NaN
3            8  user1     search         NaN
4            8  user1    booking           1
5            6  user2     search         NaN
6            6  user2    booking         NaN
7            4  user2     search         NaN
8            4  user2    booking           6
9            6  user2    booking           4
10           4  user2    booking           6

关于python - 根据其他列中先前的值填充新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50795135/

相关文章:

Python Pandas 从 Dataframe 中获取单一值

python - python列中字母的频率-速度优化

python - 替换 Pandas 数据框中部分匹配字符串的列名

java - 在android上运行java内的python脚本

python - 在 Python 中解析日期并检索特异性

python - 在 Python 中使用定界符解析行

Python 字典中的列表

python - 收视率最高的前 5 部电影

python - 替换 pandas 数据框中的值会使内核崩溃

python - Pandas 迭代数据框列