python - 将数据框中的列与单个列表中的许多项目进行比较，并挑选出公共(public)元素

我对 python 比较陌生，在使用列表方面确实遇到了麻烦。

我有一个数据框(df1)，其中有一列“ Actor ”，字符串中包含许多 Actor ，并且我有一个单独的数据框(df2)，其中列出了已获奖的 Actor 。

我想在 df1 中添加一列来指示 Actor 是否获奖，例如 1=获奖，0=没有获奖。

我正在尝试使用 for 循环，但它没有按照我想要的方式迭代。

在我的示例中，只有“Betty”获得了奖项，因此“actors_with_awards”列应在第一行显示 0，在第二行显示 1，但结果两行均为 1。

我怀疑这是因为它正在查看整个字符串，例如列表中的“Alexander，Ann”与列表中的“Alexander”或“Ann”，我认为拆分刺痛可以解决这个问题(也许我做错了那一步？)所以我不知道如何解决这个问题。

我的完整代码如下:

import pandas as pd

# Creating sample dataframes
df1 = pd.DataFrame()
df1['cast']=['Alexander, Ann','Bob, Bill, Benedict, Betty']

df2 = pd.DataFrame()
df2['awards']=['Betty']

# Creating lists of actors, and Splitting up the string  
actor_split=[]
for x in df1['cast']:
    actor_split.append(x.split(','))
    
# Creating a list of actors who have received an award
award=[]
for x in df2['awards']:
    award.append(x)

# Attempting to create a list of actors in Df1 who have received an award
actors_with_awards = []

for item in actor_split:
    if x in item not in award:
        actors_with_awards.append(0)
    else:
        actors_with_awards.append(1)  
        
df1['actors_with_awards']=actors_with_awards
df1

电流输出Df1

<表类=“s-表”> <标题> 转换 actors_with_awards <正文> 亚历山大，安 1 鲍勃、比尔、本尼迪克特、贝蒂 1

预期输出 Df1

<表类=“s-表”> <标题> 转换 actors_with_awards <正文> 亚历山大，安 0 鲍勃、比尔、本尼迪克特、贝蒂 1

最佳答案

在尝试您的程序时，出现了一些情况。首先是比较“x”，看看它是否包含在奖项数据库中。

for item in actor_split:
    if x in item not in award:
        actors_with_awards.append(0)
    else:
        actors_with_awards.append(1)

这里的问题是 x 包含填充奖项数组时剩下的“Betty”值。它不是每个分割参与者数组的“x”值。另一个问题是，在检查奖项数组中是否存在某个项目时， Actor 姓名的前导和/或尾随空格会导致比较失败。

考虑到这一点，我对您的代码进行了一些调整，以解决这些情况，如下代码片段所示。

import pandas as pd

# Creating sample dataframes
df1 = pd.DataFrame()
df1['cast']=['Alexander, Ann','Bob, Bill, Benedict, Betty']

df2 = pd.DataFrame()
df2['awards']=['Betty']

# Creating lists of actors, and Splitting up the string  
actor_split=[]
for x in df1['cast']:
    actor_split.append(x.split(','))
    
# Creating a list of actors who have received an award
award=[]
for x in df2['awards']:
    award.append(x.strip()) # Make sure no leading or trailing spaces exist for subsequent test

# Attempting to create a list of actors in Df1 who have received an award
actors_with_awards = []

for item in actor_split:
    y = 0
    for x in item:                    # Reworked this so that "x" is associated with the selected actor set
        if x.strip() not in award:    # Again, make sure no leading or trailing spaces are in the comparison
            y = 0
        else:
            y = 1
    actors_with_awards.append(y)  
        
df1['actors_with_awards']=actors_with_awards
print(df1)  # Changed this so as to print out the data to a terminal

为了确保前导或尾随空格不会导致比较或列表检查失败，我在“.strip()”函数中添加了仅需要存储名称值而仅存储名称值的位置。其次，为了将正确的名称值放入变量“x”中，添加了一个附加的 for 循环以及要填充正确的“0”或“1”值的工作变量。添加这些调整会在终端上产生以下原始数据输出。

                    cast  actors_with_awards
0              Alexander, Ann                   0
1  Bob, Bill, Benedict, Betty                   1

您可能想尝试一下。请注意，这可能只是解决此问题的一种方法。

关于python - 将数据框中的列与单个列表中的许多项目进行比较，并挑选出公共(public)元素，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/72860574/

python - 将数据框中的列与单个列表中的许多项目进行比较，并挑选出公共(public)元素

上一篇：Python 本地最小值/最大值，而 bin 没有改变

下一篇：typescript - Nextjs 构建失败，错误为 'workerError'