Python,Pandas 匹配并查找两个数据框中的内容

标签 python pandas dataframe

检查一个数据框中的内容是否也在另一个数据框中。

原始数据框有 2 列,ID 及其对应的 Fruits。还有另一个不同大小的数据框(行数和列数)

在原始数据框中,如果ID与ID_1匹配,并且ID对应的Fruit要么在ID_1对应的Content中,要么在Content_1中,则创建一个新列来指示它。 (想要的输出在这个问题的末尾)

我尝试合并两个数据框以进行进一步操作。这是到目前为止我所拥有的:

import pandas as pd

data = {'ID': ["4589", "14805", "23591", "47089", "56251", "85964", "235225", "322624", "342225", "380689", "480562", "5623", "85624", "866278"], 
'Fruit' : ["Avocado", "Blackberry", "Black Sapote", "Fingered Citron", "Crab Apples", "Custard Apple", "Chico Fruit", "Coconut", "Damson", "Elderberry", "Goji Berry", "Grape", "Guava", "Huckleberry"]
}

data_1 = {'ID_1': ["488", "14805", "23591", "470995", "56251", "85964", "5268", "322624", "342225", "380689", "480562", "5623"], 
'Content' : ["Kalo Beruin", "this is Blackberry", "Khara Beruin", "Khato Dosh", "Lapha", "Loha Sura", "Matichak", "Miniket Rice", "Mou Beruin", "Moulata", "oh Goji Berry", "purple Grape"],
'Content_1' : ["Jook-sing noodles", "Kaomianjin", "Lai fun", "Lamian", "Liangpi", "who wants Custard Apple", "Misua", "nana Coconut", "Damson", "Paomo", "Ramen", "Rice vermicelli"]
}

df = pd.DataFrame(data)
df = df[['ID', 'Fruit']]

df_1 = pd.DataFrame(data_1)
df_1 = df_1[['ID_1', 'Content', 'Content_1']]

result = df.merge(df_1, left_on = 'ID', right_on = 'ID_1', how = 'outer')

for index, row in result.iterrows():
    if row["ID"] == row["ID_1"] and row["Fruit"] in row["Content"] or row["Fruit"] in row["Content_1"]:
        print row["ID"] + row["Fruit"]

它给了我TypeError:“float”类型的参数不可迭代

(我使用的Pandas版本是v.0.20.3。)

我怎样才能实现它?谢谢。

enter image description here

最佳答案

在某些情况下,row["Content"]row["Content_1"] 的内容为 NaNNaN 是一个 float,而且它也是不可迭代的 - 这就是您收到错误的原因。

您可以使用try/ except来捕获这些:

for index, row in result.iterrows():
    try:
        if row["ID"] == row["ID_1"] and row["Fruit"] in row["Content"] or row["Fruit"] in row["Content_1"]:
            print( str(row["ID"]) + row["Fruit"])
    except TypeError as e:
        print(e, "for:")
        print(row)

我认为你的合并工作得很好。要获取您指定的输出,只需添加一个 Matched 列来检查 NaN 值:

result = df.merge(df_1, left_on = 'ID', right_on = 'ID_1', how = 'outer')
result["Matched"] = np.where(result.isnull().any(axis=1), "N", "Y")

result

        ID            Fruit    ID_1             Content  \
0     4589          Avocado     NaN                 NaN   
1    14805       Blackberry   14805  this is Blackberry   
2    23591     Black Sapote   23591        Khara Beruin   
3    47089  Fingered Citron     NaN                 NaN   
4    56251      Crab Apples   56251               Lapha   
5    85964    Custard Apple   85964           Loha Sura   

                  Content_1 Matched  
0                       NaN       N  
1                Kaomianjin       Y  
2                   Lai fun       Y  
3                       NaN       N  
4                   Liangpi       Y  
5   who wants Custard Apple       Y  

关于Python,Pandas 匹配并查找两个数据框中的内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51551475/

相关文章:

python - 获取不同格式的 Python 元组

python - 在包含一列列表的数据帧组行中

r - 在多个列上嵌套 if else 语句

python - Pandas 系列的不同类型展示

python - 在 Ruby 中检查变量类型

python - 在 Pandas 中将列转换为字符串

python - 如何指定 Pandas 数据框的行数?

r 使用带有 NA 的行来 reshape 数据以识别新列

python - 比较两个表中的两个值并追加

Python pandas检查数据框是否不为空