检查一个数据框中的内容是否也在另一个数据框中。
原始数据框有 2 列,ID 及其对应的 Fruits。还有另一个不同大小的数据框(行数和列数)
在原始数据框中,如果ID与ID_1匹配,并且ID对应的Fruit要么在ID_1对应的Content中,要么在Content_1中,则创建一个新列来指示它。 (想要的输出在这个问题的末尾)
我尝试合并两个数据框以进行进一步操作。这是到目前为止我所拥有的:
import pandas as pd
data = {'ID': ["4589", "14805", "23591", "47089", "56251", "85964", "235225", "322624", "342225", "380689", "480562", "5623", "85624", "866278"],
'Fruit' : ["Avocado", "Blackberry", "Black Sapote", "Fingered Citron", "Crab Apples", "Custard Apple", "Chico Fruit", "Coconut", "Damson", "Elderberry", "Goji Berry", "Grape", "Guava", "Huckleberry"]
}
data_1 = {'ID_1': ["488", "14805", "23591", "470995", "56251", "85964", "5268", "322624", "342225", "380689", "480562", "5623"],
'Content' : ["Kalo Beruin", "this is Blackberry", "Khara Beruin", "Khato Dosh", "Lapha", "Loha Sura", "Matichak", "Miniket Rice", "Mou Beruin", "Moulata", "oh Goji Berry", "purple Grape"],
'Content_1' : ["Jook-sing noodles", "Kaomianjin", "Lai fun", "Lamian", "Liangpi", "who wants Custard Apple", "Misua", "nana Coconut", "Damson", "Paomo", "Ramen", "Rice vermicelli"]
}
df = pd.DataFrame(data)
df = df[['ID', 'Fruit']]
df_1 = pd.DataFrame(data_1)
df_1 = df_1[['ID_1', 'Content', 'Content_1']]
result = df.merge(df_1, left_on = 'ID', right_on = 'ID_1', how = 'outer')
for index, row in result.iterrows():
if row["ID"] == row["ID_1"] and row["Fruit"] in row["Content"] or row["Fruit"] in row["Content_1"]:
print row["ID"] + row["Fruit"]
它给了我TypeError:“float”类型的参数不可迭代
(我使用的Pandas版本是v.0.20.3。)
我怎样才能实现它?谢谢。
最佳答案
在某些情况下,row["Content"]
和 row["Content_1"]
的内容为 NaN
。 NaN
是一个 float
,而且它也是不可迭代的 - 这就是您收到错误的原因。
您可以使用try
/ except
来捕获这些:
for index, row in result.iterrows():
try:
if row["ID"] == row["ID_1"] and row["Fruit"] in row["Content"] or row["Fruit"] in row["Content_1"]:
print( str(row["ID"]) + row["Fruit"])
except TypeError as e:
print(e, "for:")
print(row)
我认为你的合并工作得很好。要获取您指定的输出,只需添加一个 Matched
列来检查 NaN
值:
result = df.merge(df_1, left_on = 'ID', right_on = 'ID_1', how = 'outer')
result["Matched"] = np.where(result.isnull().any(axis=1), "N", "Y")
result
ID Fruit ID_1 Content \
0 4589 Avocado NaN NaN
1 14805 Blackberry 14805 this is Blackberry
2 23591 Black Sapote 23591 Khara Beruin
3 47089 Fingered Citron NaN NaN
4 56251 Crab Apples 56251 Lapha
5 85964 Custard Apple 85964 Loha Sura
Content_1 Matched
0 NaN N
1 Kaomianjin Y
2 Lai fun Y
3 NaN N
4 Liangpi Y
5 who wants Custard Apple Y
关于Python,Pandas 匹配并查找两个数据框中的内容,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51551475/