python - 当给定数据框中的数组元素时,如何在同一行上找到另一个元素

标签 python pandas dataframe

我有一个 pandas dataframe 数据集,它包含 1000 行, 数据集的一部分如下所示:

{
"keywords": 
[
"joyride",
"motorway",
"140km/h",
"8-yr-old",
"Germany",
"child",
"police",
"Dortmund",
"lay-by",
"Wednesday",
"parents",
"warning",
"eight-year-old",
"mother",
"automatic",
"Golf",
"Soest",
"Volkswagen",
"high",
"The alarm",
"A44",
"go-karts",
"Volkswagen Golf",
"Berlin",
"IANS",
"triangle",
"bumper cars"
],

"hashtags": 
[
"#ControlledaccessHighway",
"#Germany",
"#RestArea",
"#Dortmund",
"#Berlin",
"#IANS",
"#IndoAsianNewsService",
"#A44Road",
"#Triangle",
"#KartRacing",
"#BumperCars",
"#VolkswagenGolf",
"#SoestGermany",
"#TheAlarm"
],

"label": "200"}
,

{"keywords": 
[
"Eight-year-old",
"140kph",
"parent",
"mother",
"police",
"lay-by",
"child",
"property",
"family",
"cars",
"autobahn",
"warning",
"night",
"late",
"87mph",
"eight-year-old",
"motorway",
"joyride",
"German",
"minutes",
"dash",
"Soest",
"journalism",
"private property",
"Facebook",
"A44",
"go-karts",
"Dortmund",
"VW Golf",
"free-thinking",
"Germany",
"triangle",
"bumper cars"
],

"hashtags": 
[
"#A44Road",
"#Autobahn",
"#RestArea",
"#Germany",
"#ControlledaccessHighway",
"#VolkswagenGolf",
"#SoestGermany",
"#Dortmund",
"#Triangle",
"#Facebook",
"#PrivateProperty",
"#BumperCars",
"#KartRacing",
"#Freethought",
"#Journalism"
], 

"label": "200"
}
}

给定一个名为“temp”的数组变量,它与“hashtags”列中的元素之一相同,我尝试了几种方法来定位它的“label”元素,例如:

label = df.loc[df['hashtags'] == temp, "label" ].iloc[0]
print(label)

它总是返回错误:

ValueError: Arrays were different lengths: 1000 vs 21

但是当我使用“标签”(这是一个数字)来定位其他列时,它不会返回错误。如果我想继续使用“hashtag”作为输入,如何解决这个问题?

最佳答案

IIUC 和你的数据框看起来像

df 
                                        keywords  ... label
0  [joyride, motorway, 140km/h, 8-yr-old, Germany...  ...   200
1  [Eight-year-old, 140kph, parent, mother, polic...  ...   200

示例:

temp = '#Berlin'

您可以接收要查找 temp 的索引

df.hashtags.apply(lambda v: temp in v)

# 0     True
# 1    False
# Name: hashtags, dtype: bool

因此相应的标签是

df.label.loc[df.hashtags.apply(lambda v: temp in v)]

# 0    200
# Name: label, dtype: object

或者对于几个临时值:

for temp in ['#Berlin', '#BumperCars', '#Journalism']:
    print(temp, '\n', df.label.loc[df.hashtags.apply(lambda v: temp in v)], '\n')

#Berlin 
# 0    200
# Name: label, dtype: object 

#BumperCars 
# 0    200
# 1    200
# Name: label, dtype: object 

#Journalism 
# 1    200
# Name: label, dtype: object 

关于python - 当给定数据框中的数组元素时,如何在同一行上找到另一个元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58371590/

相关文章:

python - Py4JException : Constructor org. apache.spark.sql.SparkSession([class org.apache.spark.SparkContext, class java.util.HashMap]) 不存在

python - 如何通过 PyMySQL 获取更新语句中受影响的行?

pandas - 如何从 Pandas 中的两个数据帧做条件映射

r - 匹配不同数据框中列之间的值

python - 找到列表中的三个单词在给定文档中同时出现的频率

python - 如何同时将函数与 if 语句应用于多个数据帧列

python - 将 Architect 与 Django 抽象模型类一起使用

python - Pandas:什么是 NDFrame 对象(以及什么是非 NDFrame 对象)

python - 根据 Pandas python 中的两个条件选择数据框的行

python - 无法为在 Cpanel 上部署 Django 的应用程序生成进程