我有一个 pandas dataframe 数据集,它包含 1000 行, 数据集的一部分如下所示:
{
"keywords":
[
"joyride",
"motorway",
"140km/h",
"8-yr-old",
"Germany",
"child",
"police",
"Dortmund",
"lay-by",
"Wednesday",
"parents",
"warning",
"eight-year-old",
"mother",
"automatic",
"Golf",
"Soest",
"Volkswagen",
"high",
"The alarm",
"A44",
"go-karts",
"Volkswagen Golf",
"Berlin",
"IANS",
"triangle",
"bumper cars"
],
"hashtags":
[
"#ControlledaccessHighway",
"#Germany",
"#RestArea",
"#Dortmund",
"#Berlin",
"#IANS",
"#IndoAsianNewsService",
"#A44Road",
"#Triangle",
"#KartRacing",
"#BumperCars",
"#VolkswagenGolf",
"#SoestGermany",
"#TheAlarm"
],
"label": "200"}
,
{"keywords":
[
"Eight-year-old",
"140kph",
"parent",
"mother",
"police",
"lay-by",
"child",
"property",
"family",
"cars",
"autobahn",
"warning",
"night",
"late",
"87mph",
"eight-year-old",
"motorway",
"joyride",
"German",
"minutes",
"dash",
"Soest",
"journalism",
"private property",
"Facebook",
"A44",
"go-karts",
"Dortmund",
"VW Golf",
"free-thinking",
"Germany",
"triangle",
"bumper cars"
],
"hashtags":
[
"#A44Road",
"#Autobahn",
"#RestArea",
"#Germany",
"#ControlledaccessHighway",
"#VolkswagenGolf",
"#SoestGermany",
"#Dortmund",
"#Triangle",
"#Facebook",
"#PrivateProperty",
"#BumperCars",
"#KartRacing",
"#Freethought",
"#Journalism"
],
"label": "200"
}
}
给定一个名为“temp”的数组变量,它与“hashtags”列中的元素之一相同,我尝试了几种方法来定位它的“label”元素,例如:
label = df.loc[df['hashtags'] == temp, "label" ].iloc[0]
print(label)
它总是返回错误:
ValueError: Arrays were different lengths: 1000 vs 21
但是当我使用“标签”(这是一个数字)来定位其他列时,它不会返回错误。如果我想继续使用“hashtag”作为输入,如何解决这个问题?
最佳答案
IIUC 和你的数据框看起来像
df
keywords ... label
0 [joyride, motorway, 140km/h, 8-yr-old, Germany... ... 200
1 [Eight-year-old, 140kph, parent, mother, polic... ... 200
示例:
temp = '#Berlin'
您可以接收要查找 temp 的索引
df.hashtags.apply(lambda v: temp in v)
# 0 True
# 1 False
# Name: hashtags, dtype: bool
因此相应的标签是
df.label.loc[df.hashtags.apply(lambda v: temp in v)]
# 0 200
# Name: label, dtype: object
或者对于几个临时值:
for temp in ['#Berlin', '#BumperCars', '#Journalism']:
print(temp, '\n', df.label.loc[df.hashtags.apply(lambda v: temp in v)], '\n')
#Berlin
# 0 200
# Name: label, dtype: object
#BumperCars
# 0 200
# 1 200
# Name: label, dtype: object
#Journalism
# 1 200
# Name: label, dtype: object
关于python - 当给定数据框中的数组元素时,如何在同一行上找到另一个元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58371590/