给定以下数据框:
df = pd.DataFrame({'term' : ['analys','applic','architectur','assess','item','methodolog','research','rs','studi','suggest','test','tool','viewer','work'],
'newValue' : [0.810419, 0.631963 ,0.687348, 0.810554, 0.725366, 0.742715, 0.799152, 0.599030, 0.652112, 0.683228, 0.711307, 0.625563, 0.604190, 0.724763]})
df = df.set_index('term')
print(df)
newValue
term
analys 0.810419
applic 0.631963
architectur 0.687348
assess 0.810554
item 0.725366
methodolog 0.742715
research 0.799152
rs 0.599030
studi 0.652112
suggest 0.683228
test 0.711307
tool 0.625563
viewer 0.604190
work 0.724763
我正在尝试使用数据帧中的值更新每个“^”后面的该字符串中的值。
(analysi analys^0.8046919107437134 studi^0.6034331321716309 framework methodolog^0.7360332608222961 architectur^0.6806665658950806)^0.0625 (recommend suggest^0.6603200435638428 rs^0.5923488140106201)^0.125 (system tool^0.6207902431488037 applic^0.610009491443634)^0.25 (evalu assess^0.7828741073608398 test^0.6444937586784363)^0.5
此外,这应该针对相应的单词来完成,以便我得到:
(analysi analys^0.810419 studi^0.652112 framework methodolog^0.742715 architectur^0.687348)^0.0625 (recommend suggest^0.683228 rs^0.599030)^0.125 (system tool^0.625563 applic^0.631963)^0.25 (evalu assess^0.810554 test^0.711307)^0.5
预先感谢您的帮助!
最佳答案
我能想到的最好方法是分多个阶段进行。
首先,获取旧字符串并提取所有要替换的值。这可以通过正则表达式来完成。
old_string = "(analysi analys^0.8046919107437134 studi^0.6034331321716309 framework methodolog^0.7360332608222961 architectur^0.6806665658950806)^0.0625 (recommend suggest^0.6603200435638428 rs^0.5923488140106201)^0.125 (system tool^0.6207902431488037 applic^0.610009491443634)^0.25 (evalu assess^0.7828741073608398 test^0.6444937586784363)^0.5"
pattern = re.compile(r"(\w+\^(0|[1-9]\d*)(\.\d+)?)")
# pattern.findall(old_string) returns a list of tuples,
# so we need to keep just the outer capturing group for each match.
matches = [m[0] for m in pattern.findall(old_string)]
print("Matches:", matches)
在下一部分中,我们制作两个字典。一种是要替换为整个值的值的前缀(单词部分,^
之前)的字典。我们用它来创建第二个字典,从要替换的值到新值(来自数据帧)。
prefix_dict = {}
for m in matches:
pre, post = m.split('^')
prefix_dict[pre] = m
print("Prefixes:", prefix_dict)
matches_dict = {}
for i, row in df.iterrows(): # df is the dataframe from the question
if i in prefix_dict:
old_val = prefix_dict[i]
new_val = "%s^%s" % (i, row.newValue)
matches_dict[old_val] = new_val
print("Matches dict:", matches_dict)
完成后,我们可以循环旧值>新值字典中的项目并替换输入字符串中的所有旧值。
new_string = old_string
for key, val in matches_dict.items():
new_string = new_string.replace(key, val)
print("New string:", new_string)
关于python - 根据 pandas 数据帧中的值更新字符串中的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55121749/