我有一个数据框 df1
Questions Purpose
what is scientific name of <input> scientific name
what is english name of <input> english name
我有如下 2 个列表:
name1 = ['salt','water','sugar']
name2 = ['sodium chloride','dihydrogen monoxide','sucrose']
我想通过替换 <input>
来创建一个新的数据框按列表中的值取决于用途。
如果目的是英文名称替换<input>
按 name2
中的值
否则替换 <input>
通过 name1
.
预期输出数据帧:
Questions Purpose
what is scientific name of salt scientific name
what is scientific name of water scientific name
what is scientific name of sugar scientific name
what is english name of sodium chloride english name
what is english name of dihydrogen monoxide english name
what is english name of sucrose english name
我的努力
questions = []
purposes = []
for i, row in df1.iterrows():
if row['Purpose'] == 'scientific name':
for name in name1:
ques = row['Questions'].replace('<input>', name)
questions.append(ques)
purposes.append(row['Purpose'])
else:
for name in name2:
ques = row['Questions'].replace('<input>', name)
questions.append(ques)
purposes.append(row['Purpose'])
df = pd.DataFrame({'Questions':questions, 'Purpose':purposes})
上面的代码产生了预期的输出。但它太慢了,因为我对原版有很多疑问dataframe
. (我也有多个目的,但现在,我只坚持 2 个)。
我正在寻找一个更有效的解决方案,可以摆脱 for
循环。
最佳答案
一种方法是遍历 Questions
用列表理解和替换 <input>
与相应的name
.为了重复每个Question
namesx
中字段的次数你可以使用 itertools.cycle
:
from itertools import cycle
names = [name1, name2]
new = [[i.replace('<input>', j), purpose]
for row, purpose, name in zip(df.Questions, df.Purpose, names)
for i,j in zip(cycle([row]), name)]
pd.DataFrame(new, columns=df.columns)
Questions Purpose
0 what is scientific name of salt scientific name
1 what is scientific name of water scientific name
2 what is scientific name of sugar scientific name
3 what is english name of sodium chloride english name
4 what is english name of dihydrogen monoxide english name
5 what is english name of sucrose english name
关于python - 如何多次替换 Pandas Column 中的值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54626794/