我有一个数据框,其中有一列名为“cleaned_tweet”。本专栏由带有几个缩写的推文组成,我想用正确的英语单词替换这些缩写。为此,我准备了一本名为“俚语”的字典,其中缩写。是键,所需的英语短语/单词作为值,我想替换所有出现的这些缩写。以及它们在字典中的值。我在 stackoverflow 上寻找了其他几个解决方案,但似乎都不起作用。这是我尝试过的。我正在使用嵌套 for 循环,我相信我已经非常接近解决方案,但我做错了一些事情,我似乎无法弄清楚。
这是嵌套循环:
for i in range(len(train_test_set)):
for j in slangs:
train_test_set['cleaned_tweet'][i] = train_test_set['cleaned_tweet'][i].replace(j, slangs[j])
当我执行此代码并打印 print(train_test_set['cleaned_tweet][0])
时,我得到了如下意外输出:
"#mopanthank whyour | hi | years oldwhyour | hi | years oldhesitationospecial editekissas insekissperience wall hacken whyour | hi | years oldunited statesing a hallwhyour | hi | years olducinogenic drwhyour | hi | years olduglwhyour | hi | years oldung ladye rainbowhwhy | would whyour | hi | years olduohesitationents | rapper from atalk later | ekissperience wall hacken whyour | hi | years oldunited statesing a hallwhyour | hi | years olducinogenic drwhyour | hi | years olduglwhyour | hi | years oldung ladye rainbowhwhy | would whyour | hi | years olduoue loversatileionwhyes | yeah | yes | your | hi | years oldu | team leaderantaonwhysomethingop it | somethingwhyour | hi | years oldupid idiotake careal edwhyour | hi | years olducatekissas insekissperience wall hacken whyour | hi | years oldunited statesing a hallwhyour | hi | years olducinogenic drwhyour | hi | years olduglwhyour | hi | years oldung ladye..."
似乎有许多不需要的值被附加到单元格中。 输出尺寸非常大,所以我无法在这里全部复制。这是执行代码之前我的数据集和字典的结构:
有人可以告诉我我做错了什么吗?
最佳答案
您可以尝试将字典与map()函数一起使用。像这样的事情:
slangs = {'abbr1': 'word1', .........}
train_test_set['cleaned_tweet'] = train_test_set['cleaned_tweet'].map(slangs)
如果同一个单词有多个缩写,您可以尝试使用单词作为键、各个缩写词的列表作为值来定义字典。然后,您可以交换键和值并遵循相同的方法。像这样的事情:
# define the dictionary with the words as the keys and the lists of the respective abbreviations as the values
slangs = {'word1': ['abbr11', 'abbr12', ....], 'word2': ['abbr21', 'abbr22',..]}
#swap keys in slangs: http://stackoverflow.com/a/31674731/2901002
d = {k: oldk for oldk, oldv in slangs.items() for k in oldv}
train_test_set['cleaned_tweet'] = train_test_set['cleaned_tweet'].map(slangs)
关于python - 使用字典替换数据框中的字符串值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58449192/