python - 代码不从字典中删除所需的值

已链接 Removing escaped entities from a String in Python

我的代码正在读取一个大的推文 csv 文件并将其解析为两个字典(取决于推文的情绪)。然后，我创建了一个新字典并使用 HTML 解析器对所有内容进行了转义，然后使用 translate() 方法从文本中删除所有标点符号。
最后，我试图只保留大于 length = 3 的单词。
这是我的代码:

tweets = []
for (text, sentiment) in pos_tweets.items() + neg_tweets.items():
    text = HTMLParser.HTMLParser().unescape(text.decode('ascii'))
    remove_punctuation_map = dict((ord(char), None) for char in string.punctuation)
    shortenedText = [e.lower() and e.translate(remove_punctuation_map) for e in text.split() if len(e) >= 3 and not e.startswith(('http', '@')) ]
    print shortenedText

然而，我发现虽然我想要的大部分内容都已完成，但我仍然得到长度为二(但不是长度为一)的单词，并且我的字典中有一些空白条目。
例如:

(: !!!!!! - so I wrote something last week
* enough said *
.... Do I need to say it?

产生:

[u'', u'wrote', u'something', u'last', u'week']
[u'enough', u'said']
[u'', u'need', u'even', u'say', u'it']

我的代码有什么问题？如何删除所有长度小于 2 的单词，包括空白条目？

最佳答案

我认为你的问题是，当你测试是否 len(e) >= 3 时，e 仍然包含标点符号，所以“它？”没有被过滤掉。也许分两步完成？清除标点符号，然后过滤大小？

有点像

cleanedText = [e.translate(remove_punctuation_map).lower() for e in text.split() if not e.startswith(('http', '@')) ]
shortenedText = [e for e in cleanedText if len(e) >= 3]

关于python - 代码不从字典中删除所需的值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18148953/

上一篇： python : How can two GTK widgets interact with each other?

下一篇：python - 有没有办法在整个项目中将代码缩进从制表符切换为空格，并保持 'hg annotate' 功能？

相关文章：

python - 类型错误:__init__() 得到了意外的关键字参数

python - 为什么使用 mask=None 或 mask=0 创建一个屏蔽的 numpy 数组这么慢

string - 通过Startindex和Endindex获取字符串的子字符串

python - 如何在不使用 for 循环的情况下将字典打印为键和计数(如果值是列表)？

python - 使用 ESCAPE ARROW 防止 Android 上的 kivy 应用退出

python - 使用 pandas 对 Excel 列进行排序

从包含 R 中特定字符的字符串向量中删除条目

iphone - 如何将时间值的NSString表示形式转换为包含小时和分钟的两个NSInteger形式？

android - 如何将 map 放入 PagerAdapter？是否可以？

java - 重复插入时映射抛出错误