python - 从字符串列表中提取标记集

我有一个字符串列表，我想将所有标记提取到一组标记中 - 而不是一组列表。我需要将每个 token 混合起来。

我的句子存储为“句子”中的字符串列表

所以如果尝试:

words = set([])
a=set(sentences[1].split())
b=set(sentences[2].split())
a.union(b)

我在一组中得到了 a 和 b 组，如下所示。这就是我正在寻找的内容

{',', '.', '2.252', '35-1/7', '37-year-old', 'B', 'Blood', 'Fred', 'G4', 'Grauman', 'O+', 'P3-5', 'pregnancy', 'product', 'rubella', surface', 'the', 'to', 'type', 'week', 'woman'}

但是通过列表理解

words = set()
[words.union(set(sent.split())) for sent in sentences]

输出是一个集合列表，如下所示

[{'.',  'Care',  'He',  'Intensive',  'Neonatal''}, {'.',  '2.252',  35-1/7',  '37-year-old',  'Fred',  'G4',  'Grauman','}]

有没有办法通过像列表理解这样的紧凑代码行来获得我需要的东西？

====

好吧，我刚刚做了，在列表理解“单词”之后，

a = set()
a.union(*words)

最佳答案

如果您的句子是字符串，您可以将它们连接起来并再次拆分。

set(" ".join(sentences).split())

转['一个短句', '第二个句子'] 进入 {'A', 'second', 'sentence', 'short'}

关于python - 从字符串列表中提取标记集，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/57436133/

上一篇：python - 我收到此错误时间数据 '27:07.5' 与格式 '%H:%M:%S'(匹配)不匹配

下一篇：python - 如何在 python 中缓存 boto3 API 调用？

list - 映射列表时是否在每次迭代后释放内存？

list - 在 Haskell 中删除或添加要列出的项目

python - 在Python中迭代嵌套列表

python - 如果条目匹配，则减少列表列表

c++ - 如何插入STL集？

Python 在保持顺序的同时从列表中删除一些重复项？

python - Pandas :在一系列可用值之前/之后估算给定数量的缺失值

python - Azure机器学习工作室可以访问上传zip文件中的文件

python - 我可以在 Python 中验证 JSON 模式中字符串的内容吗