python清理文本数据

标签 python text data-science data-cleaning

谁有清理文本数据的技巧?我拥有的数据在列表 (master_list) 中,我正在尝试创建一个循环或函数来删除额外的 [] 符号以及 None ,None 所以基本上 master_list 中的数据只是由 ,

分隔的字符串

非常感谢任何帮助..

master_list = [['the supply fan speed mean is over 90% like the fan isnt building static, mean value recorded is 94.3.', 'the supply fan is running, the VFD speed output mean value is 94.3.'], None, ['the supply fan speed mean is over 90% like the fan isnt building static, mean value recorded is 94.2.', 'the supply fan is running, the VFD speed output mean value is 94.2.'], None, ['the supply fan speed mean is over 90% like the fan isnt building static, mean value recorded is 94.1.', 'the supply fan is running, the VFD speed output mean value is 94.1.'], None, ['the supply fan speed mean is over 90% like the fan isnt building static, mean value recorded is 94.0.', 'the supply fan is running, the VFD speed output mean value is 94.0.'], None, ['the supply fan speed mean is over 90% like the fan isnt building static, mean value recorded is 93.9.', 'the supply fan is running, the VFD speed output mean value is 93.9.'], None]

最佳答案

您想要展平您的列表,因此 [[1, 2], [3, 4]] 变为 [1, 2, 3, 4]。一种方法是通过列表理解:[x for sublist in my_list for x in sublist]

但是,您的数据还包含 None 而不是列表,因此需要将其过滤掉。此外,子列表还可以包含也需要删除的 None。所以 [[1, 2], None, [None, 3, ""]] 变成了 [1, 2, 3]

要完成第一部分(在需要列表时删除 None 值),我们可以使用 or 运算符有效地将这些 Nones 替换为空列表:子列表或 []。我们无法遍历 None,但可以遍历空列表。

要执行第二部分(删除列表中包含的 None 值,以及其他“虚假”值,例如空字符串或零),我们在列表理解的末尾添加条件: [...如果 x]

所以最后的结果是:

>>> [x for sublist in master_list for x in sublist or [] if x]
['the supply fan speed mean is over 90% like the fan isnt building static, mean value recorded is 94.3.',
 'the supply fan is running, the VFD speed output mean value is 94.3.',
 'the supply fan speed mean is over 90% like the fan isnt building static, mean value recorded is 94.2.',
 'the supply fan is running, the VFD speed output mean value is 94.2.',
 'the supply fan speed mean is over 90% like the fan isnt building static, mean value recorded is 94.1.',
 'the supply fan is running, the VFD speed output mean value is 94.1.',
 'the supply fan speed mean is over 90% like the fan isnt building static, mean value recorded is 94.0.',
 'the supply fan is running, the VFD speed output mean value is 94.0.',
 'the supply fan speed mean is over 90% like the fan isnt building static, mean value recorded is 93.9.',
 'the supply fan is running, the VFD speed output mean value is 93.9.']

关于python清理文本数据,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57810946/

相关文章:

python - 是否可以嵌套所有功能?

python - 在 django settings.py 中包含应用程序

ios - 标签文本和导航栏标题更改不会在 iOS 模拟器中更新。

c# - 正则表达式匹配一段文本中的一个部分

python - 如何从 PySpark 的 SQLite 数据库文件加载表?

python - Heroku Flask-SocketIO 错误 __init__() 得到意外的关键字参数 'server_hostname

regex - 如何使用 sed 将一个文件中的模式替换为另一个文件的内容?

python - 使用 Pandas 和 Scatter_Matrix 将不会显示

python - Pisarze - 波兰信息学奥林匹克竞赛的数据分析任务

python - 在 Django 中使用元类