python脚本按行连接值并删除相同的值

我使用的是 python 2.7，并且我有一个如下所示的文本文件:

id     value
---    ----
1      x
2      a
1      z
1      y
2      b

我正在尝试获得如下所示的输出:

id     value
---    ----
1      x,z,y
2      a,b

非常感谢!

最佳答案

最简单的解决方案是使用 collections.defaultdict和 collections.OrderedDict 。如果您不关心顺序，也可以使用 set 代替 OrderedDict。

from collections import defaultdict, OrderedDict

# Keeps all unique values for each id
dd = defaultdict(OrderedDict)
# Keeps the unique ids in order of appearance
ids = OrderedDict()

with open(yourfilename) as f:
    f = iter(f)
    # skip first two lines
    next(f), next(f)  
    for line in f:
        id_, value = list(filter(bool, line.split()))  # split at whitespace and remove empty ones
        dd[id_][value] = None  # dicts need a value, but here it doesn't matter which one...
        ids[id_] = None

print('id     value')
print('---    ----')
for id_ in ids:
    print('{}      {}'.format(id_, ','.join(dd[id_])))

结果:

id     value
---    ----
1      x,z,y
2      a,b

如果您想将其写入另一个文件，只需将我打印的内容与 \n 连接起来，然后将其写入到文件中。

关于python脚本按行连接值并删除相同的值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/45822922/

上一篇：python - 如何从 sqlite create_function 获取错误类型？

下一篇：python - 使用 BeautifulSoup 解析一个父级中的多个 href

相关文章：

java - 如何使用jython jar将python模块添加到java中

c# - 创建文件后拖放到桌面？ (C#)

python - matplotlib 将箱线图和直方图与图例结合起来

python - 即使我在环境变量上添加了 pg_config 的路径，仍然出现错误 : pg_config executable not found.

python - 在 Python 上使用多进程与 API 请求和多个 for 循环

python-2.7 - EC2 python : can't open file '–m' : [Errno 2] No such file or directory

python - 如何使用类范围而不是类方法范围初始化/定义子类？

c - 从 C 语言文件中读取、处理和输出数字

android - 如何打开存储在 res/raw 或 assets 文件夹中的 pdf？

python - Django 错误 : needs to have a value for field "..." before this many-to-many relationship can be used