我想将大约 8 个 *.csv 文件合并为一个。
示例文件:
ID, Average
34, 4.5
35, 5.6
36, 3.4
另一个文件可能是:
ID, Max
34, 6
35, 7
36, 4
我需要的输出是:
ID, Average, Max
34, 4.5, 6
35, 5.6, 7
36, 3.4, 4
这只成功了一半......它将所有数据附加到相同的两列中。
import glob, string
outfile = open('<directory>/<fileName>.csv','a')
files = glob.glob(r"<directory>/*.csv")
for y in files:
newfile = open(y,'r+')
data = newfile.read()
newfile.close()
outfile.writerow(y)
如何将数据附加到新列,而不是重复“ID”字段?
最佳答案
你在这里遇到了三个问题。
- 读入每个csv文件
- 在公共(public)领域合并
- 将合并后的数据写入一个新的csv文件
代码
#!/usr/bin/env python
import argparse, csv
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='merge csv files on field', version='%(prog)s 1.0')
parser.add_argument('infile', nargs='+', type=str, help='list of input files')
parser.add_argument('--out', type=str, default='temp.csv', help='name of output file')
args = parser.parse_args()
data = {}
fields = []
for fname in args.infile:
with open(fname, 'rb') as df:
reader = csv.DictReader(df)
for line in reader:
# assuming the field is called ID
if line['ID'] not in data:
data[line['ID']] = line
else:
for k,v in line.iteritems():
if k not in data[line['ID']]:
data[line['ID']][k] = v
for k in line.iterkeys():
if k not in fields:
fields.append(k)
del reader
writer = csv.DictWriter(open(args.out, "wb"), fields, dialect='excel')
# write the header at the top of the file
writer.writeheader()
writer.writerows(data)
del writer
请注意,这将忽略具有相同字段名称的数据。
解析器部分的替代方法是:
#!/usr/bin/env python
import glob, csv
if __name__ == '__main__':
infiles = glob.glob('./*.csv')
out = 'temp.csv'
data = {}
fields = []
for fname in infiles:
df = open(fname, 'rb')
reader = csv.DictReader(df)
for line in reader:
# assuming the field is called ID
if line['ID'] not in data:
data[line['ID']] = line
else:
for k,v in line.iteritems():
if k not in data[line['ID']]:
data[line['ID']][k] = v
for k in line.iterkeys():
if k not in fields:
fields.append(k)
del reader
df.close()
writer = csv.DictWriter(open(out, "wb"), fields, dialect='excel')
# write the header at the top of the file
writer.writeheader()
writer.writerows(data)
del writer
关于python - 使用 python 基于公共(public)字段合并多个 *.csv、*.txt 或 *.ascii 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7519412/