Python不写标题

标签 python python-2.7 csv python-3.x

我正在根据第二列“ParentID”使用这个 python 程序分割一个非常大的 csv。由于文件量大和每个进程的限制,我最近更新为“a”而不是“w”。这样做时,我的标题每次都会写入,而不仅仅是每个文件中的第一次。

我更新为添加“write_header= true”和“write_header=false”,但现在它只在第一个文件上写入 header ...我有超过 29,000 个文件

#!/usr/bin/env python3
import binascii
import csv
import os.path
import sys
from tkinter.filedialog import askopenfilename, askdirectory
from tkinter.simpledialog import askinteger

def split_csv_file(f, dst_dir, keyfunc):
    csv_reader = csv.reader(f)
    header = next(csv_reader)
    write_header = True
    csv_writers = {}
    for row in csv_reader:
        k = keyfunc(row)
        with open(os.path.join(dst_dir, k), mode='a', newline='') as output:
            writer = csv.writer(output)
            while write_header:
                writer.writerow(header)
                write_header = False
            csv_writers[k] = writer
            csv_writers[k].writerow(row[0:1])

def get_args_from_cli():
    input_filename = sys.argv[1]
    column = int(sys.argv[2])
    dst_dir = sys.argv[3]
    return (input_filename, column, dst_dir)

def get_args_from_gui():
    input_filename = askopenfilename(
        filetypes=(('TXT','.txt'),('CSV', '.csv')),
        title='Select CSV Input File')
    column = askinteger('Choose Table Column', 'Table column')
    dst_dir = askdirectory(title='Select Destination Directory')
    return (input_filename, column, dst_dir)

if __name__ == '__main__':
    if len(sys.argv) == 1:
        input_filename, column, dst_dir = get_args_from_gui()
    elif len(sys.argv) == 4:
        input_filename, column, dst_dir = get_args_from_cli()
    else:
         raise Exception("Invalid number of arguments")
    with open(input_filename, mode='r', newline='') as f:
        split_csv_file(f, dst_dir, lambda r: r[column-1]+'.txt')
        # if the column has funky values resulting in invalid filenames
        # replace the line from above with:
        # split_csv_file(f, dst_dir, lambda r: binascii.b2a_hex(r[column-1].encode('utf-8')).decode('utf-8')+'.csv')

这是被分割的文件的示例..

<option value=''>Choose SubGroup</option>, ParentID
<option value='/1990-Accord-DX-Glass-s/37918.htm'>Glass</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Glass-s/37919.htm'>Glass</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Reveal-Moldings-s/69090.htm'>Reveal Moldings</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Reveal-Moldings-s/69091.htm'>Reveal Moldings</option>,Accord1990DX422F22A1BodyHardwareBackGlass
<option value='/1990-Accord-DX-Center-s/10331.htm'>Center</option>,Accord1990DX422F22A1BodyHardwareConsole
<option value='/1990-Accord-DX-Cowl-s/16006.htm'>Cowl</option>,Accord1990DX422F22A1BodyHardwareCowl
<option value='/1990-Accord-DX-Exterior-Trim-s/26889.htm'>Exterior Trim</option>,Accord1990DX422F22A1BodyHardwareFender
<option value='/1990-Accord-DX-Exterior-Trim-s/26890.htm'>Exterior Trim</option>,Accord1990DX422F22A1BodyHardwareFender

如何让 header 在每个输出文件上只写入一次?

最佳答案

第一次写入 header 时,您将 write_header 设置为 false。因此,只有您打开的第一个文件才会获得该 header 。

跟踪哪些文件在集合中设置了 header :

def split_csv_file(f, dst_dir, keyfunc):
    csv_reader = csv.reader(f)
    header = next(csv_reader)
    header_written = set()
    for row in csv_reader:
        k = keyfunc(row)
        with open(os.path.join(dst_dir, k), mode='a', newline='') as output:
            writer = csv.writer(output)
            if k not in header_written:
                writer.writerow(header)
                header_written.add(k)
        writer.writerow(row[0:1])

您可能想要通过跟踪您上次写入文件的时间并关闭那些您没有写入时间最长的文件来研究如何让文件保持更长时间的打开状态。这需要一个自定义类,当您通过按键请求文件时,该类会透明地跟踪文件,这需要做的工作比答案要多。

关于Python不写标题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35639181/

相关文章:

Python 变量 "resetting"

python - 如何快速将csv表导出到python字典?

c++ - 将字符串 vector 连接到 std::ostream(如 boost::join)

javascript - jquery 将可排序列表(用户排序)导出到 csv?

python - 如何将算法转换为python

python - 如何在 anaconda 中升级 scikit-learn 包

python - 如何在Python中使用相同的参数创建相同的实例,但对于不同的参数创建不同的实例

python - 将 PCollection 分配回全局窗口

python-2.7 - 从cmd启动python时如何避免chcp

python - 如何对列表中的每个项目应用函数