python - 通过python分割一个大的csv文件

标签 python csv

我有一个包含 500 万行的 csv 文件。 我想将文件拆分为用户指定的多个行。

已经开发了以下代码,但执行时间太长。谁能帮我优化代码。

import csv
print "Please delete the previous created files. If any."

filepath = raw_input("Enter the File path: ")

line_count = 0
filenum = 1
try:
    in_file = raw_input("Enter Input File name: ")
    if in_file[-4:] == ".csv":
        split_size = int(raw_input("Enter size: "))
        print "Split Size ---", split_size
        print in_file, " will split into", split_size, "rows per file named as OutPut-file_*.csv (* = 1,2,3 and so on)"
        with open (in_file,'r') as file1:
            row_count = 0
            reader = csv.reader(file1)
            for line in file1:
                #print line
            with open(filepath + "\\OutPut-file_" +str(filenum) + ".csv", "a") as out_file:
                if row_count < split_size:
                    out_file.write(line)
                    row_count = row_count +1
                else:
                    filenum = filenum + 1
                    row_count = 0
            line_count = line_count+1
        print "Total Files Written --", filenum
     else:
        print "Please enter the Name of the file correctly."        
except IOError as e:
   print "Oops..! Please Enter correct file path values", e
except  ValueError:
   print "Oops..! Please Enter correct values"

我也尝试过不使用“with open”

最佳答案

哎呀!当这是一项昂贵的操作时,您不断地重新打开每一行的输出文件...您的代码可能会变成:

    ...
    with open (in_file,'r') as file1:
        row_count = 0
        #reader = csv.reader(file1)   # unused here
        out_file = open(filepath + "\\OutPut-file_" +str(filenum) + ".csv", "a")
        for line in file1:
            #print line
            if row_count >= split_size:
                out_file.close()
                filenum = filenum + 1
                out_file = open(filepath + "\\OutPut-file_" +str(filenum) + ".csv", "a")
                row_count = 0
            out_file.write(line)
            row_count = row_count +1
            line_count = line_count+1
        ...

理想情况下,您甚至应该在 try block 之前初始化 out_file = None ,并确保使用 if out_file is not None: out_file 在 except block 中完全关闭.close()

备注:此代码仅按行数进行拆分(就像您的代码一样)。这意味着如果 csv 文件可以在带引号的字段中包含换行符,则会给出错误的输出...

关于python - 通过python分割一个大的csv文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46662134/

相关文章:

python - 使用 beautifulsoup 抓取 XML 元素属性

r - 列出循环 "i"交互的文件

java - 如何使用java运行csv并在浏览器中输出

从 tar.gz 导入 Python 库?

python - 如何使用opencv在Python中将图像转换为矩阵

python - 比较两个数据帧的多行

csv - 如何使用ABL编码将数据写入CSV文件

php - 自动导入CSV文件并上传到数据库

python - Python中的手动垃圾收集

python - 根据另一个 DataFrame 选择一个 DataFrame 的列