我正在编写一个脚本,它将大量数据写入 .csv
文件。为了使感兴趣的用户之间的数据传输更容易,我想对每个文件的行数实现限制。例如,我希望将前一百万条记录写入 some_csv_file_1.csv
以及要写入的第二百万条记录 some_csv_file_2.csv
等,直到所有记录都已写入。
我试图让以下工作:
import csv
csv_record_counter = 1
csv_file_counter = 1
while csv_record_counter <= 1000000:
with open('some_csv_file_' + str(csv_file_counter) + '.csv', 'w') as csvfile:
output_writer = csv.writer(csvfile, lineterminator = "\n")
output_writer.writerow(['record'])
csv_record_counter += 1
while not csv_record_counter <= 1000000:
csv_record_counter = 1
csv_file_counter += 1
问题:随着记录增加超过 1000000,不会创建后续文件。该脚本继续向原始文件添加记录。
最佳答案
我喜欢在导出数据之前对数据进行批处理。
def batch(iterable, n=1):
length = len(iterable)
for ndx in range(0, length, n):
yield iterable[ndx:min(ndx + n, length)]
headers = [] # Your headers
products = [] # Milions of products go here
batch_size = int(len(db_products) / 4) # Example
# OR in your case, batch_size = 1000000000
for idx, product_batch in enumerate(batch(products, batch_size)):
with open('products_{}.csv'.format(idx + 1), 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=headers)
writer.writeheader()
for product in product_batch:
writer.writerow(product)
引用:关于Python CSV writer 自动限制每个文件的行数并创建新文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47537014/