Python,内存错误,csv文件太大

标签 python csv memory

我的 python 模块存在问题,无法处理导入大数据文件(文件 Targets.csv 的重量接近 1 Gb)

加载此行时会出现错误:

targets = [(name, float(X), float(Y), float(Z), float(BG))
           for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]

回溯:

Traceback (most recent call last):
  File "C:\Users\gary\Documents\EPSON STUDIES\colors_text_D65.py", line 41, in <module>
    for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]
MemoryError

我想知道是否有办法逐行打开文件targets.csv?并且还想知道这会减慢进程吗?

这个模块已经相当慢了...

谢谢!

import geometry
import csv
import numpy as np
import random
import cv2

S = 0


img = cv2.imread("MAP.tif", -1)
height, width = img.shape

pixx = height * width
iterr = float(pixx / 1000)
accomplished = 0
temp = 0

ppm = file("epson gamut.ppm", 'w')

ppm.write("P3" + "\n" + str(width) + " " + str(height) + "\n" + "255" + "\n")
# PPM file header

all_colors = [(name, float(X), float(Y), float(Z))
              for name, X, Y, Z in csv.reader(open('XYZcolorlist_D65.csv'))]

# background is marked SUPPORT
support_i = [i for i, color in enumerate(all_colors) if color[0] == '255 255 255']
if len(support_i)>0:
    support = np.array(all_colors[support_i[0]][1:])
    del all_colors[support_i[0]]
else:
    support = None

tg, hull_i = geometry.tetgen_of_hull([(X,Y,Z) for name, X, Y, Z in all_colors])
colors = [all_colors[i] for i in hull_i]

print ("thrown out: "
       + ", ".join(set(zip(*all_colors)[0]).difference(zip(*colors)[0])))

targets = [(name, float(X), float(Y), float(Z), float(BG))
           for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]

for target in targets:


    name, X, Y, Z, BG = target

    target_point = support + (np.array([X,Y,Z]) - support)/(1-BG)

    tet_i, bcoords = geometry.containing_tet(tg, target_point)

    if tet_i == None:
        #print str("out")    
        ppm.write(str("255 255 255") + "\n")
        print "out"

        temp += 1

        if temp >= iterr:

            accomplished += temp 
            print str(100 * accomplished / (float(pixx))) + str(" %")
            temp = 0

        continue 
        # not in gamut

    else:

        A = bcoords[0]
        B = bcoords[1]
        C = bcoords[2]
        D = bcoords[3]

        R = random.uniform(0,1)

        names = [colors[i][0] for i in tg.tets[tet_i]]

        if R <= A:
            S = names[0] 

        elif R <= A+B:
            S = names[1]

        elif R <= A+B+C:
            S = names[2]

        else:
            S = names[3]

        ppm.write(str(S) + "\n")

        temp += 1

        if temp >= iterr:

            accomplished += temp 
            print str(100 * accomplished / (float(pixx))) + str(" %")
            temp = 0


print "done"
ppm.close()

最佳答案

csv.reader() 已经一次读取一行。但是,您首先将所有行收集到一个列表中。您应该一次处理一行。一种方法是切换到生成器,例如:

targets = ((name, float(X), float(Y), float(Z), float(BG))
           for name, X, Y, Z, BG in csv.reader(open('targets.csv')))

(从方括号切换到括号应该将 target 从列表理解更改为生成器。)

关于Python,内存错误,csv文件太大,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21591353/

相关文章:

python - scipy.sparse.csr_matrix 的最大值

python 正则表达式模式在字符串中不匹配,甚至在测试器中匹配(重试,正则表达式教练,http ://ksamuel. pythonanywhere.com)

Python:如何写入 CSV 并将 DataFrame 中的日期附加到文件名

java - 为什么我们在启动 jvm 时指定最小和最大堆内存

java - 为了节省内存并在java中快速运行,我应该使用哪种方式

python - 如何舍入数字

python - 在 Python 中避免 PostgreSQL 数据库中的重复数据

Node.js 迭代请求的 CSV 文件,但在继续迭代之前等待响应

python - 使用大型(1.7gig)csv 文件在 python 中进行数据清理

c - 关于Unix中的bss段和data段