1 2 3 4 5
1 0.000 0.733 0.762 0.745 0.692
2 0.733 0.000 0.842 0.766 0.701
3 0.762 0.842 0.000 0.851 0.803
4 0.745 0.766 0.851 0.000 0.402
5 0.692 0.701 0.803 0.402 0.000
我正在编写如下 python 代码:
import csv
import time
import numpy as np
import matplotlib.pyplot as plt
t0 = time.time()
count = 0
with open('test.csv','r') as infile:
reader=csv.reader(infile, delimiter='\t',lineterminator='\n',)
reader.next()
for rows in reader:
numbers = np.array([float(col) for col in rows])
numbersnz = numbers[numbers != 0.0]
if (numbersnz[1:] >= 0.5):
# **HERE I want to caculate how many rows (in the above csv file data) has 50% or more data points which are greater than 0.5. but I donot understand how to do it ??? please help.!!!**
print time.time() - t0, "seconds"
这段代码对于 50000 * 50000 数据有点慢......所以如果这方面有任何改进......因为我是 python 的新手我无法制作更快的代码!!!
提前致谢!
最佳答案
正如@DavidAlber 所说,50000 * 50000 个数字可能无法放入您的 RAM。
但是下面的代码应该足够快并且它只在内存中保留当前行。
import csv
import time
import numpy as np
count = 0
with open('test.csv','r') as infile:
reader=csv.reader(infile, delimiter='\t',lineterminator='\n',)
reader.next()
for row in reader:
rec = np.fromiter(row[1:], dtype=np.float32)
if (rec > 0.5).sum() >= (len(rec) - 1) * 0.5:
count += 1
关于python - 编号的计算满足条件的行(来自 csv 的数据)和较慢的代码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8265955/