python - 文件太大无法读取

标签 python file

我有一个大小为 3.8GB 的​​文件“uniprot.tab”。

我正在尝试根据此文件绘制直方图,但它永远无法完成计算,因为它太大了。

我之前用一个小文件“mock.tab”测试了我的代码,它工作正常。

编辑: 例如“mock.dat”的一些行:

Entry   Status  Cross-reference (PDB)
A1WYA9  reviewed    
Q6LLK1  reviewed    
Q1ACM9  reviewed    
P10994  reviewed    1OY8;1OY9;1OY9;1OY9;
Q0HV56  reviewed    
Q2NQJ2  reviewed    
B7HCE7  reviewed    
P0A959  reviewed    4CVQ;
B7HLI3  reviewed    
P31224  reviewed    1IWG;1OY6;1OY8;1OY9;4CVQ;

在这里你可以看到小文件上使用的代码:

import matplotlib.pyplot as plt

occurrences = []
with open('/home/martina/Documents/webstormProj/unpAnalysis/mock.tab', 'r') as f:
    next(f) #do not read the heading
    for line in f:
        col_third = line.split('\t')[2] #take third column
        occ = col_third.count(';') # count how many times it finds ; in each line
        occurrences.append(occ)

x_min = min(occurrences)
x_max = max(occurrences)


x = [] # x-axis
x = list(range(x_min, x_max + 1))

y = [] # y-axis
for i in x:
    y.append(occurrences.count(i))

plt.bar(x,y,align='center') # draw the plot
plt.xlabel('Bins')
plt.ylabel('Frequency')
plt.show()

我怎样才能用我的大文件绘制这个图?

最佳答案

与其构建所有值的列表然后计算每个值的出现次数,不如在迭代时直接构建直方图更快。您可以为此使用 collections.Counter:

from collections import Counter

histogram = Counter()
with open(my_file, 'r') as f:
    next(f)
    for line in file:
        # split line, etc. 
        histogram[occ] += 1

# now histogram is a dictionary containing each "occurrence" value and the count of how many times it was seen.

x_axis = list(range(min(histogram), max(histogram)+1))
y_axis = [histogram[x] for x in x_axis]

关于python - 文件太大无法读取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57711114/

相关文章:

python - 为 Ad Block Plus 安装 abpcrawler 失败

java - 带/不带 TRUNCATE_EXISTING 的 StandardOpenOption.WRITE + StandardOpenOption.CREATE 之间的区别?

android - 如何将远程文本从文本文件加载到 android textview 中?

python - 具有高基数的特征(如何向量化它们?)

python - 在 Django 中,如何控制查询集将使用哪个数据库连接和游标

python - 有没有像 GitLab 这样用 Python 写的程序?

c# - 无法将文件从一个目录复制到应用程序文件夹

c - 尝试使用 fread 读取字节但出现 Segmentation Fault 11

ruby-on-rails - Rails : How to download a previously uploaded document?

python - 使用 Ansible 从一行文本中提取两个字符串