python - 简化 pandas 中大文件的处理

有没有一种方法可以在不占用大量内存的情况下简化 pandas 中大型文件或 Excel 文件的处理？

我现在要做的就是像这样加载文件:

data = pd.read_csv('SUPERLARGEFILE.csv', index_col=0, encoding = "ISO-8859-1", low_memory=False)

Perform some task

data.to_csv('Results.csv', sep=',')

如果我在一台内存较少的计算机上工作。有没有一种方法可以使用迭代函数流式传输和处理大型数据文件来执行以下操作:

   Load first 1000 rows, store this in memory

   Perform some task

   Save data

   Load next 1000 rows, over write this in memory

   perform task

   append to save file

最佳答案

只需将 chunksize 参数添加到您的代码中即可:

data = pd.read_csv('SUPERLARGEFILE.csv', index_col=0, encoding = "ISO-8859-1", low_memory=Fals, chunksize=10)

result = []
for chunk in data:  # get chunks of 10 rows each
   result.append(chunk.mean())
# do something with res e.g. res = DataFrame(res).to_csv("result.csv")

关于python - 简化 pandas 中大文件的处理，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23641484/

上一篇：Python:无法分配给运算符

下一篇：javascript - 基于缩进的语法 -> AST

相关文章：

arrays - swift 中的二维数组迭代速度(Beta 4)

Python 数组在迭代 char 数组时出现乱序

python - numba 中两个列表的交集

python - 将 dict 与 key 一起传递以传递存储位置是否有意义？更好的方法？

python - 根据时间戳中存在的两列中的元素分隔行

python - 如何在 pandas 中对不同的数据帧进行分组和聚合

python - 将数组转换为 Pandas 数据框列

.net - 在VB.Net中遍历二维数组

c++ - 迭代无符号整数的所有值的最漂亮方法

javascript - django 提供静态文件/angularJS 应用程序