python - 使用 pandas 加载过滤后的 .tda 文件的最简单方法是什么？

Pandas 有优秀的.read_table() 函数，但是大文件会导致MemoryError。
因为我只需要加载满足特定条件的行，所以我正在寻找一种方法来只加载那些。

这可以使用临时文件来完成:

with open(hugeTdaFile) as huge:
    with open(hugeTdaFile + ".partial.tmp", "w") as tmp:
        tmp.write(huge.readline())  # the header line
        for line in huge:
            if SomeCondition(line):
                tmp.write(line)

t = pandas.read_table(tmp.name)

有没有办法避免这样使用临时文件？

最佳答案

您可以使用 chunksize 参数返回一个迭代器

看到这个:http://pandas.pydata.org/pandas-docs/stable/io.html#iterating-through-files-chunk-by-chunk

根据需要过滤 block 帧
将过滤后的内容附加到列表中
最后连接

(或者你可以将它们写到新的 csvs 或 HDFStores 或其他什么)

关于python - 使用 pandas 加载过滤后的 .tda 文件的最简单方法是什么？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15088190/

上一篇：Python GTK : Instantiating a subclass of gtk. Bin

下一篇：python - 在命令行上加速 python 脚本与在 IDLE shell 中启动

python - Pandas滚动窗口统计计算输入数据时间戳不均匀

python - 使用 islice 和多处理批量读取和处理大型文本文件

python - 在 Pandas 中使用 DataFrame.ix 和元组索引

Python:如果 DataFrame 之间的其他值匹配，则对 DataFrame 中的值求和

linux - 远程 linux 服务器到远程 linux 服务器大型稀疏文件复制 - 如何？

phpstorm - 如何用PhpStorm 8打开大文件？

Python - 替换列表中的字符

python - 将所有 `None` 移动到列表的末尾 (python3)

Python 网页抓取 : how to skip url error