你好,我想随机化一个 1.53 亿行文本文件的行,但我目前使用的方式让我在执行此操作时内存不足:
with open(inputfile,'r') as source:
data = [ (random.random(), line) for line in source ]
data.sort()
with open(outputfile,'w') as target:
for _, line in data:
target.write( line )
最佳答案
使用 h5py ,你可以将你的数据文件移植成HDF5格式,然后随机化:
https://stackoverflow.com/a/44866734/3841261
You can use random.shuffle(dataset). This takes a little more than 11 minutes for a 30 GB dataset on my laptop with a Core i5 processor, 8 GB of RAM, and a 256 GB SSD
关于python - 在不耗尽内存的情况下随机化 1.53 亿行文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52010848/