python - 克服 Ashton String 任务中的 MemoryError/Slow Runtime

标签 python string out-of-memory n-gram

Ashton String task ,目标是:

Arrange all the distinct substrings of a given string in lexicographical order and concatenate them. Print the Kth character of the concatenated string. It is assured that given value of K will be valid i.e. there will be a Kth character.

输入格式:

First line will contain a number T i.e. number of test cases. First line of each test case will contain a string containing characters (a−z) and second line will contain a number K.

输出格式:

Print Kth character ( the string is 1 indexed )

约束

1 ≤ T ≤ 5
1 ≤ length ≤ 105
K will be an appropriate integer.

例如,给定输入:

1
dbac
3

输出将是:c

我已经尝试使用这段代码完成任务,它适用于相对较短的字符串:

from itertools import chain

def zipngram(text, n=2):
    words = list(text)
    return zip(*[words[i:] for i in range(n)])

for _ in input():
    t = input()
    position = int(input())-1 # 0th indexing
    chargrams = chain(*[zipngram(t,i) for i in range(1,len(t)+1)])
    concatstr = ''.join(sorted([''.join(s) for s in chargrams]))
    print (concatstr[position])

但是如果输入文件是这样的:http://pastebin.com/raw/WEi2p09H所需的输出是:

l
s
y
h
s

解释器会抛出一个MemoryError:

Traceback (most recent call last):
  File "solution.py", line 11, in <module>
    chargrams = chain(*[zipngram(t,i) for i in range(1,len(t)+1)])
  File "solution.py", line 11, in <listcomp>
    chargrams = chain(*[zipngram(t,i) for i in range(1,len(t)+1)])
  File "solution.py", line 6, in zipngram
    return zip(*[words[i:] for i in range(n)])
  File "solution.py", line 6, in <listcomp>
    return zip(*[words[i:] for i in range(n)])
MemoryError

如何解决 MemoryError?是否可以使用 native python2 或 python3 以另一种方式解决?

我尝试通过使用 heapq 修剪堆栈来解决 MemoryError 但现在它进入了 super 缓慢的运行时推送和弹出堆而不是占用太多内存.

from itertools import chain
import heapq

t = int(input())
s = list(input())
k = int(input())

stack = []
for n in range(1,len(s)+1):
    for j in range(n):
        ngram = (''.join(s[j:]))
        ngram_len = len(ngram)
        # Push the ngram into the heapq.
        heapq.heappush(stack, ngram)
        pruned_stack = []
        temp_k = 0
        # Prune stack.
        while stack != [] and temp_k < k:
            x = heapq.heappop(stack)
            pruned_stack.append(x)
            temp_k+=len(x)
        # Replace stack with pruend_stack.
        stack = pruned_stack

print (''.join(chain(*pruned_stack))[k])

有没有办法在不使用太多内存导致 MemoryErrorheapq 推送和弹出运行时太慢之间取得平衡?

最佳答案

MemoryError 表示程序消耗了所有可用内存并因此崩溃。

一个可能的解决方案是使用惰性的迭代器(它们也可以在 Py2 中工作,但 Py3 对它们有更好的支持)(它们只按需计算值,而不是一次全部计算)。

使你的程序适应生成器应该只需要很小的改变,在不使用列表的情况下索引生成器(这会抵消懒惰的好处)参见:Get the nth item of a generator in Python

关于python - 克服 Ashton String 任务中的 MemoryError/Slow Runtime,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34531136/

相关文章:

python - 系统/pip readline 版本不匹配

Python boolean 比较

python - 如何从一串数字和逗号中挑选出数百万?

java - 解析数百万个 XML 文件 - Java

android - Android中如何设置imageView的背景避免OutOfMemoryError?

python - 如何绘制正确的分布 TreeMap ?

python - 每次启动应用程序时轮换日志文件 (Python)

java - 如何从 String 数组或 ArrayList 中删除相似元素

C++ 字符串操作/输入

java - Buttons背景引起的OutOfMemory