python heapq排序列表错误?

标签 python sorting

我正在尝试将列表分类为一个列表,其中包含部分、子部分和子部分的编号和名称。该程序如下所示:

import heapq

sections = ['1. Section', '2. Section', '3. Section', '4. Section', '5. Section', '6. Section', '7. Section', '8. Section', '9. Section', '10. Section', '11. Section', '12. Section']
subsections = ['1.1 Subsection', '1.2 Subsection', '1.3 Subsection', '1.4 Subsection', '2.1 Subsection', '4.1 My subsection', '7.1 Subsection', '8.1 Subsection', '12.1 Subsection']
subsubsections = ['1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.4.1 Subsubsection', '2.1.1 Subsubsection', '7.1.1 Subsubsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection']

sorted_list = list(heapq.merge(sections, subsections, subsubsections))

print(sorted_list)

我得到的是这样的:

['1. Section', '1.1 Subsection', '1.2 Subsection', '1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.3 Subsection', '1.4 Subsection', '1.4.1 Subsubsection', '2. Section', '2.1 Subsection', '2.1.1 Subsubsection', '3. Section', '4. Section', '4.1 My subsection', '5. Section', '6. Section', '7. Section', '7.1 Subsection', '7.1.1 Subsubsection', '8. Section', '8.1 Subsection', '12.1 Subsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection', '9. Section', '10. Section', '11. Section', '12. Section']

我的第 12 小节和子小节位于第 8 节内,而不是第 12 节内。

为什么会这样?原始列表已排序,一切顺利,显然达到了第 10 位。

我不确定为什么会这样,有没有办法根据列表中的数字更好地将其分类为“树”?我正在构建各种目录,这将返回(一旦我过滤掉列表)

1. Section
    1.1 Subsection
    1.2 Subsection
        1.2.1 Subsubsection
        1.2.2 Subsubsection
    1.3 Subsection
    1.4 Subsection
        1.4.1 Subsubsection
2. Section
    2.1 Subsection
        2.1.1 Subsubsection
3. Section
4. Section
    4.1 My subsection
5. Section
6. Section
7. Section
    7.1 Subsection
        7.1.1 Subsubsection
8. Section
    8.1 Subsection
    12.1 Subsection
        8.1.1 Subsubsection
        12.1.1 Subsubsection
9. Section
10. Section
11. Section
12. Section

注意 8.1 小节后面的 12.1 小节和 8.1.1 小节后面的 12.1.1 小节。

最佳答案

在人眼看来,您的列表可能显示 已排序。但对于 Python 而言,您的输入并未完全排序,因为它按字典顺序 对字符串进行排序。这意味着 '12' 按排序顺序排在 '8' 之前,因为只比较前字符

因此,合并是完全正确的;在看到'8.1'字符串之后遇到以'12.1'开头的字符串,但是以'8.1.1'开头的字符串被排序之后。

您必须使用键函数从字符串中提取整数元组才能正确排序:

section = lambda s: [int(d) for d in s.partition(' ')[0].split('.') if d]
heapq.merge(sections, subsections, subsubsections, key=section))

请注意,key 参数仅在 Python 3.5 及更高版本中可用;在早期版本中,您必须手动进行装饰-合并-取消装饰。

演示(使用 Python 3.6):

>>> section = lambda s: [int(d) for d in s.partition(' ')[0].split('.') if d]
>>> sorted_list = list(heapq.merge(sections, subsections, subsubsections, key=section))
>>> from pprint import pprint
>>> pprint(sorted_list)
['1. Section',
 '1.1 Subsection',
 '1.2 Subsection',
 '1.2.1 Subsubsection',
 '1.2.2 Subsubsection',
 '1.3 Subsection',
 '1.4 Subsection',
 '1.4.1 Subsubsection',
 '2. Section',
 '2.1 Subsection',
 '2.1.1 Subsubsection',
 '3. Section',
 '4. Section',
 '4.1 My subsection',
 '5. Section',
 '6. Section',
 '7. Section',
 '7.1 Subsection',
 '7.1.1 Subsubsection',
 '8. Section',
 '8.1 Subsection',
 '8.1.1 Subsubsection',
 '9. Section',
 '10. Section',
 '11. Section',
 '12. Section',
 '12.1 Subsection',
 '12.1.1 Subsubsection']

键控合并很容易向后移植到 Python 3.3 和 3.4:

import heapq

def _heappop_max(heap):
    lastelt = heap.pop()
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        heapq._siftup_max(heap, 0)
        return returnitem
    return lastelt

def _heapreplace_max(heap, item):
    returnitem = heap[0]
    heap[0] = item
    heapq._siftup_max(heap, 0)
    return returnitem

def merge(*iterables, key=None, reverse=False):    
    h = []
    h_append = h.append

    if reverse:
        _heapify = heapq._heapify_max
        _heappop = _heappop_max
        _heapreplace = _heapreplace_max
        direction = -1
    else:
        _heapify = heapify
        _heappop = heappop
        _heapreplace = heapreplace
        direction = 1

    if key is None:
        for order, it in enumerate(map(iter, iterables)):
            try:
                next = it.__next__
                h_append([next(), order * direction, next])
            except StopIteration:
                pass
        _heapify(h)
        while len(h) > 1:
            try:
                while True:
                    value, order, next = s = h[0]
                    yield value
                    s[0] = next()           # raises StopIteration when exhausted
                    _heapreplace(h, s)      # restore heap condition
            except StopIteration:
                _heappop(h)                 # remove empty iterator
        if h:
            # fast case when only a single iterator remains
            value, order, next = h[0]
            yield value
            yield from next.__self__
        return

    for order, it in enumerate(map(iter, iterables)):
        try:
            next = it.__next__
            value = next()
            h_append([key(value), order * direction, value, next])
        except StopIteration:
            pass
    _heapify(h)
    while len(h) > 1:
        try:
            while True:
                key_value, order, value, next = s = h[0]
                yield value
                value = next()
                s[0] = key(value)
                s[2] = value
                _heapreplace(h, s)
        except StopIteration:
            _heappop(h)
    if h:
        key_value, order, value, next = h[0]
        yield value
        yield from next.__self__

decorate-sort-undecorate 合并非常简单:

def decorate(iterable, key):
    for elem in iterable:
        yield key(elem), elem

sorted = [v for k, v in heapq.merge(
    decorate(sections, section), decorate(subsections, section)
    decorate(subsubsections, section))]

因为您的输入已经排序,所以使用归并排序更有效。作为最后的手段,您可以只使用 sorted() 但是:

from itertools import chain
result = sorted(chain(sections, subsections, subsubsections), key=section)

关于python heapq排序列表错误?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39675066/

相关文章:

python - 有没有一种在 Django 模型中模拟虚拟继承的明智方法?

python - 将 2 列中的值合并为 pandas 数据框中的单列

python - 在 C 中,如何在没有嵌套函数的情况下为一个函数提供另一个函数的作用域?

python - 如何修复 Visual Studio "AttributeError: ' 引擎'对象没有属性 'getproperty' ”

java - 对具有两个元素的表进行排序

PHP根据值将数组拆分为两个数组

bash - 文件名的数字排序

python - 使用 BeautifulSoup 删除标签但保留其内容

java - 将值存储在列表中然后使用集合对它们进行排序不起作用

c++ - C++中的模板排序