python - 如何减少词典排序代码的运行时间[Python]

标签 python algorithm sorting runtime

目标:
给定整数 nk,在 [1, ..,n]< 范围内找到字典序上第 k 最小的整数

示例:
输入:n=13, k=2
输出:10

解释:
字典顺序是[1, 10, 11, 12, 13, 2, 3, 4, 5, 6, 7, 8, 9],所以第二小的数字是10。

我的代码适用于最大 10^5 的数字,但在 10^6 左右时失败。我一点都不熟悉在 Python 中改进运行时。

使用 Python 的排序方法不是一种方法吗?

代码:

    class Solution(object):
        def findKthNumber(self, n, k):

        """
        :type n: int
        :type k: int
        :rtype: int
        """

        A = []
        for i in range(1,n+1):
            A.append(i)
        x = (sorted(map(str, A)))
        return int(x[k-1])

最佳答案

使用第 k 个选择算法而不是排序。

def select_median_of_medians_pivot(array, k):
    '''
    Implementation of the Blum, Floyd, Pratt, Rivest, Tarjan SELECT
    algorithm as described by David Eppstein.

    CITATION: http://www.ics.uci.edu/~eppstein/161/960130.html

    This algorithm has worst case run time of O(N) where N is the
    number of entries in the array.

    Although this algorithm has better worst case performance than
    select_random_pivot(), that algorithm is preferred because it
    is much faster in practice.

    Here is how you might use it:

        # Create a list of pseudo random numbers.
        # Duplicates can occur.
        num = 10000
        array = [random.randint(1,1000) for i in range(num)]
        random.shuffle(array)
        random.shuffle(array)

        # Get the value of the kth item.
        k = 7
        kval = select_median_of_medians_pivot(array, k)

        # Test it.
        sorted_array = sorted(array)
        assert sorted_array[k] == kval

    @param array the list of values
    @param k     k-th item to select.
    '''

    # If the array is short, terminate the recursion and return the
    # value without partitioning.
    if len(array) <= 10:
        array.sort()
        return array[k]

    # Partition the array into subsets with a maximum of 5 elements
    # each.
    subset_size = 5  # max items in a subset
    subsets = []  # list of subsets
    num_medians = len(array) / subset_size
    if (len(array) % subset_size) > 0:
        num_medians += 1  # not divisible by 5
    for i in range(num_medians):
        beg = i * subset_size
        end = min(len(array), beg + subset_size)
        subset = array[beg:end]
        subsets.append(subset)

    # Find the medians in each subset.
    # Note that it calls select_median_of_medians_pivot() recursively taking
    # advantage of the fact that for len(array) <= 10, the select
    # operation simply sorts the array and returns the k-th item. This
    # could be done here but since the termination condition is
    # required to get an infinite loop we may as well use it.
    medians = []  # list of medians
    for subset in subsets:
        median = select_median_of_medians_pivot(subset, len(subset)/2)
        medians.append(median)

    # Now get the median of the medians recursively.
    # Assign it to the local pivot variable because
    # the pivot handling code is the same regardless
    # of how it was generated. See select_random_pivot() for
    # a different approach for generating the pivot.
    median_of_medians = select_median_of_medians_pivot(medians, len(medians)/2)
    pivot = median_of_medians  # pivot point value (not index)

    # Now select recursively using the pivot.
    # At this point we have the pivot. Use it to partition the input
    # array into 3 categories: items that are less than the pivot
    # value (array_lt), items that are greater than the pivot value
    # (array_gt) and items that exactly equal to the pivot value
    # (equals_array).
    array_lt = []
    array_gt = []
    array_eq = []
    for item in array:
        if item < pivot:
            array_lt.append(item)
        elif item > pivot:
            array_gt.append(item)
        else:
            array_eq.append(item)

    # The array values have been partitioned according to their
    # relation to the pivot value. The partitions look like this:
    #
    #   +---+---+---+...+---+---+---+...+---+---+---+...
    #   | 0 | 1 | 2 |   | e |e+1|e+2|   | g |g+1|g+2|
    #   +---+---+---+...+---+---+---+...+---+---+---+...
    #      array_lt        array_eq       array_gt
    #
    # If the value of k is in the range [0..e) then we know that
    # the desired value is in array_lt so we need to recurse.
    #
    # If the value of k in the range [e..g) then we know that the
    # desired value is in array_eq and we are done.
    #
    # If the value of k is >= g then we the desired value is in
    # array_gt and we need to recurse but we also have to make sure
    # that k is normalized with respect to array_gt so that it has the
    # proper offset in the recursion. We normalize it by subtracting
    # len(array_lt) and len(array_eq).
    #
    if k < len(array_lt):
        return select_fct(array_lt, k)
    elif k < len(array_lt) + len(array_eq):
        return array_eq[0]
    else:
        normalized_k = k - (len(array_lt) + len(array_eq))
        return select_fct(array_gt, normalized_k)

修改后的代码。

class Solution(object):

    def findKthNumber(self, n, k):

    """
    :type n: int
    :type k: int
    :rtype: int
    """

    A = range(1,n+1)
    x = (map(str, A))
    return select_median_of_medians_pivot(x,k-1)

它将最坏情况下的运行时间从 O(n*log n) 减少到 O(n)。

code for select_median_of_medians_pivot was taken from here

关于python - 如何减少词典排序代码的运行时间[Python],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40200441/

相关文章:

java - 如何使用TSPlib解决旅行商问题(TSP)

python - 从 Python 使用 Spark 所需的库 (PySpark)

Python 请求模块发送 JSON 字符串而不是 x-www-form-urlencoded 参数字符串

algorithm - 这种类型的软件可以吗

algorithm - GLua - 获取两个表之间的差异

python - Django:类型错误:实例(模型对象)之间不支持 '<'

python - 编码德语算法

algorithm - Line(x1,y1,x2,y2) 只需要使用 PutPixel(x,y) 示例吗?

hibernate - 如何在hql中对链接列表进行排序?

php - 我的代码有什么问题 - 循环排序的数组不显示任何结果