Python 对 >= 10,000 的数字进行列表比较的效率

标签 python

我一直在尝试完成最近 ACM 编程挑战赛后的一个问题,但遇到了障碍。问题说明

Your team has been retained by the director of a competition who supervises a panel of judges. The competition asks the judges to assign integer scores to competitors – the higher the score, the better. Although the event has standards for what score values mean, each judge is likely to interpret those standards differently. A score of 100, say, may mean different things to different judges.

The director's main objective is to determine which competitors should receive prizes for the top positions. Although absolute scores may differ from judge to judge, the director realizes that relative rankings provide the needed information – if two judges rank the same competitors first, second, third, ... then they agree on who should receive the prizes.

Your team is to write a program to assist the director by comparing the scores of pairs of judges. The program is to read two lists of integer scores in competitor order and determine the highest ranking place (first place being highest) at which the judges disagree.

Input to your program will be a series of score list pairs. Each pair begins with a single integer giving the number of competitors N, 1 < N < 1,000,000. The next N integers are the scores from the first judge in competitor order. These are followed by the second judge's scores – N more integers, also in competitor order. Scores are in the range 0 to 100,000,000 inclusive. Judges are not allowed to give ties, so each judge’s scores will be unique. Values are separated from each other by one or more spaces and/or newlines. The last score list pair is followed by the end-of-file indicator.

有示例测试用例涵盖 N = 4 和 N = 8

4

3 8 6 2

15 37 17 3

8

80 60 40 20 10 30 50 70

160 100 120 80 20 60 90 135

以及预期输出: 对于每个分数对,打印一行,其中的整数代表评委不同意的最高排名。如果评委们在每个地方都达成一致,则打印一行仅包含“同意”一词。使用以下格式:“案例”、一个空格、案例编号、一个冒号和一个空格,以及该案例的答案(不带尾随空格)。

Case 1: agree

Case 2: 3

我的代码如下:

import sys

def calculate(competitors, scores1, scores2):
    scores1sort = sorted(scores1, reverse = True)
    scores2sort = sorted(scores2, reverse = True)

    for x in range(len(scores1)) :
        indexed1 = scores1.index(scores1sort[x])
        #print ('place: ', x+1, 'Position: ',indexed1+1)
    #iterating over the entire length  of the sorted lists multiple times takes too long
        indexed2 = scores2.index(scores2sort[x])
        #print ('place: ', x+1, 'Position: ',indexed2+1)
        if indexed2 != indexed1 :
            print ( "Case",  str(case) + ":", x+1)
            return

    #run both fors at the same time, compare indexed of scores1 to index of scores2
    #if the position(indexed + 1) doesnt match between the two, print the place(x+1) of the disparity


    #if match:
        #print ("Case " + case +": " + "agree"
    #else: print (Case " + case + ": " + index of disagreement

    print ("Case", str(case) + ":" , "agree")



    scores1 = [];
    scores2 = [];
    case = 1;
    state = 0;
    # 0 indicates number of competitors
    # 1 indicates judge 1
    # 2 indicates judge 2
    #for line in sys.stdin:
    for line in test.split("\n"):
        line = line.strip().split()
        if not line:
            continue

    if state == 0:
        #if empty line, error
        competitors = int(line[0])
        state = 1;

    else:
        for y in line:
            if state == 1:
                scores1.append(int(y))
                if len(scores1) >= competitors:
                    state = 2;
            elif state == 2:
                scores2.append(int(y))
                if len(scores2) >= competitors:
                    state = 0;
                    #print (competitors, score1, scores2)
                    calculate(competitors, scores1, scores2);
                    case += 1;

我的代码当前使用一个文本文件运行,其中包含留给我们的编程竞赛的测试输入,其中包括小型测试值,但也包括一组包含 10,000 个参赛者的值。

我毫不怀疑,如果有足够的时间,代码可以完成,但编程挑战指南指定代码必须在比当前运行时更短的时间窗口中运行。

因此,我想询问任何人关于如何优化我的代码以加快执行速度的任何提示。

最佳答案

现在你的程序正在以二次方的时间运行,因此随着 N 变大,运行时间将急剧增加。您必须在远低于 O(n) 的内部循环中完成工作才能处理更大的数据集。

此外,简单地在开始时对输入进行排序对您没有帮助,因为您会丢失两个数组中关联条目之间的映射。

像这样怎么样:

def calculate(N, scores1, scores2):
    ranked1 = sorted(enumerate(scores1),key=lambda x: x[1], reverse=True)
    ranked2 = sorted(enumerate(scores2),key=lambda x: x[1], reverse=True)
    ...

现在你有两个数组,从最高排名到最低排名排序,成本为 O(n log n),你可以只搜索第 1[i][0] !=第2[i][0] 的情况最坏情况下也是 O(n)。

因此,最坏情况下总体运行时间为 O(n + n log n)

关于Python 对 >= 10,000 的数字进行列表比较的效率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27207090/

相关文章:

python - 访问 docker 容器外部的日志

python - 使用 R Markdown 样式文档 (.Rmd) 作为 Pweave 的输入

php - 是否有与 PHP 函数 htmlspecialchars() 等效的 Python?

Python 子进程输出到标准输出

python - 伪装类的真实模块

python - 如何在caffe模型中获取概率

python - Statsmodels 基于异方差一致性标准误差绘制平均置信区间

python - 使用Python操作Excel电子表格

python - oct2py - 在 python 中使用线程调用 Octave 函数

python - 由于 cffi.api.CDefError 无法使用 pyinstaller