python - 在 20 个随机数字列表中找到最常见的一对、三重奏等，玩过 100 次

所以我有一个包含 100 行的 CSV，每行有 20 个数字，用逗号分隔，如下所示:

23, 52, 63, 76, 23, 45, ...
39, 52, 83, 33, 35, 23, ...
etc.

我想编写一个算法，找到一对的所有不同组合，并列出它发生的次数，并对三重奏、四重奏等执行相同的操作。

显然，每一行都会有很多组合(因为顺序并不重要)

这将是每一行的答案

每行有 15 个组合。现在我只能想象如果是 20 个而不是 6 个，会有多少个组合，但现在我们不用担心这个。

由于有 15 个，我不希望程序显示每一个组合，只显示“计数”大于 1 的组合，“计数”是它们出现的次数。

因此，如果我仅针对上面的这两行运行程序，对于 6 个数字而不是 20 个数字，程序将返回

23, 52: 2

看看如何只显示出现多次的对，这就是我想做的，但这很简单，只是不显示计数是否等于 1。

无论如何，我如何开始创建这个算法？我不知道从哪里开始，我想我会从抓取每一行开始，并获取每一对，但我该怎么做？

提前谢谢您，毫无疑问，我会继续解决这个问题，如果我找到解决方案，我会发布它(以及代码)。再次感谢您。

最佳答案

代码内嵌注释。

import itertools

lines = [[23, 52, 63, 76, 23, 45],
         [39, 52, 83, 33, 35, 23]]

# This set will store the unique elements
# across all lines
uniq = set()

# This list will hold a dict for each line
# These dicts will contain the frequency
# of each unique number in that line

freq = [{} for _ in range(len(lines))]
for index, line in enumerate(lines):
    for j in line:

        freq[index][j] = freq[index].get(j, 0) + 1
        uniq.add(j)

# This will store the frequency of each possible pair
counter = {}

# There will be k * (k - 1) / 2 combinations,
# where k is the number of unique pairs

for i in itertools.combinations(uniq, 2):
    for j in range(len(lines)):

        freq1 = freq[j].get(i[0], 0)
        freq2 = freq[j].get(i[1], 0)

        # Multiplying frequencies gives us the number
        # of pairs with these numbers in this line

        counter[i] = counter.get(i, 0) + (freq1 * freq2)

# Performs a descending sort on all pairs
sol = sorted(counter.items(), key=lambda value: value[1], reverse=True)
print(sol)

输出:

   [((52, 23), 3), ((45, 23), 2), ((23, 63), 2), ((76, 23), 2), ((33, 35), 1),
    ((76, 45), 1), ((52, 63), 1), ((39, 83), 1), ((76, 52), 1), ((45, 52), 1),
    ((35, 39), 1), ((39, 23), 1), ((33, 52), 1), ((39, 52), 1), ((33, 23), 1),
    ((35, 23), 1), ((35, 52), 1), ((76, 63), 1), ((33, 83), 1), ((35, 83), 1),
    ((33, 39), 1), ((83, 23), 1), ((45, 63), 1), ((83, 52), 1), ((83, 63), 0),
    ((35, 76), 0), ((33, 76), 0), ((35, 45), 0), ((33, 45), 0), ((45, 83), 0),
    ((39, 63), 0), ((33, 63), 0), ((39, 76), 0), ((39, 45), 0), ((76, 83), 0),
    ((35, 63), 0)]

当心!如果列表主要由唯一值组成，则该算法的运行时间约为 O(n^4)。

编辑:这是另一个应该表现更好的版本(它维护每行的 uniq 设置，因此避免零大小组合)。

import functools
import itertools
import operator

lines = [[23, 52, 63, 76, 23, 45],
         [39, 52, 83, 33, 35, 23]]

# Set this variable before use!
permlength = 2

uniq = [set() for _ in range(len(lines))]
freq = [{} for _ in range(len(lines))]

for index, line in enumerate(lines):
    for j in line:
        freq[index][j] = freq[index].get(j, 0) + 1
        uniq[index].add(j)

counter = {}

print("Total number of lines: {}".format(len(lines)))

for i in range(len(lines)):
    print("Line {}...".format(i + 1))

    for j in itertools.combinations(uniq[i], permlength):

        freqp = [freq[i].get(j[x], 0) for x in range(permlength)]
        counter[j] = counter.get(j, 0) + functools.reduce(operator.mul, freqp)

sol = sorted(counter.items(), key=lambda value: value[1], reverse=True)

print(sol)

输出:

[((52, 23), 3), ((45, 23), 2), ((63, 23), 2), ((76, 23), 2), ((76, 52), 1),
 ((76, 45), 1), ((52, 63), 1), ((39, 83), 1), ((35, 39), 1), ((45, 52), 1),
 ((33, 52), 1), ((39, 52), 1), ((33, 23), 1), ((35, 23), 1), ((35, 52), 1),
 ((39, 23), 1), ((76, 63), 1), ((33, 83), 1), ((33, 39), 1), ((83, 23), 1),
 ((45, 63), 1), ((83, 52), 1), ((33, 35), 1), ((35, 83), 1)]

关于python - 在 20 个随机数字列表中找到最常见的一对、三重奏等，玩过 100 次，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/35242026/

python - 在 20 个随机数字列表中找到最常见的一对、三重奏等，玩过 100 次

上一篇：python - 如何在 pygame 中的幻灯片之间切换

下一篇：python - 将 PIL 图像转换为 numpy 数组有时不起作用