python - 比较给定给python中的键的多个值

我实际上正在尝试使用 python 为特定数据开发映射器和缩减器。我已经编写了映射器代码，它会给出商店名称和在商店完成的交易的成本。

例如:

耐克 $45.99 阿迪达斯 $72.99 彪马 $56.99 耐克 $109.99 阿迪达斯 $85.99

这里的key是店名，value是交易费用。现在我正在尝试编写 reducer 代码，该代码将比较每家商店的交易成本并给出每家商店的最高交易。

现在我想要得到的输出是

耐克 $109.99 阿迪达斯 $85.99 彪马 $56.99

我的问题是如何比较 python 中赋予键的不同值？

最佳答案

好吧，MapReduce 范例是一个键值对，每个映射器都应该以准确的格式输出。

至于 reducer，hadoop 框架保证每个使用 shuffle-sort 算法的 reducer 都会得到某个键的所有值，所以两个不同的 reducer 不可能从同一个键得到不同的条目。

但是，reducer 可以处理多个键值。

关于您的问题，假设您对同一个键有 3 个不同的值，例如:

Nike $109.99
Nike $45.99
Nike $294.99

reducer 将首先获得 2 个值，因此基于您的键的 reducer 函数将获得这些值:

109.99 美元
45.99 美元

并且需要使用简单比较输出最高的一个，输出应该是 $109.99 这将是您的 reducer 函数第二次运行的输入，这次输入:

109.99 美元
294.99 美元

同样，使用比较你应该输出最高值，即:$294.99

至于代码，您将需要一个非常简单的函数，例如:

编辑:我假设您的分隔符是制表符，但您可以将格式更改为您正在使用的格式

#!/usr/bin/env python

import sys

current_word = None
current_max_count = 0
word = None

# input comes from STDIN
for line in sys.stdin:
    # remove leading and trailing whitespace
    line = line.strip()

    # parse the input we got from mapper.py
    word, count = line.split('\t', 1)

    # convert count (currently a string) to int
    try:
        count = int(count)
    except ValueError:
        # count was not a number, so silently
        # ignore/discard this line
        continue

    # this IF-switch only works because Hadoop sorts map output
    # by key (here: word) before it is passed to the reducer
    if current_word == word:
        if count > current_max_count:
            current_max_count = count
    else:
        if current_word:
            # write result to STDOUT
            print '%s\t%s' % (current_word, current_max_count)
        current_max_count = count
        current_word = word

# do not forget to output the last word if needed!
if current_word == word:
    print '%s\t%s' % (current_word, current_max_count)

关于python - 比较给定给python中的键的多个值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/37782885/

python - 比较给定给python中的键的多个值

上一篇：hadoop - 在 Hadoop 中使用流式处理

下一篇：performance - Mapreduce Job - 完成时间太长