python - 使用全局变量更好还是将参数传递给映射函数更好？

我正在使用 pyspark 来对服务器日志进行一些处理，并且我对函数式编程概念还很陌生。我有一个查找表，在我的函数中使用它来从多个选项中进行选择，如下所示:

user_agent_vals = {
        'CanvasAPI': 'api',
        'candroid': 'mobile_app_android',
        'iCanvas': 'mobile_app_ios',
        'CanvasKit': 'mobile_app_ios',
        'Windows NT': 'desktop',
        'MacBook': 'desktop',
        'iPhone': 'mobile',
        'iPod Touch': 'mobile',
        'iPad': 'mobile',
        'iOS': 'mobile',
        'CrOS': 'desktop',
        'Android': 'mobile',
        'Linux': 'desktop',
        'Mac OS': 'desktop',
        'Macintosh': 'desktop'
    }

def parse_requests(line):
    """
    Expects an input list, which is then mapped to the correct fieldnames in
    a dict.

    :param line: A list of values.
    :return: A list containing the values for writing to a file.
    """
    values = dict(zip(requests_fieldnames, line))
    print(values)
    values['request_timestamp'] = values['request_timestamp'].split('-')[1]
    found = False
    for key, value in user_agent_vals.items():
        if key in values['user_agent']:
            found = True
            values['user_agent'] = value
    if not found:
        values['user_agent'] = 'other_unknown'
    return [
        values['user_id'],
        values['context_id'],
        values['request_timestamp'],
        values['user_agent']
    ]

我不想每次调用该函数时都重新定义字典(这将是数百万次)，但仅仅使用Python的LEGB查找让它在模块命名空间。我是否应该向调用 parse_requests 的映射函数传递一个参数(如果是，如何传递？)，或者处理这个问题的最佳实践方法是什么？

作为引用，这是我的 map 调用:

parsed_data = course_data.map(parse_requests)

最佳答案

对于此类全局“常量”使用全部大写是一种约定:

USER_AGENT_VALS

例如，默认设置pylint只允许模块级别的变量(函数和类除外)全部大写名称。

或者，您可以提供 user_agent_vals 作为第二个参数:

def parse_requests(line, user_agent_vals):

调用方式:

parse_requests(line, user_agent_vals)

您可以使用 functools.partial()“卡住”函数的参数:

from functools import partial

parse_requests_for_map = partial(parse_requests, user_agent_vals=user_agent_vals)

现在，您可以将其与map一起使用:

parsed_data = course_data.map(parse_requests_for_map)

关于python - 使用全局变量更好还是将参数传递给映射函数更好？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/36655604/

python - 使用全局变量更好还是将参数传递给映射函数更好？

上一篇：尝试更新按钮标签文本时 Python GUI 卡住或关闭

下一篇：python - 正则表达式: Removing this [##### : #####] from a string