python - 我如何能够从给定两个约束的列表列表中找到平均值?

标签 python python-2.7 dictionary

所以我有一个列表列表,其中每个子列表的第 7 个索引包含我想要平均的值,但是,数字必须根据其类型进行平均。可以在子列表的第 11 个索引处找到要匹配的类型。

下面是我写的一些代码。在本例中

# Open the csv file
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
# Store the data as a list or arrays
apps_data = list(read_file)

# idx_num = index number of interest
# list_doc = the list of lists
# row_start = 1
def extract(idx_num,list_doc,row_start=1):
    a_list = []
    for row in list_doc[row_start:]:
        var = row[idx_num]
        a_list.append(var)
    return a_list

# Use the extract function to get an array
a_list = extract(11, apps_data, 0)
# Find unique elements
a_list_set = set(a_list)
# Create a dictionary with initial values at [0,0]
dic = dict.fromkeys(a_list_set,[0,0])

print(dic)
# Works as intended
#{'Weather': [0, 0], 'Sports': [0, 0], 'Productivity': [0, 0], 'Games': [0, #0], 'News': [0, 0], 'Finance': [0, 0], 'Education': [0, 0], #'Entertainment': [0, 0], 'Health & Fitness': [0, 0], 'Business': [0, 0], #'Social Networking': [0, 0], 'prime_genre': [0, 0], 'Photo & Video': [0, #0], 'Navigation': [0, 0], 'Music': [0, 0], 'Medical': [0, 0], 'Travel': #[0, 0], 'Reference': [0, 0], 'Shopping': [0, 0], 'Utilities': [0, 0], #'Food & Drink': [0, 0], 'Lifestyle': [0, 0], 'Catalogs': [0, 0], 'Book': #[0, 0]}


for row in apps_data[1:]:
    price = float(row[4])
    genre = row[11]

# Here is the issue:
# I thought that this would allow for the genre instance to be matched to the appropriate key and then I could append my values.

    if genre in dic.keys():
        dic[genre][0] += 1
        dic[genre][1] += (price)
    else:
        dic[genre][0] = 1
        dic[genre][1] = price


print(dic)

## From here I would extract the array contents of the dictionary
for genre in a_list_set:
print(str(genre) + " mean price:"  + str(round(dic[genre][1]/dic[genre][0], 2)))


我得到了这个。

{'Weather': [7197, 12423.58999999945], 'Sports': [7197, 12423.58999999945], 'Productivity': [7197, 12423.58999999945], 'Games': [7197, 12423.58999999945], 'News': [7197, 12423.58999999945], 'Finance': [7197, 12423.58999999945], 'Education': [7197, 12423.58999999945], 'Entertainment': [7197, 12423.58999999945], 'Health & Fitness': [7197, 12423.58999999945], 'Business': [7197, 12423.58999999945], 'Social Networking': [7197, 12423.58999999945], 'prime_genre': [7197, 12423.58999999945], 'Photo & Video': [7197, 12423.58999999945], 'Navigation': [7197, 12423.58999999945], 'Music': [7197, 12423.58999999945], 'Medical': [7197, 12423.58999999945], 'Travel': [7197, 12423.58999999945], 'Reference': [7197, 12423.58999999945], 'Shopping': [7197, 12423.58999999945], 'Utilities': [7197, 12423.58999999945], 'Food & Drink': [7197, 12423.58999999945], 'Lifestyle': [7197, 12423.58999999945], 'Catalogs': [7197, 12423.58999999945],'Book': [7197, 12423.58999999945]}

最佳答案

我们可以使用itertools.groupby来做到这一点;首先,我们从数据中提取关注的“列”,即每行的第 7 个和第 11 个值,放入 子集 中,同样按第 11 个值排序。

然后,我们使用 groupby 将子集划分为组,其中每个组的成员都具有相同的第 2 个元素(原始第 11 个元素)。然后,我们可以使用 dict 理解来获取每个组成员的第一个元素的平均值。

from itertools import groupby

from operator import itemgetter

from statistics import mean

subset = sorted(((row[6], row[10]) for row in data), key=itemgetter(1))
result = {key: mean(map(itemgetter(0), group)) for key, group in groupby(subset, itemgetter(1))}

print(result)

一些示例数据:

[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -4.926456602181107, 0.0, 0.0, 0.0, 'this'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -4.261928508086729, 0.0, 0.0, 0.0, 'that'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.582427615396794, 0.0, 0.0, 0.0, 'other'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.08345371286375847, 0.0, 0.0, 0.0, 'other'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.6323414510835206, 0.0, 0.0, 0.0, 'this'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -7.755177634382969, 0.0, 0.0, 0.0, 'this'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -5.948058847184649, 0.0, 0.0, 0.0, 'that'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -5.767820549798114, 0.0, 0.0, 0.0, 'other'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.609131600539092, 0.0, 0.0, 0.0, 'this'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.2106567350536854, 0.0, 0.0, 0.0, 'that'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -3.1550716372338297, 0.0, 0.0, 0.0, 'other'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.6037278107842077, 0.0, 0.0, 0.0, 'that'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -11.819322083983815, 0.0, 0.0, 0.0, 'this'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.441817745217389, 0.0, 0.0, 0.0, 'other'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.4961079817344718, 0.0, 0.0, 0.0, 'other'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 8.269603775378254, 0.0, 0.0, 0.0, 'this'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -0.42023137240633596, 0.0, 0.0, 0.0, 'this'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.855652365179269, 0.0, 0.0, 0.0, 'this'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -8.048026683773955, 0.0, 0.0, 0.0, 'that'],
 [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -4.577046681982131, 0.0, 0.0, 0.0, 'this']]

结果:

{'other': 0.585667907075492,
 'that': -3.530217022955171,
 'this': -0.9035005758618025}

关于python - 我如何能够从给定两个约束的列表列表中找到平均值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56248849/

相关文章:

django - 将 Django 应用程序部署到 Amazon AWS Elastic Beanstalk 时出现问题

python - 用列表中的值替换 pandas.DataFrame 的 NaN 值

python - Unpickle具有向后兼容性的namedtuple(忽略附加属性)

python - 如何通过map或lambda函数提取带有键的dict的值

python - 如果键存在,则按键对字典进行排序,如果不将其放在列表末尾

python - sqlite 的子进程、编码和日志记录问题

python - 如何将字典转换为平面列表?

algorithm - 我应该使用什么编程语言、算法来进行字典翻译?

python - Tensorflow-GPU 在训练期间保存检查点时卡住了 - 也没有使用整个 GPU,不知道为什么

dictionary - map[byte]int 和 map[string]int 有不同的内存使用