python - 如何从交易行中有效地构建亲和性矩阵?

标签 python json graph affinity data-munging

给定节点之间的交易(可能大到 2+GB)json 文件,大约有 100 万个节点和大约 1000 万个交易,每个交易有 10-1000 个节点,例如

{"transactions":
 [
  {"transaction 1": ["node1","node2","node7"], "weight":0.41},
  {"transaction 2": ["node4","node2","node1","node3","node10","node7","node9"], "weight":0.67},
  {"transaction 3": ["node3","node10","node11","node2","node1"], "weight":0.33},...
  ]
}

将其转换为节点亲和性矩阵的最优雅和最有效的Python方式是什么,其中亲和性是节点之间加权交易的总和。

affinity [i,j] = weighted transaction count between nodes[i] and nodes[j] = affinity [j,i]

例如

affinity[node1, node7] = [0.41 (transaction1) + 0.67 (transaction2)] / 2 = affinity[node7, node1]

注意:亲和性矩阵是对称的,因此仅计算下三角就足够了。

数值不具有代表性***仅是结构示例!

            节点1 |节点2 |节点3 |节点4 | ....
节点1          1          .4          .1         .9    ...
节点2         .4           1          .6         .3    ...
节点3         .1          .6           1         .7    ...< br/> 节点4         .9          .3          .7 1    ...
...


最佳答案

首先,我会清理数据并用整数表示每个节点,并从这样的字典开始

data=[{'transaction': [1, 2, 7], 'weight': 0.41},
      {'transaction': [4, 2, 1, 3, 10, 7, 9], 'weight': 0.67},
      {'transaction': [3, 10, 11, 2, 1], 'weight': 0.33}]

不确定这是否足够Pythonic,但它应该是不言自明的

def weight(i,j,data_item):
    return data_item["weight"] if i in data_item["transaction"] and j in data_item["transaction"] else 0

def affinity(i,j):
    if j<i: # matrix is symmetric
        return affinity(j,i)
    else:
        weights = [weight(i,j,data_item) for data_item in data if weight(i,j,data_item)!=0]
        if len(weights)==0:
            return 0
        else:
            return sum(weights) / float(len(weights))

ln = 10 # number of nodes
A = [[affinity(i,j) for j in range(1,ln+1)] for i in range(1,ln+1)]

查看亲和性矩阵

import numpy as np
print(np.array(A))
    [[ 0.47  0.47  0.5   0.67  0.    0.    0.54  0.    0.67  0.5 ]
     [ 0.47  0.47  0.5   0.67  0.    0.    0.54  0.    0.67  0.5 ]
     [ 0.5   0.5   0.5   0.67  0.    0.    0.67  0.    0.67  0.5 ]
     [ 0.67  0.67  0.67  0.67  0.    0.    0.67  0.    0.67  0.67]
     [ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
     [ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
     [ 0.54  0.54  0.67  0.67  0.    0.    0.54  0.    0.67  0.67]
     [ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.  ]
     [ 0.67  0.67  0.67  0.67  0.    0.    0.67  0.    0.67  0.67]
     [ 0.5   0.5   0.5   0.67  0.    0.    0.67  0.    0.67  0.5 ]]

关于python - 如何从交易行中有效地构建亲和性矩阵?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44451015/

相关文章:

python - 如何压缩两个不同大小的列表,重复较短的列表?

ruby-on-rails - Rails 可以自动解析从表单 text_field 收到的日期时间吗

algorithm - 寻找强连通分量?

c++ - 嵌入 Python 并向解释器添加 C 函数

python - : Python, ctypes.windll.user32.SystemParametersInfoA 中的参数是什么?

python - Keras 变分自动编码器示例 - 潜在输入的使用

java - 为什么只有根记录器在具有 JSON 配置的 Log4j 2 中工作?

android - Kotlin使用Gson反序列化本地json文件

javascript - Highcharts:动态(以编程方式)分配轴名称

c++ - C++中的图轴校准