python - 具有权重 Python 的节点对出现率

我必须计算一系列数据中节点对出现的次数和百分比。我能够很好地计算这个，但是，当我还包括第三个参数时，问题就来了，即以秒为单位的时间。这些节点对相互联系并花费一些时间连接，该时间以秒为单位。还应该计算相应节点对的总时间(以秒为单位)。

例如:输入

    Node  Node Time
     A     B    455
     A     B    456
     A     B    463
     A     C    4
     A     C    675
     C     B    64
     C     B    78
     C     B    579

我得到的输出到目前为止是正确的。

Node   Node   paircount pairpercentage
   A       B      3       37.5
   A       C      2       25
   C       B      3       37.5

我应该得到的输出是

Node Node Paircount pairpercentage  Time
A     B      3       37.5            1374
A     C      2       25               679
C     B      3       37.5             721

代码:

from collections import defaultdict
d = defaultdict(int)

# get number of occurences for the first two columns
with open('Inputfile.txt', 'r') as f:
    #f.readline() # discard the header line
    for numlines, line in enumerate(f,1):
        line = line.strip().split()
        c = line[0], line[1]
        d[c] += 1

# compute 100*(occurences/numlines) for each key in d
d = {k:(v, 100*float(v)/numlines) for k,v in d.iteritems()}
with open('outputfile.txt', 'w') as outfile:
 for k in d:
    #print k, d[k]
    outfile.write("%s %s\n" % (k, d[k]))

注意:上面的代码非常适合我提到的那一半我需要一些帮助来处理剩下的一半关于节点对的时间加法。

输入文件

5454 5070 2755.0
5070 4391 2935.0
1158 305  1.0
5045 3140 48767.0
4921 3140 58405.0
5372 2684 460.0
1885 1158 351.0
1349 1174 6375.0
1980 1174 650.0
1980 1349 650.0
4821 2684 469.0
4821 937  459.0
2684 937  318.0
1980 606  390.0
1349 606  750.0
1174 606  750.0
5045 3545 8133.0
4921 3545 8133.0
3545 3140 8133.0
5045 4243 14863.0
4921 4243 14863.0
4243 3545 8013.0
4243 3140 14863.0
4821 4376 5471.0
4376 937  136.0
2613 968  435.0
5372 937  83.0

代码 2:在下面答案的帮助下，我可以计算配对数和时间，但我现在无法获得百分比。我也在寻找一些输出清理。

from collections import defaultdict
paircount = defaultdict(int)
pairtime = defaultdict(float)
pairper = defaultdict(float)

#get number of pair occurrences and total time 
with open('USC_Test.txt', 'r') as f:
  with open('pair.txt', 'w') as o:
    numline = 0
    for line in f:
        numline += 1
            line = line.split()
        pair = line[0], line[1]
        paircount[pair] += 1
        pairtime[pair] += float(line[2])
        pairper = float(paircount/line)*100      

print "%s\n" % paircount
print "%s\n" % pairtime
print "%s\n" % pairper

输出:仅当 pairper = float(paircount/line)*100 被注释时，否则它会给出如下所示的错误。

defaultdict(<type 'int'>, {('1349', '606'): 1, ('2684', '937'): 1,
defaultdict(<type 'float'>, {('1349', '606'): 750.0, ('2684', '937'): 318.0,

但是当pairper = float(paircount/line)*100不在注释中时

错误:TypeError: unsupported operand type(s) for /: 'collections.defaultdict' and 'list'

预期输出没有 defaultdict(<type 'int'>, or deafultdict(<type 'float'> 文本，只有

node node paircount pairper   pairtime
1349 606     1      somevalue  750.0
2684 937     1      somevalue  318.0

任何建议表示感谢。

最佳答案

您可以使用另一个 defaultdict 将时间加起来。 (编辑:现在进行了其他更改。)

from collections import defaultdict
paircount = defaultdict(int)
pairtime = defaultdict(float)

with open('Inputfile.txt') as f:
    numlines = 0
    for line in f:
        numlines += 1
        line = line.split()
        pair = line[0], line[1]
        paircount[pair] += 1
        pairtime[pair] += float(line[2])

pairper = dict((pair, c * 100.0 / numlines) for (pair, c) in paircount.iteritems())
for pair, c in paircount.iteritems():
    print pair[0], pair[1], c, pairper[pair], pairtime[pair]

我还修复了涉及 numlines 的逐一错误，删除了冗余的 .strip()，并为清楚起见重命名了一些变量。

关于python - 具有权重 Python 的节点对出现率，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25469274/

python - 具有权重 Python 的节点对出现率

上一篇：java - 如何在 Java 中获得两个具有容差的 HashMap 的交集？

下一篇：java - 文件必须位于何处才能对其使用算法(即二进制搜索)？