performance - 不使用 Matlab 内置函数的无向图中节点唯一邻居的有效方法

我有一个由 N 个节点和 L 个链接组成的无向图，其结构通常由其邻接矩阵 A 和 描述N 行和 N 列。在这个邻接矩阵中，值“1”表示两个节点之间的链接，相反，“0”表示没有链接。

我在 Matlab 工作，想避免内置 graph() 函数提供的所有复杂功能，例如 N = neighbors(G, NODEID) 其中 G 是 G = graph(adjm); 而 adjm 是邻接矩阵。

在此图中，我需要确定哪些唯一节点(邻居)连接到一个或多个起始节点，标记为 idx。我目前正在使用以下代码行:

fidx = unique(ceil(find(adjm(idx,:)) / numel(idx)));

其中 fidx 包含一组 idx 起始节点的邻居(连接节点)。这个操作针对不同的节点重复了很多次，经过一些分析，我发现指令 find(adjm(idx,:)) 确实阻碍了整体运行时间。我的第一次尝试是尽量避免 find()。例如使用:

A = adjm(idx,:);
A = A(:) == 1;
V = 1:size(A,1);
idx = unique(ceil(V(A) / numel(idx)));

但是，这对我的应用程序来说仍然太慢了。

对于减少与该特定代码部分相关的运行时间，您有什么建议？

这是一个最小的例子:

%Minimal example:
f = @() get_neighb();
timeit(f)

function get_neighb()
%generate a graph with a large number of nodes N, in order to have
%computations times large enough:
N = 2e4; %number of nodes
G = round( rand(N) / 1.5 );
G = triu(G) + triu(G,1)';
adjm = G - diag(diag(G));

%for visualisation purposes:
%plot(graph(adjm));

s = size(adjm,1);

%list of some nodes to search for (randomly chosen):
n_nodes_test = 2000; %number of couples to test
%In this minimal example, we focus on searching for the neighbors of each couple (2-tuple) of
%nodes only. But 'snodes' can be single nodes or n-tuples, i.e. containing more
%than 2 nodes
snodes = [ round(linspace(1, s/2, n_nodes_test)); round(linspace(s/2, s, n_nodes_test))]';

%This section is the critical one:
for i=1:n_nodes_test
    
    idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
    
    %fidx = unique(ceil(find(adjm(idx,:)) / numel(idx)));
    %can be replaced by the following:
    A = adjm(idx,:);
    A = A(:) == 1;
    V = 1:size(A,1);
    idxf = unique(ceil(V(A) / numel(idx)));
    
    %which is similar to (using Matlab built-in 'graph' functions):
    % G = graph(adjm);
    % T = unique([neighbors(G, snodes(1,1)); neighbors(G, snodes(1,2))]);
    % here idxf' == T
end


end

基准

这是所提议解决方案的简短基准。关于如何使用 timeit() 或 profiler 准确测量运行时间，有几个建议。在此基准测试中，我们坚持使用分析器来测量每个函数 find_neighb_v() 的运行时间。在上面发布的最小示例中，大约 80% 的运行时间专门用于构建图 adjm。然而，我们的目标是测量检索节点邻居的最快方法，我们不想测量图形创建的时间。

关于图(G)的注释

Matlab 内置套件 graph() 提供了实用且高效的函数来返回节点的邻居。然而，对于邻接矩阵 adjm 的每次直接修改，都需要通过调用 graph(G) 来更新 graph() 结构，这可能非常耗时，特别是对于大型网络。可以直接修改 graph() 结构的底层节点和边列表，但对于使用邻接矩阵属性(例如使用频谱分析)直接处理图拓扑的人来说，这是不可能的。此处还提供了 graph() 函数的运行时间以供比较。

代码

%Benchmarck using profiler:

%generate a graph with a large number of nodes N, in order to have
%computations times large enough:
N = 1.5e4; %number of nodes
G = round( rand(N) / 1.5 );
G = triu(G) + triu(G,1)';
adjm = G - diag(diag(G));

%for visualisation purposes:
%plot(graph(adjm));

s = size(adjm,1);

%list of some nodes to search for (randomly chosen):
n_nodes_test = 2000; %number of couples to test
%In this minimal example, we focus on searching for the neighbors of each couple (2-tuple) of
%nodes only. But 'snodes' can be single nodes or n-tuples, i.e. containing more
%than 2 nodes
snodes = [ round(linspace(1, s/2, n_nodes_test)); round(linspace(s/2, s, n_nodes_test))]';

G_test = graph(adjm);

profile on
%1st version, using find()
find_neighb_v1(adjm, snodes, n_nodes_test);
%2nd version, without find()
find_neighb_v2(adjm, snodes, n_nodes_test);
%3rd version, using any() and find()
find_neighb_v3(adjm, snodes, n_nodes_test);
%base version, using built-in graph() and neighbors-) functions
find_neighb_base(G_test, snodes, n_nodes_test);
profile viewer

function find_neighb_v1(adjm, snodes, n_nodes_test)
    for i=1:n_nodes_test
        idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
        
        idxf = unique(ceil(find(adjm(idx,:)) / numel(idx)));
    end
end

function find_neighb_v2(adjm, snodes, n_nodes_test)

    for i=1:n_nodes_test
        idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
    
        A = adjm(idx,:);
        A = A(:) == 1;
        V = 1:size(A,1);
        idxf = unique(ceil(V(A) / numel(idx)));
    end

end

function find_neighb_v3(adjm, snodes, n_nodes_test)

    for i=1:n_nodes_test
        idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
    
        idxf = find(any(adjm(idx, :)));
    end

end

function find_neighb_base(G, snodes, n_nodes_test)

    for i=1:n_nodes_test
        idx = snodes(i,:); %assign current couple of starting nodes to 'idx'.
        
        idxf = unique([neighbors(G, idx(1)); neighbors(G, idx(2))]);
    end

end

结果与结论

在一台简单的办公电脑上得到的结果:

find_neighb_v1 : 1.437s
find_neighb_v2 : 1.314s
find_neighb_v3 : 0.995s
find_neighb_base : 0.666s

并证明所提出的解决方案 (v3) 更有效，但受到 find() 的阻碍。

最佳答案

您可以使用 any ，所以代码将简化为:

idxf = find(any(adjm(idx, :)));

解释:
假设您有两个逻辑行向量 A 和 B，并且您想要找到包含 1 的向量 C其中 A 或 B 的任何元素都是 1。换句话说，我们想要找到A 或B 的唯一索引，其中它们包含1。我们可以简单地使用逻辑 OR 运算符:

C = A | B;

等效地，您可以使用任何函数来计算C:

C = any([A;B]);

当我们使用 idx = find(C) 时，索引将是唯一的，不需要使用 unique 函数。

第二种解决方案

idxf = find(idx * adjm);

这里我们可以使用矩阵乘法向量 idx * adjm 等价于 sum(adjm(idx, :))。然而，在这种情况下，索引向量 idx 被假定为逻辑向量，其长度与 adjm 的列相同。

关于performance - 不使用 Matlab 内置函数的无向图中节点唯一邻居的有效方法，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/73484902/

performance - 不使用 Matlab 内置函数的无向图中节点唯一邻居的有效方法

基准

关于图(G)的注释

代码

结果与结论

上一篇：c++20 默认比较运算符和空基类

下一篇：python - 如何快速填充一个序列