arrays - 寻找所有最大的序列

将元素具有不同长度的多维数字数组中的模式聚类的合适算法或策略是什么。

一个例子是包含这些元素的数组:

0: [4,2,8,5,3,2,8]
1: [1,3,6,2]
2: [8,3,8]
3: [3,2,5,2,1,8]

目标是找到这些数字列表中的模式并将其聚类。例如在元素“3”中有模式:“2,5,2,8”(不连续)也可以在元素“0”中找到。找到的模式的编号在元素“0”和元素“3”中都不连续，但它们具有相同的顺序。

注意:该示例使用整数是为了更清楚，但实际数据将使用 float ，而不是完全相同，当两者在给定阈值内分开时，它们将被视为“匹配”。

编辑 2: 如果我们只选择最长的公共(public)子序列，虽然 Abhishek Bansai 的方法很有帮助，但我们可能会错过其他重要的模式。例如这两个序列:

0: [4,5,2,1,3,6,8,9]
1: [2,1,3,4,5,6,7,8]

最长的公共(public)子序列是 [2,1,3,6,8]，但我们会遗漏另一个重要的子序列 [4,5,6,8]。

编辑 1: Abhishek Bansai 的回答似乎是解决此问题的好方法。

这是 Longest Common Subsequence algorithm :

使用此算法将每个元素与其他每个元素进行比较将返回所有模式，下一步将根据这些模式生成集群。

最佳答案

由于您似乎对通过查找所有匹配项(每个编辑 1,2)来查找序列之间的“相似性”更感兴趣，您会发现 Sequence Alignment 领域有大量研究。来自维基:

In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino acid residues are typically represented as rows within a matrix. Gaps are inserted between the residues so that identical or similar characters are aligned in successive columns.

关于arrays - 寻找所有最大的序列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/23966882/

arrays - 寻找所有最大的序列

上一篇：algorithm - 使用比例阈值进行聚类

下一篇：javascript - Ratchet 删除/添加数据忽略 ="push"