这看似简单的问题却让我很头疼(这不是作业,而是实际研究中的症结所在)
我有一个包含 2266 个级别的列表。该列表看起来有点像这样:
[1] ~/folder1/folder1/a.bin
[2] ~/folder1/folder1/b.bin
[3] ~/folder1/folder1/c.bin
[4] ~/folder1/folder2/a.bin
[5] ~/folder1/folder2/b.bin
[6] ~/folder1/folder2/c.bin
解释一下:该列表是我使用 readBin 函数读取的二进制文件的文件名。我想将每一行与其他每一行进行比较,所以我想要的是两列,其中包含从我的单列派生的所有唯一组合。
(choose 2266,2)
告诉我,我们的单列有 2566245 种组合成两列。
`expand.grid()似乎让我成功了一半。但组合数量是我需要的四倍:我得到两行,每行 5132490。这意味着存在重复:1 + 2 和 2 + 1 对于我的目的来说是相同的。
expand.grid.df
与 unique=TRUE
似乎也没有帮助。
我的最后一个想法是对 500 万行中的每一行进行 md5 散列,并尝试以这种方式检测重复项。
我正在寻找某种方法来制作两个列表,涵盖我的列表的 2566245 种组合。或者通过某种方式删除所有重复项。 我想我并不完全热衷于使用 R,并且已经研究过 awk 或 sed 来做同样的事情。但尚未成功。
最佳答案
我认为您正在寻找类似于 expand.grid
的 combn
,使用 @Arun 数据,
v <- c("~/folder1/folder1/a.bin",
"~/folder1/folder1/b.bin",
"~/folder1/folder1/c.bin",
"~/folder1/folder2/a.bin",
"~/folder1/folder2/b.bin",
"~/folder1/folder2/c.bin")
do.call(rbind,combn(v,2,simplify=F))
[,1] [,2]
[1,] "~/folder1/folder1/a.bin" "~/folder1/folder1/b.bin"
[2,] "~/folder1/folder1/a.bin" "~/folder1/folder1/c.bin"
[3,] "~/folder1/folder1/a.bin" "~/folder1/folder2/a.bin"
[4,] "~/folder1/folder1/a.bin" "~/folder1/folder2/b.bin"
[5,] "~/folder1/folder1/a.bin" "~/folder1/folder2/c.bin"
[6,] "~/folder1/folder1/b.bin" "~/folder1/folder1/c.bin"
[7,] "~/folder1/folder1/b.bin" "~/folder1/folder2/a.bin"
[8,] "~/folder1/folder1/b.bin" "~/folder1/folder2/b.bin"
[9,] "~/folder1/folder1/b.bin" "~/folder1/folder2/c.bin"
[10,] "~/folder1/folder1/c.bin" "~/folder1/folder2/a.bin"
[11,] "~/folder1/folder1/c.bin" "~/folder1/folder2/b.bin"
[12,] "~/folder1/folder1/c.bin" "~/folder1/folder2/c.bin"
[13,] "~/folder1/folder2/a.bin" "~/folder1/folder2/b.bin"
[14,] "~/folder1/folder2/a.bin" "~/folder1/folder2/c.bin"
[15,] "~/folder1/folder2/b.bin" "~/folder1/folder2/c.bin"
编辑
我认为路径格式使问题变得过于复杂。如果我们使用例如字母代替文件名,我们会得到:
do.call(rbind,combn(letters[1:4],2,simplify=F))
[,1] [,2]
[1,] "a" "b"
[2,] "a" "c"
[3,] "a" "d"
[4,] "b" "c"
[5,] "b" "d"
[6,] "c" "d"
所以如您所见,没有重复。
关于r - 从一列制作两列,涵盖所有组合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15309433/