我已经在 R 中读取了一个 csv 文件,其中包含共同作者数据以及其他信息。该文件的作者列包含如下合着信息:
Miyazaki T., Akisawa A., Saha B.B., El-Sharkawy I.I., Chakraborty A.
Saha B.B., Chakraborty A., Koyama S., Aristov Y.I.
Ali S.M., Chakraborty A.
...
我想将此信息转换为具有以下形式的边缘列表:
Miyazaki T. Akisawa A.
Miyazaki T. Saha B.B.
Miyazaki T. El-Sharkawy I.I.
Miyazaki T. Chakraborty A.
Akisawa A. Saha B.B.
Akisawa A. El-Sharkawy I.I.
Akisawa A. Chakraborty A.
Saha B.B. El-Sharkawy I.I.
Saha B.B. Chakraborty A.
El-Sharkawy I.I. Chakraborty A.
Saha B.B. Chakraborty A.
Saha B.B. Koyama S.
....
基本上,网络是一个无向图。任何帮助/入门代码将不胜感激。此外,有没有一种方法可以保持协作的计数/频率(即在示例中 Saha 与 Chakraborty 一起发布了两次)?
鉴于您的输入数据(在我的示例中为 dat)具有 NA 的缺失值,因为每篇文章的作者数量少于最大值,您可以使用以下方法R-代码:
# data
dat <- rbind(c("Miyazaki T.", "Akisawa A.", "Saha B.B.", "El-Sharkawy I.I.", "Chakraborty A."),
c("Saha B.B.", "Chakraborty A.", "Koyama S.", "Aristov Y.I.", NA),
c("Ali S.M.", "Chakraborty A.", NA, NA, NA))
# loop through all rows of dat (all papers, I presume)
transformed.dat <- lapply(1:nrow(dat), function(row.num) {
row.el <- dat[row.num, ] # the row element that will be used in this loop
# number of authors per paper
n.authors <- length(row.el[!is.na(row.el)])
# creates a matrix with all possible combinations (play around with n.authors, to see what it does)
pairings <- combn(n.authors, 2)
# loop through all pairs and return a vector with one row and two columns
res <- apply(pairings, 2, function(vec) {
return(t(row.el[vec]))
})
# create a data.frame with names aut1 and aut2
res <- data.frame(aut1 = res[1, ],
aut2 = res[2, ])
return(res)
})
# use data.table's rbindlist to bind the list of combinations together
final.dat <- data.table::rbindlist(transformed.dat)
final.dat
# aut1 aut2
# 1: Miyazaki T. Akisawa A.
# 2: Miyazaki T. Saha B.B.
# 3: Miyazaki T. El-Sharkawy I.I.
# 4: Miyazaki T. Chakraborty A.
# 5: Akisawa A. Saha B.B.
# 6: Akisawa A. El-Sharkawy I.I.
# 7: Akisawa A. Chakraborty A.
# 8: Saha B.B. El-Sharkawy I.I.
# 9: Saha B.B. Chakraborty A.
# 10: El-Sharkawy I.I. Chakraborty A.
# 11: Saha B.B. Chakraborty A.
# 12: Saha B.B. Koyama S.
# 13: Saha B.B. Aristov Y.I.
# 14: Chakraborty A. Koyama S.
# 15: Chakraborty A. Aristov Y.I.
# 16: Koyama S. Aristov Y.I.
# 17: Ali S.M. Chakraborty A.