r - 如何为 R 中的桑基图准备输入数据?

标签 r data-visualization sankey-diagram networkd3

我正在尝试制作 sankey diagram在 R 中,也称为河流图。我看过这个问题 Sankey Diagrams in R?其中列出了各种生成桑基图的包。由于我有输入数据并且知道不同的工具/包,我可以生成这样的图表,但我的问题是:我该如何准备输入数据?

假设我们想展示用户如何在 10 天内在不同状态之间迁移,并拥有如下所示的起始数据集:

data.frame(userID = 1:100,
                     day1_state = sample(letters[1:8], replace = TRUE, size = 100),
                     day2_state = sample(letters[1:8], replace = TRUE, size = 100),
                     day3_state = sample(letters[1:8], replace = TRUE, size = 100),
                     day4_state = sample(letters[1:8], replace = TRUE, size = 100),
                     day5_state = sample(letters[1:8], replace = TRUE, size = 100),
                     day6_state = sample(letters[1:8], replace = TRUE, size = 100),
                     day7_state = sample(letters[1:8], replace = TRUE, size = 100),
                     day8_state = sample(letters[1:8], replace = TRUE, size = 100),
                     day9_state = sample(letters[1:8], replace = TRUE, size = 100),
                     day10_state = sample(letters[1:8], replace = TRUE, size = 100)
                     ) -> dt

现在,如果你想用 networkD3 package 创建一个桑基图应该如何改造这个dt data.frame 到所需的输入

这样我们就可以从这个例子中输入
library(networkD3)
URL <- paste0(
        "https://cdn.rawgit.com/christophergandrud/networkD3/",
        "master/JSONdata/energy.json")
Energy <- jsonlite::fromJSON(URL)
# Plot
sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source",
             Target = "target", Value = "value", NodeID = "name",
             units = "TWh", fontSize = 12, nodeWidth = 30)

编辑

我发现这样的脚本可以在其他情况下准备数据并复制它,所以我认为它现在可能已关闭:

https://github.com/mi2-warsaw/JakOniGlosowali/blob/master/sankey/sankey.R

最佳答案

我发现这样的脚本可以在其他情况下准备数据并复制它,所以我认为它现在可能已关闭:

https://github.com/mi2-warsaw/JakOniGlosowali/blob/master/sankey/sankey.R

然后这段代码为问题data.frame中提到的生成这样的sankey图

fixtable <- function(...) {
    tab <- table(...)
    if (substr(colnames(tab)[1],1,1) == "_" &
                substr(rownames(tab)[1],1,1) == "_") {
        tab2 <- tab
        colnames(tab2) <- sapply(strsplit(colnames(tab2), split=" "), `[`, 1)
        rownames(tab2) <- sapply(strsplit(rownames(tab2), split=" "), `[`, 1)
        tab2[1,1] <- 0
        # mandat w klubie
        for (par in names(which(tab2[1,] > 0))) {
            delta = min(tab2[par, 1], tab2[1, par])
            tab2[par, par] = tab2[par, par] + delta
            tab2[1, par] = tab2[1, par] - delta
            tab2[par, 1] = tab2[par, 1] - delta
        }
        # przechodzi przez niezalezy
        for (par in names(which(tab2[1,] > 0))) {
            tab2["niez.", par] = tab2["niez.", par] + tab2[1, par]
            tab2[1, par] = 0
        }
        for (par in names(which(tab2[,1] > 0))) {
            tab2[par, "niez."] = tab2[par, "niez."] + tab2[par, 1]
            tab2[par, 1] = 0
        }

        tab[] <- tab2[] 
    }
    tab
}


flow2 <- rbind(
    data.frame(fixtable(z = paste0(dat$day1_state, " day1"), do = paste0(dat$day2_state, " day2"))),
    data.frame(fixtable(z = paste0(dat$day2_state, " day2"), do = paste0(dat$day3_state, " day3"))),
    data.frame(fixtable(z = paste0(dat$day3_state, " day3"), do = paste0(dat$day4_state, " day4"))),
    data.frame(fixtable(z = paste0(dat$day4_state, " day4"), do = paste0(dat$day5_state, " day5"))),
    data.frame(fixtable(z = paste0(dat$day5_state, " day5"), do = paste0(dat$day6_state, " day6"))),
    data.frame(fixtable(z = paste0(dat$day6_state, " day6"), do = paste0(dat$day7_state, " day7"))),
    data.frame(fixtable(z = paste0(dat$day7_state, " day7"), do = paste0(dat$day8_state, " day8"))),
    data.frame(fixtable(z = paste0(dat$day8_state, " day8"), do = paste0(dat$day9_state, " day9"))),
    data.frame(fixtable(z = paste0(dat$day9_state, " day9"), do = paste0(dat$day10_state, " day10"))))

flow2 <- flow2[flow2[,3] > 0,]

nodes2 <- data.frame(name=unique(c(levels(factor(flow2[,1])), levels(factor(flow2[,2])))))
nam2 <- seq_along(nodes2[,1])-1
names(nam2) <- nodes2[,1]

links2 <- data.frame(source = nam2[as.character(flow2[,1])],
                                        target = nam2[as.character(flow2[,2])],
                                        value = flow2[,3])

sankeyNetwork(Links = links, Nodes = nodes,
                            Source = "source", Target = "target",
                            Value = "value", NodeID = "name",
                            fontFamily = "Arial", fontSize = 12, nodeWidth = 40,
                            colourScale = "d3.scale.category20()")

关于r - 如何为 R 中的桑基图准备输入数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33499476/

相关文章:

regex - 使用 gsub 和 regex 提取特定单词

r - R直方图-频率范围

cassandra - 有 Cassandra 键空间的可视化工具吗?

javascript - 如何为每个 x 值绘制具有多个 y 值的散点图?

c++ - 如何使用 C++ gnuplot 生成图像文件?

javascript - 使用 D3.js 库更改桑基图中节点的颜色。

python - 在 matplotlib 桑基图中连接流程

r - 如何找到半径 250 米附近的位置

css - 如何修复带有 d3 的 Sankey 链接的 SVG 路径元素上不需要的圆圈/损坏?

r - 相互合并列表中的数据帧