r - 为二元组创建唯一 ID。无方向性

标签 r uniqueidentifier

我有一个数据框,其中包括对其他国家/地区的国家/年度 import/export 。与示例数据集一样,二元导入和导出的数据不会完全重叠。
例如

    library(tidyverse)

    df <- data.frame("Reporter" = c("USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "Africa","Africa", "Africa","Africa", "Africa","Africa", "Africa","Africa", "EU", "EU","EU", "EU", "EU", "EU","EU", "EU"), 
                     "Partner" = c("Africa","Africa", "Africa","Africa","EU", "EU","EU", "EU", "USA", "USA", "USA", "USA", "EU", "EU","EU", "EU","USA", "USA", "USA", "USA","Africa","Africa", "Africa","Africa"),
                     "Year" = c(1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980,  1970, 1970, 1980, 1980, 1970, 1970, 1980, 1980), 
                     "Flow" = c("Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export","Import", "Export"),
                     "Val" = runif(24, min=0, max=100), stringsAsFactors = FALSE)                    

#     Reporter Partner Year Flow     Val
# 1       USA  Africa 1970 Import 13.169790
# 2       USA  Africa 1970 Export 28.531263
# 3       USA  Africa 1980 Import 66.811160
# 4       USA  Africa 1980 Export 47.556102
# 5       USA      EU 1970 Import 59.166556
# 6       USA      EU 1970 Export 71.032895
# 7       USA      EU 1980 Import 89.688642
# 8       USA      EU 1980 Export 36.563593
# 9    Africa     USA 1970 Import 33.088294
# 10   Africa     USA 1970 Export 10.692528
# 11   Africa     USA 1980 Import 69.296384
# 12   Africa     USA 1980 Export 54.697131
# 13   Africa      EU 1970 Import 64.327314
# 14   Africa      EU 1970 Export 64.659566
# 15   Africa      EU 1980 Import  6.139465
# 16   Africa      EU 1980 Export 97.317815
# 17       EU     USA 1970 Import  7.245794
# 18       EU     USA 1970 Export 72.291265
# 19       EU     USA 1980 Import 14.134386
# 20       EU     USA 1980 Export 60.288242
# 21       EU  Africa 1970 Import 29.648374
# 22       EU  Africa 1970 Export 81.916536
# 23       EU  Africa 1980 Import 47.665834
# 24       EU  Africa 1980 Export 64.307639
我创建了这个数据的广泛版本。
wide_df <- df %>% spread ("Flow", "Val")
我能够为二元组创建定向 ID。
wide_df$ReporterID  <- as.numeric(factor(wide_df$Reporter, levels=unique(wide_df$Reporter)))
但是,结果数据被认为是不同的,例如,美国和非洲,非洲和美国。
问题:如何为每个二元组创建唯一 ID?
谁能想出一种方法让我将这些二元组折叠成一个 ID 代码

最佳答案

library(tidyverse)

# vectorised function to order and combine values
f = function(x,y) paste(sort(c(x, y)), collapse="_")
f = Vectorize(f)

df %>% 
  spread ("Flow", "Val") %>%
  mutate(ID1 = f(Reporter, Partner),
         ID2 = as.numeric(as.factor(ID1)))

#   Reporter Partner  Year Export Import ID1         ID2
# 1 Afica    EU       1970  56.6  98.9   Afica_EU      1
# 2 Afica    EU       1980  95.3   2.25  Afica_EU      1
# 3 Afica    USA      1970  50.4  10.3   Afica_USA     2
# 4 Afica    USA      1980  29.4   3.08  Afica_USA     2
# 5 EU       Afica    1970  88.8  56.3   Afica_EU      1
# 6 EU       Afica    1980  53.6  48.0   Afica_EU      1
# 7 EU       USA      1970   4.50 83.8   EU_USA        3
# 8 EU       USA      1980  79.1   0.473 EU_USA        3
# 9 USA      Afica    1970  61.9  37.2   Afica_USA     2
#10 USA      Afica    1980   9.88 39.6   Afica_USA     2
#11 USA      EU       1970  10.4  29.3   EU_USA        3
#12 USA      EU       1980  21.1  35.3   EU_USA        3

一种选择是 ID1 ,它结合了实际值。

另一种选择是 ID2 ,它基于 ID1 创建一个数字.

这些背后的逻辑ID2 numbers 是 factor 的级别顺序变量 ID1 (即在这种情况下按字母顺序排列)。

如果您不需要原始列 ReporterPartner您可以使用 unite(ID1, Reporter, Partner, remove = T) 排除它们, 或 select(-Reporter, -Partner)在过程结束时。

关于r - 为二元组创建唯一 ID。无方向性,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52316998/

相关文章:

macos - 在 Mac 上自动夜间执行 R 脚本

javascript - 从 javascript 中的字符串数组创建唯一 ID?

encryption - 产生短哈希值的哈希函数?

ios - iOS 8 和 iOS 7 的 UDID 替换

r - 如何在R中按趋势而不是按距离进行聚类?

r - x 轴的斜体标签

r - 在 RMarkdown 文档中使用引用书目时,如何在 RStudio 中使用 --citeproc 而不是 pandoc-citeproc?

r - 在ggplot方面绘制不同值的垂直线

php - 为自动增量字段创建唯一的用户哈希

php - 如何在mysql中有一个序列的唯一id