r - 将字符添加到列中的重复行

标签 r

我有一个看起来像这样的数据框:

          var1         var2
1  927720_2005  927720_2006
2  927720_2006  927720_2007
3  841555_2005  841555_2006
4   88095_2005   88095_2006
5 1003464_2005 1003464_2006
6 1003464_2005 1003464_2006
7 1003464_2006 1003464_2007
8 1037388_2005 1037388_2006
9 1037388_2006 1037388_2007

观察1003464_2005在专栏 var1是重复的,所以当我申请 rownames(MyMatrix) <- df$var1 时rownames 有一个观察结果为 1003464_2005另一个为 1003464_2005.1 。我不介意这一点,但当我使用 colnames(MyMatrix) <- df$var2 时,列名允许有重复项。 .

我想让数据:

          var1          var2
1  927720_2005   927720_2006
2  927720_2006   927720_2007
3  841555_2005   841555_2006
4   88095_2005   88095_2006
5 1003464_2005   1003464_2006
6 1003464_2005.1 1003464_2006.1
7 1003464_2006   1003464_2007
8 1037388_2005   1037388_2006
9 1037388_2006   1037388_2007

如果我在var1中有“3个重复项”只需在 1003464_2005.2 上添加另一个“计数器”即可或1003464_2005.1.1 。这样我在 var1 中就不会出现重复项。列以及已“添加”到 var1 的内容列添加到 var2专栏。

预先感谢您的帮助!

数据:

df <- structure(list(var1 = structure(c(7L, 8L, 5L, 6L, 1L, 1L, 2L, 
3L, 4L), .Label = c("1003464_2005", "1003464_2006", "1037388_2005", 
"1037388_2006", "841555_2005", "88095_2005", "927720_2005", "927720_2006"
), class = "factor"), var2 = c("927720_2006", "927720_2007", 
"841555_2006", "88095_2006", "1003464_2006", "1003464_2006", 
"1003464_2007", "1037388_2006", "1037388_2007")), class = "data.frame", row.names = c(NA, 
-9L))

对于重复的行名,我的矩阵如下所示:

structure(c(0.0000000000000000111365086910415, 0.0242390433922595, 
0.294121286748089, 0.302965878225595, 0.259626633772708, 0.25760904856241, 
0.248574305825551, 0.17848782814175, 0.191657814393258, 0.0242390433922595, 
0.0000000000000000113968217215608, 0.310381807852827, 0.293653514681392, 
0.245957439956465, 0.249142123526167, 0.251115609138352, 0.166302748882678, 
0.176256028117321, 0.294121286748089, 0.310381807852827, -0.00000000000000000151197688178523, 
0.355703128500295, 0.319662657194485, 0.317127296846476, 0.305644319511071, 
0.255031411391534, 0.275597914790561, 0.302965878225595, 0.293653514681392, 
0.355703128500295, 0.00000000000000000801369440490437, 0.309841957462355, 
0.311910981514099, 0.317253692884325, 0.254335300246398, 0.265496031285385, 
0.259626633772708, 0.245957439956465, 0.319662657194485, 0.309841957462355, 
0.0000000000000000105380873106143, 0.0104634838149491, 0.0245937753301301, 
0.221744045353809, 0.22476375867925, 0.25760904856241, 0.249142123526167, 
0.317127296846476, 0.311910981514099, 0.0104634838149491, 0.00000000000000000986038424517971, 
0.0257337720292454, 0.220483645448676, 0.224712591289328, 0.248574305825551, 
0.251115609138352, 0.305644319511071, 0.317253692884325, 0.0245937753301301, 
0.0257337720292454, 0.0000000000000000121630774340264, 0.213285559165696, 
0.229922308724439, 0.17848782814175, 0.166302748882678, 0.255031411391534, 
0.254335300246398, 0.221744045353809, 0.220483645448676, 0.213285559165696, 
0.0000000000000000139766402734024, 0.0152113168185518, 0.191657814393258, 
0.176256028117321, 0.275597914790561, 0.265496031285385, 0.22476375867925, 
0.224712591289328, 0.229922308724439, 0.0152113168185518, 0.0000000000000000120926010568502
), .Dim = c(9L, 9L), .Dimnames = list(c("927720_2005", "927720_2006", 
"841555_2005", "88095_2005", "1003464_2005", "1003464_2005", 
"1003464_2006", "1037388_2005", "1037388_2006"), c("927720_2005", 
"927720_2006", "841555_2005", "88095_2005", "1003464_2005", "1003464_2005", 
"1003464_2006", "1037388_2005", "1037388_2006")))

最佳答案

使用make.unique可以轻松完成。循环遍历感兴趣的列,然后应用函数 make.unique。它期望该列是字符类。根据 ?make.unique

names - a character vector

所以,如果是factor,则将其转换为character

df[] <- lapply(df, function(x) make.unique(as.character(x)))

dplyr中,这可以类似地完成,但使用mutate_if

library(dplyr)
df %>%
   mutate_if(is.factor, as.character) %>%
   mutate_if(is.character, make.unique)

关于r - 将字符添加到列中的重复行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57876084/

相关文章:

r - 在 R 的新 session 中创建新项目

R:为什么类 Date 在子集化时丢失

r - 如何将 'day' 格式化为 POSIXct 日期中的单个数字

r - 在创建列表 tibble 列时在 "mutation"中使用 dplyr::sym() 会导致错误 is_symbol(x): object '.x' not found

r - 使用区间来分配分类值

读取带有单引号和双引号的字符串

R - 根据其他列中的组元素数量创建列

r - R中的嵌套布局

r - 使用 R 中的管道创建 data.frame - 并命名列

r - 理解 c( ) 对命名向量的影响