R:用 0 随机替换数据框的元素

标签 r dataframe random integer data-manipulation

我正在使用 R 编程语言。

假设我有以下数据框:

var_1 = var_2 = var_3 = var_4 = var_5 =  c("1,2,3,4,5,6,7,8,9,10")

my_data = data.frame(var_1,var_2,var_3,var_4,var_5)

my_data = rbind(my_data, my_data[rep(1, 100), ])

rownames(my_data) = 1:nrow(my_data)

数据如下所示:

    head(my_data)

                 var_1                var_2                var_3                var_4                var_5
1 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
2 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
3 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
4 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
5 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
6 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10

我的问题:我想用 0 随机替换此数据框中的元素 - 例如,最终结果应如下所示(为简洁起见,我只显示第一行) :

# desired result

                 var_1                var_2                var_3                var_4                var_5
1 1,0,3,0,5,6,0,0,9,10 1,2,0,4,5,0,0,8,9,0 1,0,3,0,0,0,0,8,9,0 1,2,3,4,0,6,7,0,0,10 1,2,0,4,5,0,7,8,0,10

我尝试使用以下代码行 (Replace random values in a column in a dataframe) 执行此操作:

my_data$var_1[sample(nrow(my_data),as.integer(0.5*nrow(my_data)) , replace = TRUE)] <- 0
my_data$var_2[sample(nrow(my_data),as.integer(0.5*nrow(my_data)), replace = TRUE)] <- 0
my_data$var_3[sample(nrow(my_data),as.integer(0.5*nrow(my_data)), replace = TRUE)] <- 0
my_data$var_4[sample(nrow(my_data),as.integer(0.5*nrow(my_data)), replace = TRUE)] <- 0
my_data$var_5[sample(nrow(my_data),as.integer(0.5*nrow(my_data)), replace = TRUE)] <- 0

但这是用 0 替换行中的所有元素(而不是仅替换行中的某些元素):

head(my_data)
                 var_1                var_2                var_3                var_4                var_5
1 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10                    0                    0                    0
2                    0                    0 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10                    0
3                    0 1,2,3,4,5,6,7,8,9,10                    0 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
4                    0 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10                    0 1,2,3,4,5,6,7,8,9,10
5 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
6 1,2,3,4,5,6,7,8,9,10                    0 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10                    0

有人可以告诉我我做错了什么以及如何获得预期的结果吗?

谢谢!

最佳答案

这是一个版本,它允许您使用 Map 在每一列中分别指定成为 0 的概率向量 pnul。拆分字符串的 length 乘以 pnul 的元素以获得设置为零的 sample 的数量。您还可以将 pnul 设置为所有列中具有相同概率的标量。

pnul <- c(.0, .2, .5, .8, 1)

res <- Map(\(x, a) {
  S <- strsplit(x, ',')
  sapply(S, \(s) {
    s[sample(seq_along(s), length(s)*a)] <- '0'
    paste(s, collapse=',')
  })
}, my_data, pnul) |> as.data.frame()

head(res)
#                  var_1                var_2                var_3                var_4               var_5
# 1 1,2,3,4,5,6,7,8,9,10 0,0,3,4,5,6,7,8,9,10  1,2,0,4,0,0,7,8,0,0  0,0,0,0,0,0,0,8,9,0 0,0,0,0,0,0,0,0,0,0
# 2 1,2,3,4,5,6,7,8,9,10  1,0,3,4,5,6,7,8,9,0 1,0,3,0,5,0,0,0,9,10  0,0,0,0,0,0,7,8,0,0 0,0,0,0,0,0,0,0,0,0
# 3 1,2,3,4,5,6,7,8,9,10 1,0,0,4,5,6,7,8,9,10 1,0,0,0,0,6,7,0,9,10 0,0,0,0,5,0,0,0,0,10 0,0,0,0,0,0,0,0,0,0
# 4 1,2,3,4,5,6,7,8,9,10 1,2,3,0,5,6,7,0,9,10 0,0,3,0,5,0,7,0,9,10  0,0,0,4,0,0,7,0,0,0 0,0,0,0,0,0,0,0,0,0
# 5 1,2,3,4,5,6,7,8,9,10  1,0,3,4,5,6,7,8,9,0 0,2,0,4,5,0,7,0,0,10  1,0,0,0,0,0,0,8,0,0 0,0,0,0,0,0,0,0,0,0
# 6 1,2,3,4,5,6,7,8,9,10 0,2,3,4,5,6,0,8,9,10  1,2,3,0,5,0,7,0,0,0  0,0,0,4,5,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0

关于R:用 0 随机替换数据框的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/71814413/

相关文章:

r - 在 R 中双重集成

r - read.dcf(path_desc) : Line starting 'This corresponds to ...' is malformed 中的 devtools 错误

r - geom_boxplot() : forcing an empty level to appear

r - 是否可以计算 R 中的算术运算次数?

r - 如何通过另一个变量对 data.frame 的列进行排序

python - 来自列表值字典的 Pandas 数据框

python - 创建具有给定行数和重复值的数据框

vb.net - 如何从Web远程在Visual Basic中播放随机WAV文件

c# - 为什么 minValue 是包含的,而 maxValue 是 Random.Next() 独占的?

python - 在python中创建随机数列表