假设我有一个第一列有几个数字的数据框。我想获取这些数字,将它们用作字符串中的位置,并获取一个在该位置前后包含 2 个字符的子字符串。澄清一下,
aggSN <- data.frame(V1=c(5,6,7,8),V2="blah")
gen <- "AJSDAFKSDAFJKLASDFKJKA" # <- take this string
aggSN # <- take the numbers in the first column
# V1 V2
# 5 blah
# 6 blah
# 7 blah
# 8 blah
并创建一个看起来像的新列 V3
aggSN
# V1 V2 V3
# 5 blah SDAFK # <- took the two characters before and after the 5th character
# 6 blah DAFKS # <- took the two characters before and after the 6th character
# 7 blah AFKSD # <- took the two characters before and after the 7th character
# 10 blah SDAFJ # <- took the two characters before and after the 10th character
# 2 blah AJSD # <- here you can see that it the substring cuts off
目前我正在使用 for 循环,它可以工作,但在处理非常大的数据帧和大字符串时会花费大量时间。还有其他选择吗?谢谢。
fillvector <- ""
for(j in 1:nrow(aggSN)){fillvector[j] <- substr(gen,aggSN[j,V1]-2,aggSN[j,V1]+2)}
aggSN$V9 <- fillvector
最佳答案
无需编写循环即可使用substring()
aggSN <- data.frame(V1=c(5,6,7,8,2),V2="blah")
gen <- "AJSDAFKSDAFJKLASDFKJKA"
with(aggSN, substring(gen, V1-2, V1+2))
# [1] "SDAFK" "DAFKS" "AFKSD" "FKSDA" "AJSD"
所以要添加新列,
aggSN$V3 <- with(aggSN, substring(gen, V1-2, V1+2))
aggSN
# V1 V2 V3
# 1 5 blah SDAFK
# 2 6 blah DAFKS
# 3 7 blah AFKSD
# 4 8 blah FKSDA
# 5 2 blah AJSD
如果您想要更快一些,我会使用 stringi::stri_sub
代替 substring()
。
关于R创建没有for循环的新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31778557/