我有这样的文件:
President Dr. Norbert Lammert: I declare the session open.
I will now give the floor to Bundesminister Alexander Dobrindt.
(Applause of CDU/CSU and delegates of the SPD)
Alexander Dobrindt, Minister for Transport and Digital Infrastructure:
Ladies and Gentleman. We will today start the biggest investment in infrastructure that ever existed, with over 270 billion Euro, over 1 000 projects and a clear financing perspective.
(Volker Kauder [CDU/CSU]: Genau!)
(Applause of the CDU/CSU and the SPD)
当我阅读那些 .txt 文档时,我想创建第二列,指示演讲者姓名。
所以我尝试的是首先创建一个所有可能名称的列表并替换它们..
library(qdap)
members <- c("Alexander Dobrindt, Minister for Transport and Digital Infrastructure:","President Dr. Norbert Lammert:")
members_r <- c("@Alexander Dobrindt, Minister for Transport and Digital Infrastructure:","@President Dr. Norbert Lammert:")
prok <- scan(".txt", what = "character", sep = "\n")
prok <- mgsub(members,members_r,prok)
prok <- as.data.frame(prok)
prok$speaker <- grepl("@[^\\@:]*:",prok$prok, ignore.case = T)
我的计划是然后通过正则表达式获取@ 和 : 之间的名称,如果 Speaker == true 并将其向下应用,直到出现不同的名称(并显然删除所有掌声/喊叫括号),但这也是我不确定如何我可以那样做。
最佳答案
这是方法:
require (qdap)
#text is the document text
# remove round brackets and text b/w ()
a <- bracketX(text, "round")
names <- c("President Dr. Norbert Lammert","Alexander Dobrindt" )
searchString <- paste(names[1],names[2], sep = ".+")
# Get string from names[1] till names[2] with the help of searchString
string <- regmatches(a, regexpr(searchString, a))
# remove names[2] from string
string <- gsub(names[2],"",string)
当名字超过2个时,此代码可以循环
关于r - 在 RStudio 中拆分扬声器和对话,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41100482/