regex - 如何在 Stata 中每次出现字符串时向字符串添加递增值?

标签 regex stata

我有一个名为talk的字符串变量。假设我想在 talk 中查找单词“please”的所有实例,并在每一行中为每个“please”添加一个后缀,其中包含该单词的递增计数。

例如,如果 talk 如下所示:

"will you please come here please do it as soon as you can if you please"

我希望它看起来像这样:

"will you please1 come here please2 do it as soon as you can if you please3"

换句话说,“please1”表示这是第一个出现的“please”,“please2”是第二个出现的,依此类推。

我使用正则表达式和几个循环编写了一些代码(如下),但它不能完美工作,即使我可以解决问题,它似乎过于复杂。 有更简单的方法吗?

# I first extract the portion of 'talk' beginning from the 1st please to the last   
    gen talk_pl = strtrim(stritrim(regexs(0))) if regexm(talk, "please.+please")
# I count the number of times "please" occurs in 'talk_pl'
    egen count = noccur(talk_pl), string("please")
# in the loop below, x = 2nd to last word; i = 3rd to last word 
    qui levelsof count
    foreach n in `r(levels)' {
            local i = `n' -1
            local x = `i' -1
            replace talk_pl = regexrf(talk_pl, "please$", "please`n'") if count == `n'      
            replace talk_pl = regexrf(talk_pl, "please (?=.+?please`n')", "please`i' ") if count == `n' 
            replace talk_pl = regexrf(talk_pl, "please (?=.+?please`i')", "please`x' ") if count == `n'         
        }

最佳答案

* Example generated by -dataex-. To install: ssc install dataex
clear
input str71 talk
"will you please come here please do it as soon as you can if you please"
end

// Install egenmore if not installed already
* ssc install egenmore

clonevar wanted = talk

// count occurrences of "please"
egen countplease = noccur(talk), string(please)

// Loop over 1 to max number of occurrences
sum countplease, meanonly 
forval i = 1/`r(max)' {
    replace wanted = ustrregexrf(wanted, "\bplease\b", "please`i'")
}
list

     +---------------------------------------------------------------------------------------+
  1. |                                                                           talk        |
     |        will you please come here please do it as soon as you can if you please        |
     |---------------------------------------------------------------------------------------|
     |                                                                     wanted | countp~e |
     | will you please1 come here please2 do it as soon as you can if you please3 |        3 |
     +---------------------------------------------------------------------------------------+

关于regex - 如何在 Stata 中每次出现字符串时向字符串添加递增值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65604864/

相关文章:

python - 负前瞻正则表达式贪婪(为什么.*?太贪婪)

java - 匹配不同卡片的正则表达式模式

javascript - 如何将正则表达式与字符串中间的起始索引匹配?

r - 将标记数字变量的变量标签转换为新的字符变量

r - 像在 Stata 中一样标记 R 中的所有重复行

r - 使用向量在 R 中标记变量

regex - paper-input-container 中的 pattern 属性是什么?

javascript - 设置正确的正则表达式

stata - 在 if 命令中使用值标签

stata - `touse' 和 `varlist' 是本地人的保留名称吗?