regex - 如何在 Stata 中每次出现字符串时向字符串添加递增值？

我有一个名为talk的字符串变量。假设我想在 talk 中查找单词“please”的所有实例，并在每一行中为每个“please”添加一个后缀，其中包含该单词的递增计数。

例如，如果 talk 如下所示:

"will you please come here please do it as soon as you can if you please"

我希望它看起来像这样:

"will you please1 come here please2 do it as soon as you can if you please3"

换句话说，“please1”表示这是第一个出现的“please”，“please2”是第二个出现的，依此类推。

我使用正则表达式和几个循环编写了一些代码(如下)，但它不能完美工作，即使我可以解决问题，它似乎过于复杂。 有更简单的方法吗？

# I first extract the portion of 'talk' beginning from the 1st please to the last   
    gen talk_pl = strtrim(stritrim(regexs(0))) if regexm(talk, "please.+please")
# I count the number of times "please" occurs in 'talk_pl'
    egen count = noccur(talk_pl), string("please")
# in the loop below, x = 2nd to last word; i = 3rd to last word 
    qui levelsof count
    foreach n in `r(levels)' {
            local i = `n' -1
            local x = `i' -1
            replace talk_pl = regexrf(talk_pl, "please$", "please`n'") if count == `n'      
            replace talk_pl = regexrf(talk_pl, "please (?=.+?please`n')", "please`i' ") if count == `n' 
            replace talk_pl = regexrf(talk_pl, "please (?=.+?please`i')", "please`x' ") if count == `n'         
        }

最佳答案

* Example generated by -dataex-. To install: ssc install dataex
clear
input str71 talk
"will you please come here please do it as soon as you can if you please"
end

// Install egenmore if not installed already
* ssc install egenmore

clonevar wanted = talk

// count occurrences of "please"
egen countplease = noccur(talk), string(please)

// Loop over 1 to max number of occurrences
sum countplease, meanonly 
forval i = 1/`r(max)' {
    replace wanted = ustrregexrf(wanted, "\bplease\b", "please`i'")
}
list

     +---------------------------------------------------------------------------------------+
  1. |                                                                           talk        |
     |        will you please come here please do it as soon as you can if you please        |
     |---------------------------------------------------------------------------------------|
     |                                                                     wanted | countp~e |
     | will you please1 come here please2 do it as soon as you can if you please3 |        3 |
     +---------------------------------------------------------------------------------------+

关于regex - 如何在 Stata 中每次出现字符串时向字符串添加递增值？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65604864/

regex - 如何在 Stata 中每次出现字符串时向字符串添加递增值？

上一篇：LibGDX 用一个角度切割纹理

下一篇：javascript - 为什么我的 script.js 文件不起作用，但 index.html 文件中的 JavaScript 代码却起作用？