Stata 从字符串中删除整个单词

标签 stata

我有一个字符串变量,我想在其中删除某些单词,但许多其他单词可能是部分匹配,我不想删除。我想删除单词,当且仅当它们完全匹配时。

clear
* Add in some example data
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end

* I create a copy to compare obefore/after strip
gen strip_words = words

* This is a list of words I want removed. In reality, this is a fairly long list
local removs "mor ten bad"
* For each of words, remove the complete word from teh string
foreach w of local removs {
    replace strip_words = subinstr(strip_words, "`w'","", .) 
}

list
     +---------------------------------------------------------------+
     | index                            words            strip_words |
     |---------------------------------------------------------------|
  1. |     1              more mor morph test            e ph test   |
  2. |     2   ten tennis tenner tenth keeper     nis ner th keeper  |
  3. |     3           badder baddy bad other         der dy other   |
     +---------------------------------------------------------------+

我已经尝试用 replace strip_words = ""+ strip_words + "" 填充一些空格,但这也会删除分隔其他单词的空格。我想要的输出是

     +-------------------------------------------------------------------------+
     | index                            words                      strip_words |
     |-------------------------------------------------------------------------|
  1. |     1              more mor morph test              more  morph test    |
  2. |     2   ten tennis tenner tenth keeper    tennis tenner tenth keeper    |
  3. |     3           badder baddy bad other           badder baddy  other    |
     +-------------------------------------------------------------------------+
'''

最佳答案

请参阅帮助字符串函数了解subinword()

clear
* Add in some example data
input index str50 words
1 "more mor morph test"
2 "ten tennis tenner tenth keeper"
3 "badder baddy bad other"
end

* I create a copy to compare obefore/after strip
gen strip_words = words

* This is a list of words I want removed. In reality, this is a fairly long list
local removs "mor ten bad"
* For each of words, remove the complete word from teh string
foreach w of local removs {
    replace strip_words = subinword(strip_words, "`w'","", .) 
}

replace strip_words = itrim(strip_words) 

关于Stata 从字符串中删除整个单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65598883/

相关文章:

stata - Stata 中 Boxcox 模型后的预测

stata - 如何在 Stata 中创建引用其他字符串变量的字符串变量?

stata - 在 Stata 中将值从一个变量传递到另一个变量

Stata estpost esttab : Generate table with mean of variable split by year and group

excel - 在 Stata 中执行导入循环时,Excel 文件的扩展名仍然存在

stata - 如何基于多个变量创建假人

r - 如何通过 vglm tobit 模型使用健壮的 SE 和集群 SE?

r - 在 R 中使用 Stata 变量标签

stata - 删除所有缺失值的变量

Stata:找不到 Ado 包