这个问题在这里已经有了答案:
Deleting multiple words from a file using terminal
(2 个回答)
Remove specific words from sentences in bash?
(2 个回答)
10 个月前关闭。
我想从文件中的句子中删除停用词?
停止词,我的意思是:[I, a, an, as, at, the, by, in, for, of, on, that]
我在文件 my_text.txt
中有这些句子:
One of the primary goals in the design of the Unix system was to create an environment that promoted efficient program
然后我想从上面的句子中删除停用词
我使用了这个脚本:
array=( I a an as at the by in for of on that )
for i in "${array[@]}"
do
cat $p | sed -e 's/\<$i\>//g'
done < my_text.txt
但输出是:One of the primary goals in the design of the Unix system was to create an environment that promoted efficient program
预期的输出应该是:
One primary goals design Unix system was to create an environment promoted efficient program
注意:我要删除 删除停用词 不是重复词?
最佳答案
像这样,假设 $p
是一个现有文件:
sed -i -e "s/\<$i\>//g" "$p"
您必须使用双引号,而不是单引号来扩展变量。-i
开关替换在线。学习如何在 shell 中正确引用,这非常重要:
"Double quote" every literal that contains spaces/metacharacters and every expansion:
"$var"
,"$(command "$var")"
,"${array[@]}"
,"a & b"
. Use'single quotes'
for code or literal$'s: 'Costs $5 US'
,ssh host 'echo "$HOSTNAME"'
. See
http://mywiki.wooledge.org/Quotes
http://mywiki.wooledge.org/Arguments
http://wiki.bash-hackers.org/syntax/words
最后
array=( I a an as at the by in for of on that )
for i in "${array[@]}"
do
sed -i -e "s/\<$i\>\s*//g" Input_File
done
奖金不尝试
\s*
了解我为什么添加这个正则表达式
关于bash - 如何使用shell脚本从句子中删除停用词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65331755/