regex - 在段落中查找匹配的字符串

标签 regex perl awk sed

我有一个包含 LaTeX 数学方程式的 TXT 文件,其中每个内联方程式前后使用单个 $ 分隔符。

我想找到段落中的每个方程式,并用 XML 开始和结束标记替换定界符....

例如,

以下段落:

This is the beginning of a paragraph $first equation$ ...and here is some text... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation$

应该变成:

This is the beginning of a paragraph <equation>first equation</equation> ...and here is some text... <equation>second equation</equation> ...and here is more text... <equation>third equation</equation> ...and here is yet more text... <equation>fourth equation</equation>

我已经尝试过如下的 sed 和 perl 命令:

perl -p -e 's/(\$)(.*[^\$])(\$)/<equation>$2<\/equation>/'

但是这些命令会导致方程的第一个和最后一个实例被转换,但是这两个之间的方程都不会被转换:

This is the beginning of a paragraph <equation>first equation$ ...and here is some text... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation</equation>

我还想要一个强大的解决方案,它可以考虑不用作 LaTeX 定界符的单个 $ 的存在。例如,

This is the beginning of a paragraph $first equation$ ...and here is some text that includes a single dollar sign: He paid $2.50 for a pack of cigarettes... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation$

不会变成:

This is the beginning of a paragraph <equation>first equation$ ...and here is some text that includes a single dollar sign: He paid <equation>2.50 for a pack of cigarettes... $second equation$ ...and here is more text... $third equation$ ...and here is yet more text... $fourth equation</equation>

注意:我正在用 Bash 编写。

最佳答案

注意:此答案的第一部分仅关注替换 $'s 对;对于 OP 要求替换独立的 $'s ...请参阅答案的第二部分。


替换成对的 $'s

示例数据:

$ cat latex.txt
... $first equation$ ... $second equation$ ... $third equation$

一个sed想法:

sed -E 's|\$([^$]*)\$|<equation>\1</equation>|g' latex.txt

地点:

  • -E - 启用扩展的正则表达式支持
  • \$ - 匹配文字 $
  • ([^$]*) - [捕获组 #1] - 匹配所有不是文字的 $ (在这种情况下,这对 $'s 之间的所有内容)
  • \$ - 匹配文字 $
  • <equation>\1</equation> - 用 <equation> 替换匹配的字符串+ contents of capture group + </equation>
  • /g - 根据需要经常重复搜索/替换

这会产生:

... <equation>first equation</equation> ... <equation>second equation</equation> ... <equation>third equation</equation>

处理独立 $

如果独立$可以转义(例如 \$ )一个想法是拥有 sed用无意义的文字替换它,执行 <equation> / </equation>替换,然后将无意义的文字改回 \$ .

示例数据:

$ cat latex.txt
... $first equation$ ... $second equation$ ... $third equation$
... $first equation$ ... \$3.50 cup of coffee ... $third equation$

原创 sed新替换的解决方案:

sed -E 's|\\\$|LITDOL|g;s|\$([^$]*)\$|<equation>\1</equation>|g;s|LITDOL|\\\$|g' latex.txt

我们在哪里替换 \$LITDOL (LITeral DOLlar),执行我们原来的替换,然后切换 LITDOL返回\$ .

生成:

... <equation>first equation</equation> ... <equation>second equation</equation> ... <equation>third equation</equation>
... <equation>first equation</equation> ... \$3.50 cup of coffee ... <equation>third equation</equation>

关于regex - 在段落中查找匹配的字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65386540/

相关文章:

php - 将 preg_match() 与 REGEX 表达式一起使用时的未知修饰符 '('

ruby - 评估不带字符串插值的字符串

linux - 如何将我的 Bash 循环变量传递给 Perl?

arrays - 试图理解困难的 Perl 语法 : array and empty square braces

awk - 如何使用awk将第一列和第二列的第一行打印为单列?

c++ - 如何将字符串解析为 std::map 并验证其格式?

python - 正则表达式替换由管道分成一部分的单词

regex - 如何根据行的特殊部分对文件的行进行排序

perl - 如何为 Perl 制作静态分析调用图?

windows - 在awk中转义嵌套的双引号