bash - Bash/sed/AWK 中大括号的基本解析

由于我有数学背景，我已经成功地像巴甫洛夫一样强制自己只能用(普通)TeX 进行编写，因此我经常使用 TeX 进行大量非数学文本编辑，而人们更常使用 MS Word 或 Google 文档即可完成。 (一旦您习惯了 TeX 生成的 PDF 的优雅，文字处理程序排版的笨拙确实会变得令人眼花缭乱。)当我被要求提交 .docx 中的内容时，这个问题就会偶尔出现。 > 格式，这通常需要我将 TeX 代码复制到 Google 文档，花几分钟到半小时手动进行更改，然后将其导出为 .docx 文件。我的 .zshrc 中有一个小 Bash 函数，我用它来做一些我不想手动做的非常简单的事情(欧洲口音等):

tex2doc() {
    (tr '\n' ' ' < $1.tex) |
    sed 's/  /\n\t/g' |
    sed "s/\\\'e/é/g" |
    sed "s/\\\^o/ô/g" |
    sed "s/\\\'E/É/g" |
    sed 's/```/“‘/g' |
    sed "s/'''/’”/g" |
    sed 's/``/“/g' |
    sed "s/''/”/g" |
    sed 's/`/‘/g' |
    sed "s/'/’/g" |
    sed 's/\\\"//g' |
    sed 's/\\~n/ñ/g' |
    sed 's/\~/ /g' |
    sed 's/\\ / /g' |
    sed 's/\\-//g' |
    sed 's/\\c{c}/ç/g' |
    sed 's/\\\///g' |
    sed 's/---/—/g' |
    sed 's/--/–/g' > $1.txt
}

有了这些东西，我剩下的就是(计算上)更复杂的任务，将斜体从格式 {\it ... } 转换为类似 Markdown 的格式，这样我就可以在某种 Markdown 阅读器中打开 Markdown，然后复制粘贴它。如果我只使用过斜体，这会很容易，但有时我可能会使用粗体，格式为{\bf ... }，或小写字母，格式为{\sc ... }。所以我可能喜欢的功能如下:

看到 {\it 后，将其替换为 _。
看到 {\bf 后，将其替换为 **。
看到 {\sc 后，将其替换为任何内容。 (我不介意在 .doc 输出中不使用小型大写字母。)
看到 } 时，如果遇到的最后一个标记是 {\it ，则将其替换为 _，并替换为 ** 如果我们最后看到的是 {\bf ，或者如果我们最后看到的是 {\sc ，则什么也没有。 (然后弹出“堆栈”，无论它是如何实现的。)

因为这是一个(非常简单的)解析，所以我考虑使用基于Python的词法分析器和解析器来解析所有内容，然后输出Markdown，我觉得这需要一段时间，而且已经是超长的时间了因为我使用了 lex 和 yacc。这也感觉像是用大锤来挂画一样。有没有一个简单的“Bash-esque”解决方案来解决这个问题？我知道 sed 本身可能无法解决这个问题，因为它更多地用于正则表达式，但也许 AWK 可能会有所帮助。我过去使用过 AWK，但我记得它的语法非常迟钝和困难，而且我不确定它的局限性，所以我将不胜感激!预先感谢:)

最佳答案

来自 OP 系列 sed 的示例数据流来电:

$ cat sample.tex
nothing to do on this line
this line {\itis just {\bfmade up {\scout of {\it{thin}}}}} air
nothing to do on this line

一个GNU awk (第四个参数为 split() )想法:

$ cat tex.awk

BEGIN  { map["\\it"]="_"                                                    # populate our replacements array
         map["\\bf"]="**"
         map["\\sc"]=""
         delete stack                                                       # designate "stack" as an array
         s=0                                                                # index for stack[] array; default value is "0" so do not need this line except for documentation
       }

/[{}]/ { n = split($0,data,/[{}]/,seps)                                     # if line contains braces then split on braces; data goes into data[], delimiters go into seps[]
         newline = data[1]                                                  # initialize new line of output to anything before first delimiter

         for (i=2;i<=n;i++) {                                               # loop through rest of parsed data
             if (seps[i-1] == "}") {                                        # if delimiter is "}" then ...
                newline = newline stack[s--] data[i]                        # pop associated string off stack[]
             }
             else {                                                         # delimiter is "{"
                f3 = substr(data[i],1,3)                                    # get first 3 characters
                if (f3 in map) {                                            # if f3 is an index in map[] then ...
                   newline = newline map[f3] substr(data[i],length(f3)+1)   # append map[] replacement and rest of data to our new line of output
                   stack[++s] = map[f3]                                     # push associated map[] value on stack
                }
                else {                                                      # delimiter is standalone "{" so ...
                   newline = newline "{" data[i]                            # (effectively) do nothing but append to new line
                   stack[++s] = "}"                                         # push closing brace on stack[]
                }
             }
         }
         print newline
         next
       }
1                                                                           # line does not contain any braces so just print as is

注释:

假设所有替换均基于 3 个字符；可以添加更少或更多字符的替换，同时了解测试逻辑需要扩展到当前的 f3 之外。代码
假设大括号成对出现(即每个 { 都有一个匹配的 } ，反之亦然)
假设相同的格式应用于一对匹配的独立大括号，例如 {\it{thin} }匹配{thin而不是 {\it (尽管源可能意味着对 {thin 和 } 应用不同的格式，在这种情况下 } 将与 {\it 匹配)
一次awk被拉入混合中，将有可能拉出所有当前的sed代码输入相同的 awk脚本，尽管OP可能更愿意维护当前的sed按原样

我们将使用cat sample.tex模拟来自 OP 系列 sed 的输出来电:

$ cat sample.tex
nothing to do on this line
this line {\itis just {\bfmade up {\scout of {\it{thin}}}}} air
nothing to do on this line

$ cat sample.tex | awk -f tex.awk
nothing to do on this line
this line _is just **made up out of _{thin}_**_ air
nothing to do on this line

关于bash - Bash/sed/AWK 中大括号的基本解析，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/75849873/

bash - Bash/sed/AWK 中大括号的基本解析

上一篇：javafx - 使用 jfx 解析 CSS 时如何修复 out-of-place 'expected RBRACE' 错误？

下一篇：javascript - 在 JavaScript 中等待 Promise 和事件监听器