bash - AWK 将 NULL 列值替换为前一行的列值(续)

标签 bash awk replace

这篇文章是对我之前提出的问题的修正here .

假设我有以下示例文件:

cat sample2.txt
HOST <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="16727360567966637a736e3875797b" rel="noreferrer noopener nofollow">[email protected]</a>
PORT 1066
DATABASE ORACLE_1
SCHEMA DEPT.*;
SCHEMA EMP.*;
DATABASE ORACLE_2
SCHEMA JOB.*;
SCHEMA SALARY.*;
HOST <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="98f7eaf4d8f7e8edf4fde0b6fbf7f5" rel="noreferrer noopener nofollow">[email protected]</a>
PORT 89
DATABASE MYSQL_1
SCHEMA PURCHASE.*;
DATABASE MYSQL_2
SCHEMA PRICE.*;
SCHEMA PRODUCT.*;

对于上述文件中的内容,我只想打印 HOST/PORT/DATABASE/SCHEMA 旁边的列,同时假设每行的最后一列以分号结尾,我想替换缺失的列值与上一行的列值。

@anubhava 帮助我实现了类似的目标,如我的 previous post 中所示。 .:

cat sample2.txt | awk 'tolower($0)~/^host|^port|^database|^schema/{printf "%s",$2 OFS;}' | awk -v RS=';' -v ORS=';\n' 'NF' | awk 'NF==1{print c1, c2, c3, $1; next} NF==2{print c1, c2, $1, $2; next} {c1=$1; c2=$2; c3=$3} 1' | sed 's|^[[:blank:]]*||g; s|\;$||g'
    <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="47232231072837322b223f6924282a" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_1 DEPT.*
    <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="32565744725d42475e574a1c515d5f" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_1 EMP.*
    <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="51353427113e21243d34297f323e3c" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_2 JOB.*
    <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="25414053654a555049405d0b464a48" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_1 SALARY.*
    <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c8a7baa488a7b8bda4adb0e6aba7a5" rel="noreferrer noopener nofollow">[email protected]</a> 89 MYSQL_1 PURCHASE.*
    <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="214e534d614e51544d44590f424e4c" rel="noreferrer noopener nofollow">[email protected]</a> 89 MYSQL_2 PRICE.*
    <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="9bf4e9f7dbf4ebeef7fee3b5f8f4f6" rel="noreferrer noopener nofollow">[email protected]</a> 89 MYSQL_1 PRODUCT.*

但我希望它如下

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0c68697a4c637c79606974226f6361" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_1 DEPT.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="77131201371807021b120f5914181a" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_1 EMP.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6b0f0e1d2b041b1e070e1345080406" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_2 JOB.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="70141506301f00051c15085e131f1d" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_2 SALARY.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2a4558466a455a5f464f5204494547" rel="noreferrer noopener nofollow">[email protected]</a> 89 MYSQL_1 PURCHASE.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f09f829cb09f80859c9588de939f9d" rel="noreferrer noopener nofollow">[email protected]</a> 89 MYSQL_2 PRICE.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a6c9d4cae6c9d6d3cac3de88c5c9cb" rel="noreferrer noopener nofollow">[email protected]</a> 89 MYSQL_2 PRODUCT.*

谢谢

最佳答案

您可以使用这个单个 awk 命令来获取输出:

awk '{sub(/;$/, "")} $1=="HOST"{host=$2} $1=="PORT"{port=$2} $1=="DATABASE"{db=$2}
      $1=="SCHEMA"{print host, port, db, $2}' sample2.txt

<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="41252437012e31342d24396f222e2c" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_1 DEPT.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="096d6c7f4966797c656c71276a6664" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_1 EMP.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ef8b8a99af809f9a838a97c18c8082" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_2 JOB.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f3979685b39c83869f968bdd909c9e" rel="noreferrer noopener nofollow">[email protected]</a> 1066 ORACLE_2 SALARY.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f798859bb79887829b928fd994989a" rel="noreferrer noopener nofollow">[email protected]</a> 89 MYSQL_1 PURCHASE.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="741b0618341b040118110c5a171b19" rel="noreferrer noopener nofollow">[email protected]</a> 89 MYSQL_2 PRICE.*
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c0afb2ac80afb0b5aca5b8eea3afad" rel="noreferrer noopener nofollow">[email protected]</a> 89 MYSQL_2 PRODUCT.*

说明:

  • sub 函数正在删除每行尾随的 ;
  • $1=="HOST"时,我们将第二列存储在变量host
  • $1=="PORT"时,我们将第二列存储在变量port
  • $1=="DATABASE"时,我们将第二列存储在变量db
  • $1=="SCHEMA"时,我们打印主机、端口、数据库、第二列

关于bash - AWK 将 NULL 列值替换为前一行的列值(续),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/36342300/

相关文章:

awk:使用 getline - 抑制用户输入的打印消息以仅打印到终端而不输出文件

bash - 在 AWK 中打印字段编号大于的行

javascript - Windows 上的 Node.js Git Bash shebang 失败

bash - 在 bash 中检查 kubectl 连接

python - 如何让 BASH 脚本作为进程运行?这样即使 Python 脚本被杀死,BASH 脚本也会永远运行?

awk - 如何在csv文件中用递增的数字递增列值

django - 如何替换/替代上载的文件?

Javascript用其他文本替换字符串中的文本

mysql - 我想替换 sql | 中的文本匹配的行 :1 changed: 0

linux - 使用特定前缀删除队列的脚本