bash - 从文本表中提取 URL(多行)

标签 bash shell

我的来源:

+-----------+-------+----------------------+----------------------------------------------------------------------------------+
| positives | total |      scan_date       |                                       url                                        |
+===========+=======+======================+==================================================================================+
|     4     |  65   | 2015-09-21 23:29:33  | http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/ |
|           |       |                      | prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt                     |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
|     1     |  64   | 2015-09-17 19:28:50  | http://thebackpack.fr/                                                           |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+
|     1     |  64   | 2015-09-17 08:44:16  | http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/ |
|           |       |                      | prettyphoto/images/prettyPhoto/light_rounded/                                    |
+-----------+-------+----------------------+----------------------------------------------------------------------------------+

我想提取完整的网址(一行完整的网址):

hxxp://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt
hxxp://thebackpack.fr/
hxxp://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/

多行 URL 是我的问题。我尝试过例如:awk '{print $9}'

预先感谢您的帮助!

最佳答案

您可以使用以下 awk 命令:

awk -F '[[:blank:]]*\\|[[:blank:]]*' 'NR<3 || NF<5{next}
   $2{if (url) print url; url=$5; next}
   {url=url $5}
   END{print url}' file

输出:

http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/66836487162.txt
http://thebackpack.fr/
http://thebackpack.fr/wp-content/themes/salient/wpbakery/js_composer/assets/lib/prettyphoto/images/prettyPhoto/light_rounded/

关于bash - 从文本表中提取 URL(多行),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32796453/

相关文章:

bash - 获取 gui 对话框弹出窗口(来自 bash)以保持在其他窗口的顶部

linux - 从 shell 中清理文件

python - 无法从 Spark 提交中的 JAR 文件加载主类

linux - 导出的变量未反射(reflect)在 "env"输出中

bash - 是否可以在bash脚本中剪切特定列并显示该列重复出现的次数?

python - 从 .bashrc 执行时如何使工作的 python 程序无错误地运行

Bash:用换行符替换 "",使用 sed 或 tr

python - 从 python 异常中杀死 Bash 脚本

shell - gawk:为什么 "next;"不抑制与模式匹配的行?

bash - 如何收集程序的日志和输出并将它们传递给函数的参数