regex - Egrep 特殊表达式 like\w in bracket expressions []

标签 regex bash grep

我正在尝试使用扩展的 grep 从 JSON 中提取数据。我使用的正则表达式适用于我的 regexr instance ,但由于某种原因,它在 bash 中不起作用。

我尝试了很多东西,特别是 bare double dash以及对正则表达式进行转义的各种小修改。

#!/bin/bash
networks='{ "networks": [ { "admin_state_up": true, "availability_zone_hints": [], "availability_zones": [], "created_at": "2019-03-12T23:45:13Z", "description": "", "id": "7188504a-72cb-4590-a9b0-414732017837", "ipv4_address_scope": null, "ipv6_address_scope": null, "is_default": false, "mtu": 1450, "name": "BLUE", "port_security_enabled": true, "project_id": "187d635aec4c43fe8e8918afb3a5c82e", "provider:network_type": "vxlan", "provider:physical_network": null, "provider:segmentation_id": 86, "revision_number": 2, "router:external": false, "shared": false, "status": "ACTIVE", "subnets": [], "tags": [], "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e", "updated_at": "2019-03-12T23:45:13Z" }, { "admin_state_up": true, "availability_zone_hints": [], "availability_zones": [], "created_at": "2019-03-12T23:45:13Z", "description": "", "id": "ed82083f-0a7c-4322-a4fb-de8db23e2bae", "ipv4_address_scope": null, "ipv6_address_scope": null, "is_default": false, "mtu": 1450, "name": "RED", "port_security_enabled": true, "project_id": "187d635aec4c43fe8e8918afb3a5c82e", "provider:network_type": "vxlan", "provider:physical_network": null, "provider:segmentation_id": 108, "revision_number": 2, "router:external": false, "shared": false, "status": "ACTIVE", "subnets": [], "tags": [], "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e", "updated_at": "2019-03-12T23:45:13Z" }, { "admin_state_up": true, "availability_zone_hints": [], "availability_zones": [], "created_at": "2019-03-12T23:45:13Z", "description": "", "id": "1eb6647e-869e-4e83-9468-43e2c320bccc", "ipv4_address_scope": null, "ipv6_address_scope": null, "is_default": false, "mtu": 1450, "name": "public", "port_security_enabled": true, "project_id": "187d635aec4c43fe8e8918afb3a5c82e", "provider:network_type": "vxlan", "provider:physical_network": null, "provider:segmentation_id": 32, "revision_number": 2, "router:external": false, "shared": false, "status": "ACTIVE", "subnets": [], "tags": [], "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e", "updated_at": "2019-03-12T23:45:13Z" } ] }'
result=`echo $networks | grep -oE '"(id|name)": "([\w+-]+)"'`
echo $result

上述代码不起作用,但如果我切换到以下正则表达式,它就起作用了。我只需要为 id 字段添加提取也能够使用\2 反向引用(第 2 组)提取 id 和名称

grep -oE '"(id|name)": "(\w+)"'

你能帮我理解为什么脚本不起作用吗?

完整格式的 JSON

{
  "networks": [{
    "admin_state_up": true,
    "availability_zone_hints": [],
    "availability_zones": [],
    "created_at": "2019-03-12T23:45:13Z",
    "description": "",
    "id": "7188504a-72cb-4590-a9b0-414732017837",
    "ipv4_address_scope": null,
    "ipv6_address_scope": null,
    "is_default": false,
    "mtu": 1450,
    "name": "BLUE",
    "port_security_enabled": true,
    "project_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "provider:network_type": "vxlan",
    "provider:physical_network": null,
    "provider:segmentation_id": 86,
    "revision_number": 2,
    "router:external": false,
    "shared": false,
    "status": "ACTIVE",
    "subnets": [],
    "tags": [],
    "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "updated_at": "2019-03-12T23:45:13Z"
  }, {
    "admin_state_up": true,
    "availability_zone_hints": [],
    "availability_zones": [],
    "created_at": "2019-03-12T23:45:13Z",
    "description": "",
    "id": "ed82083f-0a7c-4322-a4fb-de8db23e2bae",
    "ipv4_address_scope": null,
    "ipv6_address_scope": null,
    "is_default": false,
    "mtu": 1450,
    "name": "RED",
    "port_security_enabled": true,
    "project_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "provider:network_type": "vxlan",
    "provider:physical_network": null,
    "provider:segmentation_id": 108,
    "revision_number": 2,
    "router:external": false,
    "shared": false,
    "status": "ACTIVE",
    "subnets": [],
    "tags": [],
    "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "updated_at": "2019-03-12T23:45:13Z"
  }, {
    "admin_state_up": true,
    "availability_zone_hints": [],
    "availability_zones": [],
    "created_at": "2019-03-12T23:45:13Z",
    "description": "",
    "id": "1eb6647e-869e-4e83-9468-43e2c320bccc",
    "ipv4_address_scope": null,
    "ipv6_address_scope": null,
    "is_default": false,
    "mtu": 1450,
    "name": "public",
    "port_security_enabled": true,
    "project_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "provider:network_type": "vxlan",
    "provider:physical_network": null,
    "provider:segmentation_id": 32,
    "revision_number": 2,
    "router:external": false,
    "shared": false,
    "status": "ACTIVE",
    "subnets": [],
    "tags": [],
    "tenant_id": "187d635aec4c43fe8e8918afb3a5c82e",
    "updated_at": "2019-03-12T23:45:13Z"
  }]
}

最佳答案

根据 man grep :

The Backslash Character and Special Expressions

The symbol \w is a synonym for [[:alnum:]] and \W is a synonym for [^[:alnum:]]. ... A bracket expression is a list of characters enclosed by [ and ]. ... To include a literal ] place it first in the list. Similarly, to include a literal ^ place it anywhere but first. Finally, to include a literal - place it last.

基本上,\w 在计算时被字面上替换为那些字符,给你 "([[[:alnum:]]+-]+) ",在美国标准语言环境中为您提供 "([[a-zA-Z0-9]+-]+)"

由于括号表达式被它看到的第一个 ] 截断(除非它是括号表达式的第一个元素),所以该组仅为 [[[:alnum:]] +,或“一个或多个数字、字母和[。此表达式后跟-]+,意思是“恰好是一个连字符和一个或更多 ]”。这显然非常糟糕。

如果你尝试

echo $networks | grep -oE '"(id|name)": "([[:alnum:]+-]+)"'

\w 没有外括号表达式,相关部分表示“一组(由 包围”),由一个或多个数字、字母、连字符、和加号”,输出:

"id": "7188504a-72cb-4590-a9b0-414732017837"
"name": "BLUE"
"id": "ed82083f-0a7c-4322-a4fb-de8db23e2bae"
"name": "RED"
"id": "1eb6647e-869e-4e83-9468-43e2c320bccc"
"name": "public"

关于regex - Egrep 特殊表达式 like\w in bracket expressions [],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55152338/

相关文章:

regex - std.regex.regex 的返回值?

regex - 在perl中分割包含经度或纬度表达式的字符串

java - 为什么我的程序不等待输入?

performance - Cygwin 在 XP 上执行命令的速度非常慢。会是什么呢?

regex - Perl 模式匹配未按预期工作

linux -/usr/bin/找到 : Argument list too long in for loop bash script

linux - 在 unix 中将多行、带引号的字符串作为单个命令行参数传递?

perl - 需要解析这个 dumpsys 输出(最好使用 Perl)

bash - sed 从文件中提取部分字符串

bash - 如何在 bash 中过滤列值