regex - 将键和值从 Perl 正则表达式传递到哈希

标签 regex perl parsing

您能告诉我如何将捕获组的内容填充到 Perl 中的哈希中吗?

示例:

我有一个文件:

https://www.youtube.com/watch?v=5qap5aO4i9A
http://example.com:8080/r/p?s=10&z=11#text
https://exapmle.com/test/p?var=100
http://test.org:81/
https://main.org
gopher://gopher.floodgap.com/gopher/relevance.txt
file:///home/user/.profile
gemini://transjovian.org/

我想将此文件的每一行分解为一组键值并将它们添加到哈希中,然后输出该哈希的内容。

我的脚本内容:

#!/usr/bin/env perl

use strict;
use utf8;
use warnings;
use feature qw(say);
use Data::Dumper;

sub parse_url {
    my ($url) = @_;
    if ($url =~ m#(.*):/(.*)#) {
        my (%hash, $scheme, $domain, $port, $path, $query_string, $anchor);
        $url =~ m!^(?<scheme>[^:]+):/{2,3}(?<domain>[^:/]+)(?::(?<port>(?:\d+)?)?)(?<path>(?:/[^?]+)?)(?:\?(?<query_string>(?:[^\#]+)?)?)(?:\#(?<anchor>(?:.+)?)?)!;

        if(defined($scheme)) { $hash{'scheme'} = $scheme; }
        if(defined($domain)) { $hash{'domain'} = $domain; }
        if(defined($port)) { $hash{'port'} = $port; }
        if(defined($path)) { $hash{'path'} = $path; }
        if(defined($query_string)) { $hash{'query_string'} = $query_string; }
        if(defined($anchor)) { $hash{'anchor'} = $anchor; }
        return %hash;
    }
}

while (my $row = <>) {
    chomp $row;
    say $row;
    my %hash = parse_url($row);
    print Dumper \%hash;
}

我想得到这个输出:

https://www.youtube.com/watch?v=5qap5aO4i9A
$VAR1 = {
    scheme         => 'https',
    domain         => 'www.youtube.com',
    path           => '/watch',
    query_string   => 'v=5qap5aO4i9A',
};
http://example.com:8080/r/p?s=10&z=11#text
$VAR1 = {
    scheme         => 'http',
    domain         => 'example.com',
    port           => '8080',
    path           => '/r/p',
    query_string   => 's=10&z=11',
    anchor         => 'text',
};
https://exapmle.com/test/p?var=100
$VAR1 = {
    scheme         => 'http',
    domain         => 'example.com',
    path           => '/test/p',
    query_string   => 'var=100',
};
http://test.org:81/
$VAR1 = {
    scheme         => 'http',
    domain         => 'test.org',
    port           => '81',
};
https://main.org
$VAR1 = {
    scheme         => 'https',
    domain         => 'main.org',
};
gopher://gopher.floodgap.com/gopher/relevance.txt
$VAR1 = {
    scheme         => 'gopher',
    domain         => 'gopher.floodgap.com',
    path           => '/gopher/relevance.txt',
};
file:///home/user/.profile
$VAR1 = {
    scheme         => 'file',
    path           => '/home/user/.profile',
};
gemini://transjovian.org/
$VAR1 = {
    scheme         => 'gemini',
    domain         => 'transjovian.org',
};

但我得到这个结论:

https://www.youtube.com/watch?v=5qap5aO4i9A
$VAR1 = {};
http://example.com:8080/r/p?s=10&z=11#text
$VAR1 = {};
https://exapmle.com/test/p?var=100
$VAR1 = {};
http://test.org:81/
$VAR1 = {};
https://main.org
$VAR1 = {};
gopher://gopher.floodgap.com/gopher/relevance.txt
$VAR1 = {};
file:///home/user/.profile
$VAR1 = {};
gemini://transjovian.org/
$VAR1 = {};

感谢您的帮助!

最佳答案

您可以使用特殊变量 %+ (或 %{^CAPTURE})来获取命名捕获,如下所示:

use strict;
use utf8;
use warnings;
use feature qw(say);
use open ':std', ':encoding(utf-8)';
use Data::Dumper;

sub parse_url {
    my ($url) = @_;
    if ($url =~ m#(.*):/(.*)#) {
        $url =~ m!
           ^(?<scheme>[^:]+):/{2,3}
            (?<domain>[^:/]+)
              (?::?(?<port>(?:\d+)?)?)
              (?<path>(?:/[^?]+)?)
              (?:\??(?<query_string>(?:[^\#]+)?)?)
              (?:\#?(?<anchor>(?:.+)?)?)
        !x;
        my %hash = %+;
        return %hash;
    }
}

while (my $row = <>) {
    chomp $row;
    say $row;
    my %hash = parse_url($row);
    if (%hash) {
        print Dumper \%hash;
    }
    else {
        say "  -> No match";
    }
}

输出:

 $VAR1 = {
          'anchor' => 'text',
          'path' => '/r/p',
          'query_string' => 's=10&z=11',
          'port' => '8080',
          'scheme' => 'http',
          'domain' => 'example.com'
        };
https://www.youtube.com/watch?v=5qap5aO4i9A
$VAR1 = {
          'scheme' => 'https',
          'domain' => 'www.youtube.com',
          'port' => '',
          'anchor' => '',
          'query_string' => 'v=5qap5aO4i9A',
          'path' => '/watch'
        };
https://exapmle.com/test/p?var=100
$VAR1 = {
          'port' => '',
          'scheme' => 'https',
          'anchor' => '',
          'path' => '/test/p',
          'domain' => 'exapmle.com',
          'query_string' => 'var=100'
        };
http://test.org:81/
$VAR1 = {
          'scheme' => 'http',
          'domain' => 'test.org',
          'port' => '81',
          'anchor' => '',
          'query_string' => '/',
          'path' => ''
        };
https://main.org
$VAR1 = {
          'port' => '',
          'scheme' => 'https',
          'domain' => 'main.org',
          'anchor' => '',
          'path' => '',
          'query_string' => ''
        };
gopher://gopher.floodgap.com/gopher/relevance.txt
$VAR1 = {
          'domain' => 'gopher.floodgap.com',
          'scheme' => 'gopher',
          'port' => '',
          'query_string' => '',
          'path' => '/gopher/relevance.txt',
          'anchor' => ''
        };
file:///home/user/.profile
$VAR1 = {
          'port' => '',
          'scheme' => 'file',
          'domain' => 'home',
          'anchor' => '',
          'path' => '/user/.profile',
          'query_string' => ''
        };
gemini://transjovian.org/
$VAR1 = {
          'domain' => 'transjovian.org',
          'scheme' => 'gemini',
          'port' => '',
          'query_string' => '/',
          'path' => '',
          'anchor' => ''
        };

关于regex - 将键和值从 Perl 正则表达式传递到哈希,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69965270/

相关文章:

c++ - C++中的命令行解析器

javascript - Express 路由正则表达式生成的意外行为

mysql - 从数据库中获取获取的值

regex - 如何让Google Analytics(分析)知道在utm代码之前查询未完成?

php - 如何捕获 Windows cmd shell 的输出?

perl - 如何强制 FastCGI 将表单数据编码为 UTF-8,因为 CGI.pm 有选项?

json - node.js sass/scss 解析器输出json

c# - HtmlAgility - 将解析保存到字符串

javascript - JS正则表达式仅替换重复 block 的第一个 block 中的内容

c# - .Net Regex 用捕获组替换模式的重复出现