regex - 如何从绑定(bind)标量变量的文件输出中过滤特定行?

标签 regex perl vmware

我正在尝试使用正则表达式从 VMware VMX 文件中过滤除非常特定的文本行之外的所有内容,我正在通过 foreach 循环运行该文件,因为每个虚拟机都有多个文件。每次循环运行时,它都会绑定(bind) Net::OpenSSH 的输出。它针对 VM 服务器上的文件运行 cat 到标量变量。

我不确定这是否真的有意义。

无论如何,我遇到的问题是,当脚本运行时,它与我的正则表达式中的任何内容都不匹配,它只是一个接一个地显示所有已连接的 VMX 文件。我不知道我错过了什么。

这是我正在处理的代码示例。

sub get_virtual_machines {
my $esx_host = config_file()->{ESX}{host};
my $ssh_port = config_file()->{ESX}{port};
my $esx_user = config_file()->{ESX}{user};
my $esx_password = config_file()->{ESX}{password};
my %options = (
    port => $ssh_port,
    user => $esx_user, 
    password => $esx_password
);
my $ssh1 = Net::OpenSSH->new($esx_host, %options);
print color 'blue';
print "Collecting virtual machine data for $esx_host\n";
my @virtual_machines = $ssh1->capture('vim-cmd vmsvc/getallvms');
shift @virtual_machines;
print color 'reset';
# Filter data from ESX\ESXi output
my %virtual_machines = ();

foreach my $vm (@virtual_machines) {

    # Replace "[" with "/"

    $vm =~ s/\[/\//;

    # Replace "]" with "/"

    $vm =~ s/\]/\//;

    # Match ID, NAME and VMX location
    $vm =~  m/^(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\D+)(\D)(\d)(\d)/x;
    # Build hash table of discovered virtual machines
    $virtual_machines{"$2"}{"ID"} = "$1";
    $virtual_machines{"$2"}{"VMX"} = "/vmfs/volumes$3$4";
    $virtual_machines{"$2"}{"Version"} = "$9";
}
undef @virtual_machines;
foreach my $vm (keys %virtual_machines) {
$vm = $ssh1->capture("cat $virtual_machines{$vm}{VMX}");
$vm =~ m/^(\bguestOSAltName\b)/x;
print "$1\n";
}
#print Dumper (\%virtual_machines);

}

有问题的部分位于“undef @virtual_machines”行之后。示例中的第 38 行 我的第一个目标是将该行与“guestOSAltName”一词相匹配,我想一旦完成该部分,我将再次上路,只是遇到了障碍。

这里还有一个示例 VMX 文件可供查看。

.encoding = "UTF-8"
config.version = "8"
virtualHW.version = "7"
pciBridge0.present = "TRUE"
pciBridge4.present = "TRUE"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge4.functions = "8"
pciBridge5.present = "TRUE"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge5.functions = "8"
pciBridge6.present = "TRUE"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge6.functions = "8"
pciBridge7.present = "TRUE"
pciBridge7.virtualDev = "pcieRootPort"
pciBridge7.functions = "8"
vmci0.present = "TRUE"
nvram = "NS02.nvram"
deploymentPlatform = "windows"
virtualHW.productCompatibility = "hosted"
unity.customColor = "|23C0C0C0"
tools.upgrade.policy = "useGlobal"
powerType.powerOff = "default"
powerType.powerOn = "default"
powerType.suspend = "default"
powerType.reset = "default"

displayName = "NS02"
extendedConfigFile = "NS02.vmxf"

scsi0.present = "TRUE"
scsi0.sharedBus = "none"
scsi0.virtualDev = "lsilogic"
memsize = "512"
scsi0:0.present = "TRUE"
scsi0:0.fileName = "NS02.vmdk"
scsi0:0.deviceType = "scsi-hardDisk"
ide1:0.present = "TRUE"
ide1:0.clientDevice = "FALSE"
ide1:0.deviceType = "cdrom-image"
ide1:0.startConnected = "FALSE"
ethernet0.present = "TRUE"
ethernet0.virtualDev = "e1000"
ethernet0.networkName = "solignis.local"
ethernet0.addressType = "generated"
chipset.onlineStandby = "FALSE"
guestOSAltName = "Ubuntu Linux (64-bit)"
guestOS = "ubuntu-64"
uuid.location = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
uuid.bios = "56 4d ab a6 1e 7b c5 43-02 45 7c 24 1f fc 28 d9"
vc.uuid = "52 50 c1 4b be 91 07 d5-22 0e 86 ee db 88 6d 8a"
snapshot.action = "keep"
sched.cpu.min = "0"
sched.cpu.units = "mhz"
sched.cpu.shares = "normal"
sched.mem.minsize = "0"
sched.mem.shares = "normal"

sched.scsi0:0.shares = "normal"
bios.forceSetupOnce = "FALSE"
floppy0.present = "FALSE"

ethernet0.generatedAddress = "00:0c:29:fc:28:d9"
tools.syncTime = "FALSE"
cleanShutdown = "FALSE"
replay.supported = "FALSE"
sched.swap.derivedName = "/vmfs/volumes/4cbcad5b-b51efa39-c3d8-001517585013/NS02/NS02-510988a0.vswp"
scsi0:0.redo = ""
vmotion.checkpointFBSize = "4194304"
pciBridge0.pciSlotNumber = "17"
pciBridge4.pciSlotNumber = "21"
pciBridge5.pciSlotNumber = "22"
pciBridge6.pciSlotNumber = "23"
pciBridge7.pciSlotNumber = "24"
scsi0.pciSlotNumber = "16"
ethernet0.pciSlotNumber = "32"
vmci0.pciSlotNumber = "33"
ethernet0.generatedAddressOffset = "0"
vmci0.id = "536619225"
hostCPUID.0 = "0000000a756e65476c65746e49656e69"
hostCPUID.1 = "000006fb000408000000e3bdbfebfbff"
hostCPUID.80000001 = "00000000000000000000000120100800"
guestCPUID.0 = "0000000a756e65476c65746e49656e69"
guestCPUID.1 = "000006fb00010800800022010febfbff"
guestCPUID.80000001 = "00000000000000000000000120100800"
userCPUID.0 = "0000000a756e65476c65746e49656e69"
userCPUID.1 = "000006fb000408000000e3bdbfebfbff"
userCPUID.80000001 = "00000000000000000000000120100800"
evcCompatibilityMode = "FALSE"
ide1:0.fileName = "/usr/lib/vmware/isoimages/linux.iso"

最佳答案

根据您提供的信息很难说,但我认为问题在于正则表达式

$vm =~ m/^(\bguestOSAltName\b)/x;

与您提供的文件不匹配,因为^断言匹配字符串开头,而不是开头。由于正则表达式不匹配,$1保留程序早期的旧值,该值将被打印出来。为了安全起见,您应该在使用捕获之前检查实际匹配的正则表达式:

if ($vm =~ m/^(\bguestOSAltName\b)/x) {
    print "$1\n";
}
else {
    carp "Couldn't find guestOSAltName!";
}

或者通过将匹配项放入列表上下文来抓取捕获:

# $result gets $1 if the match succeeds, undef if it fails.
my ($result) = $vm =~ m/^(\bguestOSAltName\b)/x

制作 ^匹配行首,您需要 /m修改器,它改变 ^$按行匹配而不是按字符串匹配:

if ($vm =~ m/^(\bguestOSAltName\b)/xm) { ... }

这就是为什么 Damian Conway 在 Perl 最佳实践中建议您始终使用 /m -- 因为那时^$ 总是做你直觉认为他们应该做的事情。 [事实上,他建议始终使用 /xms 。你已经完成了三分之一:)]


PS:从这一点开始,一切都是一般的代码审查批评,与问题没有直接关系。我希望它有用,但请随意忽略它。

我发现在正则表达式和其他双引号上下文中过度使用转义字符

$vm =~ s/\[/\//;

通常在单引号上下文中更好地重写:

$vm =~ s'['/';

此外,这个正则表达式很难阅读:

$vm =~  m/^(\d+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\D+)(\D)(\d)(\d)/x;

您正在使用/x标签,为什么不利用它呢?

$vm =~  m/^(\d+) \s+ # $1: number of some sort
           (\S+) \s+ # $2: identifier we're interested in
           (\S+) \s+ # $3: VMX filename part a
           (\S+) \s+ # $4: VMX filename part b
           (\S+) \s+ # $5: another identifier
           (\D+)(\D) # $6, $7: at least two nondigits
           (\d)   # $8: digit
           (\d)   # $9: version digit
           /x;

我还会考虑使用命名捕获:

$vm =~  m/^(?:      \d+) \s+ # number of some sort
           (?<ID>   \S+) \s+ # $+{ID}: identifier we're interested in
           (?<VMXa> \S+) \s+ # $+{VMXa}: VMX filename part a
           (?<VMXb> \S+) \s+ # $+{VMXb}: VMX filename part b
           (?:      \S+) \s+
           (?:\D+)(?:\D)     # at least two nondigits
           (?:\d)            # one digit
           (?<VERSION> \d)   # $+{VERSION}: version digit
           /x;

现在不再是对 $2 的神秘引用和$9之后,您将获得对 $+{ID} 的清晰、明显、自记录的引用。和$+{VERSION} 。我已将其余组设为非捕获组 (?:regex) ,但如果我想稍后捕获一个捕获,我可以将其变成命名捕获,而无需更改所有其他捕获的索引,这与位置捕获不同。

命名捕获也不太可能遇到上面提到的旧值问题,即失败的捕获会留下所有 $1变量处于旧状态。

关于regex - 如何从绑定(bind)标量变量的文件输出中过滤特定行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4591049/

相关文章:

python - 从Python中的多行字符串解析消息

java - 负前瞻正则表达式不起作用

java - 使用java删除另一个双引号内的双引号

linux - 在 linux 中执行 perl 脚本给出 :Exec format error. Wrong Architecture

perl - 在 Perl 中使用 web::scraper 提取特定信息

virtual-machine - 如何修复错误 "Resize medium operation for this format is not implemented yet"?

regex - 如何到达文本文件的特定部分然后搜索

perl - `$hash{$key} |= {}` 在 Perl 中做什么?

python - 如何在没有 vCenter 的情况下从 OVF/VMDK 在独立 ESXi 主机上部署虚拟机?

vmware - 从 linux guest 检测 VMware 产品名称