php - 如何调整我的正则表达式以允许转义引号?

标签 php regex preg-replace preg-match preg-split

简介

首先,我的一般问题是我想用字符串替换字符串中的问号,但前提是它们被引用。所以我在 SO ( link ) 上找到了类似的答案并开始测试代码。当然,不幸的是,代码没有考虑转义引号。

例如:$string = 'hello="您要找的是我吗\\"?"AND test=?';

我改编了 that answer 中的正则表达式和代码对于问题:How to replace words outside double and single quotes ,为了便于阅读我的问题,在此处转载:

<?php
function str_replace_outside_quotes($replace,$with,$string){
    $result = "";
    $outside = preg_split('/("[^"]*"|\'[^\']*\')/',$string,-1,PREG_SPLIT_DELIM_CAPTURE);
    while ($outside)
        $result .= str_replace($replace,$with,array_shift($outside)).array_shift($outside);
    return $result;
}
?>

实际问题

所以我尝试调整模式以允许它匹配任何不是引号 " 和转义的引号 \" 的内容:

<?php
$pattern = '/("(\\"|[^"])*"' . '|' . "'[^']*')/";

// when parsed/echoed by PHP the pattern evaluates to
// /("(\"|[^"])*"|'[^']*')/
?>

但这并没有像我希望的那样起作用。

我的测试字符串是:hello=“你要找的是我吗\”?” AND test=?

我得到以下匹配项:

array
  0 => string 'hello=' (length=6)
  1 => string '"is it me your are looking for\"?"' (length=34)
  2 => string '?' (length=1)
  3 => string ' AND test=?' (length=11)

匹配索引二不应该存在。该问号应仅被视为匹配索引 1 的一部分,而不应单独重复。

一旦解决了这个相同的修复,还应该纠正单引号/撇号的主要交替的另一侧以及'

在被完整函数解析后,它应该输出:

echo str_replace_outside_quotes('?', '%s', 'hello="is it me your are looking for\\"?" AND test=?');
// hello="is it me your are looking for\"?" AND test=%s

我希望这是有道理的,并且我已经提供了足够的信息来回答这个问题。如果没有,我很乐意提供您需要的任何东西。

调试代码

我当前(完整)的代码示例是 on codepad for forking as well :

function str_replace_outside_quotes($replace, $with, $string){
    $result = '';
    var_dump($string);
    $pattern = '/("(\\"|[^"])*"' . '|' . "'[^']*')/";
    var_dump($pattern);
    $outside = preg_split($pattern, $string, -1, PREG_SPLIT_DELIM_CAPTURE);
    var_dump($outside);
    while ($outside) {
        $result .= str_replace($replace, $with, array_shift($outside)) . array_shift($outside);
    }
    return $result;
}
echo str_replace_outside_quotes('?', '%s', 'hello="is it me your are looking for\\"?" AND test=?');

示例输入和预期输出

In: hello="is it me your are looking for\\"?" AND test=? AND hello='is it me your are looking for\\'?' AND test=? hello="is it me your are looking for\\"?" AND test=?' AND hello='is it me your are looking for\\'?' AND test=?
Out: hello="is it me your are looking for\\"?" AND test=%s AND hello='is it me your are looking for\\'?' AND test=%s hello="is it me your are looking for\\"?" AND test=%s AND hello='is it me your are looking for\\'?' AND test=%s

In: my_var = ? AND var_test = "phoned?" AND story = 'he said \'where is it?!?\''
Out: my_var = %s AND var_test = "phoned?" AND story = 'he said \'where is it?!?\''

最佳答案

下面的测试脚本首先检查给定的字符串是否有效,仅由单引号、双引号和不带引号的 block 组成。 $re_valid 正则表达式执行此验证任务。如果字符串有效,它会使用 preg_replace_callback()$re_parse 正则表达式一次解析字符串一个 block 。回调函数使用 preg_replace() 处理未加引号的 block ,并返回所有未更改的带引号的 block 。逻辑中唯一棘手的部分是将 $replace$with 参数值从主函数传递到回调函数。 (请注意,PHP 过程代码使该变量从主函数传递到回调函数有点尴尬。)以下是脚本:

<?php // test.php Rev:20121113_1500
function str_replace_outside_quotes($replace, $with, $string){
    $re_valid = '/
        # Validate string having embedded quoted substrings.
        ^                           # Anchor to start of string.
        (?:                         # Zero or more string chunks.
          "[^"\\\\]*(?:\\\\.[^"\\\\]*)*"  # Either a double quoted chunk,
        | \'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\'  # or a single quoted chunk,
        | [^\'"\\\\]+               # or an unquoted chunk (no escapes).
        )*                          # Zero or more string chunks.
        \z                          # Anchor to end of string.
        /sx';
    if (!preg_match($re_valid, $string)) // Exit if string is invalid.
        exit("Error! String not valid.");
    $re_parse = '/
        # Match one chunk of a valid string having embedded quoted substrings.
          (                         # Either $1: Quoted chunk.
            "[^"\\\\]*(?:\\\\.[^"\\\\]*)*"  # Either a double quoted chunk,
          | \'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\'  # or a single quoted chunk.
          )                         # End $1: Quoted chunk.
        | ([^\'"\\\\]+)             # or $2: an unquoted chunk (no escapes).
        /sx';
    _cb(null, $replace, $with); // Pass args to callback func.
    return preg_replace_callback($re_parse, '_cb', $string);
}
function _cb($matches, $replace = null, $with = null) {
    // Only set local static vars on first call.
    static $_replace, $_with;
    if (!isset($matches)) { 
        $_replace = $replace;
        $_with = $with;
        return; // First call is done.
    }
    // Return quoted string chunks (in group $1) unaltered.
    if ($matches[1]) return $matches[1];
    // Process only unquoted chunks (in group $2).
    return preg_replace('/'. preg_quote($_replace, '/') .'/',
        $_with, $matches[2]);
}
$data = file_get_contents('testdata.txt');
$output = str_replace_outside_quotes('?', '%s', $data);
file_put_contents('testdata_out.txt', $output);
?>

关于php - 如何调整我的正则表达式以允许转义引号?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13360870/

相关文章:

.net - 正则表达式逗号和列

java - 简单的java正则表达式不起作用

JavaScript.replace(),删除标签和内容

php preg_replace 特定 html 注释标签之间的所有内容

php - 日志/图表 PHP 执行时间

php - php mem_get_usage() 是否包括数据库内存?

php - 如何在 PHP 标签内添加 css 属性

Javascript RegExp 的解释与字符串与数据属性不同

php 使用 preg_replace 将 url 转换为链接..... 有时.... :P

php - PDO:每一定数量的行