php - 在 PHP 代码库中查找所有字符串

我有几百万行的 PHP 代码库没有显示和逻辑的真正分离，我正试图提取代码中表示的所有字符串以进行本地化.显示和逻辑的分离是一个长期目标，但现在我只想能够本地化。

在代码中，字符串以 PHP 的所有可能格式表示，因此我需要一种理论(或实践)方法来解析我们的整个源代码，并且至少找到每个字符串所在的位置。当然，理想情况下，我会用函数调用替换每个字符串，例如

"this is a string"

将替换为

_("this is a string")

当然我需要同时支持单双quote format .其他的我不太关心，它们很少出现，我可以手动更改它们。

此外，我当然不想本地化数组索引。所以像这样的字符串

$arr["value"]

不应该成为

$arr[_("value")]

谁能帮我开始做这件事？

最佳答案

你可以使用 token_get_all()从 PHP 文件中获取所有标记例如

<?php

$fileStr = file_get_contents('file.php');

foreach (token_get_all($fileStr) as $token) {
    if ($token[0] == T_CONSTANT_ENCAPSED_STRING) {
        echo "found string {$token[1]}\r\n";
        //$token[2] is line number of the string
    }
}

你可以做一个非常肮脏的检查，它没有被像这样的东西用作数组索引:

$fileLines = file('file.php');

//inside the loop and if
$line = $fileLines[$token[2] - 1];
if (false === strpos($line, "[{$token[1]}]")) {
    //not an array index
}

但你真的很难正确地做到这一点，因为有人可能写了一些你可能没有想到的东西，例如:

$str = 'string that is not immediately an array index';
doSomething($array[$str]);

编辑正如 Ant P 所说，对于此答案的第二部分，您最好在周围的标记中寻找 [ 和 ] 而不是我的 strpos hack，像这样:

$i = 0;
$tokens = token_get_all(file_get_contents('file.php'));
$num = count($tokens);
for ($i = 0; $i < $num; $i++) {
    $token = $tokens[$i];

    if ($token[0] != T_CONSTANT_ENCAPSED_STRING) {
        //not a string, ignore
        continue;
    }

    if ($tokens[$i - 1] == '[' && $tokens[$i + 1] == ']') {
        //immediately used as an array index, ignore
        continue; 
    }

    echo "found string {$token[1]}\r\n";
    //$token[2] is line number of the string
}

关于php - 在 PHP 代码库中查找所有字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/571734/

php - 在 PHP 代码库中查找所有字符串

上一篇：PHP session + 带盐的用户代理

下一篇：php - 将整数转换为笛卡尔坐标的替代/更快方法？