javascript - 在一串 javascript 代码中查找正则表达式文字

标签 javascript regex parsing

我正在使用 javascript 对 javascript 代码进行某种粗略的解析。我将省略为什么我需要这样做的细节,但足以说明我不想集成大量的库代码,因为它是对我的目的来说是不必要的,重要的是我要保持它非常轻巧且相对简单。所以请不要建议我使用 JsLint 或类似的东西。如果答案的代码多于您无法粘贴到答案中的代码,那可能超出了我的预期。

我的代码目前能够很好地检测引用的部分和注释,然后匹配大括号、方括号和圆括号(当然,确保不要被引号和注释混淆,或者在引号内转义)。这就是我需要它做的所有事情,而且它做得很好……有一个异常(exception):

它可能会被正则表达式文字混淆。所以我希望在检测一串 javascript 中的正则表达式文字方面得到一些帮助,这样我就可以适本地处理它们。

像这样:

function getRegExpLiterals (stringOfJavascriptCode) {
  var output = [];
  // todo!
  return output;
}

var jsString =  "var regexp1 = /abcd/g, regexp1 = /efg/;"
console.log (getRegExpLiterals (jsString));

// should print:
// [{startIndex: 13, length: 7}, {startIndex: 32, length: 5}]

最佳答案

es5-lexer是一个 JS 词法分析器,它使用非常准确的启发式来区分 JS 代码中的正则表达式和除法表达式,并且还提供了一个标记级别的转换,您可以使用它来确保生成的程序将被完整的 JS 解析器以相同的方式解释正如词法分析器所言。

确定 / 是否开始正则表达式的位在 guess_is_regexp.js 中测试从 scanner_test.js line 401 开始

var REGEXP_PRECEDER_TOKEN_RE = new RegExp(
  "^(?:"  // Match the whole tokens below
    + "break"
    + "|case"
    + "|continue"
    + "|delete"
    + "|do"
    + "|else"
    + "|finally"
    + "|in"
    + "|instanceof"
    + "|return"
    + "|throw"
    + "|try"
    + "|typeof"
    + "|void"
    // Binary operators which cannot be followed by a division operator.
    + "|[+]"  // Match + but not ++.  += is handled below.
    + "|-"    // Match - but not --.  -= is handled below.
    + "|[.]"    // Match . but not a number with a trailing decimal.
    + "|[/]"  // Match /, but not a regexp.  /= is handled below.
    + "|,"    // Second binary operand cannot start a division.
    + "|[*]"  // Ditto binary operand.
  + ")$"
  // Or match a token that ends with one of the characters below to match
  // a variety of punctuation tokens.
  // Some of the single char tokens could go above, but putting them below
  // allows closure-compiler's regex optimizer to do a better job.
  // The right column explains why the terminal character to the left can only
  // precede a regexp.
  + "|["
    + "!"  // !           prefix operator operand cannot start with a division
    + "%"  // %           second binary operand cannot start with a division
    + "&"  // &, &&       ditto binary operand
    + "("  // (           expression cannot start with a division
    + ":"  // :           property value, labelled statement, and operand of ?:
           //             cannot start with a division
    + ";"  // ;           statement & for condition cannot start with division
    + "<"  // <, <<, <<   ditto binary operand
    // !=, !==, %=, &&=, &=, *=, +=, -=, /=, <<=, <=, =, ==, ===, >=, >>=, >>>=,
    // ^=, |=, ||=
    // All are binary operands (assignment ops or comparisons) whose right
    // operand cannot start with a division operator
    + "="
    + ">"  // >, >>, >>>  ditto binary operand
    + "?"  // ?           expression in ?: cannot start with a division operator
    + "["  // [           first array value & key expression cannot start with
           //             a division
    + "^"  // ^           ditto binary operand
    + "{"  // {           statement in block and object property key cannot start
           //             with a division
    + "|"  // |, ||       ditto binary operand
    + "}"  // }           PROBLEMATIC: could be an object literal divided or
           //             a block.  More likely to be start of a statement after
           //             a block which cannot start with a /.
    + "~"  // ~           ditto binary operand
  + "]$"
  // The exclusion of ++ and -- from the above is also problematic.
  // Both are prefix and postfix operators.
  // Given that there is rarely a good reason to increment a regular expression
  // and good reason to have a post-increment operator as the left operand of
  // a division (x++ / y) this pattern treats ++ and -- as division preceders.
  );

关于javascript - 在一串 javascript 代码中查找正则表达式文字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7936593/

相关文章:

javascript - 带有 RequireJS 的 Knockout 映射插件 : Uncaught Error: Mismatched anonymous define() module

javascript - 在函数中使用状态变量,无法读取未定义的属性 x (Functional React)

python - 使用正则表达式从 html 文本中获取两个或多个连续大写单词

python - 使用命名组和替换使用 python 正则表达式剖析字符串

javascript - 如何在 React 中更新嵌套状态,状态应该是不可变的吗?

php - 创建 PHP 多维关联数组的 Javascript 版本

java - 用正则表达式替换 Java 中大括号 { } 之间的所有文本

c# - 如何将带小数的数字的外国字符串表示形式转换为 double ?

java - 解析 JSON 时出错,但代码看起来没问题

用于在网页上搜索单词的 PHP 脚本