regex - 构建正则表达式编写器

我正在阅读 Java 项目想法 described here :

The user gives examples of what he wants and does not want to match. The program tries to deduce a regex that fits the examples. Then it generates examples that would fit and not fit. The user corrects the examples it got wrong, and it composes a new regex. Iteratively, you get a regex that is close enough to what you need.

这对我来说听起来是个非常有趣的想法。有谁知道如何做到这一点？我的第一个想法类似于遗传算法，但我希望你们能提供一些意见。

最佳答案

实际上，这看起来越来越像一个编译器应用程序。事实上，如果我没记错的话，Aho Dragon 编译器一书使用了一个正则表达式示例来构建 DFA 编译器。这就是开始的地方。这可能是一个非常酷的编译器项目。

如果这太多了，您可以将它作为一个优化来处理，通过多次传递来进一步完善它，但它首先都是预定义的算法:

第一关:想匹配Cat，Catches jar 头结果:/Cat|Catches|Cans/

第二遍:寻找类似的起始条件: 结果:/Ca(t|tches|ans)/

第二遍:寻找类似的结束条件: 结果:/Ca(t|tch|an)s*/

第三遍:寻找更多的改进，例如重复和负面条件

关于regex - 构建正则表达式编写器，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/3651962/