c++ - 如何在 C 或 C++ 中编写简单的正则表达式模式匹配函数?

标签 c++ c regex algorithm

这是我今天笔试的一道题,函数签名是

int is_match(char* pattern,char* string)

模式仅限于ASCII字符和量化*?,所以比较简单。 is_match 如果匹配则返回 1,否则返回 0。

我该怎么做?

最佳答案

Brian Kernighan提供了一篇关于 A Regular Expression Matcher 的短文那Rob Pike作为他们正在编写的一本书的演示程序编写的。这篇文章读起来很不错,解释了一些关于代码和正则表达式的一般信息。

我玩过这段代码,做了一些更改以试验一些扩展,例如返回字符串中模式匹配的位置,以便可以从原始文本中复制匹配模式的子字符串。

来自文章:

I suggested to Rob that we needed to find the smallest regular expression package that would illustrate the basic ideas while still recognizing a useful and non-trivial class of patterns. Ideally, the code would fit on a single page.

Rob disappeared into his office, and at least as I remember it now, appeared again in no more than an hour or two with the 30 lines of C code that subsequently appeared in Chapter 9 of TPOP. That code implements a regular expression matcher that handles these constructs:

c    matches any literal character c
.    matches any single character
^    matches the beginning of the input string
$    matches the end of the input string
*    matches zero or more occurrences of the previous character

This is quite a useful class; in my own experience of using regular expressions on a day-to-day basis, it easily accounts for 95 percent of all instances. In many situations, solving the right problem is a big step on the road to a beautiful program. Rob deserves great credit for choosing so wisely, from among a wide set of options, a very small yet important, well-defined and extensible set of features.

Rob's implementation itself is a superb example of beautiful code: compact, elegant, efficient, and useful. It's one of the best examples of recursion that I have ever seen, and it shows the power of C pointers. Although at the time we were most interested in conveying the important role of a good notation in making a program easier to use and perhaps easier to write as well, the regular expression code has also been an excellent way to illustrate algorithms, data structures, testing, performance enhancement, and other important topics.

文章中的实际 C 源代码非常好。

/* match: search for regexp anywhere in text */
int match(char *regexp, char *text)
{
    if (regexp[0] == '^')
        return matchhere(regexp+1, text);
    do {    /* must look even if string is empty */
        if (matchhere(regexp, text))
            return 1;
    } while (*text++ != '\0');
    return 0;
}

/* matchhere: search for regexp at beginning of text */
int matchhere(char *regexp, char *text)
{
    if (regexp[0] == '\0')
        return 1;
    if (regexp[1] == '*')
        return matchstar(regexp[0], regexp+2, text);
    if (regexp[0] == '$' && regexp[1] == '\0')
        return *text == '\0';
    if (*text!='\0' && (regexp[0]=='.' || regexp[0]==*text))
        return matchhere(regexp+1, text+1);
    return 0;
}

/* matchstar: search for c*regexp at beginning of text */
int matchstar(int c, char *regexp, char *text)
{
    do {    /* a * matches zero or more instances */
        if (matchhere(regexp, text))
            return 1;
    } while (*text != '\0' && (*text++ == c || c == '.'));
    return 0;
}

关于c++ - 如何在 C 或 C++ 中编写简单的正则表达式模式匹配函数?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/4040835/

相关文章:

c - 通过 void 指针的指针转换是否定义明确?

c - 登录进程作为协进程

c - 启动 Windows 驱动程序时出错 : The handle is invalid

php - 解析字符串php并替换子字符串

c++ - 链表作业数组 - 运行时错误

c++ - 在消息前加上消息的大小

javascript - 找一个正则表达式来检查字母、汉字和斜线(/)

mysql - 在 MySQL 中按 Unicode 范围搜索

c++ - 动态分配不适用于全局整数指针

c++ - 模板类截断 double