c - 在 C 中使用正则表达式组

我需要获取与 C 中的正则表达式匹配的组来操作 Java 程序日志。

我已经测试了正则表达式:

(Client:\s[a-zA-Z\s]+)|(Wallet:\s[a-zA-Z0-9]+)|(ID\s*:\s*[0-9]{3}.{0,1}[0-9]{3}.{0,1}[0-9]{3}-{0,1}[0-9]{2})

here并且它有效。

但在我的 C 程序中，它不起作用。

#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
  const char *source =
      "[com.example.app.JavaClass.JavaMethod(JavaClass.java:1)] (Thread-1) - "
      "Client: FirstName MiddleName AnotherName LastName, Wallet: WL01, "
      "Agency: 9999, ID: 06611486123, Ticket: TKR211";
  const char *regexString =
      "(Client:\\s[a-zA-Z[:space:]]+)|(Wallet:\\s[a-zA-Z0-9]+)|(ID\\s*:\\s*[0-"
      "9]{3}.{0,1}[0-9]{3}.{0,1}[0-9]{3}-{0,1}[0-9]{2})";

  regex_t regexCompiled;

  regcomp(&regexCompiled, regexString, REG_ICASE | REG_EXTENDED);

  size_t ngroups = regexCompiled.re_nsub + 1;
  regmatch_t *groups = malloc(ngroups * sizeof(regmatch_t));

  regexec(&regexCompiled, source, ngroups, groups, 0);

  char cursorCopy[strlen(source) + 1];
  strcpy(cursorCopy, source);
  size_t nmatched;
  for (nmatched = 0; nmatched < ngroups; nmatched++) {
    if (groups[nmatched].rm_so == (size_t)(-1)) {
      break;
    }

    char *match =
        calloc(groups[nmatched].rm_eo - groups[nmatched].rm_so, sizeof(char));
    memcpy(match, &source[groups[nmatched].rm_so],
           groups[nmatched].rm_eo - groups[nmatched].rm_so);
    printf("Match: [%2u-%2u]: \"%s\"\n", groups[nmatched].rm_so,
           groups[nmatched].rm_eo, match);
  }
  regfree(&regexCompiled);

  return 0;
}

执行:

$ gcc -Wall -Wextra -Wwrite-strings reg.c && ./a.out

生成输出:

Match: [70-119]: "Client: FirstName MiddleName AnotherName LastName"
Match: [70-119]: "Client: FirstName MiddleName AnotherName LastName"

但我想要的是:

Match: [xx-xx]: "Client: FirstName MiddleName AnotherName LastName"
Match: [xx-xx]: "Wallet: WL01"
Match: [xx-xx]: "ID: 06611486123"

有人可以告诉我是否可以使用 C 来实现或者我需要其他方法吗？

编辑:

就我而言，某些字段(“客户端”、“钱包”或“ID”)可能不会出现在日志中。

最佳答案

您的正则表达式的组成如下:(a)|(b)|(c)，其中a、b和c 对应于 Client 正则表达式、Wallet 正则表达式和 ID 正则表达式。

这不是您想要的 - 您可以在自己的 RegExr 中看到你得到的不是一场比赛，而是三场不同的比赛。在 C 中，您只匹配一次。

您真正想要完成的是匹配您的源字符串仅一次，并且让每个组都包含其字符串。换句话说，我们要更改您的正则表达式:

(a)|(b)|(c) -> (a),(b),(c) - 与 匹配的单个匹配项整个字符串。

这可以解决问题:

const char *regexString =
    "(Client:\\s[a-zA-Z[:space:]]+), (Wallet:\\s[a-zA-Z0-9]+).*(ID\\s*:\\s*[0-"
    "9]{3}.{0,1}[0-9]{3}.{0,1}[0-9]{3}-{0,1}[0-9]{2})";

我将第一个 | 更改为 , ，它用于分隔 Client 和 Wallet 子字符串，并且我将第二个 | 更改为 .*，它封装了 Wallet 和 ID 子字符串之间的所有内容。

现在运行它会给出:

Match: [70-164]: "Client: FirstName MiddleName AnotherName LastName, Wallet: WL01, Agency: 9999, ID: 06611486123"
Match: [70-119]: "Client: FirstName MiddleName AnotherName LastName"
Match: [121-133]: "Wallet: WL01"
Match: [149-164]: "ID: 06611486123"

第一行为您提供整个比赛，而接下来的行为您提供每个单独组的内容。

更直观的方式来看待这个问题来自:

至:

关于c - 在 C 中使用正则表达式组，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/68504315/

c - 在 C 中使用正则表达式组

上一篇：java - Snowflake Java UDF 无法返回 float 组？

下一篇：javascript - javascript : URLs (pseudo-protocol) introduced into the HTML standard? 是什么时候