java - 如何在模式重叠时找到事件

标签 java regex

上下文: 这是一个日志分析的东西。我正在创建一个 regex 程序来查找从客户端发送到服务器的某些请求的出现。我有包含这些请求以及其他日志的客户端日志文件。

问题: 当请求消息发送到服务器时,客户端应该有 2 个日志语句,如:

sending..
message_type

当找到上述语句或模式时,我们可以说一个请求已经发送。它是组合模式。好的

我们期望日志文件内容是这样的

sending..
message_type
...//other text
sending..
message_type
...//other text
sending..
message_type

从上面的日志我们可以看出客户端已经发送了 3 条消息。但是在实际的日志文件中,不知何故,模式重叠如下(不是针对所有消息,而是针对某些消息):

sending..(1)
...//other text
sending..(2)
message_type(2)
...//other text
message_type(1)
sending..(3)
message_type(3)

还有 3 个请求(我对消息进行了编号以便理解)。但是模式是重叠的。即在完全记录第一条消息之前,第二条消息被记录了。 以上解释是为了理解。以下为部分日志原件:

原始日志

Send message to server:
Created post notification log dir
Created post notification log dir
Created post notification log dir
Send message to server:
Created post notification log dir
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message>

这里根据解释单个请求将被识别为它的 2 个部分:

Send message to server:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>

我尝试过的

public class LogMatcher {   

    static final String create_session= "Send message to server(.){10,1000}(<\\?xml(.){10,500}type=\"createsession\"(.){1,100}</message>)";



    public static void main(String[] args) throws IOException {
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File("D:/dummy.txt"))));//I put the above log in this file
        StringBuilder b = new StringBuilder();
        String line = "";
        while((line = reader.readLine()) != null ){     
            b.append(line);
        }

        findMatch(b,"Send message to server","Send message to server");
        findMatch(b,create_session,"create_session");

    }
    private static int findMatch(StringBuilder b,String pattern, String type) {
        int count =0;
        Pattern regex = Pattern.compile(pattern,Pattern.MULTILINE);
        Matcher regexMatcher = regex.matcher(b.toString());
        while (regexMatcher.find()) {
            count++;
        } 
        System.out.printf("%25s%2d\n",type+": ",count);
        return count;
    }
}

当前输出

意图是找出发送的createsession消息的数量

Send message to server:  2
        create_session:  1

预期输出

从日志中可以清楚地看到发送了 2 条消息。因此输出将是:

 Send message to server:  2
         create_session:  2

您可以看到我在代码中尝试过的模式。任何人都可以建议一种模式以获得所需的结果吗?

注意:可以简单地说为什么不单独使用 Send message to server 计数。因为在日志中有许多类型的消息,如 login、closesession 等。它们的第一部分都是 Send message to server。它们也单独记录了消息类型出于其他目的,所以我们不能中继任何部分(意味着只能中继我们可以中继的组合)

最佳答案

Find occurrence of certain requests send to a server from a client.

"other way" that you can neglect here , that will have like Store in DB : instead of Send message to server and the xml message.

我会提出一个新策略:

  1. 仅使用 1 个正则表达式来匹配所有备选方案,仅解析一次日志(提高长文件的性能)。
  2. 匹配 type=\"createsession\" 独立的 xml。
  3. 也匹配 Store in DB: xml,但忽略它们(不要增加计数器)。

我们可以使用下面的表达式来匹配发送到服务器的消息数。

^(?<toserver>Send message to server:)
  • 注意我使用的是 named group ,我们稍后可以引用为 regexMatcher.group("toserver")增加计数器。

并独立匹配目标 xmls 为:

^(?<message><\? *xml\b.{10,500} type *= *\"createsession\")
  • 后来引用为 regexMatcher.group("message") .
  • 我们将使用独立的柜台。

那么,我们如何忽略 Store in DB: xml?我们可以匹配它们,同时不创建捕获。

^Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.*
  • 它匹配文字 Store in DB : , 其次是
  • \r?\n(?:.*\n)*?尽可能少的行,直到
  • <\? *xml\b.*它与拳头相配<?xml

正则表达式

^(?:Store in DB ?:\r?\n(?:.*\n)*?<\? *xml\b.*|(?<toserver>Send message to server:)|(?<message><\? *xml\b.{10,500} type *= *\"createsession\"))

regex101 demo


代码

static final String create_session = "^(?:Store in DB ?:\\r?\\n(?:.*\\n)*?<\\? *xml\\b.*|(?<toserver>Send message to server:)|(?<message><\\? *xml\\b.{10,500} type *= *\\\"createsession\\\"))";

public static void main (String[] args) throws java.lang.Exception
{
    //for testing purposes
    final String text = "Send message to server:\nCreated post notification log dir\nCreated post notification log dir\nCreated post notification log dir\nSend message to server:\nCreated post notification log dir\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nStore in DB :\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></params></response></message>\n<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><request xaction_guid=\"new xaction guid\" type=\"createsession\"/></message>\nINFO [a] - Server Response: <?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><message schema_version=\"3644767c-2632-411a-9416-44f8a7dee08e\"><response xaction_guid=\"new xaction guid\" type=\"ok\"></response></message>";
    System.out.println("INPUT:\n" + text + "\n\nCOUNT:");
    StringBuilder b = new StringBuilder();
    b.append(text);

    findMatch(b,create_session,"create_session");
}

private static int findMatch(StringBuilder b,String pattern, String type) {
    int count =0;  // counter for "Send message to server:"
    int countType=0; // counter for "type=\"createsession\""
    Pattern regex = Pattern.compile(pattern,Pattern.MULTILINE);
    Matcher regexMatcher = regex.matcher(b.toString());
    while (regexMatcher.find()) {
        if (regexMatcher.group("toserver") != null) {
            count++;
        } else if (regexMatcher.group("message") != null) {
            countType++;
        } else {
            // Ignoring "Store in DB :\n<?xml...."
        }
    } 
    System.out.printf("%25s%2d\n%25s%2d\n", "to server: ", count, type+": ", countType);
    return countType;
}

输出

INPUT:
Send message to server:
Created post notification log dir
Created post notification log dir
Created post notification log dir
Send message to server:
Created post notification log dir
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
Store in DB :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></params></response></message>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><request xaction_guid="new xaction guid" type="createsession"/></message>
INFO [a] - Server Response: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><message schema_version="3644767c-2632-411a-9416-44f8a7dee08e"><response xaction_guid="new xaction guid" type="ok"></response></message>

COUNT:
              to server:  2
         create_session:  2

ideone demo

关于java - 如何在模式重叠时找到事件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33822886/

相关文章:

java - 文件实用程序搜索实现

javascript - 在 Javascript 中用下划线替换初始空格

javascript - REGEX - 必须包含字母数字和斜线

javascript - 如果有两个或多个重复字符,则返回 true

java - 使用 FileReader 加密文件

java - 与 picasso 的共享元素转换即使在实现回调后也无法正常工作

javascript - 用 JavaScript 解析表情

c# - 如何优雅地将下面的文字解析成字典

java - 在ArrayList中找不到对象

JavaEE6 : HowTo select a persistence unit for entity manager by login information