Java正则表达式解析长文本的部分

我有一个巨大的文本，其中可以包含多个房屋名称，并且对于每个房屋，都有一些特定于该特定房屋的值，依此类推。这是我的 txt 的类似部分:

getHouseName: house1
random useless text
price: 1000
squaremtr: 75
sellVal: 1000
random useless text
random useless text
random useless text
rentPrice: 150
getHouseName: house2
price: 1004
squaremtr: 85
sellVal: 950
random useless text
rentPrice: 150
getHouseName: house3
price: 1099
squaremtr: 90
random useless text
random useless text
sellVal: 1100
random useless text
rentPrice: 199

我想，对于每栋房子，检索每栋房子特定的值并使用正则表达式将它们存储到变量中。现在这是我的代码:

public void testHouse() {
    Scanner txt = new Scanner(new File("path//to//file"));

    String houseName ="";
    String price = "";
    String squaremtr = "";
    String sellVal = "";
    String rentPrice = "";
    
    Pattern houseNamePatt = Pattern.compile("getHouseName: ((_!getHouseName: \\s).)*", Pattern.DOTALL);

    while(txt.hasNextLine()) {
        String str = txt.nextLine();
        Matcher m = houseNamePatt.matcher(str);
        if(m.find) {
            houseName=str.substring(m.end());
            System.out.println("houses: " + m.group());
        }
    }
}

但在这种情况下，我只是得到一个包含所有房屋名称的列表，而不是每个名称之间的行，而且我绝对不能将特定房屋的值分配给我的变量。我哪里错了？谢谢

最佳答案

您可以通过匹配名称后跟捕获组来获取所有值。如果中间有带有随机值的行，您可以使用负前瞻来匹配所有不以下一个预期值开头的行 (?!

然后将变量的值设置为等于组号。

^getHouseName:\h+(.+)(?:\R(?!price:).*)*\Rprice: (\d+)(?:\R(?!squaremtr:).*)*\Rsquaremtr:\h+(\d+)(?:\R(?!sellVal:).*)*\RsellVal:\h+(\d+)(?:\R(?!rentPrice:).*)*\RrentPrice:\h+(\d+)

部分:

^ 字符串开头
getHouseName:\h+(.+) 匹配组 1 中 getHouseName 的值
(?:\R(?!price:).*)*\Rprice: (\d+) 匹配直到下一行包含 price，捕获 1+ 第 2 组中的数字
(?:\R(?!squaremtr:).*)*\Rsquaremtr:\h+(\d+) 匹配直到下一行与 squaremtr 匹配，捕获第 3 组中有 1 个以上数字
(?:\R(?!sellVal:).*)*\RsellVal:\h+(\d+) 匹配直到下一行包含 sellVal，捕获第 4 组中有 1 个以上数字
(?:\R(?!rentPrice:).*)*\RrentPrice:\h+(\d+) 匹配直到下一行与 rentPrice 匹配，捕获第 5 组中有 1 个以上数字

Regex demo

关于Java正则表达式解析长文本的部分，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64929773/

Java正则表达式解析长文本的部分

上一篇：android - 应用程序启动后回压时出现 IllegalArgumentException

下一篇：WSO2 Integration Studio 不适用于 MacOS BigSur