我有一些文本,其示例如下:
Lactose Hydrogen Breath Test
Time
Time Point (min)
H2 (ppm)
H2 Change
(ppm)
Hydrogen (ppm)
0937
0
0/0
Time point (min)
0
10
20
30
40
50
60
70
80
90
100
Notes: Measurements at 120 and 150 mins are insignificant changes and are most probably due to sporadic error.
Results are not consistent with Lactose malabsorption.
Lactose intolerance is not suggested.
This is now some other text that can be anything
我只想提取“Notes”后的前五行,并留下所有其他内容(在这种情况下,不建议使用 up to Lactose intolerance,但后面可以有任何类型的文本。
我正在使用当前的 java 来提取这个:
public Map<String,String> LactoseTestExtractor(String str){
Pattern match_pattern = Pattern.compile("Lactose Hydrogen Breath Test(.*?Interpretation[^\\r|^\\n]*)",Pattern.DOTALL);
Matcher matchermatch_pattern = match_pattern.matcher(str);
Pattern match_pattern2 = Pattern.compile("Lactose Hydrogen Breath Test.*?(Notes:.*?\\r|\\n[\\r|\\n]?.*?\\r|\\n[\\r|\\n]?)",Pattern.DOTALL);
Matcher matchermatch_pattern2 = match_pattern2.matcher(str);
if (matchermatch_pattern.find()) {
lact=matchermatch_pattern.group(1).toString().trim();
System.out.println("lact1"+lact);
}
else if (matchermatch_pattern2.find()){
lact=matchermatch_pattern2.group(1).toString().trim();
System.out.println("lact2"+lact);
}
但是我得到了整场比赛,而不是我想要的是:
Measurements at 120 and 150 mins are insignificant changes and are most probably due to sporadic error.
Results are not consistent with Lactose malabsorption.
Lactose intolerance is not suggested.
我该如何纠正?不确定它是 Java 还是正则表达式问题
最佳答案
首先,Java 8 supports \R
to match a linebreak .
对于正则表达式,你可以使用lookbehind来匹配注意:
然后接下来的5行如下:
(?<=Notes:)(.*\\R){5}
结果在 group(0)
中。
关于java - 正则表达式在比赛后获得 n 行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39789686/