好吧,有很多正则表达式,但一如既往,它们似乎都不符合我想要做的事情。
我有一个文本文件:
F00220034277909272011
H001500020003000009272011
D001500031034970000400500020000000025000000515000000000
D001500001261770008003200010000000025000000132500000000
H004200020001014209272011
D004200005355800007702200005142000013420000000000000000
D004200031137360000779000005000000012000000000000000000
H050100180030263709272011
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000
并且,使用多行正则表达式(.NET 风格),我想进行替换,以便得到:
H050100180030263709272011
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000
这样,基本上,我抓取了以 [HD]0501
开头的所有内容。没有别的。
我知道这似乎比替换更适合匹配,但我正在使用一个仅接受正则表达式模式字符串和正则表达式替换字符串的预构建引擎。
我可以为模式和替换字符串提供什么来获得我想要的结果?多行正则表达式是硬编码配置?
我最初认为这样的东西会起作用:
搜索:
(?<Match>^[HD]0501\d+$)
,但这没有匹配到。
搜索:
(?!^[HD]0501\d+$)
,但这匹配了一堆空字符串,我不知道要为替换字符串添加什么。
搜索:
(?!(?<Omit>^[HD]0501\d+$))
,“未找到组‘省略’。”
看起来这应该很简单,但一如既往,正则表达式让我感觉很愚蠢。我们将不胜感激。
最佳答案
尝试匹配以下模式:
(?m)^(?![HD]0501).+(\r?\n)?
并将其替换为空字符串。
以下演示:
using System;
using System.Text.RegularExpressions;
namespace Test
{
class MainClass
{
public static void Main (string[] args)
{
string input = @"F00220034277909272011
H001500020003000009272011
D001500031034970000400500020000000025000000515000000000
D001500001261770008003200010000000025000000132500000000
H004200020001014209272011
D004200005355800007702200005142000013420000000000000000
D004200031137360000779000005000000012000000000000000000
H050100180030263709272011
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000";
string regex = @"(?m)^(?![HD]0501).+(\r?\n)?";
Console.WriteLine(Regex.Replace(input, regex, ""));
}
}
}
打印:
H050100180030263709272011
D050100001876700006000300019500000025000000250000001500
D050100001247060000071500030000000025000000280000000000
D050100002075670000430400020000000025000000515000000000
D050100008342500007702600005700000010000000000000000700
D050100009460270000702100015205000025000000000000006205
D050100008135120000702400015000000010000000000000001000
D050100006938430000702200026700000010000000000000001000
D050100006423710008000200025700000000000000000000001000
D050100009488040008000600007175000000000000000000001000
D050100001299190000800100016300000000000000000000003950
D050100001244850000800400005407000000000000000000001607
D050100001216280000840200020000000000000001000000006200
D050100001216840000479000008175000000000000100000001000
D050100001265880000410200014350000000000000100000001000
D050100007402650002000300026700000000000000100000001000
D050100001305150002000200016175000000000001000000000000
D050100005435430000899700022350000000000001000000000000
D050100031113850000500200008200000000250000100000001000
快速解释:
(?m)
- 启用多行模式,以便
^
匹配新行的开头;
- 启用多行模式,以便
^
- 匹配新行的开头;
(?![HD]0501)
- 向前看是否有
“H0501”
或“D0501”
;
- 向前看是否有
.+
- 匹配除换行符之外的一个或多个字符;
(\r?\n)?
- 匹配可选的换行符。
关于regex - 多行正则表达式替换,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/7588706/