我有一个文本文件,其中包含重复结构作为标题和详细记录,例如
StopService::
697::12::test::20::<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1e7f5e677f767171307d7173" rel="noreferrer noopener nofollow">[email protected]</a>::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3475745359555d581a575b59" rel="noreferrer noopener nofollow">[email protected]</a>::0::::
我想删除标题和详细记录之间的换行符,以便将它们作为单个记录处理,因为详细记录也可以包含换行符,我只需要删除直接跟在 ::
之后的换行符标志。
我不是使用正则表达式的专家,所以我搜索并尝试使用这种方法,但它不起作用:
string text = File.ReadAllText(path);
Regex.Replace(text, @"(?<=(:))(?!\1):\n", String.Empty);
File.WriteAllText(path, text);
我也尝试过这个:
Regex.Replace(text, @"(?<=::)\n", String.Empty);
知道在这种情况下如何使用正则表达式后视吗? 我的输出应该如下所示:
StopService::697::12::test::20::<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0160417860696e6e2f626e6c" rel="noreferrer noopener nofollow">[email protected]</a>::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="eaabaa8d878b8386c4898587" rel="noreferrer noopener nofollow">[email protected]</a>::0::::
最佳答案
非正则表达式方式
逐行读取文件。检查第一行是否等于 StopService::
不要在其后添加换行符 ( Environment.Newline
)。
正则表达式方式
您可以匹配第一个 ::
之后的换行符使用(?<=^[^:]*::)
向后看:
var str = "StopService::\r\n697::12::test::20::<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3051704951585f5f1e535f5d" rel="noreferrer noopener nofollow">[email protected]</a>::20 Main Rd::Alcatraz::CA::1200::Please send me Information to\r\<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c8a68988afa5a9a1a4e6aba7a5" rel="noreferrer noopener nofollow">[email protected]</a>::0::::";
var rgx = new Regex(@"(?<=^[^:]*::)[\r\n]+");
Console.WriteLine(rgx.Replace(str, string.Empty));
输出:
StopService::697::12::test::20::<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="caab8ab3aba2a5a5e4a9a5a7" rel="noreferrer noopener nofollow">[email protected]</a>::20 Main Rd::Alcatraz::CA::1200::Please send me Information to
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="91d0d1f6fcf0f8fdbff2fefc" rel="noreferrer noopener nofollow">[email protected]</a>::0::::
后视 ( (?<=...)
) 匹配:
-
^
- 字符串开头 -
[^:]*
- 除:
之外的 0 个或多个字符 -
::
- 2 个冒号
[\r\n]+
模式确保我们匹配所有换行符,即使有多个换行符。
关于c# - 如何在 C# 正则表达式中使用lookbehind 来删除换行符?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30602447/