为清楚起见而编辑 - 我知道有多种方法可以分多个步骤执行此操作,或者使用 LINQ 或普通 C# 字符串操作。我使用单个正则表达式调用的原因是因为我想练习复杂的正则表达式模式。 - 结束编辑
我正在尝试编写一个将执行自动换行的正则表达式。它非常接近所需的输出,但我无法让它正常工作。
Regex.Replace(text, @"(?<=^|\G)(.{1,20}(\s|$))", "$1\r\n", RegexOptions.Multiline)
这是为太长的行正确换行,但它在已经有一个换行符时添加了一个换行符。
输入
"This string is really long. There are a lot of words in it.\r\nHere's another line in the string that's also very long."
预期输出
"This string is \r\nreally long. There \r\nare a lot of words \r\nin it.\r\nHere's another line \r\nin the string that's \r\nalso very long."
实际输出
"This string is \r\nreally long. There \r\nare a lot of words \r\nin it.\r\n\r\nHere's another line \r\nin the string that's \r\nalso very long.\r\n"
请注意输入已经有换行符的句子之间的双“\r\n”和放在末尾的额外“\r\n”。
也许有一种方法可以有条件地应用不同的替换模式? IE。如果匹配以“\r\n”结尾,则使用替换模式“$1”,否则使用替换模式“$1\r\n”。
这是一个类似问题的链接,用于包装一个没有空格的字符串,我将其用作起点。 Regular expression to find unbroken text and insert space
最佳答案
这是在 Perl 中快速测试的。
编辑 - 此正则表达式代码模拟在 MS-Windows Notepad.exe 中使用的自动换行(好或坏)
# MS-Windows "Notepad.exe Word Wrap" simulation
# ( N = 16 )
# ============================
# Find: @"(?:((?>.{1,16}(?:(?<=[^\S\r\n])[^\S\r\n]?|(?=\r?\n)|$|[^\S\r\n]))|.{1,16})(?:\r?\n)?|(?:\r?\n|$))"
# Replace: @"$1\r\n"
# Flags: Global
# Note - Through trial and error discovery, it apparears Notepad accepts an extra whitespace
# (possibly in the N+1 position) to help alignment. This matters not because thier viewport hides it.
# There is no trimming of any whitespace, so the wrapped buffer could be reconstituted by inserting/detecting a
# wrap point code which is different than a linebreak.
# This regex works on un-wrapped source, but could probably be adjusted to produce/work on wrapped buffer text.
# To reconstitute the source all that is needed is to remove the wrap code which is probably just an extra "\r".
(?:
# -- Words/Characters
( # (1 start)
(?> # Atomic Group - Match words with valid breaks
.{1,16} # 1-N characters
# Followed by one of 4 prioritized, non-linebreak whitespace
(?: # break types:
(?<= [^\S\r\n] ) # 1. - Behind a non-linebreak whitespace
[^\S\r\n]? # ( optionally accept an extra non-linebreak whitespace )
| (?= \r? \n ) # 2. - Ahead a linebreak
| $ # 3. - EOS
| [^\S\r\n] # 4. - Accept an extra non-linebreak whitespace
)
) # End atomic group
|
.{1,16} # No valid word breaks, just break on the N'th character
) # (1 end)
(?: \r? \n )? # Optional linebreak after Words/Characters
|
# -- Or, Linebreak
(?: \r? \n | $ ) # Stand alone linebreak or at EOS
)
测试用例 换行宽度 N 为 16。输出与记事本的输出相匹配,并超过各种宽度。
$/ = undef;
$string1 = <DATA>;
$string1 =~ s/(?:((?>.{1,16}(?:(?<=[^\S\r\n])[^\S\r\n]?|(?=\r?\n)|$|[^\S\r\n]))|.{1,16})(?:\r?\n)?|(?:\r?\n|$))/$1\r\n/g;
print $string1;
__DATA__
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
bbbbbbbbbbbbbbbbEDIT FOR CLARITY - I know there are ways to do this in multiple steps, or using LINQ or vanilla C#
string manipulation.
The reason I am using a single regex call, is because I wanted practice. with complex
regex patterns. - END EDIT
pppppppppppppppppppUf
输出>>
hhhhhhhhhhhhhhhh
hhhhhhhhhhhhhhh
bbbbbbbbbbbbbbbb
EDIT FOR CLARITY
- I
know there
are ways to do
this in
multiple steps,
or using LINQ or
vanilla C#
string
manipulation.
The reason I am
using a single
regex call, is
because I wanted
practice. with
complex
regex patterns.
- END EDIT
pppppppppppppppp
pppUf
关于c# - 用正则表达式换行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20431801/