c# - 用正则表达式换行

标签 c# regex word-wrap

为清楚起见而编辑 - 我知道有多种方法可以分多个步骤执行此操作,或者使用 LINQ 或普通 C# 字符串操作。我使用单个正则表达式调用的原因是因为我想练习复杂的正则表达式模式。 - 结束编辑

我正在尝试编写一个将执行自动换行的正则表达式。它非常接近所需的输出,但我无法让它正常工作。

Regex.Replace(text, @"(?<=^|\G)(.{1,20}(\s|$))", "$1\r\n", RegexOptions.Multiline)

这是为太长的行正确换行,但它在已经有一个换行符时添加了一个换行符。

输入

"This string is really long. There are a lot of words in it.\r\nHere's another line in the string that's also very long."

预期输出

"This string is \r\nreally long. There \r\nare a lot of words \r\nin it.\r\nHere's another line \r\nin the string that's \r\nalso very long."

实际输出

"This string is \r\nreally long. There \r\nare a lot of words \r\nin it.\r\n\r\nHere's another line \r\nin the string that's \r\nalso very long.\r\n"

请注意输入已经有换行符的句子之间的双“\r\n”和放在末尾的额外“\r\n”。

也许有一种方法可以有条件地应用不同的替换模式? IE。如果匹配以“\r\n”结尾,则使用替换模式“$1”,否则使用替换模式“$1\r\n”。

这是一个类似问题的链接,用于包装一个没有空格的字符串,我将其用作起点。 Regular expression to find unbroken text and insert space

最佳答案

这是在 Perl 中快速测试的。

编辑 - 此正则表达式代码模拟在 MS-Windows Notepad.exe 中使用的自动换行(好或坏)

 # MS-Windows  "Notepad.exe Word Wrap" simulation
 # ( N = 16 )
 # ============================
 # Find:     @"(?:((?>.{1,16}(?:(?<=[^\S\r\n])[^\S\r\n]?|(?=\r?\n)|$|[^\S\r\n]))|.{1,16})(?:\r?\n)?|(?:\r?\n|$))"
 # Replace:  @"$1\r\n"
 # Flags:    Global     

 # Note - Through trial and error discovery, it apparears Notepad accepts an extra whitespace
 # (possibly in the N+1 position) to help alignment. This matters not because thier viewport hides it.
 # There is no trimming of any whitespace, so the wrapped buffer could be reconstituted by inserting/detecting a
 # wrap point code which is different than a linebreak.
 # This regex works on un-wrapped source, but could probably be adjusted to produce/work on wrapped buffer text.
 # To reconstitute the source all that is needed is to remove the wrap code which is probably just an extra "\r".

 (?:
      # -- Words/Characters 
      (                       # (1 start)
           (?>                     # Atomic Group - Match words with valid breaks
                .{1,16}                 #  1-N characters
                                        #  Followed by one of 4 prioritized, non-linebreak whitespace
                (?:                     #  break types:
                     (?<= [^\S\r\n] )        # 1. - Behind a non-linebreak whitespace
                     [^\S\r\n]?              #      ( optionally accept an extra non-linebreak whitespace )
                  |  (?= \r? \n )            # 2. - Ahead a linebreak
                  |  $                       # 3. - EOS
                  |  [^\S\r\n]               # 4. - Accept an extra non-linebreak whitespace
                )
           )                       # End atomic group
        |  
           .{1,16}                 # No valid word breaks, just break on the N'th character
      )                       # (1 end)
      (?: \r? \n )?           # Optional linebreak after Words/Characters
   |  
      # -- Or, Linebreak
      (?: \r? \n | $ )        # Stand alone linebreak or at EOS
 )

测试用例 换行宽度 N 为 16。输出与记事本的输出相匹配,并超过各种宽度。

 $/ = undef;

 $string1 = <DATA>;

 $string1 =~ s/(?:((?>.{1,16}(?:(?<=[^\S\r\n])[^\S\r\n]?|(?=\r?\n)|$|[^\S\r\n]))|.{1,16})(?:\r?\n)?|(?:\r?\n|$))/$1\r\n/g;

 print $string1;

 __DATA__
 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
 bbbbbbbbbbbbbbbbEDIT FOR CLARITY - I                    know there are  ways to do this in   multiple steps, or using LINQ or vanilla C#
 string manipulation. 

 The reason I am using a single regex call, is because I wanted practice. with complex
 regex patterns. - END EDIT
 pppppppppppppppppppUf

输出>>

 hhhhhhhhhhhhhhhh
 hhhhhhhhhhhhhhh
 bbbbbbbbbbbbbbbb
 EDIT FOR CLARITY 
 - I              
       know there 
 are  ways to do 
 this in   
 multiple steps, 
 or using LINQ or 
 vanilla C#
 string 
 manipulation. 

 The reason I am 
 using a single 
 regex call, is 
 because I wanted 
 practice. with 
 complex
 regex patterns. 
 - END EDIT
 pppppppppppppppp
 pppUf

关于c# - 用正则表达式换行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/20431801/

相关文章:

c# - Resources.resx 错误?

python - 具有自动换行功能的 Python 文字处理函数

ios - 文本随机不会自动换行 iOS

css - 如何强制 `span` 不在行尾换行?

c# - 'count' 中的未知列 'field list'

java - 从 JAVA 调用 C#.NET 方法

c# - 尽管订阅了事件,如何仍对对象/控件进行 GC

javascript - 如何创建正则表达式检查罗马数字?

java - 在文件中生成数据库值的字符串格式化问题

c# - 从源文件中删除所有注释(单行/多行)和空行