我正在准备一些 Whatsapp 聊天记录来呈现统计数据和词云。然而,我的数据时不时地有双换行符,这会扰乱日志的格式,我想知道如何自动修复。
13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats
well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
搜索并删除空行(轻松修复)。但是我仍然留下了破坏日期和时间格式的行:
13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats
well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
目标格式:
13 Mar 18:51 - nicola: mainly he's crap
13 Mar 18:52 - Sebastian K: ... you didn't really dress it up
13 Mar 18:52 - nicola: and he has no natural grace like most cats well no i didn't lol
13 Mar 18:52 - nicola: you saw the last video
13 Mar 18:53 - Sebastian K: Stilton jumped onto that wall effortlessly while Ched almost killed himself yea...
也许解决方案是利用此规则:我需要保留的换行符遵循以下模式:
TEXT *linebreak*
NUMBER(begging of date column)
讨厌的遵循模式:
TEXT *linebreak*
TEXT
我如何使用 Notepad++ 修复它?
最佳答案
在搜索和替换对话框中你可以搜索这个模式
\r\n(?!\d)
启用正则表达式并替换为空。
\r\n
搜索由 CR 和 LF 组成的换行符。在 Notepad++ 中启用控制字符的显示以查看您有什么换行符。
(?!\d)
是 negative lookahead当后面没有数字时,断言为真。这适用于您的示例,但对于某些特殊情况可能会失败,您可以将其扩展到一个模式,例如(?!\d{2}\s)
当日期始终为两位数时。
关于regex - 使用 Notepad++ 双换行符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30314592/