.net - 如何分割 RTF 文件

标签 .net parsing rtf

我想通过字符串 [BreakPage] 将 RTF 文件(使用 C# 或 VB.Net)拆分为 2 个或更多部分。例如,我有一个包含 [BreakPage] 的文件,需要将其分为两部分:

{\rtf1\ansi\ansicpg1251\uc1\deff0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deflang1049\deflangfe1049{\fonttbl{\f0\froman\fcharset204\fprq2{*\panose 02020603050405020304}Times New Roman;}{\f38\froman\fcharset0\fprq2 Times New Roman;} {\f36\froman\fcharset238\fprq2 Times New Roman CE;}{\f39\froman\fcharset161\fprq2 Times New Roman Greek;}{\f40\froman\fcharset162\fprq2 Times New Roman Tur;}{\f41\froman\fcharset177\fprq2 Times New Roman (Hebrew);} {\f42\froman\fcharset178\fprq2 Times New Roman (Arabic);}{\f43\froman\fcharset186\fprq2 Times New Roman Baltic;}{\f44\froman\fcharset163\fprq2 Times New Roman (Vietnamese);}}{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255; \red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;\red128\green0\blue0;\red128\green128\blue0; \red128\green128\blue128;\red192\green192\blue192;}{\stylesheet{\ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1049\langfe1049\cgrid\langnp1049\langfenp1049 \snext0 Normal;}{*\cs10 \additive \ssemihidden Default Paragraph Font;}{*\ts11\tsrowd\trftsWidthB3\trpaddl108\trpaddr108\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\trcbpat1\trcfpat1\tscellwidthfts0\tsvertalt\tsbrdrt\tsbrdrl\tsbrdrb\tsbrdrr\tsbrdrdgl\tsbrdrdgr\tsbrdrh\tsbrdrv \ql \li0\ri0\widctlpar\aspalpha\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs20\lang1024\langfe1024\cgrid\langnp1024\langfenp1024 \snext11 \ssemihidden Normal Table;}}{*\latentstyles\lsdstimax156\lsdlockeddef0}{*\rsidtbl \rsid2111663\rsid7154806 \rsid15558346}{*\generator Microsoft Word 11.0.5604;}{\info{\author Programmer}{\operator Programmer}{\creatim\yr2011\mo8\dy2\hr12\min45}{\revtim\yr2011\mo8\dy5\hr12\min34}{\version3}{\edmins1}{\nofpages1}{\nofwords5}{\nofchars34}{\nofcharsws38} {\vern24689}}\margl1701\margr850\margt1134\margb1134 \widowctrl\ftnbj\aenddoc\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb\nospaceforul\hyphcaps0\horzdoc\dghspace120\dgvspace120\dghorigin1701\dgvorigin1984\dghshow0\dgvshow3 \jcompress\viewkind1\viewscale100\nolnhtadjtbl\rsidroot15558346 \fet0\sectd \linex0\sectdefaultcl\sftnbj {*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang {\pntxta .}}{*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang {\pntxta .}}{*\pnseclvl3 \pndec\pnstart1\pnindent720\pnhang {\pntxta .}}{*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang {\pntxta )}}{*\pnseclvl5\pndec\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}} {*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}{*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang {\pntxtb (}{\pntxta )}}\pard\plain \ql \li0\ri0\nowidctlpar\faauto\rin0\lin0\itap0 \fs24\lang1049\langfe1049\cgrid\langnp1049\langfenp1049 {\b\insrsid7154806\charrsid7154806 Line 1 \par }{\insrsid7154806 \par }{\i\insrsid7154806\charrsid7154806 Line3}{\lang1048\langfe1049\langnp1048\insrsid7154806 \par }{\lang1048\langfe1049\langnp1048\insrsid2111663 [BreakPage] \par }{\insrsid7154806 Line4 \par \par Line5 \par }}

有人可以帮助我吗?

谢谢!

最佳答案

问题是 RTF 在全局 header 中有一些(但不一定是全部)格式信息。为了分割 RTF 文本,以便结果再次成为应用了格式的有效 RTF,您本质上需要知道标题信息在哪里,并在分割中复制它。

有两种方法可以做到这一点:

  1. 编写 RTF 解析器
  2. 使用现有的 RTF 解析器

(1) 是可行的,但需要时间。幸运的是,RTF 解析器已经存在,例如 this one on CodeProject .

或者,您也可以将 RTF 文本加载到 RichTextBox 中。 ,然后在 RichTextBox 中搜索拆分文本 "[BreakPage]",以编程方式选择第一部分和第二部分,并使用 SelectedRtf 检索 RTF 文本。属性。

关于.net - 如何分割 RTF 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6954289/

相关文章:

c# - 代码生成 - 领域/模型优先 (DDD)

java - java 如何获取xml中元素的值?

c# - 如何将带有where子句的多个SQL LEFT JOIN语句转换为LINQ

.net - C# 字段属性

javascript - url 资源部分的正则表达式

java - 解析/扫描/分词 "raw XML"

c# - 如何将图像插入 RichTextBox?

iphone - 将 BOM 与文件一起保存

vb6 - 如何将 RTF 文件分割成行?

c# - 如何为纯度设置异常属性?