java - 用于选择 rtf 源的特定部分的正则表达式

标签 java regex

我试图选择从(包含){\\*\listtable{\\*\listoverridetable{(不包含))开始的所有数据

这是一个示例 RTF:

    {\\rtf1\adeflang1\ansi\\ansicpg1\uc1\adeff3\deff0\stshfdbch3\stshfloch3\stshfhich3\stshfbi3\deflang1\deflangfe1\themelang1\themelangfe0\themelangcs0{\\fonttbl{{\\f0\fbidi \froman\\fcharset0\fprq2{\\*\panose 02020603050405020304}Times New Roman;}{\\f0\fbidi \froman\\fcharset0\fprq2{\\*\panose 02020603050405020304}Times New Roman;}
{\\f3\fbidi \fswiss\\fcharset0\fprq2{\\*\panose 020f0502020204030204}Calibri;}{\\flomajor\\f3\fbidi \froman\\fcharset0\fprq2{\\*\panose 02020603050405020304}Times New Roman;}
{\\fdbmajor\\f3\fbidi \froman\\fcharset0\fprq2{\\*\panose 02020603050405020304}Times New Roman;}{\\fhimajor\\f3\fbidi \fswiss\\fcharset0\fprq2{\\*\panose 020f0302020204030204}Calibri Light;}
{\\fbimajor\\f3\fbidi \froman\\fcharset0\fprq2{\\*\panose 02020603050405020304}Times New Roman;}{\\flominor\\f3\fbidi \froman\\fcharset0\fprq2{\\*\panose 02020603050405020304}Times New Roman;}
{\\fdbminor\\f3\fbidi \froman\\fcharset0\fprq2{\\*\panose 02020603050405020304}Times New Roman;}{\\fhiminor\\f3\fbidi \fswiss\\fcharset0\fprq2{\\*\panose 020f0502020204030204}Calibri;}
{\\fbiminor\\f3\fbidi \froman\\fcharset0\fprq2{\\*\panose 02020603050405020304}Times New Roman;}{\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman CE;}{\\f4\fbidi \froman\\fcharset2\fprq2Times New Roman Cyr;}
{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman Greek;}{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman Tur;}{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman (Hebrew);}{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman (Arabic);}
{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman Baltic;}{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman (Vietnamese);}{\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman CE;}{\\f4\fbidi \froman\\fcharset2\fprq2Times New Roman Cyr;}
{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman Greek;}{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman Tur;}{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman (Hebrew);}{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman (Arabic);}
{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman Baltic;}{\\f4\fbidi \froman\\fcharset1\fprq2Times New Roman (Vietnamese);}{\\f4\fbidi \fswiss\\fcharset2\fprq2Calibri CE;}{\\f4\fbidi \fswiss\\fcharset2\fprq2Calibri Cyr;}
{\\f4\fbidi \fswiss\\fcharset1\fprq2Calibri Greek;}{\\f4\fbidi \fswiss\\fcharset1\fprq2Calibri Tur;}{\\f4\fbidi \fswiss\\fcharset1\fprq2Calibri Baltic;}{\\f4\fbidi \fswiss\\fcharset1\fprq2Calibri (Vietnamese);}
{\\flomajor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman CE;}{\\flomajor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman Cyr;}{\\flomajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Greek;}
{\\flomajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Tur;}{\\flomajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Hebrew);}{\\flomajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Arabic);}
{\\flomajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Baltic;}{\\flomajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Vietnamese);}{\\fdbmajor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman CE;}
{\\fdbmajor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman Cyr;}{\\fdbmajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Greek;}{\\fdbmajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Tur;}
{\\fdbmajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Hebrew);}{\\fdbmajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Arabic);}{\\fdbmajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Baltic;}
{\\fdbmajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Vietnamese);}{\\fhimajor\\f3\fbidi \fswiss\\fcharset2\fprq2Calibri Light CE;}{\\fhimajor\\f3\fbidi \fswiss\\fcharset2\fprq2Calibri Light Cyr;}
{\\fhimajor\\f3\fbidi \fswiss\\fcharset1\fprq2Calibri Light Greek;}{\\fhimajor\\f3\fbidi \fswiss\\fcharset1\fprq2Calibri Light Tur;}{\\fhimajor\\f3\fbidi \fswiss\\fcharset1\fprq2Calibri Light Baltic;}
{\\fhimajor\\f3\fbidi \fswiss\\fcharset1\fprq2Calibri Light (Vietnamese);}{\\fbimajor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman CE;}{\\fbimajor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman Cyr;}
{\\fbimajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Greek;}{\\fbimajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Tur;}{\\fbimajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Hebrew);}
{\\fbimajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Arabic);}{\\fbimajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Baltic;}{\\fbimajor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Vietnamese);}
{\\flominor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman CE;}{\\flominor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman Cyr;}{\\flominor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Greek;}
{\\flominor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Tur;}{\\flominor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Hebrew);}{\\flominor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Arabic);}
{\\flominor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Baltic;}{\\flominor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Vietnamese);}{\\fdbminor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman CE;}
{\\fdbminor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman Cyr;}{\\fdbminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Greek;}{\\fdbminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Tur;}
{\\fdbminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Hebrew);}{\\fdbminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Arabic);}{\\fdbminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Baltic;}
{\\fdbminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Vietnamese);}{\\fhiminor\\f3\fbidi \fswiss\\fcharset2\fprq2Calibri CE;}{\\fhiminor\\f3\fbidi \fswiss\\fcharset2\fprq2Calibri Cyr;}
{\\fhiminor\\f3\fbidi \fswiss\\fcharset1\fprq2Calibri Greek;}{\\fhiminor\\f3\fbidi \fswiss\\fcharset1\fprq2Calibri Tur;}{\\fhiminor\\f3\fbidi \fswiss\\fcharset1\fprq2Calibri Baltic;}
{\\fhiminor\\f3\fbidi \fswiss\\fcharset1\fprq2Calibri (Vietnamese);}{\\fbiminor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman CE;}{\\fbiminor\\f3\fbidi \froman\\fcharset2\fprq2Times New Roman Cyr;}
{\\fbiminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Greek;}{\\fbiminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Tur;}{\\fbiminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Hebrew);}
{\\fbiminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Arabic);}{\\fbiminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman Baltic;}{\\fbiminor\\f3\fbidi \froman\\fcharset1\fprq2Times New Roman (Vietnamese);}}
{\\colortbl;;\red0\green0\blue0;\red0\green0\blue2;\red0\green2\blue2;\red0\green2\blue0;\red2\green0\blue2;\red2\green0\blue0;\red2\green2\blue0;\red2\green2\blue2;\red0\green0\blue1;\red0\green1\blue1;\red0\green1\blue0;
\red1\green0\blue1;\red1\green0\blue0;\red1\green1\blue0;\red1\green1\blue1;\red1\green1\blue1;}{\\*\defchp \f3\fs2}{\\*\defpap \ql \li0\ri0\sa1\sl2\slmult1
\widctlpar\\wrapdefault\\aspalpha\\aspnum\\faauto\\adjustright\\rin0\lin0\itap0}\noqfpromote {\\stylesheet{{\\ql \li0\ri0\sa1\sl2\slmult1\widctlpar\\wrapdefault\\aspalpha\\aspnum\\faauto\\adjustright\\rin0\lin0\itap0\rtlch\\fcs1\af3\afs2\alang1
\ltrch\\fcs0\f3\fs2\lang1\langfe1\cgrid\\langnp1\langfenp1\snext0\sqformat \spriority0Normal;}{\\*\cs1\additive \ssemihidden \sunhideused \spriority1Default Paragraph Font;}{\\*
\ts1\tsrowd\\trftsWidthB3\trpaddl1\trpaddr1\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\tblind0\tblindtype3\tsvertalt\\tsbrdrt\\tsbrdrl\\tsbrdrb\\tsbrdrr\\tsbrdrdgl\\tsbrdrdgr\\tsbrdrh\\tsbrdrv \ql \li0\ri0\sa1\sl2\slmult1
\widctlpar\\wrapdefault\\aspalpha\\aspnum\\faauto\\adjustright\\rin0\lin0\itap0\rtlch\\fcs1\af3\afs2\alang1\ltrch\\fcs0\f3\fs2\lang1\langfe1\cgrid\\langnp1\langfenp1\snext1\ssemihidden \sunhideused Normal Table;}{

\s1\ql \li7\ri0\sa1\sl2\slmult1\widctlpar\\wrapdefault\\aspalpha\\aspnum\\faauto\\adjustright\\rin0\lin7\itap0\contextualspace \rtlch\\fcs1\af3\afs2\alang1\ltrch\\fcs0\f3\fs2\lang1\langfe1\cgrid\\langnp1\langfenp1
\sbasedon0\snext1\sqformat \spriority3\styrsid1List Paragraph;}{\\*\ts1\tsrowd\\trbrdrt\\brdrs\\brdrw1\trbrdrl\\brdrs\\brdrw1\trbrdrb\\brdrs\\brdrw1\trbrdrr\\brdrs\\brdrw1\trbrdrh\\brdrs\\brdrw1\trbrdrv\\brdrs\\brdrw1
\trpaddl1\trpaddr1\trpaddfl3\trpaddft3\trpaddfb3\trpaddfr3\tblind0\tblindtype0\tsvertalt\\tsbrdrt\\tsbrdrl\\tsbrdrb\\tsbrdrr\\tsbrdrdgl\\tsbrdrdgr\\tsbrdrh\\tsbrdrv \ql \li0\ri0\widctlpar\\wrapdefault\\aspalpha\\aspnum\\faauto\\adjustright\\rin0\lin0\itap0
\rtlch\\fcs1\af0\afs2\alang1\ltrch\\fcs0\f3\fs2\lang1\langfe1\cgrid\\langnp1\langfenp1\sbasedon1\snext1\spriority3\styrsid1Table Grid;}}{\\*\listtable{{\\list\\listtemplateid6\listhybrid{{\\listlevel\\levelnfc0
\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace3\levelindent0{\\leveltext\\leveltemplateid6\'0\'0.;}{\\levelnumbers\\'0;}\rtlch\\fcs1\af0\ltrch\\fcs0\fi-3\li7\lin7}{\\listlevel\\levelnfc4\levelnfcn4\leveljc0\leveljcn0
\levelfollow0\levelstartat1\lvltentative\\levelspace3\levelindent0{\\leveltext\\leveltemplateid6\'0\'0.;}{\\levelnumbers\\'0;}\rtlch\\fcs1\af0\ltrch\\fcs0\fi-3\li1\lin1}{\\listlevel\\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0
\levelstartat1\lvltentative\\levelspace3\levelindent0{\\leveltext\\leveltemplateid6\'0\'0.;}{\\levelnumbers\\'0;}\rtlch\\fcs1\af0\ltrch\\fcs0\fi-1\li2\lin2}{\\listlevel\\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1
\lvltentative\\levelspace3\levelindent0{\\leveltext\\leveltemplateid6\'0\'0.;}{\\levelnumbers\\'0;}\rtlch\\fcs1\af0\ltrch\\fcs0\fi-3\li2\lin2}{\\listlevel\\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\lvltentative

\levelspace3\levelindent0{\\leveltext\\leveltemplateid6\'0\'0.;}{\\levelnumbers\\'0;}\rtlch\\fcs1\af0\ltrch\\fcs0\fi-3\li3\lin3}{\\listlevel\\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\lvltentative\\levelspace3
\levelindent0{\\leveltext\\leveltemplateid6\'0\'0.;}{\\levelnumbers\\'0;}\rtlch\\fcs1\af0\ltrch\\fcs0\fi-1\li4\lin4}{\\listlevel\\levelnfc0\levelnfcn0\leveljc0\leveljcn0\levelfollow0\levelstartat1\lvltentative\\levelspace3\levelindent0
{\\leveltext\\leveltemplateid6\'0\'0.;}{\\levelnumbers\\'0;}\rtlch\\fcs1\af0\ltrch\\fcs0\fi-3\li5\lin5}{\\listlevel\\levelnfc4\levelnfcn4\leveljc0\leveljcn0\levelfollow0\levelstartat1\lvltentative\\levelspace3\levelindent0{\\leveltext

\leveltemplateid6\'0\'0.;}{\\levelnumbers\\'0;}\rtlch\\fcs1\af0\ltrch\\fcs0\fi-3\li5\lin5}{\\listlevel\\levelnfc2\levelnfcn2\leveljc2\leveljcn2\levelfollow0\levelstartat1\lvltentative\\levelspace3\levelindent0{\\leveltext

\leveltemplateid6\'0\'0.;}{\\levelnumbers\\'0;}\rtlch\\fcs1\af0\ltrch\\fcs0\fi-1\li6\lin6}{\\listname ;}\listid9}}{\\*\listoverridetable{

到目前为止我已经写过:

public String getContent(){           //my method
    String str = null;
    Pattern pattern = Pattern.compile("({\\\\\*\\listtable(.*\W)*){\\\\\*\\listoverridetable");
    Matcher matcher = pattern.matcher(bodyContent);

    if (matcher.find()) {
         str = matcher.group(2);
    }
    return str;
}

但如果我尝试将正则表达式实现为this,它总是会给我错误,例如无效的转义序列。 。我应该如何在java中执行此操作?

最佳答案

您需要非常小心 Java 正则表达式中的转义。为了避免转义至少其中一些,只需使用字符类 [...]

这是可以匹配文本的字符串:

String pattern = "(?s)[{]\\\\\\\\[*]\\\\listtable.*?(?=[{]\\\\\\\\[*]\\\\listoverridetable[{])";

要匹配文字 \,正则表达式模式中需要 4 个 \ 符号。

请注意 (?s) 单行修饰符,它使点能够匹配换行符。在正则表达式中,(.*\\W)*) 在获取包含换行符的子字符串时效率非常低。

参见IDEONE demo

关于java - 用于选择 rtf 源的特定部分的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31876889/

相关文章:

java - 尝试将字符串解析为整数时出现数字格式异常

Java重复捕手for循环仍然返回欺骗

java - 如何用Java仅将csv文件上传到服务器

c# - 获取正则表达式中第一次出现的匹配项

ruby-on-rails - 在尊重 CDATA 的同时转义 HTML 符号的正则表达式

javascript - 句号、单词和冒号的正则表达式

Java Regex - 使用 String 的 replaceAll 方法替换换行符

java - 制作对象数组

c++ - 为什么正则表达式在 C++ 的日语字符串中找不到 "("?

java - 如何替换包含双引号的字符?