c# - 有没有办法逐行阅读word文档

标签 c# ms-word

我正在尝试提取 Word 文档中的所有单词。我可以按如下方式一次性完成所有工作......

Word.Application word = new Word.Application();
doc = word.Documents.Open(@"C:\SampleText.doc");
doc.Activate();

foreach (Word.Range docRange in doc.Words) // loads all words in document
{
    IEnumerable<string> sortedSubstrings = Enumerable.Range(0, docRange.Text.Trim().Length)
        .Select(i => docRange.Text.Substring(i))
        .OrderBy(s => s.Length < 3 ? s : s.Remove(2, Math.Min(s.Length - 2, 2)));

    wordPosition =
        (int)
        docRange.get_Information(
            Microsoft.Office.Interop.Word.WdInformation.wdFirstCharacterColumnNumber);

    foreach (var substring in sortedSubstrings)
    {
        index = docRange.Text.IndexOf(substring) + wordPosition;
        charLocation[index] = substring;
    }
}

但是我更愿意一次加载一行文档...是否可以这样做?

我可以按段落加载它,但是我无法遍历段落以提取所有单词。

foreach (Word.Paragraph para in doc.Paragraphs)
{
    foreach (Word.Range docRange in para) // Error: type Word.para is not enumeranle**
    {
        IEnumerable<string> sortedSubstrings = Enumerable.Range(0, docRange.Text.Trim().Length)
            .Select(i => docRange.Text.Substring(i))
            .OrderBy(s => s.Length < 3 ? s : s.Remove(2, Math.Min(s.Length - 2, 2)));

        wordPosition =
            (int)
            docRange.get_Information(
                Microsoft.Office.Interop.Word.WdInformation.wdFirstCharacterColumnNumber);

        foreach (var substring in sortedSubstrings)
        {
            index = docRange.Text.IndexOf(substring) + wordPosition;
            charLocation[index] = substring;
        }

    }
}

最佳答案

这有助于您逐行获取字符串。

    object file = Path.GetDirectoryName(Application.ExecutablePath) + @"\Answer.doc";

    Word.Application wordObject = new Word.ApplicationClass();
    wordObject.Visible = false;

    object nullobject = Missing.Value;
    Word.Document docs = wordObject.Documents.Open
        (ref file, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject,
        ref nullobject, ref nullobject, ref nullobject, ref nullobject);

    String strLine;
    bool bolEOF = false;

    docs.Characters[1].Select();

    int index = 0;
    do
    {
        object unit = Word.WdUnits.wdLine;
        object count = 1;
        wordObject.Selection.MoveEnd(ref unit, ref count);

        strLine = wordObject.Selection.Text;
        richTextBox1.Text += ++index + " - " + strLine + "\r\n"; //for our understanding

        object direction = Word.WdCollapseDirection.wdCollapseEnd;
        wordObject.Selection.Collapse(ref direction);

        if (wordObject.Selection.Bookmarks.Exists(@"\EndOfDoc"))
            bolEOF = true;
    } while (!bolEOF);

    docs.Close(ref nullobject, ref nullobject, ref nullobject);
    wordObject.Quit(ref nullobject, ref nullobject, ref nullobject);
    docs = null;
    wordObject = null;

Here是代码背后的天才。请点击链接以获取有关其工作原理的更多说明。

关于c# - 有没有办法逐行阅读word文档,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/6924056/

相关文章:

c# - Razor View 中不包含 'AsEnumerable' 和 'cannot convert from methodgroup' 的定义

c# - 如何将 Windows 身份验证从 ".aspx"页面传递到 ".ashx"处理程序

database - VBA:使用文本框填充 Word 用户窗体上的列表框 - 查询 excel 数据库

c# - 如何定义继承泛型抽象类的泛型抽象类?

c# - 点击 Div 刷新页面

ms-word - 如何指定用于使用 pandoc 导出的 word doc 的字体?

vba - 从一个Word文档中选择一系列文本,然后复制到另一个Word文档中

c#-4.0 - 如何使用 C# 使用 VSTO wordAddIn 2010 访问 native Word 应用程序进度条

c# - 带键的 LINQ groupby 语句

vba - 有没有办法将 VBA 用户表单的 VB_PredeclaredId 属性更改为 False