java - 如何在java中读取doc和docx

标签 java android apache-poi

首先你应该知道我已经研究了很多问题,但没有一个对我有帮助。 我希望能够阅读 doc 和 docx 文档(当我说阅读时,我指的是最简单的事情,仅阅读文本)。 我看到了一些关于 poi 和 scrappad 的帖子,但我无法使其正常工作,而且大多数时候 eclipse 甚至无法构建我的项目...

有人可以给我一个 doc 和 docx 的代码示例,并给我我需要使用的所有 jar 的名称(或链接)吗?

谢谢!

基本上这是代码:

try {
    if (getFileExtention(path).equals("docx")) {
        FileInputStream fis = new FileInputStream(path);
        XWPFWordExtractor oleTextExtractor =
            new XWPFWordExtractor(new XWPFDocument(fis));
        return oleTextExtractor.getText();
    } else if (getFileExtention(path).equals("doc")) {
        FileInputStream fis = new FileInputStream(path);
        WordExtractor we = new WordExtractor(fis);
        return we.getText();
    }
} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}


return "";

我有以下 jar :

dom4j-1.6.1.jar

poi-3.8-20120326.jar

poi-ooxml-3.8-20120326.jar

poi-scratchpad-3.8-20120326.jar

xmlbeans-xmlpublic-2.4.0.jar

我有以下问题:

这种情况在构建过程中多次出现

> [2012-07-05 14:12:53 - iCards] Dx warning: Ignoring InnerClasses
> attribute for an anonymous inner class
> (org.dom4j.xpath.DefaultXPath$1) that doesn't come with an associated
> EnclosingMethod attribute. This class was probably produced by a
> compiler that did not target the modern .class file format. The
> recommended solution is to recompile the class from source, using an
> up-to-date compiler and without specifying any "-target" type options.
> The consequence of ignoring this warning is that reflective operations
> on this class will incorrectly indicate that it is *not* an inner
> class.

另一个:(当尝试阅读 docx 时)

> 07-05 14:17:13.245: W/System.err(4339): java.io.IOException: read
> failed: EBADF (Bad file number) 07-05 14:17:13.255:
> W/System.err(4339):   at libcore.io.IoBridge.read(IoBridge.java:432)
> 07-05 14:17:13.260: W/System.err(4339):   at
> java.io.FileInputStream.read(FileInputStream.java:179) 07-05
> 14:17:13.265: W/System.err(4339):     at
> java.io.PushbackInputStream.read(PushbackInputStream.java:196) 07-05
> 14:17:13.270: W/System.err(4339):     at
> libcore.io.Streams.readFully(Streams.java:81) 07-05 14:17:13.275:
> W/System.err(4339):   at
> java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:230)
> 07-05 14:17:13.280: W/System.err(4339):   at
> org.apache.poi.openxml4j.util.ZipInputStreamZipEntrySource.<init>(ZipInputStreamZipEntrySource.java:51)
> 07-05 14:17:13.285: W/System.err(4339):   at
> org.apache.poi.openxml4j.opc.ZipPackage.<init>(ZipPackage.java:83)
> 07-05 14:17:13.290: W/System.err(4339):   at
> org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:228)
> 07-05 14:17:13.295: W/System.err(4339):   at
> org.apache.poi.util.PackageHelper.open(PackageHelper.java:39) 07-05
> 14:17:13.300: W/System.err(4339):     at
> org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:120)
> 07-05 14:17:13.305: W/System.err(4339):   at
> com.ronEven.iCards.AddRemove.loadFile(AddRemove.java:504) 07-05
> 14:17:13.310: W/System.err(4339):     at
> com.ronEven.iCards.AddRemove.showDoc(AddRemove.java:495) 07-05
> 14:17:13.315: W/System.err(4339):     at
> com.ronEven.iCards.AddRemove.setFilePath(AddRemove.java:492) 07-05
> 14:17:13.320: W/System.err(4339):     at
> com.ronEven.iCards.FileDialog$1.onClick(FileDialog.java:177) 07-05
> 14:17:13.325: W/System.err(4339):     at
> android.view.View.performClick(View.java:3591) 07-05 14:17:13.330:
> W/System.err(4339):   at
> android.view.View$PerformClick.run(View.java:14263) 07-05
> 14:17:13.335: W/System.err(4339):     at
> android.os.Handler.handleCallback(Handler.java:605) 07-05
> 14:17:13.340: W/System.err(4339):     at
> android.os.Handler.dispatchMessage(Handler.java:92) 07-05
> 14:17:13.345: W/System.err(4339):     at
> android.os.Looper.loop(Looper.java:137) 07-05 14:17:13.345:
> W/System.err(4339):   at
> android.app.ActivityThread.main(ActivityThread.java:4507) 07-05
> 14:17:13.345: W/System.err(4339):     at
> java.lang.reflect.Method.invokeNative(Native Method) 07-05
> 14:17:13.350: W/System.err(4339):     at
> java.lang.reflect.Method.invoke(Method.java:511) 07-05 14:17:13.350:
> W/System.err(4339):   at
> com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:790)
> 07-05 14:17:13.350: W/System.err(4339):   at
> com.android.internal.os.ZygoteInit.main(ZygoteInit.java:557) 07-05
> 14:17:13.350: W/System.err(4339):     at
> dalvik.system.NativeStart.main(Native Method) 07-05 14:17:13.355:
> W/System.err(4339): Caused by: libcore.io.ErrnoException: read failed:
> EBADF (Bad file number) 07-05 14:17:13.360: W/System.err(4339):   at
> libcore.io.Posix.readBytes(Native Method) 07-05 14:17:13.360:
> W/System.err(4339):   at libcore.io.Posix.read(Posix.java:118) 07-05
> 14:17:13.360: W/System.err(4339):     at
> libcore.io.BlockGuardOs.read(BlockGuardOs.java:149) 07-05
> 14:17:13.360: W/System.err(4339):     at
> libcore.io.IoBridge.read(IoBridge.java:422) 07-05 14:17:13.365:
> W/System.err(4339):   ... 24 more

最后一个是在尝试阅读文档时

    07-05 14:17:37.015: W/System.err(4339): java.io.IOException: read failed: EBADF (Bad file number)
07-05 14:17:37.020: W/System.err(4339):     at libcore.io.IoBridge.read(IoBridge.java:432)
07-05 14:17:37.025: W/System.err(4339):     at java.io.FileInputStream.read(FileInputStream.java:179)
07-05 14:17:37.055: W/System.err(4339):     at java.io.PushbackInputStream.read(PushbackInputStream.java:196)
07-05 14:17:37.055: W/System.err(4339):     at java.io.InputStream.read(InputStream.java:163)
07-05 14:17:37.060: W/System.err(4339):     at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:95)
07-05 14:17:37.065: W/System.err(4339):     at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:53)
07-05 14:17:37.070: W/System.err(4339):     at com.ronEven.iCards.AddRemove.loadFile(AddRemove.java:509)
07-05 14:17:37.075: W/System.err(4339):     at com.ronEven.iCards.AddRemove.showDoc(AddRemove.java:495)
07-05 14:17:37.085: W/System.err(4339):     at com.ronEven.iCards.AddRemove.setFilePath(AddRemove.java:492)
07-05 14:17:37.090: W/System.err(4339):     at com.ronEven.iCards.FileDialog$1.onClick(FileDialog.java:177)
07-05 14:17:37.095: W/System.err(4339):     at android.view.View.performClick(View.java:3591)
07-05 14:17:37.100: W/System.err(4339):     at android.view.View$PerformClick.run(View.java:14263)
07-05 14:17:37.105: W/System.err(4339):     at android.os.Handler.handleCallback(Handler.java:605)
07-05 14:17:37.110: W/System.err(4339):     at android.os.Handler.dispatchMessage(Handler.java:92)
07-05 14:17:37.115: W/System.err(4339):     at android.os.Looper.loop(Looper.java:137)
07-05 14:17:37.120: W/System.err(4339):     at android.app.ActivityThread.main(ActivityThread.java:4507)
07-05 14:17:37.120: W/System.err(4339):     at java.lang.reflect.Method.invokeNative(Native Method)
07-05 14:17:37.125: W/System.err(4339):     at java.lang.reflect.Method.invoke(Method.java:511)
07-05 14:17:37.125: W/System.err(4339):     at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:790)
07-05 14:17:37.130: W/System.err(4339):     at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:557)
07-05 14:17:37.130: W/System.err(4339):     at dalvik.system.NativeStart.main(Native Method)
07-05 14:17:37.130: W/System.err(4339): Caused by: libcore.io.ErrnoException: read failed: EBADF (Bad file number)
07-05 14:17:37.150: W/System.err(4339):     at libcore.io.Posix.readBytes(Native Method)
07-05 14:17:37.160: W/System.err(4339):     at libcore.io.Posix.read(Posix.java:118)
07-05 14:17:37.160: W/System.err(4339):     at libcore.io.BlockGuardOs.read(BlockGuardOs.java:149)
07-05 14:17:37.160: W/System.err(4339):     at libcore.io.IoBridge.read(IoBridge.java:422)
07-05 14:17:37.165: W/System.err(4339):     ... 20 more

最佳答案

Tika支持 Microsoft Office 格式以及许多其他格式,它为您提供所有格式的通用界面,并隐藏维护和学习如何使用许多不同库的复杂性。就像调用 function 一样简单。您还可以使用Office ParserOOXMLParser直接。

关于java - 如何在java中读取doc和docx,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11342978/

相关文章:

java - 从java中的网页读取源代码

java - HTTP Post 到 API 返回 403 FORBIDDEN

java - excel中的数字单元格值读取异常,小数点后附加额外数字

java - 使用Java修改xlsm文档

java - 在 Java 中序列化和反序列化 android.graphics.Bitmap

java - 我可以将 HTMLUnit 配置为只运行特定的 javascript 进程而不是整个过程吗?

android - 如何使用 RoboSpice 下载 html 源代码?

java - 在android中使用套接字进行数据传输

android - onServiceConnected 是否仅在 Service onCreate 之后调用?

android apache poi 错误