Python无法导入tika

标签 python module apache-tika

我在 python 文件中导入 tika 时遇到问题。我花了很多时间谷歌搜索但找不到任何东西。这是 iPython 命令:导入 tika 以及后续的堆栈跟踪。

我发现 tika 所依赖的模块可能有问题,例如 requests 或 urllib3。但是,当我尝试用 pip 安装它们时,它说要求已经满足。我还仔细检查了 PYTHONHOME 导演,我 99% 确定它是正确的。

    $ ipython
Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 17:54:52) [MSC v.1900 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

IPython 4.2.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
WARNING: Readline services not available or not loaded.
WARNING: Proper color support under MS Windows requires the pyreadline library.
You can find it at:
http://ipython.org/pyreadline.html

Defaulting color scheme to 'NoColor'

In [1]: import tika
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
C:\cygwin64\lib\python3.6\site-packages\requests\packages\__init__.py in <module>()
     26 try:
---> 27     from . import urllib3
     28 except ImportError:

ImportError: cannot import name 'urllib3'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-1-9f3de0ba3e70> in <module>()
----> 1 import tika

C:\cygwin64\lib\python3.6\site-packages\tika\tika.py in <module>()
     18
     19 try:
---> 20     __import__('pkg_resources').declare_namespace(__name__)
     21 except ImportError:
     22     from pkgutil import extend_path

C:\cygwin64\lib\python3.6\site-packages\pkg_resources\__init__.py in declare_namespace(packageName)
   2161             # Ensure all the parent's path items are reflected in the child,
   2162             # if they apply
-> 2163             _handle_ns(packageName, path_item)
   2164
   2165     finally:

C:\cygwin64\lib\python3.6\site-packages\pkg_resources\__init__.py in _handle_ns(packageName, path_item)
   2096         path = module.__path__
   2097         path.append(subpath)
-> 2098         loader.load_module(packageName)
   2099         _rebuild_mod_path(path, packageName, module)
   2100     return subpath

C:\cygwin64\lib\python3.6\site-packages\tika\tika.py in <module>()
     89     open = codecs.open
     90
---> 91 import requests
     92 import socket
     93 import tempfile

C:\cygwin64\lib\python3.6\site-packages\requests\__init__.py in <module>()
     50 # Attempt to enable urllib3's SNI support, if possible
     51 try:
---> 52     from .packages.urllib3.contrib import pyopenssl
     53     pyopenssl.inject_into_urllib3()
     54 except ImportError:

C:\cygwin64\lib\python3.6\site-packages\requests\packages\__init__.py in <module>()
     27     from . import urllib3
     28 except ImportError:
---> 29     import urllib3
     30     sys.modules['%s.urllib3' % __name__] = urllib3
     31

C:\cygwin64\lib\python3.6\site-packages\urllib3\__init__.py in <module>()
      6 import warnings
      7
----> 8 from .connectionpool import (
      9     HTTPConnectionPool,
     10     HTTPSConnectionPool,

C:\cygwin64\lib\python3.6\site-packages\urllib3\connectionpool.py in <module>()
      9
     10
---> 11 from .exceptions import (
     12     ClosedPoolError,
     13     ProtocolError,

C:\cygwin64\lib\python3.6\site-packages\urllib3\exceptions.py in <module>()
      1 from __future__ import absolute_import
----> 2 from .packages.six.moves.http_client import (
      3     IncompleteRead as httplib_IncompleteRead
      4 )
      5 # Base Exceptions

ValueError: source code string cannot contain null bytes

最佳答案

如果其他人看到这个,这就是我最终解决问题的方法。

我错误地认为 python-tika 模块是完全打包的、可以运行的 tika 版本。事实上,您需要从 Apache 下载 java tika 服务器,并且当您使用 python-tika 时它必须正在运行(您可以轻松地在本地主机上运行服务器)。

然后,Python-tika 模块允许您从 Python 代码向该服务器发出请求。我可能应该知道这一点,但由于某种原因我没有在文档中找到它。

关于Python无法导入tika,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44597445/

相关文章:

python - python 中的带通滤波器

javascript - JavaScript 中的静态导入

multithreading - Perl 线程安全模块

java - 将语言配置文件添加到 Apache Tika

python - 删除短输入函数中的多余行

python - 返回BadRequestError实例时引发酒杯 “BadRequestError is not JSON serializable”异常

javascript - 如何在 Javascript ES6 中使用 "hide"类名?

grails - 将apache-tika依赖项添加到grails项目中的正确方法是什么

java - Tika 无法删除临时文件

python - lxml 在解析时删除 <?xml ...> 标签?