python-2.7 - 在Elasticsearch中为pdf编制索引时出现mapper_parsing_exception错误

标签 python-2.7 elasticsearch

我正在尝试使用elasticsearch 2.3.4和python为PDF编制索引。想要从pdf提取文本和元数据到索引。使用mapper_attachment插件。

当我尝试编制索引时,出现“mapper_parsing_exception”错误。以下是我的代码,

#Configuration

DIR = 'D:/QA_Testing/testing/data'
ES_HOST = {"host" : "localhost", "port" : 9200}
INDEX_NAME = 'testing'
TYPE_NAME = 'documents'
URL = "D:/xyz.pdf"

es = Elasticsearch(hosts = [ES_HOST])

mapping = {
  "mappings": {
    "documents": {
      "properties": {
        "cv": { "type": "attachment" }
}}}}

file64 = open(URL, "rb").read().encode("base64")
data_dict = {'cv': file64}
data_dict = json.dumps(data_dict)

res = es.indices.create(index = INDEX_NAME, body = mapping)

es.index(index = INDEX_NAME, body = data_dict ,doc_type = "attachment", id=1)

错误:
Traceback (most recent call last):
  File "C:/Users/537095/Desktop/QA/IndexingWorkspace/MainWorkspace/index3.py", line 51, in <module>
    es.index(index = INDEX_NAME, body = data_dict ,doc_type = "attachment", id=1)
  File "C:\Python27\lib\site-packages\elasticsearch\client\utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "C:\Python27\lib\site-packages\elasticsearch\client\__init__.py", line 261, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "C:\Python27\lib\site-packages\elasticsearch\transport.py", line 329, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "C:\Python27\lib\site-packages\elasticsearch\connection\http_urllib3.py", line 106, in perform_request
    self._raise_error(response.status, raw_data)
  File "C:\Python27\lib\site-packages\elasticsearch\connection\base.py", line 105, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
RequestError: TransportError(400, u'mapper_parsing_exception', u'failed to parse')

我做错什么了吗?

最佳答案

您需要更改doc_type,它应该是documents而不是attachment

es.index(index = INDEX_NAME, body = data_dict ,doc_type = "documents", id=1)

关于python-2.7 - 在Elasticsearch中为pdf编制索引时出现mapper_parsing_exception错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39116433/

相关文章:

python - if True 和 if False 语句

python - 如何在 OpenCV Python 中检测全黑图像?

url - Logstash:为文档创建 url 友好的 _id

python - Elasticsearch Python - 索引分析器和搜索分析器

elasticsearch - Elasticsearch 分数计算

csv - 在确定Elasticsearch(通过Logstash)提取的文档类型方面需要帮助

elasticsearch - 单词开头的Elasticsearch multi_match

python - 一张图片上的多个图表(python)

python - 如何在python中基于数组对列表进行子集化

python - 将大写字符串字典转换为大写字符串文本