python - 将 Twitter 导入 Elasticsearch 时出现 Illegal_argument_Exception

标签 python elasticsearch

我是 Elasticsearch 的新手,正在尝试通过将 Twitter 数据导入 Elasticsearch 并在其上运行 Kibana 来对 Twitter 数据进行一些数据分析。将 Twitter 数据导入 Elasticsearch 时我遇到了困难。如有任何帮助,我们将不胜感激!

这是一个产生错误的示例工作程序。

import json
from elasticsearch import Elasticsearch
es = Elasticsearch()
data = json.loads(open("data.json").read())
es.index(index='tweets5', doc_type='tweets', id=data['id'], body=data)

错误如下:

Traceback (most recent call last):
  File "elasticsearch_import_test.py", line 5, in <module>
    es.index(index='tweets5', doc_type='tweets', id=data['id'], body=data)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 69, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 279, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/transport.py", line 329, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 109, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 108, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, u'illegal_argument_exception', u'[Raza][127.0.0.1:9300][indices:data/write/index[p]]')

这是一个示例 Twitter JSON 文件 (data.json)

{
    "_id": {
        "$oid": "570597358c68d71c16b3b722"
    },
    "contributors": null,
    "coordinates": null,
    "created_at": "Wed Apr 06 23:09:41 +0000 2016",
    "entities": {
        "hashtags": [
            {
                "indices": [
                    68,
                    72
                ],
                "text": "dnd"
            },
            {
                "indices": [
                    73,
                    79
                ],
                "text": "Nat20"
            },
            {
                "indices": [
                    80,
                    93
                ],
                "text": "CriticalRole"
            },
            {
                "indices": [
                    94,
                    103
                ],
                "text": "d20babes"
            }
        ],
        "media": [
            {
                "display_url": "pic.twitter.com/YQoxEuEAXV",
                "expanded_url": "http://twitter.com/Zenttsilverwing/status/715953298076012545/photo/1",
                "id": 715953292849754112,
                "id_str": "715953292849754112",
                "indices": [
                    104,
                    127
                ],
                "media_url": "http://pbs.twimg.com/media/Ce-TugAUsAASZht.jpg",
                "media_url_https": "https://pbs.twimg.com/media/Ce-TugAUsAASZht.jpg",
                "sizes": {
                    "large": {
                        "h": 768,
                        "resize": "fit",
                        "w": 1024
                    },
                    "medium": {
                        "h": 450,
                        "resize": "fit",
                        "w": 600
                    },
                    "small": {
                        "h": 255,
                        "resize": "fit",
                        "w": 340
                    },
                    "thumb": {
                        "h": 150,
                        "resize": "crop",
                        "w": 150
                    }
                },
                "source_status_id": 715953298076012545,
                "source_status_id_str": "715953298076012545",
                "source_user_id": 2375847847,
                "source_user_id_str": "2375847847",
                "type": "photo",
                "url": "https://shortened.url/YQoxEuEAXV"
            }
        ],
        "symbols": [],
        "urls": [
            {
                "display_url": "darkcastlecollectibles.com",
                "expanded_url": "http://www.darkcastlecollectibles.com/",
                "indices": [
                    44,
                    67
                ],
                "url": "https://shortened.url/SJgFTE0o8h"
            }
        ],
        "user_mentions": [
            {
                "id": 2375847847,
                "id_str": "2375847847",
                "indices": [
                    3,
                    19
                ],
                "name": "Zack Chini",
                "screen_name": "Zenttsilverwing"
            }
        ]
    },
    "extended_entities": {
        "media": [
            {
                "display_url": "pic.twitter.com/YQoxEuEAXV",
                "expanded_url": "http://twitter.com/Zenttsilverwing/status/715953298076012545/photo/1",
                "id": 715953292849754112,
                "id_str": "715953292849754112",
                "indices": [
                    104,
                    127
                ],
                "media_url": "http://pbs.twimg.com/media/Ce-TugAUsAASZht.jpg",
                "media_url_https": "https://pbs.twimg.com/media/Ce-TugAUsAASZht.jpg",
                "sizes": {
                    "large": {
                        "h": 768,
                        "resize": "fit",
                        "w": 1024
                    },
                    "medium": {
                        "h": 450,
                        "resize": "fit",
                        "w": 600
                    },
                    "small": {
                        "h": 255,
                        "resize": "fit",
                        "w": 340
                    },
                    "thumb": {
                        "h": 150,
                        "resize": "crop",
                        "w": 150
                    }
                },
                "source_status_id": 715953298076012545,
                "source_status_id_str": "715953298076012545",
                "source_user_id": 2375847847,
                "source_user_id_str": "2375847847",
                "type": "photo",
                "url": "https://shortened.url/YQoxEuEAXV"
            },
            {
                "display_url": "pic.twitter.com/YQoxEuEAXV",
                "expanded_url": "http://twitter.com/Zenttsilverwing/status/715953298076012545/photo/1",
                "id": 715953295727009793,
                "id_str": "715953295727009793",
                "indices": [
                    104,
                    127
                ],
                "media_url": "http://pbs.twimg.com/media/Ce-TuquUIAEsVn9.jpg",
                "media_url_https": "https://pbs.twimg.com/media/Ce-TuquUIAEsVn9.jpg",
                "sizes": {
                    "large": {
                        "h": 768,
                        "resize": "fit",
                        "w": 1024
                    },
                    "medium": {
                        "h": 450,
                        "resize": "fit",
                        "w": 600
                    },
                    "small": {
                        "h": 255,
                        "resize": "fit",
                        "w": 340
                    },
                    "thumb": {
                        "h": 150,
                        "resize": "crop",
                        "w": 150
                    }
                },
                "source_status_id": 715953298076012545,
                "source_status_id_str": "715953298076012545",
                "source_user_id": 2375847847,
                "source_user_id_str": "2375847847",
                "type": "photo",
                "url": "https://shortened.url/YQoxEuEAXV"
            }
        ]
    },
    "favorite_count": 0,
    "favorited": false,
    "filter_level": "low",
    "geo": null,
    "id": 717851801417031680,
    "id_str": "717851801417031680",
    "in_reply_to_screen_name": null,
    "in_reply_to_status_id": null,
    "in_reply_to_status_id_str": null,
    "in_reply_to_user_id": null,
    "in_reply_to_user_id_str": null,
    "is_quote_status": false,
    "lang": "en",
    "place": null,
    "possibly_sensitive": false,
    "retweet_count": 0,
    "retweeted": false,
    "retweeted_status": {
        "contributors": null,
        "coordinates": null,
        "created_at": "Fri Apr 01 17:25:42 +0000 2016",
        "entities": {
            "hashtags": [
                {
                    "indices": [
                        47,
                        51
                    ],
                    "text": "dnd"
                },
                {
                    "indices": [
                        52,
                        58
                    ],
                    "text": "Nat20"
                },
                {
                    "indices": [
                        59,
                        72
                    ],
                    "text": "CriticalRole"
                },
                {
                    "indices": [
                        73,
                        82
                    ],
                    "text": "d20babes"
                }
            ],
            "media": [
                {
                    "display_url": "pic.twitter.com/YQoxEuEAXV",
                    "expanded_url": "http://twitter.com/Zenttsilverwing/status/715953298076012545/photo/1",
                    "id": 715953292849754112,
                    "id_str": "715953292849754112",
                    "indices": [
                        83,
                        106
                    ],
                    "media_url": "http://pbs.twimg.com/media/Ce-TugAUsAASZht.jpg",
                    "media_url_https": "https://pbs.twimg.com/media/Ce-TugAUsAASZht.jpg",
                    "sizes": {
                        "large": {
                            "h": 768,
                            "resize": "fit",
                            "w": 1024
                        },
                        "medium": {
                            "h": 450,
                            "resize": "fit",
                            "w": 600
                        },
                        "small": {
                            "h": 255,
                            "resize": "fit",
                            "w": 340
                        },
                        "thumb": {
                            "h": 150,
                            "resize": "crop",
                            "w": 150
                        }
                    },
                    "type": "photo",
                    "url": "https://shortened.url/YQoxEuEAXV"
                }
            ],
            "symbols": [],
            "urls": [
                {
                    "display_url": "darkcastlecollectibles.com",
                    "expanded_url": "http://www.darkcastlecollectibles.com/",
                    "indices": [
                        23,
                        46
                    ],
                    "url": "https://shortened.url/SJgFTE0o8h"
                }
            ],
            "user_mentions": []
        },
        "extended_entities": {
            "media": [
                {
                    "display_url": "pic.twitter.com/YQoxEuEAXV",
                    "expanded_url": "http://twitter.com/Zenttsilverwing/status/715953298076012545/photo/1",
                    "id": 715953292849754112,
                    "id_str": "715953292849754112",
                    "indices": [
                        83,
                        106
                    ],
                    "media_url": "http://pbs.twimg.com/media/Ce-TugAUsAASZht.jpg",
                    "media_url_https": "https://pbs.twimg.com/media/Ce-TugAUsAASZht.jpg",
                    "sizes": {
                        "large": {
                            "h": 768,
                            "resize": "fit",
                            "w": 1024
                        },
                        "medium": {
                            "h": 450,
                            "resize": "fit",
                            "w": 600
                        },
                        "small": {
                            "h": 255,
                            "resize": "fit",
                            "w": 340
                        },
                        "thumb": {
                            "h": 150,
                            "resize": "crop",
                            "w": 150
                        }
                    },
                    "type": "photo",
                    "url": "https://shortened.url/YQoxEuEAXV"
                },
                {
                    "display_url": "pic.twitter.com/YQoxEuEAXV",
                    "expanded_url": "http://twitter.com/Zenttsilverwing/status/715953298076012545/photo/1",
                    "id": 715953295727009793,
                    "id_str": "715953295727009793",
                    "indices": [
                        83,
                        106
                    ],
                    "media_url": "http://pbs.twimg.com/media/Ce-TuquUIAEsVn9.jpg",
                    "media_url_https": "https://pbs.twimg.com/media/Ce-TuquUIAEsVn9.jpg",
                    "sizes": {
                        "large": {
                            "h": 768,
                            "resize": "fit",
                            "w": 1024
                        },
                        "medium": {
                            "h": 450,
                            "resize": "fit",
                            "w": 600
                        },
                        "small": {
                            "h": 255,
                            "resize": "fit",
                            "w": 340
                        },
                        "thumb": {
                            "h": 150,
                            "resize": "crop",
                            "w": 150
                        }
                    },
                    "type": "photo",
                    "url": "https://shortened.url/YQoxEuEAXV"
                }
            ]
        },
        "favorite_count": 5,
        "favorited": false,
        "filter_level": "low",
        "geo": null,
        "id": 715953298076012545,
        "id_str": "715953298076012545",
        "in_reply_to_screen_name": null,
        "in_reply_to_status_id": null,
        "in_reply_to_status_id_str": null,
        "in_reply_to_user_id": null,
        "in_reply_to_user_id_str": null,
        "is_quote_status": false,
        "lang": "en",
        "place": null,
        "possibly_sensitive": false,
        "retweet_count": 1,
        "retweeted": false,
        "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
        "text": "coins came in!! Thanks https://shortened.url/SJgFTE0o8h #dnd #Nat20 #CriticalRole #d20babes https://shortened.url/YQoxEuEAXV",
        "truncated": false,
        "user": {
            "contributors_enabled": false,
            "created_at": "Thu Mar 06 19:59:14 +0000 2014",
            "default_profile": true,
            "default_profile_image": false,
            "description": "DM Geek Critter Con-man. I am here to like your art ^.^",
            "favourites_count": 4990,
            "follow_request_sent": null,
            "followers_count": 57,
            "following": null,
            "friends_count": 183,
            "geo_enabled": false,
            "id": 2375847847,
            "id_str": "2375847847",
            "is_translator": false,
            "lang": "en",
            "listed_count": 7,
            "location": "Flower Mound, TX",
            "name": "Zack Chini",
            "notifications": null,
            "profile_background_color": "C0DEED",
            "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
            "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
            "profile_background_tile": false,
            "profile_banner_url": "https://pbs.twimg.com/profile_banners/2375847847/1430928759",
            "profile_image_url": "http://pbs.twimg.com/profile_images/708816622358663168/mNF4Ysr5_normal.jpg",
            "profile_image_url_https": "https://pbs.twimg.com/profile_images/708816622358663168/mNF4Ysr5_normal.jpg",
            "profile_link_color": "0084B4",
            "profile_sidebar_border_color": "C0DEED",
            "profile_sidebar_fill_color": "DDEEF6",
            "profile_text_color": "333333",
            "profile_use_background_image": true,
            "protected": false,
            "screen_name": "Zenttsilverwing",
            "statuses_count": 551,
            "time_zone": null,
            "url": null,
            "utc_offset": null,
            "verified": false
        }
    },
    "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
    "text": "RT @Zenttsilverwing: coins came in!! Thanks https://shortened.url/SJgFTE0o8h #dnd #Nat20 #CriticalRole #d20babes https://shortened.url/YQoxEuEAXV",
    "timestamp_ms": "1459984181156",
    "truncated": false,
    "user": {
        "contributors_enabled": false,
        "created_at": "Tue Feb 10 04:31:18 +0000 2009",
        "default_profile": false,
        "default_profile_image": false,
        "description": "I use Twitter to primarily retweet Critter artwork of Critical Role and their own creations. I maintain a list of all the Critter artists I've come across.",
        "favourites_count": 17586,
        "follow_request_sent": null,
        "followers_count": 318,
        "following": null,
        "friends_count": 651,
        "geo_enabled": true,
        "id": 20491914,
        "id_str": "20491914",
        "is_translator": false,
        "lang": "en",
        "listed_count": 33,
        "location": "SanDiego, CA",
        "name": "UnknownOutrider",
        "notifications": null,
        "profile_background_color": "EDECE9",
        "profile_background_image_url": "http://abs.twimg.com/images/themes/theme3/bg.gif",
        "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme3/bg.gif",
        "profile_background_tile": false,
        "profile_image_url": "http://pbs.twimg.com/profile_images/224346493/cartoon_dragon_tattoo_designs_normal.jpg",
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/224346493/cartoon_dragon_tattoo_designs_normal.jpg",
        "profile_link_color": "088253",
        "profile_sidebar_border_color": "D3D2CF",
        "profile_sidebar_fill_color": "E3E2DE",
        "profile_text_color": "634047",
        "profile_use_background_image": true,
        "protected": false,
        "screen_name": "UnknownOutrider",
        "statuses_count": 12760,
        "time_zone": "Pacific Time (US & Canada)",
        "url": null,
        "utc_offset": -25200,
        "verified": false
    }
}

最佳答案

不起作用的原因是您尝试使用名为 _id 的字段来索引文档,该字段已作为默认字段存在。因此,删除该字段或更改字段名称:

import json
from elasticsearch import Elasticsearch
es = Elasticsearch()
data = json.loads(open("data.json").read())
# data['id_'] = data['_id']    <= You can change _id as id_
del data['_id']
es.index(index='tweets5', doc_type='tweets', id=data['id'], body=data)

关于python - 将 Twitter 导入 Elasticsearch 时出现 Illegal_argument_Exception,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37284945/

相关文章:

elasticsearch - Elasticsearch 2.0如何在Loopback和Non-Loopback接口(interface)上绑定(bind)?

hadoop - 禁用动态映射在ElasticSearch中不起作用

python - 如何在Python中绘制从最大范围到最小范围的图

python - 处理 Python 脚本中的错误

ElasticSearch,与过滤器的多重匹配?

groovy - 基于日期字段的elasticsearch自定义评分

python - 在 BeautifulSoup 中查找具有特定条件的 id

python - 字符串(到二进制)到 python 中的 int?

python:如何计算两个单词列表的余弦相似度?

amazon-web-services - 多节点 Elasticsearch 集群负载均衡器每次返回不同的排名结果