python - 模拟 curl --data-binary 的 URL 请求

我想发送一个 URL 请求，相当于在发布数据中使用 json 对象，以换行符分隔。这是为了为 Elasticsearch 批量索引两个项目。

这很好用:

curl -XPOST 'localhost:9200/myindex/mydoc?pretty=true' --data-binary @myfile.json

其中 myfile.json:

{"index": {"_parent": "btaCovzjQhqrP4s3iPjZKQ"}}    
{"title": "hello"}
{"index": {"_parent": "btaCovzjQhqrP4s3iPjZKQ"}}
{"title": "world"}

当我尝试使用时:

req = urllib2.Request(url,data=
json.dumps({"index": {"_parent": "btaCovzjQhqrP4s3iPjZKQ"}}) + "\n" +
json.dumps({"title":"hello"}) + "\n" + 
json.dumps({"index": {"_parent": "btaCovzjQhqrP4s3iPjZKQ"}}) + "\n" +
json.dumps({"title":"world"})

我得到:

HTTP Error 500: Internal Server Error

最佳答案

“HTTP 错误 500”可能是因为忘记包含索引名称或索引类型。

此外:对于批量插入，elasticsearch 需要在最后一条记录后尾随“\n”字符，否则它不会插入该记录。

尝试:

import urllib2
import json

url = 'http://localhost:9200/myindex/mydoc/_bulk?pretty=true'

data = json.dumps({"index": {"_parent": "btaCovzjQhqrP4s3iPjZKQ"}}) + "\n" + json.dumps({"title":"hello"}) + "\n" + json.dumps({"index": {"_parent": "btaCovzjQhqrP4s3iPjZKQ"}}) + "\n" + json.dumps({"title":"world"})

req = urllib2.Request(url,data=data+"\n")

f = urllib2.urlopen(req)
print f.read()

或者，通过一些重构:

import urllib2
import json

url = 'http://localhost:9200/myindex/mydoc/_bulk?pretty=true'

data = [
    {"index": {"_parent": "btaCovzjQhqrP4s3iPjZKQ"}},
    {"title":"hello"},
    {"index": {"_parent": "btaCovzjQhqrP4s3iPjZKQ"}},
    {"title":"world"}
]

encoded_data = "\n".join(map(json.dumps,data)) + "\n"

req = urllib2.Request(url,data=encoded_data)

f = urllib2.urlopen(req)
print f.read()

关于python - 模拟 curl --data-binary 的 URL 请求，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15551968/

上一篇：python - 为什么我不能指定文件的打开方式？

下一篇：python - 为什么 scipy.savetxt ('filename' , (x,y)) 按行而不是按列保存数组？

python - 使用自定义变压器时如何正确pickle sklearn管道

ruby - 如何使用 Logstash 配置文件将 Logstash 中的字段设置为 "not_analyzed"

python - Python和ElasticSearch:使用索引将CSV转换为JSON

elasticsearch - 根据特定领域中的特定值进行干草堆提升

python - 使用正则表达式 python 捕获字符串中的整数列表

python - 避免在 swarmplot 覆盖的 seaborn boxplot 中重复图例

python - 如何在不创建模型的情况下在django中保存文件

lucene - Elasticsearch 插件

elasticsearch - 在 elasticsearch 中查找不同的值，而不是不同的计数