python - python从elasticsearch结果创建数据框

标签 python pandas dataframe elasticsearch

我有来自Elasticsearch的查询结果,格式如下:

[

{
    "_index": "product",
    "_type": "_doc",
    "_id": "23234sdf",
    "_score": 2.2295187,
    "_source": {
        "SERP_KEY": "",
        "r_variant_info": "",
        "s_asin": "",
        "pid": "394",
        "r_gtin": "00838128000547",        
        "additional_attributes_remarks": "publisher:0|size:0",            
        "s_gtin": "",            
        "r_category": "",
        "confidence_score": "2.4545",      
        "title_match": "45.45"
    }
},
{
    "_index": "product",
    "_type": "_doc",
    "_id": "23234sdf",
    "_score": 2.2295187,
    "_source": {
        "SERP_KEY": "",
        "r_variant_info": "",
        "s_asin": "",
        "pid": "394",
        "r_gtin": "00838128000547",        
        "additional_attributes_remarks": "publisher:0|size:0",            
        "s_gtin": "",            
        "r_category": "",
        "confidence_score": "2.4545",      
        "title_match": "45.45"
    }
},

]

我正在尝试将_source字段与_id一起加载到数据帧中。

我尝试了这个:
def fetch_records_from_elasticsearch_index(index, filter_json):
    search_param = prepare_es_body(filter_json_dict=filter_json)
    response = settings.ES.search(index=index, body=search_param, size=10)

    if len(response['hits']['hits']) > 0:
        import pandas as pd

        all_hits = response['hits']['hits']
        # return all_hits
        # export es hits to pandas dataframe
        df = pd.concat(map(pd.DataFrame.from_dict, all_hits), axis=1)['_source'].T

        return df
    else:
        return 0
df仅包含_source字段,但我也想向其中添加_id字段。

这是df输出格式:
{

"AdminEdit": [
    "False",
    "False",
    "False",
    "False",        
],
"Group": [
    "Grp2",
    "Grp2",
    "Grp2",
    "Grp2"       
],

}

如何添加_id

最佳答案

有两种方法可以解决此问题:

  • 直接代码
    import pandas as pd
    df = pd.json_normalize(all_hits)
    
  • 代码改进
    import json
    import pandas as pd
    df = pd.concat(map(pd.DataFrame.from_dict, all_hits), axis=1)['_source'].T
    df["_id"] = [i["_id"] for i in all_hits]
    

  • 使用的JSON是:
    all_hits = [
    
    {
        "_index": "product",
        "_type": "_doc",
        "_id": "23234sdg",
        "_score": 2.2295187,
        "_source": {
            "SERP_KEY": "",
            "r_variant_info": "",
            "s_asin": "",
            "pid": "394",
            "r_gtin": "00838128000547",        
            "additional_attributes_remarks": "publisher:0|size:0",            
            "s_gtin": "",            
            "r_category": "",
            "confidence_score": "2.4545",      
            "title_match": "45.45"
        }
    },
    {
        "_index": "product",
        "_type": "_doc",
        "_id": "23234sdf",
        "_score": 2.2295187,
        "_source": {
            "SERP_KEY": "",
            "r_variant_info": "",
            "s_asin": "",
            "pid": "394",
            "r_gtin": "00838128000547",        
            "additional_attributes_remarks": "publisher:0|size:0",            
            "s_gtin": "",            
            "r_category": "",
            "confidence_score": "2.4545",      
            "title_match": "45.45"
        }
    },
    
    ]
    

    关于python - python从elasticsearch结果创建数据框,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/62018576/

    相关文章:

    python - 在 Django 之外使用 Django 数据库层?

    python - pandas 系列及其所有元素的排列 (itertools)

    python - 如何拆分列中的多个值并按pandas中的所述值进行分组?

    python - 将 Pandas 数据框中的数据分箱成间隔

    python - 基于多列分箱(分类值)的最佳方式

    python - 如何在没有输入的情况下使用破折号回调

    python - 模块未找到错误: No module named 'sounds'

    python - 有人能用 Bullet 来 Colab 工作吗?

    python - 使用棋盘图案对 Pandas DataFrame 进行切片

    python - Pandas:在一行中删除多索引中的一个级别