假设我有一个包含以下列的字典
dict_ = [
{'key1': 'value1',
'key2': 'value2',
'key3': 'value3',
'key4': 'value4',
'key5': 'value5',
'nested_dicts' : [
{'nested_key1' : 'nested_val_1',
'nested_key2' : 'nested_val_2',
'nested_key3' : 'nested_val_3',
'nested_key4' : 'nested_val_4'
},
{'nested_key1' : 'nested_val_5',
'nested_key2' : 'nested_val_6',
'nested_key3' : 'nested_val_7',
'nested_key4' : 'nested_val_8'
}
]},
{
'key1': 'value6',
'key2': 'value7',
'key3': 'value8',
'key4': 'value9',
'key5': 'value10',
'nested_dicts' : [
{'nested_key1' : 'nested_val_9',
'nested_key2' : 'nested_val_10',
'nested_key3' : 'nested_val_11',
'nested_key4' : 'nested_val_12'
},
{'nested_key1' : 'nested_val_9',
'nested_key2' : 'nested_val_10',
'nested_key3' : 'nested_val_11',
'nested_key4' : 'nested_val_12'
}
]}
]
我需要按值分组,这样被分组的列只出现一次,所有其他列按原样显示。 预期输出是这样的:
key1 key2 key3 key4 key5 nested_key1 nested_key2 nested_key3 nested_key4
value1 value2 value3 value4 value5 nested_val_1 nested_val_2 nested_val_3 nested_val_4
nested_val_5 nested_val_6 nested_val_7 nested_val_8
value6 value7 value8 value9 value10 nested_val_9 nested_val_10 nested_val_11 nested_val_12
nested_val_13 nested_val_14 nested_val_15 nested_val_16
任何使用 groupby
或 multiIndex
或其他 pandas 函数的解决方案都将被接受。
最佳答案
这是一个可能的解决方案:
from pandas import json_normalize
df = json_normalize(
dict_,
record_path=['nested_dicts'],
meta = ['key1', 'key2', 'key3', 'key4', 'key5']
)
df = df.set_index(['key1', 'key2', 'key3', 'key4', 'key5'])
这里我使用了一个json_normalize
函数来扁平化字典,然后我将索引设置为key1
-key5
。
关于 python : group by columns with columns values that are grouped by occurs only once and retain all other columns,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/72505245/