python - 如何在已展平的数据框中展平一列嵌套 json 对象

标签 python json pandas dictionary json-normalize

我有一个带有嵌套对象的 json 文件,该文件在 pandas 数据框中被展平。有一个包含嵌套 json 对象的列,我发现很难压平。

我尝试过很多方法,这是让我走得最远的方法。

非常感谢您的帮助,谢谢。

不幸的是,我无法找到类似 jsfiddle 的 python 替代品来提供工作示例。

我知道使用 json_normalize 的元参数我可以将列添加到我的数据帧中。但这种方法不适用于不平坦的列,因为我只能通过将 record_path 设置为“markets”(我的文件中的主要 json 对象)来使 json_normalize 在我的设置中正常工作。因此,在此设置中,我无法将 record_path 记录到“marketStats”并通过元参数添加任何相关列。

目标

目标是将 marketStats 对象中的一个或所有 json 对象转换为数据帧的列。

代码

with open('Data/20012022.json') as file:
data = json.loads(file.read())

# Flatten data
df0 = pd.json_normalize(
      data, 
      record_path =['markets']
)

df0.head(3)

屏幕截图

这是表格当前外观的屏幕截图,marketStats 列包含嵌套的 json。

enter image description here

数据

这是 json 文件中的一个片段。 `

{
  "markets": [
    {
      "id": 335,
      "baseCurrency": "eth",
      "quoteCurrency": "btc",
      "exchangeName": "Binance",
      "exchangeCode": "BINA",
      "longName": "BTC-ETH",
      "marketName": "btc-eth",
      "symbol": "ETHBTC",
      "volume": "40624.5823",
      "quoteVolume": "3026.13646935",
      "btcVolume": "3026.13646935",
      "usdVolume": "127009429.050524367",
      "currentPrice": 0.074681,
      "latestBase": {
        "id": 161774475,
        "time": 1639576800,
        "date": "2021-12-15T14:00:00.000+00:00",
        "price": "0.077653",
        "lowestPrice": "0.0729",
        "bounce": "6.283",
        "currentDrop": "-3.8272829124438206",
        "crackedAt": "2022-01-07T03:00:00.000Z",
        "respectedAt": "2022-01-15T15:00:00.000Z",
        "isLowest": false
      },
      "marketStats": [
        {
          "algorithm": "original",
          "ratio": "50.0",
          "medianDrop": "-4.08",
          "medianBounce": "5.51",
          "hoursToRespected": 106,
          "crackedCount": 2,
          "respectedCount": 1
        },
        {
          "algorithm": "day_trade",
          "ratio": "100.0",
          "medianDrop": "-6.12",
          "medianBounce": "6.28",
          "hoursToRespected": 204,
          "crackedCount": 1,
          "respectedCount": 1
        },
        {
          "algorithm": "conservative",
          "ratio": "100.0",
          "medianDrop": "-6.12",
          "medianBounce": "8.38",
          "hoursToRespected": 204,
          "crackedCount": 1,
          "respectedCount": 1
        },
        {
          "algorithm": "position",
          "ratio": "50.0",
          "medianDrop": "-6.12",
          "medianBounce": "6.19",
          "hoursToRespected": 204,
          "crackedCount": 2,
          "respectedCount": 1
        },
        {
          "algorithm": "hodloo",
          "ratio": "50.0",
          "medianDrop": "-3.29",
          "medianBounce": "0.0",
          "hoursToRespected": 225,
          "crackedCount": 4,
          "respectedCount": 2
        }
      ]
    },
    {
      "id": 337,
      "baseCurrency": "ltc",
      "quoteCurrency": "btc",
      "exchangeName": "Binance",
      "exchangeCode": "BINA",
      "longName": "BTC-LTC",
      "marketName": "btc-ltc",
      "symbol": "LTCBTC",
      "volume": "68309.637",
      "quoteVolume": "223.79294524",
      "btcVolume": "223.79294524",
      "usdVolume": "9392773.4219378968",
      "currentPrice": 0.003275,
      "latestBase": {
        "id": 163982984,
        "time": 1642374000,
        "date": "2022-01-16T23:00:00.000+00:00",
        "price": "0.003346",
        "lowestPrice": "0.00322",
        "bounce": "3.839",
        "currentDrop": "-2.1219366407650926",
        "crackedAt": "2022-01-18T23:00:00.000Z",
        "respectedAt": null,
        "isLowest": false
      },
      "marketStats": [
        {
          "algorithm": "original",
          "ratio": "57.14",
          "medianDrop": "-3.28",
          "medianBounce": "3.84",
          "hoursToRespected": 186,
          "crackedCount": 7,
          "respectedCount": 4
        },
        {
          "algorithm": "day_trade",
          "ratio": "0.0",
          "medianDrop": "0.0",
          "medianBounce": "5.68",
          "hoursToRespected": 0,
          "crackedCount": 1,
          "respectedCount": 0
        },
        {
          "algorithm": "conservative",
          "ratio": "0.0",
          "medianDrop": "0.0",
          "medianBounce": "5.68",
          "hoursToRespected": 0,
          "crackedCount": 1,
          "respectedCount": 0
        },
        {
          "algorithm": "position",
          "ratio": "0.0",
          "medianDrop": "0.0",
          "medianBounce": "8.16",
          "hoursToRespected": 0,
          "crackedCount": 1,
          "respectedCount": 0
        },
        {
          "algorithm": "hodloo",
          "ratio": "75.0",
          "medianDrop": "-3.7",
          "medianBounce": "0.0",
          "hoursToRespected": 35,
          "crackedCount": 4,
          "respectedCount": 3
        }
      ]
    },
    {
      "id": 339,
      "baseCurrency": "bnb",
      "quoteCurrency": "btc",
      "exchangeName": "Binance",
      "exchangeCode": "BINA",
      "longName": "BTC-BNB",
      "marketName": "btc-bnb",
      "symbol": "BNBBTC",
      "volume": "154576.177",
      "quoteVolume": "1724.66664804",
      "btcVolume": "1724.66664804",
      "usdVolume": "72385673.4448901928",
      "currentPrice": 0.01099,
      "latestBase": {
        "id": 163753765,
        "time": 1642068000,
        "date": "2022-01-13T10:00:00.000+00:00",
        "price": "0.01093",
        "lowestPrice": "0.01093",
        "bounce": "3.102",
        "currentDrop": "0.5489478499542543",
        "crackedAt": null,
        "respectedAt": null,
        "isLowest": false
      },
      "marketStats": [
        {
          "algorithm": "original",
          "ratio": "100.0",
          "medianDrop": "-7.18",
          "medianBounce": "4.34",
          "hoursToRespected": 62,
          "crackedCount": 2,
          "respectedCount": 2
        },
        {
          "algorithm": "day_trade",
          "ratio": "100.0",
          "medianDrop": "-6.19",
          "medianBounce": "4.3",
          "hoursToRespected": 63,
          "crackedCount": 1,
          "respectedCount": 1
        },
        {
          "algorithm": "conservative",
          "ratio": "66.67",
          "medianDrop": "-3.15",
          "medianBounce": "4.05",
          "hoursToRespected": 62,
          "crackedCount": 3,
          "respectedCount": 2
        },
        {
          "algorithm": "position",
          "ratio": "100.0",
          "medianDrop": "-3.15",
          "medianBounce": "4.46",
          "hoursToRespected": 60,
          "crackedCount": 2,
          "respectedCount": 2
        },
        {
          "algorithm": "hodloo",
          "ratio": "100.0",
          "medianDrop": "-7.46",
          "medianBounce": "0.0",
          "hoursToRespected": 62,
          "crackedCount": 5,
          "respectedCount": 5
        }
      ]
    }
  ]
}

最佳答案

您可以对 df0 应用一些后处理来实现您想要的效果。在这里,您可以应用 explode,然后将 apply(pf.Series) 应用于 'marketStats' 列:

df1 = df0.explode('marketStats')['marketStats'].apply(pd.Series)

df1 看起来像这样:

    algorithm       ratio    medianDrop    medianBounce    hoursToRespected    crackedCount    respectedCount
--  ------------  -------  ------------  --------------  ------------------  --------------  ----------------
 0  original        50            -4.08            5.51                 106               2                 1
 0  day_trade      100            -6.12            6.28                 204               1                 1
 0  conservative   100            -6.12            8.38                 204               1                 1
 0  position        50            -6.12            6.19                 204               2                 1
 0  hodloo          50            -3.29            0                    225               4                 2
 1  original        57.14         -3.28            3.84                 186               7                 4
 1  day_trade        0             0               5.68                   0               1                 0
 1  conservative     0             0               5.68                   0               1                 0
 1  position         0             0               8.16                   0               1                 0
 1  hodloo          75            -3.7             0                     35               4                 3
 2  original       100            -7.18            4.34                  62               2                 2
 2  day_trade      100            -6.19            4.3                   63               1                 1
 2  conservative    66.67         -3.15            4.05                  62               3                 2
 2  position       100            -3.15            4.46                  60               2                 2
 2  hodloo         100            -7.46            0                     62               5                 5

如果您希望将其与所有其他列合并,您可以使用join:

df0.join(df1)

我不会发布此命令的输出,因为它相当大

关于python - 如何在已展平的数据框中展平一列嵌套 json 对象,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/70826674/

相关文章:

python - SIGINT 在 python 中不可捕获

python - django-cms apphook url 不适用于使用 Python shell 的 reverse()

python - 使用 matplotlib 填充平均值和曲线波动?

python - 在操作代码中间定义函数在 Python 中是不好的做法吗?

java - 教程第 6 部分中出现错误 : JSON Parsing and Android ListView Design

python - 如何提取新的子字符串作为列列

python - 从 Pandas 数据框中删除行和列

c# - 使用 SQL 中的数据在 C# 中创建 Json 数组

json - .NET NewtonSoft JSON 反序列化映射到不同的属性名称

python - 根据组聚合值在 MultiIndex Dataframe 上设置值