python - 来自字典列表的 Pandas SparseDataFrame

我正在尝试将 Python 字典列表转换为 Pandas DataFrame .
由于每个dict都有不同的key，占用内存太多。由于大多数值都是 NaN，所以 SparseDataFrame在这种情况下应该会有所帮助。

import pandas

df = pandas.DataFrame(keyword_data).to_sparse(fill_value=.0)

这有效，但会占用大量内存，因为同时创建了 DataFrame，有时会引发 MemoryError .

是否可以在没有该步骤的情况下使用此数据创建 SparseDataFrame？在这种情况下， Pandas 文档没有多大帮助......
这样做:

pandas.SparseDataFrame(keyword_data, default_fill_value=.0)

提高:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

数据看起来像:

[{'a': 0.672366,
  'b': 0.667276,
  # ...
 },
 {'c': 0.507752,
  'd': 0.532593,
  'e': 0.507793
  # ...
 },
 # ...
]

键总是字符串，每个字典有不同的键，值是浮点数。

有没有办法创建一个 SparseDataFrame直接从这些数据中获取，无需通过常规 DataFrame ?

最佳答案

从 Pandas v1.0.0 开始，SparseDataFrame和 SparseSeries were removed .
不再需要它们了。报价the documentation :

There’s no performance or memory penalty to using a Series or DataFrame with sparse values, rather than a SparseSeries or SparseDataFrame.

关于python - 来自字典列表的 Pandas SparseDataFrame，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26626964/

上一篇：python - 即使在设置 LANG 和 LC_ALL 后，在 mod_wsgi 中运行时也会出现 UnicodeEncodeError

下一篇：scala - 如何让 sbt-native-packager 避免将我的资源放入 jar 文件？

python - 在mongodb中保存numpy数组

python - 如何根据另一个数据框中的条件在数据框中创建新列？

python - 仅当一列中的值为空白、空或 NaN 时才更新另一列中的值

python - 如何从 pandas 数据框中过滤特定值与正则表达式匹配的行

python - 如何找出图像中像素值发生变化的最高坐标点？

python - 我需要通过 FastAPI 发送文件和数据。我正在尝试发送请求但无法正常工作

python 遍历列表

python-3.x - 当我尝试将数据帧输出到 csv 文件时，为什么我的输出只有一行？Python3/boto3

python - 修改数据框中的值