Python - Pandas - 如何在数据帧合并后从 to_json 中删除空值

标签 python json csv null output

我正在构建一个过程来“外部连接”两个 csv 文件并将结果导出为 json 对象。

# read the source csv files
firstcsv = pandas.read_csv('file1.csv',  names = ['main_index','attr_one','attr_two'])
secondcsv = pandas.read_csv('file2.csv',  names = ['main_index','attr_three','attr_four'])

# merge them
output = firstcsv.merge(secondcsv, on='main_index', how='outer')

jsonresult = output.to_json(orient='records')
print(jsonresult)

现在,两个 csv 文件是这样的:

file1.csv:
1, aurelion, sol
2, lee, sin
3, cute, teemo

file2.csv:
1, midlane, mage
2, jungler, melee

我希望输出结果 json:

[{"main_index":1,"attr_one":"aurelion","attr_two":"sol","attr_three":"midlane","attr_four":"mage"},
{"main_index":2,"attr_one":"lee","attr_two":"sin","attr_three":"jungler","attr_four":"melee"},
{"main_index":3,"attr_one":"cute","attr_two":"teemo"}]

相反,我使用 main_index = 3 上线

{"main_index":3,"attr_one":"cute","attr_two":"teemo","attr_three":null,"attr_four":null}]

因此空值会自动添加到输出中。 我想删除它们 - 我环顾四周,但找不到合适的方法。

希望有人能帮助我!

最佳答案

由于我们使用的是 DataFrame,pandas 会用 NaN 来“填充”值,即

>>> print(output)
      main_index   attr_one attr_two attr_three attr_four
0           1   aurelion      sol    midlane      mage
1           2        lee      sin    jungler     melee
2           3       cute    teemo        NaN       NaN

我在 pandas.to_json 文档中看不到任何跳过空值的选项:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html

所以我想出的方法是重新构建 JSON 字符串。对于数百万行的大型数据集来说,这可能性能不太好(但联盟中的冠军不到 200 个,所以不应该是一个大问题!)

from collections import OrderedDict
import json

jsonresult = output.to_json(orient='records')
# read the json string to get a list of dictionaries
rows = json.loads(jsonresult)

# new_rows = [
#     # rebuild the dictionary for each row, only including non-null values
#     {key: val for key, val in row.items() if pandas.notnull(val)}
#     for row in rows
# ]

# to maintain order use Ordered Dict
new_rows = [
    OrderedDict([
        (key, row[key]) for key in output.columns
        if (key in row) and pandas.notnull(row[key])
    ])
   for row in rows
]

new_json_output = json.dumps(new_rows)

你会发现new_json_output已经删除了所有具有NaN值的键,并保持了顺序:

>>> print(new_json_output)
[{"main_index": 1, "attr_one": " aurelion", "attr_two": " sol", "attr_three": " midlane", "attr_four": " mage"},
 {"main_index": 2, "attr_one": " lee", "attr_two": " sin", "attr_three": " jungler", "attr_four": " melee"},
 {"main_index": 3, "attr_one": " cute", "attr_two": " teemo"}]

关于Python - Pandas - 如何在数据帧合并后从 to_json 中删除空值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/46187654/

相关文章:

python - 在 Python 中使用递归

javascript - JSON 到 XML 使用 Javascript

python - 在Python中使用csv.writerow()写入csv时的额外字符(quotechar)

c# - 有什么方法可以读取/写入对象到.csv

python - 为什么 +(加号)可以在 Python 中连接两个字符串?

python - Python中的多线程诅咒输出

python - 在Python中按特定值对JSON进行排序

json - 防止 Marshal 在结构的字符串字段上转义引号

javascript - 在给定 JavaScript 语句的情况下使用 R 下载文件

带有方括号和文本的 python re.search 模式