我已经抓取了一些 data由于网站的结构方式,我将数据放入两个词典中。
>>>pprint(dict(data))
{u'Additional compensation': [u'$32,241'],
u'Agency': [u'Chesterfield County Schools', u'City of Richmond Schools'],
u'Bonuses or other allowances': [u'$12,500'],
u'COMMENTS': [u'$28,088 - Board Paid Annuity; $4,153 - Excess Health Benefit Contribution;',
u''],
u'Full Name': [u'Marcus J. Newsome', u'Dana T. Bedden'],
u'Total Compensation': [u'$282,258', u'']}
>>>pprint(dict(data2))
{u'Base Salary': [u'$229,758', u'$234,068'],
u'COMMENTS': [u'12,500 CAR ALLOWANCE, 40,000 DEFFERRED COMPENSATION'],
u'Deferred compensation': [u'$40,000'],
u'Job Title': [u'SUPERINTENDENT', u'SUPERINTENDENT'],
u'Total Compensation': [u'$266,309'],
u'Work location': [u'Office Of Superintendent']}
我已将数据合并到一个主词典中,并尝试将其放入一个 csv 文件中。
for d in data2, data:
for k, v in d.iteritems():
master_data[k].append(v)
with open('test2.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(zip(*([k] + master_data[k] for k in sorted(master_data))))
问题是只有第一个人(Marcus J. Newsome
)的信息被导出到 csv。我认为这是因为 Marcus 中不存在属于
的数据。 Dana T. Bedden
的键/值(例如 Additional compensation
) J Newsome
为了解决这个问题,我尝试将 None
添加到位置来解决这个问题。
for d in data2, data:
master_data.update((k, [None, master_data[k]]) for k in master_data if k not in d)
>>>pprint(dict(master_data))
{u'Additional compensation': [None, [[u'$32,241']]],
u'Agency': [None,
[[u'Chesterfield County Schools', u'City of Richmond Schools']]],
u'Base Salary': [None, [[u'$229,758', u'$234,068']]],
u'Bonuses or other allowances': [None, [[u'$12,500']]],
u'COMMENTS': [[u'12,500 CAR ALLOWANCE, 40,000 DEFFERRED COMPENSATION'],
[u'$28,088 - Board Paid Annuity; $4,153 - Excess Health Benefit Contribution;',
u'']],
u'Deferred compensation': [None, [[u'$40,000']]],
u'Full Name': [None, [[u'Marcus J. Newsome', u'Dana T. Bedden']]],
u'Job Title': [None, [[u'SUPERINTENDENT', u'SUPERINTENDENT']]],
u'Total Compensation': [[u'$266,309'], [u'$282,258', u'']],
u'Work location': [None, [[u'Office Of Superintendent']]]}
不幸的是,这似乎并没有按照我想要的方式工作。最终我希望我的输出看起来像这样:
期望的输出
{u'Additional compensation': [[None, [u'$32,241']]],
u'Agency': [[u'Chesterfield County Schools'], [u'City of Richmond Schools']]],
u'Base Salary': [[u'$229,758'], [u'$234,068']]],
u'Bonuses or other allowances': [[u'$12,500'], None]],
u'COMMENTS': [[u'12,500 CAR ALLOWANCE, 40,000 DEFFERRED COMPENSATION'],
[u'$28,088 - Board Paid Annuity; $4,153 - Excess Health Benefit Contribution;',
u'']],
u'Deferred compensation': [[u'$40,000'], None]],
u'Full Name': [[u'Marcus J. Newsome'], [u'Dana T. Bedden']]],
u'Job Title': [[u'SUPERINTENDENT'], [u'SUPERINTENDENT']]],
u'Total Compensation': [[u'$266,309'], [u'$282,258', u'']],
u'Work location': [None, [u'Office Of Superintendent']]]}
有人有什么想法吗?
最佳答案
最好改变存储抓取数据的方式。
伪代码:
data = []
for row in table:
person = get_data_from_row(row)
person.update(get_data_from_person_page(row))
data.append(person)
然后你可以使用csv.DictWriter
没有任何复杂的数据操作:
with open('data.csv', 'w') as f:
fieldnames = data[0].keys()
writer = csv.DictWriter(f, fieldnames)
writer.writeheader()
for row in data:
writer.writerow(row)
关于python - 将两个词典合并在一起并将 None 添加到所需位置,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39923922/