Python将url解析为dict : duplicate keys

我有一个由 ids 和 url 组成的原始文本数据文件。然后我想将 url 解析为 Python 字典，然后转换为 pandas 数据帧，以便我可以分析某些 url 元素。

问题是某些元素是重复的。例如，URL 可能为 /browse/?item_type=15&color=336&color=45。注意 color= 显示两次。然后，如果我使用 urllib.parse.parse_qs 解析 url，生成的字典将包含键值对 'color' - ['336','45']，该值是一个列表。因此，当我尝试将解析的行连接到 url 元素的现有数据框中时，会引发错误:

ValueError: arrays must all be same length

new_df = DataFrame.from_dict(urllib.parse.parse_qs(df1['url'][1]), orient='columns', dtype=None)
new_df['id'] = df1['id'][1]
for i in range(2,35):
    add_df = DataFrame.from_dict(urllib.parse.parse_qs(df1['url'][i]), orient='columns', dtype=None)
    add_df['id'] = df1['id'][i]
    new_df = pd.concat([new_df, add_df])

我的问题是:如何绕过这个问题？此时，如果有两种颜色，我愿意只接受一种颜色到我的数据框中 - 网址包含两种颜色的情况很少。

最佳答案

{k: [v[0]] for k, v in parse_qs('item_type=15&color=336&color=45').items()}

这将消除任何重复项

关于Python将url解析为dict : duplicate keys，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28708258/

上一篇：python - BeautifulSoup findall 返回空列表

下一篇：python - FieldStorage 输入删除了一些字符

相关文章：

python - 在 pandas 中设置索引

python - Jupyter笔记本输出指针

python - 如何在 alwaysdata.net 中修改 WSGI 应用程序的 pythonpath

python - 退出 control-D 上的子进程？

python - 列表分配中的字典导致奇怪的输出

python - 仅从 tsv 中的列索引生成 "special"字典结构

python - 和python中的操作重载

c# - 如何使用 LinQ 按值分组获取字典中的键列表

excel - Pandas to_excel( ) 输出 float 不正确

Python pandas read_table 将零转换为 NaN