Python pandas map CSV 文件

标签 python pandas csv dictionary

我想“合并”两个 CSV 文件。我想映射文件 1 中的电子邮件并从文件 2 中获取其各自的 userId,然后我想将其分配给文件 1 的相应电子邮件

示例:

文件1

name, userId, email
john, null, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f19b9e999fb190df929e9c" rel="noreferrer noopener nofollow">[email protected]</a>
alex, null, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="355459504d75541b565a58" rel="noreferrer noopener nofollow">[email protected]</a>
micheal, null, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="5b3632303e1b3a75383436" rel="noreferrer noopener nofollow">[email protected]</a>
alex, null, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0a6b666f724a6b24696567" rel="noreferrer noopener nofollow">[email protected]</a>
john, null, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8ce6e3e4e2cceda2efe3e1" rel="noreferrer noopener nofollow">[email protected]</a>

文件2

name, userId, email
alex, 5, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="01606d647941602f626e6c" rel="noreferrer noopener nofollow">[email protected]</a>
micheal, 10, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="660b0f0d0326074805090b" rel="noreferrer noopener nofollow">[email protected]</a>
john, 12, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e983868187a988c78a8684" rel="noreferrer noopener nofollow">[email protected]</a>

输出文件

name, userId, email
john, 12, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="573d383f3917367934383a" rel="noreferrer noopener nofollow">[email protected]</a>
alex, 5, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="452429203d05246b262a28" rel="noreferrer noopener nofollow">[email protected]</a>
micheal, 10, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="076a6e6c6247662964686a" rel="noreferrer noopener nofollow">[email protected]</a>
alex, 5, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3958555c417958175a5654" rel="noreferrer noopener nofollow">[email protected]</a>
john, 12, <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="94fefbfcfad4f5baf7fbf9" rel="noreferrer noopener nofollow">[email protected]</a>

这是我的代码,但这不会分配相应电子邮件的 userId,因为电子邮件未排序

import pandas as pd

df1 = pd.read_csv("file1.csv", sep=",")
df2 = pd.read_csv("file2.csv", sep=",", index_col=0)

df1["userId"] = df2["userId"].values

df1.to_csv("output.csv", sep=";")

有人可以帮助我吗?

最佳答案

Dataframe.merge

df1 = pd.read_csv("file1.csv", sep=",")
df1.columns = ['name', 'userid', 'email']
df2 = pd.read_csv("file2.csv", sep=",", index_col=0)
df1 = df1.drop(['userId'], axis=1)

result = pd.merge(df1, df2, on=['name','email'], how='right')

result.to_csv("output.csv", sep=";")

我如何测试:

import pandas as pd

df1 = pd.DataFrame({'name': ['john', 'alex', 'michael', 'alex', 'john'],
                    'userId': ['null', 'null', 'null', 'null', 'null'],
                    'email': ['<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a9c3c6c1c7e9c887cac6c4" rel="noreferrer noopener nofollow">[email protected]</a>', '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="aacbc6cfd2eacb84c9c5c7" rel="noreferrer noopener nofollow">[email protected]</a>', '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="711c181a1431105f121e1c" rel="noreferrer noopener nofollow">[email protected]</a>', '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="eb8a878e93ab8ac5888486" rel="noreferrer noopener nofollow">[email protected]</a>', '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e882878086a889c68b8785" rel="noreferrer noopener nofollow">[email protected]</a>']
                    }, columns=['name','userId','email'])

df2 = pd.DataFrame({'name': ['alex', 'michael', 'john'],
                    'userId': ['5', '10', '12'],
                    'email': ['<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1879747d605879367b7775" rel="noreferrer noopener nofollow">[email protected]</a>', '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="caa7a3a1af8aabe4a9a5a7" rel="noreferrer noopener nofollow">[email protected]</a>', '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="375d585f5977561954585a" rel="noreferrer noopener nofollow">[email protected]</a>']
                    })

df1 = df1.drop(['userId'], axis=1)

result = pd.merge(df1, df2, on=['name','email'], how='right')

print(df1)
print(df2)

print(result)

关于Python pandas map CSV 文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47819255/

相关文章:

python - 在 QTableWidget 中对数字进行排序并不正确 Pyqt5

python - 如何设置计数图顺序

python - pandas:检查空值:按行应用此函数我做错了什么?

python - 如何创建在python中订购的产品的共现矩阵?

c# - C# 中是否有任何 CSV 读取器/写入器库?

python - 弹出窗口中的 Kivy 按钮不调用函数

Python不和谐机器人disconnect()函数不起作用

python - 分配给 pandas DataFrame 的 *new* 子集

python - 迭代 xlsx 文件并删除 unicode python openpyxl

python - 使用 django-adaptors 导入 csv 文件