我正在尝试读取 xlsx
文件,将列中的所有引用号与文件夹内的文件进行比较,如果它们对应,则将它们重命名为与引用号关联的电子邮件。
Excel 文件包含以下字段:
Reference EmailAddress
1123 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2644494408554b4f524e665f474e49490845494b" rel="noreferrer noopener nofollow">[email protected]</a>
1233 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d8b2b7b0b6f6bcaab9b3b798bfb5b9b1b4f6bbb7b5" rel="noreferrer noopener nofollow">[email protected]</a>
1334 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0e7d6f637b6b6220636f607b6b624e776f666161206d6163" rel="noreferrer noopener nofollow">[email protected]</a>
... .....
我的文件夹applicants
仅包含名为Reference列的doc文件:
如何将 applicantsCVs
文件夹的内容与 Excel 文件中的Reference 字段进行比较,如果匹配,则将所有文件重命名为相应的电子邮件地址?
这是我迄今为止尝试过的:
import os
import pandas as pd
dfOne = pd.read_excel('Book2.xlsx', na_values=['NA'], usecols = "A:D")
references = dfOne['Reference']
emailAddress = dfOne['EmailAddress']
cleanedEmailList = [x for x in emailAddress if str(x) != 'nan']
print(cleanedEmailList)
excelArray = []
filesArray = []
for root, dirs, files in os.walk("applicantCVs"):
for filename in files:
print(filename) #Original file name with type 1233.doc
reworkedFile = os.path.splitext(filename)[0]
filesArray.append(reworkedFile)
for entry in references:
excelArray.append(str(entry))
for i in excelArray:
if i in filesArray:
print(i, "corresponds to the file names")
我将引用名称与文件夹内容进行比较,如果相同则将其打印出来:
for i in excelArray:
if i in filesArray:
print(i, "corresponds to the file names")
我尝试使用 os.rename(filename, cleanEmailList )
重命名它,但它不起作用,因为 cleanedEmailList
是一个电子邮件数组。
如何匹配和重命名文件?
更新:
from os.path import dirname
import pandas as pd
from pathlib import Path
import os
dfOne = pd.read_excel('Book2.xlsx', na_values=['NA'], usecols = "A:D")
emailAddress = dfOne['EmailAddress']
reference = dfOne['Reference'] = dfOne.references.astype(str)
references = dict(dfOne.dropna(subset=[reference, "EmailAddress"]).set_index(reference)["EmailAddress"])
print(references)
files = Path("applicantCVs").glob("*")
for file in files:
new_name = references.get(file.stem, file.stem)
file.rename(file.with_name(f"{new_name}{file.suffix}"))
最佳答案
基于样本数据:
Reference EmailAddress
1123 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f7959895d9849a9e839fb78e969f9898d994989a" rel="noreferrer noopener nofollow">[email protected]</a>
1233 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1c7673747232786e7d77735c7b717d7570327f7371" rel="noreferrer noopener nofollow">[email protected]</a>
nan jane.smith#example.com
1334 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="582b39352d3d34763539362d3d34182139303737763b3735" rel="noreferrer noopener nofollow">[email protected]</a>
首先,您组装一个 dict
,其中引用集作为键,新名称作为值:
references = dict(df.dropna(subset=["Reference","EmailAddress"]).set_index("Reference")["EmailAddress"])
{'1123': '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="32505d501c415f5b465a724b535a5d5d1c515d5f" rel="noreferrer noopener nofollow">[email protected]</a>', '1233': '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a5cfcacdcb8bc1d7c4cecae5c2c8c4ccc98bc6cac8" rel="noreferrer noopener nofollow">[email protected]</a>', '1334': '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1764767a62727b397a767962727b576e767f78783974787a" rel="noreferrer noopener nofollow">[email protected]</a>'}
请注意,这里的引用是str
。如果它们不在您的原始数据库中,您可以使用 astype(str)
然后使用pathlib.Path
查找数据目录中的所有文件:
files = Path("../data/renames").glob("*")
[WindowsPath('../data/renames/1123.docx'), WindowsPath('../data/renames/1156.pptx'), WindowsPath('../data/renames/1233.txt')]
重命名可以变得非常简单:
for file in files:
new_name = references.get(file.stem, file.stem )
file.rename(file.with_name(f"{new_name}{file.suffix}"))
references.get
询问新的文件名,如果没有找到,则使用原始的主干。
[WindowsPath('../data/renames/1156.pptx'), WindowsPath('../data/renames/<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b2d0ddd09cc1dfdbc6daf2cbd3dadddd9cd1dddf9cd6ddd1ca" rel="noreferrer noopener nofollow">[email protected]</a>'), WindowsPath('../data/renames/<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6c0603040242081e0d07032c0b010d0500420f030142181418" rel="noreferrer noopener nofollow">[email protected]</a>')]
关于python - 使用 Python 和 Pandas 根据 Dataframe 内容重命名文件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55922680/