python csv复制列

标签 python csv

我有一个包含以下内容的文件

first_name,last_name,uid,email,dep_code,dep_name
john,smith,jsmith,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c7adb4aaaeb3af87a0aaa6aeabe9a4a8aa" rel="noreferrer noopener nofollow">[email protected]</a>,finance,21230
john,king,jking,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bbd1d1d2d5dcfbdcd6dad2d795d8d4d6" rel="noreferrer noopener nofollow">[email protected]</a>,human resource,31230

我想复制列“email”并创建一个新列“email2”,然后将 gmail.com 从列 email2 替换为 hotmail.com

我是Python新手,所以需要专家的帮助,我尝试了一些脚本,但如果有更好的方法,请告诉我。原始文件包含 60000 行。

with open('c:\\Python27\\scripts\\colnewfile.csv', 'rb') as fp_in1, open('c:\\Python27\\scripts\\final.csv', 'wb') as fp_out1:
    writer1 = csv.writer(fp_out1, delimiter=",")
    reader1 = csv.reader(fp_in1, delimiter=",")
    domain = "@hotmail.com"
    for row in reader1:
        if row[2:3] == "uid":
            writer1.append("Email2")
        else:
            writer1.writerow(row+[row[2:3]])

这是最终的脚本,唯一的问题是它没有完成整个输出文件,它只显示 61409 行,而输入文件中有 61438 行。

inFile = 'c:\Python27\scripts\in-093013.csv' outFile = 'c:\Python27\scripts\final.csv'

将 open(inFile, 'rb') 作为 fp_in1, open(outFile, 'wb') 作为 fp_out1: writer = csv.writer(fp_out1, 分隔符=“,”) 读者= csv.reader(fp_in1,分隔符=“,”) 对于阅读器中的 col: 德尔科尔[6:] writer.writerow(col) 标题=下一个(读者) writer.writerow(标题 + ['email2']) 对于读卡器中的行: 如果长度(行)> 3: 电子邮件 = email.split('@', 1)[0] + '@hotmail.com' writer.writerow(行 + [电子邮件])

最佳答案

如果你在阅读器上调用next(),你一次会得到一行;用它来复制标题。复制电子邮件列非常简单:

import csv

infilename = r'c:\Python27\scripts\colnewfile.csv'
outfilename = r'c:\Python27\scripts\final.csv'

with open(infilename, 'rb') as fp_in, open(outfilename, 'wb') as fp_out:
    reader = csv.reader(fp_in, delimiter=",")
    headers = next(reader)  # read first row

    writer = csv.writer(fp_out, delimiter=",")
    writer.writerow(headers + ['email2'])

    for row in reader:
        if len(row) > 3:
            # make sure there are at least 4 columns
            email = row[3].split('@', 1)[0] + '@hotmail.com'
        writer.writerow(row + [email])

此代码在第一个 @ 符号处拆分电子邮件地址,获取拆分的第一部分并在其后添加 @hotmail.com:

>>> '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="96f3eef7fbe6faf3d6f1fbf7fffab8f5f9fb" rel="noreferrer noopener nofollow">[email protected]</a>'.split('@', 1)[0]
'example'
>>> '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="46233e272b362a2306212b272f2a6825292b" rel="noreferrer noopener nofollow">[email protected]</a>'.split('@', 1)[0] + '@hotmail.com'
'<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="6f0a170e021f030a2f07001b020e0603410c0002" rel="noreferrer noopener nofollow">[email protected]</a>'

上面的结果是:

first_name,last_name,uid,email,dep_code,dep_name,email2
john,smith,jsmith,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ed879e80849985ad8a808c8481c38e8280" rel="noreferrer noopener nofollow">[email protected]</a>,finance,21230,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="167c657b7f627e567e79627b777f7a3875797b" rel="noreferrer noopener nofollow">[email protected]</a>
john,king,jking,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="8de7e7e4e3eacdeae0ece4e1a3eee2e0" rel="noreferrer noopener nofollow">[email protected]</a>,human resource,31230,<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="bdd7d7d4d3dafdd5d2c9d0dcd4d193ded2d0" rel="noreferrer noopener nofollow">[email protected]</a>

用于您的示例输入。

关于python csv复制列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19324968/

相关文章:

Python:在包含子字符串的字典中查找(字符串)键

python - 跟踪对对象实例的引用

python - 迭代字典 PYTHON 中列表中的字符串

python - 在 python 中追加表(不同的行号)以实现可视化目的

javascript - Sunburst D3 分区和外部 .csv

python - 执行前模块命名空间初始化

python - Pandas 使用 while 循环遍历数据框和列表

python - 结果的矩阵形式表示

java - 每秒对 JSON 对象进行 HTTP 发布

java - CSV 文件在压缩后被更改为字节数组