python - 如何有条件地分隔单元格值并使用 Pandas 添加到列

标签 python pandas dataframe

例如

测试.csv:

First Name    Last Name  Profile URL
Ashleigh      Phelps     https://www.linkedin.com/in/ashleighephelps
Jonathan                 https://www.linkedin.com/in/jonathantsegal
Camilla Innes            https://www.linkedin.com/in/camilla-innes-61213628  
Rachel                   https://www.linkedin.com/in/rachel-hudesman-335b8120
Michael                  https://www.linkedin.com/in/mikeitalia
Antonio                  https://www.linkedin.com/in/antoniomolinelli
Lauren        Zsigray    https://www.linkedin.com/in/lauren-zsigray-13b5aa25

我使用的代码只会分隔带连字符的代码,但如何获取名字和姓氏?

df = pd.read_csv("testing.csv", sep=',', encoding="utf-8")
df = df[df['Last Name'].isnull()]
p = df.pop('Profile URL')
tmp_df = p.str.split('/')
df['Last Name'] = tmp_df.str[-1]
tmp1_df = df.pop('Last Name').str.split('-')
df['Last Name'] = tmp1_df.str[1:-1].str.join(sep='-')
df = pd.concat([df, p], axis=1)
print (df)

这给出了这个输出:

First Name  Last Name       Profile URL
Ashleigh    Phelps          https://www.linkedin.com/in/ashleighephelps
Jonathan                    https://www.linkedin.com/in/jonathantsegal
Camilla     Innes           https://www.linkedin.com/in/camilla-innes-61213628
Rachel      hudesman        https://www.linkedin.com/in/rachel-hudesman-335b8120
Michael                     https://www.linkedin.com/in/mikeitalia
Antonio                     https://www.linkedin.com/in/antoniomolinelli
Lauren      Zsigray         https://www.linkedin.com/in/lauren-zsigray-13b5aa25

预期输出:

First Name  Last Name       Profile URL
Ashleigh    Phelps          https://www.linkedin.com/in/ashleighephelps
Jonathan    tsegal          https://www.linkedin.com/in/jonathantsegal
Camilla     Innes           https://www.linkedin.com/in/camilla-innes-13628
Rachel      hudesman        https://www.linkedin.com/in/rachel-hudesman-33
Michael                     https://www.linkedin.com/in/mikeitalia
Antonio     molinelli       https://www.linkedin.com/in/antoniomolinelli
Lauren      Zsigray         https://www.linkedin.com/in/lauren-zsigray-13b5a  

应该使用什么来获得这种格式的输出

最佳答案

试试这段代码:

import pandas as pd

df = pd.read_csv("testing.csv", sep=',', encoding="utf-8")

df.fillna('', inplace=True)

def clear_data(x):
    fname = x['First Name']
    lname = x['Last Name'].strip()
    url = x['Profile URL']
    if not lname:
        fname = fname.split(' ')[0]
        url_name = url.split('/')[-1].split('-')
        if len(url_name) > 1:
            lname = url_name[-2].title()
        else:
            index_of_fname = url_name[0].lower().find(fname.lower())
            if index_of_fname != -1:
                index_of_fname += len(fname)
                lname = url_name[0][index_of_fname:].title()

        x['First Name'] = fname
        x['Last Name'] = lname
    else:
        lname = lname.split('-')[0].strip()
        x['Last Name'] = lname

    return x


df.apply(clear_data, axis=1)

print(df)

关于python - 如何有条件地分隔单元格值并使用 Pandas 添加到列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48984869/

相关文章:

python - 在 Pandas Python 中基于列文本创建列

python - 如何在 Python 3 中生成行和列

python - 图像变化检测照度变化和小偏移的问题

python - 堆叠 RBM 以在 sklearn 中创建深度信念网络

python - 按列索引而不是列名调用数据框中的列 - pandas

python - 有没有办法将数据框设为 "unstack"并作为列表值返回

python - 使用 Pandas 删除数据框中具有不同连续值的行

python - 使用 jinja 将 python 模块导入 Flask 头文件

python - 隐藏 pandas DataFrame 中的重复行

python - 如何根据列值乘以 2 个不相等的数据框?