例如
测试.csv:
First Name Last Name Profile URL
Ashleigh Phelps https://www.linkedin.com/in/ashleighephelps
Jonathan https://www.linkedin.com/in/jonathantsegal
Camilla Innes https://www.linkedin.com/in/camilla-innes-61213628
Rachel https://www.linkedin.com/in/rachel-hudesman-335b8120
Michael https://www.linkedin.com/in/mikeitalia
Antonio https://www.linkedin.com/in/antoniomolinelli
Lauren Zsigray https://www.linkedin.com/in/lauren-zsigray-13b5aa25
我使用的代码只会分隔带连字符的代码,但如何获取名字和姓氏?
df = pd.read_csv("testing.csv", sep=',', encoding="utf-8")
df = df[df['Last Name'].isnull()]
p = df.pop('Profile URL')
tmp_df = p.str.split('/')
df['Last Name'] = tmp_df.str[-1]
tmp1_df = df.pop('Last Name').str.split('-')
df['Last Name'] = tmp1_df.str[1:-1].str.join(sep='-')
df = pd.concat([df, p], axis=1)
print (df)
这给出了这个输出:
First Name Last Name Profile URL
Ashleigh Phelps https://www.linkedin.com/in/ashleighephelps
Jonathan https://www.linkedin.com/in/jonathantsegal
Camilla Innes https://www.linkedin.com/in/camilla-innes-61213628
Rachel hudesman https://www.linkedin.com/in/rachel-hudesman-335b8120
Michael https://www.linkedin.com/in/mikeitalia
Antonio https://www.linkedin.com/in/antoniomolinelli
Lauren Zsigray https://www.linkedin.com/in/lauren-zsigray-13b5aa25
预期输出:
First Name Last Name Profile URL
Ashleigh Phelps https://www.linkedin.com/in/ashleighephelps
Jonathan tsegal https://www.linkedin.com/in/jonathantsegal
Camilla Innes https://www.linkedin.com/in/camilla-innes-13628
Rachel hudesman https://www.linkedin.com/in/rachel-hudesman-33
Michael https://www.linkedin.com/in/mikeitalia
Antonio molinelli https://www.linkedin.com/in/antoniomolinelli
Lauren Zsigray https://www.linkedin.com/in/lauren-zsigray-13b5a
应该使用什么来获得这种格式的输出
最佳答案
试试这段代码:
import pandas as pd
df = pd.read_csv("testing.csv", sep=',', encoding="utf-8")
df.fillna('', inplace=True)
def clear_data(x):
fname = x['First Name']
lname = x['Last Name'].strip()
url = x['Profile URL']
if not lname:
fname = fname.split(' ')[0]
url_name = url.split('/')[-1].split('-')
if len(url_name) > 1:
lname = url_name[-2].title()
else:
index_of_fname = url_name[0].lower().find(fname.lower())
if index_of_fname != -1:
index_of_fname += len(fname)
lname = url_name[0][index_of_fname:].title()
x['First Name'] = fname
x['Last Name'] = lname
else:
lname = lname.split('-')[0].strip()
x['Last Name'] = lname
return x
df.apply(clear_data, axis=1)
print(df)
关于python - 如何有条件地分隔单元格值并使用 Pandas 添加到列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48984869/