我有一个数据框:
df = pd.DataFrame({"id": [1,2,3,4,5],
"text": ["This is a ratio of 13.4/10","Favorate rate of this id is 11/9","It may not be a good looking person. But he is vary popular (15/10)","Ratio is 12/10","very popular 17/10"],
"name":["Joe","Adam","Sara","Jose","Bob"]})
我想将数字提取到两列中以获得以下结果:
df = pd.DataFrame({"id": [1,2,3,4,5],
"text": ["This is a ratio of 13.4/10","Favorate rate of this id is 11/9","It may not be a good looking person. But he is vary popular (15/10)","Ratio is 12/10","very popular 17/10"],
"name":["Joe","Adam","Sara","Jose","Bob"],
"rating_nominator":[13.4,11,15,12,17],
"rating_denominator":[10,9,10,10,10]})
感谢任何帮助。
最佳答案
你可以使用
df[['rating_nominator', 'rating_denominator']] = df['text'].str.extract('(-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?)').astype(float)
正则表达式 (-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?)
将捕获整数或 float 作为分子或分母。
(编辑: this answer 中的正则表达式涵盖了更多情况。我做了一些假设,例如,您在数字中找不到一元 +
符号。 )
演示:
>>> df
id text
0 1 foo 14.12/10.123 bar
1 2 10/12
2 3 13.4/14.5
3 4 -12.24/-13.5
4 5 1/-1.2
>>>
>>> df[['rating_nominator', 'rating_denominator']] = df['text'].str.extract('(-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?)').astype(float)
>>> df
id text rating_nominator rating_denominator
0 1 foo 14.12/10.123 bar 14.12 10.123
1 2 10/12 10.00 12.000
2 3 13.4/14.5 13.40 14.500
3 4 -12.24/-13.5 -12.24 -13.500
4 5 1/-1.2 1.00 -1.20
关于python - 从文本列中提取两列的配给。,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/53658016/