python - 从文本列中提取两列的配给。

我有一个数据框:

df = pd.DataFrame({"id": [1,2,3,4,5],
                "text": ["This is a ratio of 13.4/10","Favorate rate of this id is 11/9","It may not be a good looking person. But he is vary popular (15/10)","Ratio is 12/10","very popular 17/10"],
                "name":["Joe","Adam","Sara","Jose","Bob"]})

我想将数字提取到两列中以获得以下结果:

df = pd.DataFrame({"id": [1,2,3,4,5],
                "text": ["This is a ratio of 13.4/10","Favorate rate of this id is 11/9","It may not be a good looking person. But he is vary popular (15/10)","Ratio is 12/10","very popular 17/10"],
                "name":["Joe","Adam","Sara","Jose","Bob"],
                "rating_nominator":[13.4,11,15,12,17],
                "rating_denominator":[10,9,10,10,10]})

感谢任何帮助。

最佳答案

你可以使用

df[['rating_nominator', 'rating_denominator']] = df['text'].str.extract('(-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?)').astype(float)

正则表达式 (-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?) 将捕获整数或 float 作为分子或分母。

(编辑: this answer 中的正则表达式涵盖了更多情况。我做了一些假设，例如，您在数字中找不到一元 + 符号。 )

演示:

>>> df
   id                  text
0   1  foo 14.12/10.123 bar
1   2                 10/12
2   3             13.4/14.5
3   4          -12.24/-13.5
4   5                1/-1.2
>>>
>>> df[['rating_nominator', 'rating_denominator']] = df['text'].str.extract('(-?\d+(?:\.\d+)?)/(-?\d+(?:\.\d+)?)').astype(float)
>>> df
   id                  text  rating_nominator  rating_denominator
0   1  foo 14.12/10.123 bar               14.12            10.123
1   2                 10/12               10.00            12.000
2   3             13.4/14.5               13.40            14.500
3   4          -12.24/-13.5              -12.24           -13.500
4   5                1/-1.2                1.00            -1.20

关于python - 从文本列中提取两列的配给。，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/53658016/

python - 从文本列中提取两列的配给。

上一篇：python - 更喜欢字典中最大值的键？

下一篇：python - 从非 django python 脚本使用 django ORM