python - 如何比较两列中的字符串并将一列中的字符串大小写替换为另一列?

标签 python regex pandas pattern-matching

我有两栏“句子”和“更新”。我想将 Url 末尾的更新列中的每个单词与相应的句子单词大小写进行匹配,并将其替换为句子中单词的大小写。

我不知道如何进行此比较,感谢任何帮助。 实际数据有 43k 行,具有不同的 Url。

示例代码:

import pandas as pd

dict1 = {'Updates': ['The new abc.com/Line','Its a abc.com/bright and abc.com/Sunny Day','abc.com/smartphone have taken our the abc.com/WORLD','abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak'],
     'Sentences': ['The new line','Its a bright and sunny day','Smartphone have taken our the World','GLOBAL Warming is reaching its Peak ']
        }

df = pd.DataFrame(dict1)

当前营业额:

Sentences           Updates
The new line            The new abc.com/Line

Its a bright and sunny day          Its a abc.com/bright and abc.com/Sunny Day

Smartphone have taken our the World         abc.com/smartphone have taken our the abc.com/WORLD

GLOBAL Warming is reaching its Peak             abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak
Expected O/P:

Sentences           Updates
The new line            The new abc.com/line

Its a bright and sunny day          Its a abc.com/bright and abc.com/sunny day

Smartphone have taken our the World         abc.com/Smartphone have taken our the abc.com/World

GLOBAL Warming is reaching its Peak             abc.com/GLOBAL Warming is abc.com/reaching its abc.com/Peak

最佳答案

使用re

代码:

import re

dict1 = {
    'Sentences': [
        'The new line',
        'Its a bright and sunny day',
        'Smartphone have taken our the World',
        'GLOBAL Warming is reaching its Peak '
    ],
    'Updates': [
        'The new abc.com/Line',
        'Its a abc.com/bright and abc.com/Sunny Day',
        'abc.com/smartphone have taken our the abc.com/WORLD',
        'abc.com/GLOBAL Warming is abc.com/Reaching its abc.com/peak'
    ]
 }
for sentence, update in zip(dict1['Sentences'], dict1['Updates']):
    urls = [x.split("/")[-1] for x in update.split() if "/" in x]
    for url in urls:
        update = (re.sub(url, re.search(url, sentence, re.IGNORECASE).group(), update, flags=re.IGNORECASE))

    print(f"{sentence}\t{update}")

输出:

The new line    The new abc.com/line
Its a bright and sunny day  Its a abc.com/bright and abc.com/sunny Day
Smartphone have taken our the World abc.com/Smartphone have taken our the abc.com/World
GLOBAL Warming is reaching its Peak     abc.com/GLOBAL Warming is abc.com/reaching its abc.com/Peak

关于python - 如何比较两列中的字符串并将一列中的字符串大小写替换为另一列?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58403985/

相关文章:

python - 使用全局时 '=' 上的语法无效

python - 使用Python进行商业Web应用程序开发

regex - Perl 预编译正则表达式 - utf8

python - 从数据、周期范围和聚合函数创建 Pandas TimeSeries

python - 来自多人游戏的 1 对 1 比较

python - Django 和 mozilla_django_oidc - 如何注销 session ?

python - Flask:TypeError:参数 '%s'的预期Ptr <cv::UMat>

ios - 电话号码格式应该是国际格式,iPhone 中是否有用于电话号码验证的正则表达式

regex - 使用 nginx proxy_pass 修改 Location header

python - 转换 Dataframe 列以解决 TypeError Cannot be hashed