python - 用于从两端删除非 ASCII 字符的正则表达式

我必须使用这段代码循环多次，有没有更好的方法？

item = '!@#$abc-123-4;5.def)(*&^;\n'

或

'!@#$abc-123-4;5.def)(*&^;\n_'

或

'!@#$abc-123-4;5.def)_(*&^;\n_'

我的那个没用

item = re.sub('^\W|\W$', '', item)

期待

abc-123-4;5.def

最终目标是只保留从两端删除不是 [a-zA-Z0-9] 的任何内容，同时保留中间的任何字符。第一个和最后一个字母在 [a-zA-Z0-9]

类中

最佳答案

This expression不受左侧限制，如果您想要的所有字符都与您在问题中提供的示例相似，它可能会执行得更快:

([a-z0-9;.-]+)(.*)

在这里，我们假设您可能只想过滤输入字符串左右部分的那些特殊字符。

您可以在表达式中包含其他字符和边界，如果您愿意，您甚至可以将其修改/更改为更简单、更快的表达式。

正则表达式描述图

此图显示了表达式的工作方式，您可以在这个 link 中可视化其他表达式:

如果你想在右侧添加其他边界，你可以简单地这样做:

([a-z0-9;.-]+)(.*)$

甚至您可以在捕获组的左侧和右侧列出您的特殊字符。

JavaScript 测试

const regex = /([a-z0-9;.-]+)(.*)$/gm;
const str = `!@#\$abc-123-4;5.def)(*&^;\\n`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

性能测试

此 JavaScript 片段使用简单循环显示了该表达式的性能。

const repeat = 1000000;
const start = Date.now();

for (var i = repeat; i >= 0; i--) {
	const string = '!@#\$abc-123-4;5.def)(*&^;\\n';
	const regex = /([!@#$)(*&^;]+)([a-z0-9;.-]+)(.*)$/gm;
	var match = string.replace(regex, "$2");
}

const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

Python 测试

import re

regex = r"([a-z0-9;.-]+)(.*)$"
test_str = "!@#$abc-123-4;5.def)(*&^;\\n"
print(re.findall(regex, test_str))

输出

[('abc-123-4;5.def', ')(*&^;\\n')]

关于python - 用于从两端删除非 ASCII 字符的正则表达式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56069317/

python - 用于从两端删除非 ASCII 字符的正则表达式

正则表达式描述图

JavaScript 测试

性能测试

Python 测试

输出

上一篇：python - Linux 上 Python 中的环境变量

下一篇：python - 为什么这个 numba 代码比 numpy 代码慢 6 倍？