我正在构建一个 NodeJS 应用程序,可以将 URL 存储在数据库中。我想使用 URL 作为主键,以避免存储重复项。为了做到这一点,我需要 url 尽可能采用最简单的形式,删除多余的斜杠、参数和前缀。
如何将下面列出的所有 URL 转换为与列出的第一个 URL 相同的字符串?有没有一种方法可以安全地做到这一点,以解释我可能没有在下面列出的其他变化?
https://website.com/coolpage/938921/
https://www.website.com/coolpage/938921/
http://website.com/coolpage/938921/
https://website.com/coolpage/938921/
https://website.com/coolpage/938921/?awesome=1
最佳答案
使用标准 Node.js url
模块。
解决方案:
require('url');
function getBaseUrl(url){
const u = new URL(url);
const result =`${u.host}${u.pathname}`
.split('//').join('/')
.replace('www.', '');
// cut off the trailing '/' character from the result
if (result.length && result[result.length - 1] === '/')
return result.substring(0, result.length - 1)
return result;
}
测试:
const urls = [
"https://website.com/coolpage/938921/",
"https://www.website.com/coolpage/938921/",
"http://website.com/coolpage/938921/",
"https://website.com/coolpage/938921/",
"https://website.com/coolpage/938921/?awesome=1",
"https://website.com/coolpage/938921?awesome=1",
"https:///website.com//coolpage//938921//"
];
for (let i = 0; i < urls.length; i++) {
const u = getBaseUrl(urls[i]);
console.log(`${i}: ${u}`);
}
控制台输出:
0: website.com/coolpage/938921 1: website.com/coolpage/938921 2: website.com/coolpage/938921 3: website.com/coolpage/938921 4: website.com/coolpage/938921 5: website.com/coolpage/938921 6: website.com/coolpage/938921
关于javascript - JS : Convert URL into its simplest form,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54472715/