javascript - 双向、URL 友好的 Node.js slug 函数

标签 javascript node.js unicode slug

我正在尝试在 Node.js 中创建一个函数,无论文章标题是阿拉伯语、拉丁语还是它们的组合,并将其转换为符合文本方向的 URL 友好字符串。

目前,如果没有不同方向的混音,一切都会完美无缺。以下是一些不同语言的测试:

makeURLFriendly("Est-ce que vous avez des frères et sœurs? (Do you have siblings?)")
// French test, returns:
// est-ce-que-vous-avez-des-freres-et-soeurs-do-you-have-siblings

makeURLFriendly("Kannst du/ Können Sie mir helfen?")
// German test, returns:
// kannst-du-konnen-sie-mir-helfen

makeURLFriendly("A=+n_the)m, w!h@a#`t w~e k$n%o^w s&o f*a(r!")
// English with a bunch of symbols test, returns:
// anthem-what-we-know-so-far

makeURLFriendly("إليك أقوى برنامج إسترجاع ملفات في العالم بعرض حصري !")
// Arabic test, returns:
إليك-أقو-برنامج-إسترجاع-ملفات-في-العالم-بعرض-حصري

当同时使用双向语言时,问题开始出现,问题不仅在于函数返回的内容,还在于提供给函数的内容。例如,当我尝试输入阿拉伯语和英语混合的测试标题时,我会得到这样的结果:

ماكروسوفت تطور من Outlook.com

方向搞砸了,但我注意到,当将相同的字符串粘贴到 facebook 时,它会得到修复:

a facebook message

在将其提供给 makeURLFriendly 函数之前,如何在 Node.js 中实现相同的结果?

最佳答案

解决方案是将“从右到左嵌入”字符 U+202B 添加到字符串的开头和任何从左到右的单词之前。

如果有人需要,这里是最终的功能:

const makeURLFriendly = string => {
    let urlFriendlyString = ""

    // Initial clean up.
    string = string
        // Remove spaces from start and end.
        .trim()
        // Changes all characters to lower case.
        .toLowerCase()
        // Remove symbols with a space.
        .replace(/[`~!@#$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/g, " ")

    // Special characters and the characters they will be replaced by.
    const specialCharacters = "àáäâãåăæçèéëêǵḧìíïîḿńǹñòóöôœṕŕßśșțùúüûǘẃẍÿź"
    const replaceCharacters = "aaaaaaaaceeeeghiiiimnnnoooooprssstuuuuuwxyz"
    // Creates a regular expression that matches all the special characters
    // from the specialCharacters constant. Will make something like this:
    // /à|á|ä/g and matches à or á or ä...
    const specialCharactersRegularExpression = new RegExp(
        specialCharacters.split("").join("|"),
        "g"
    )
    // Replaces special characters by their url friendly equivalent.
    string = string
        .replace(
            specialCharactersRegularExpression,
            matchedCharacter => replaceCharacters.charAt(
                specialCharacters.indexOf(matchedCharacter)
            )
        )
        .replace(/œ/g, "oe")

    // Only keeps Arabic, English and numbers in the string.
    const arabicLetters = "ىشغظذخثتسرقضفعصنملكيطحزوهدجبأاإآلإلألآؤءئة"
    const englishLetters = "abcdefghijklmnopqrstuvwxyz"
    const numbers = "0123456789"
    for (let character of string) {
        if (character === " ") {
            urlFriendlyString += character
            continue
        }
        const characterIsURLFriendly = Boolean(
            arabicLetters.includes(character) ||
            englishLetters.includes(character) ||
            numbers.includes(character)
        )
        if (characterIsURLFriendly) urlFriendlyString += character
    }

    // Clean up before text direction algorithm.
    // Replace multiple spaces with one space.
    urlFriendlyString = urlFriendlyString.replace(/\s+/g, "-")

    // Regular expression that matches strings that have
    // right to left direction.
    const isRightToLeft = /[\u0590-\u05ff\u0600-\u06ff]/u
    // Makes an array of all the words in urlFriendlyString
    let words = urlFriendlyString.split("-")

    // Checks if urlFriendlyString is a unidirectional string.
    // Makes another array of boolean values that signify if
    // a string isRightToLeft. Then basically checks if all
    // the boolean values are the same. If yes then the string
    // is unidirectional.
    const stringIsUnidirectional = Boolean(
        words
        .map(word => isRightToLeft.test(word))
        .filter((isWordRightToLeft, index, words) => {
            if (isWordRightToLeft === words[0]) return true
            else return false
        })
        .length === words.length
    )

    // If the string is unidirectional, there is no need for
    // it to pass through our bidirectional algorithm.
    if (stringIsUnidirectional) {
        return urlFriendlyString
            // Replaces multiple hyphens by one hyphen
            .replace(/-+/g, "-")
            // Remove hyphen from start.
            .replace(/^-+/, "")
            // Remove hyphen from end.
            .replace(/-+$/, "")
    }

    // Reset urlFriendlyString so we can rewrite it in the
    // direction we want.
    urlFriendlyString = ""
    // Add U+202B "Right to Left Embedding" character to the
    // start of the words array.
    words.unshift("\u202B")
    // Loop throught the values on the word array.
    for (let word of words) {
        // Concatinate - before every word (the first one will
        // be cleaned later on).
        urlFriendlyString += "-"
        // If the word isn't right to left concatinate the "Right
        // to Left Embedding" character before the word.
        if (!isRightToLeft.test(word)) urlFriendlyString += `\u202B${word}`
        // If not then just concatinate the word.
        else urlFriendlyString += word
    }

    return urlFriendlyString
        // Replaces multiple hyphens by one hyphen.
        .replace(/-+/g, "-")
        // Remove hyphen from start.
        .replace(/^-+/, "")
        // Remove hyphen from end.
        .replace(/-+$/, "")
        // The character U+202B is invisible, so if it is in the start
        // or the end of a string, the first two regular expressions won't
        // match them and the string will look like it still has hyphens
        // in the start or the end.
        .replace(/^\u202B-+/, "")
        .replace(/-+\u202B$/, "")
        // Removes multiple hyphens that come after U + 202B
        .replace(/\u202B-+/, "")

}

此外,当我 .split() 返回的字符串时,单词的顺序很好。也许这对某些 SEO 有好处。 我使用的控制台无法正确或根本无法显示阿拉伯字符。所以,我制作了这个脚本来写入文件以测试脚本的返回值:

const fs = require("fs")

const test = () => {
    const writeStream = fs.createWriteStream("./test.txt")

    writeStream.write(makeURLFriendly("Est-ce que vous avez des frères et sœurs? (Do you have siblings?)"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("Quel est ton/votre film préféré? (What’s your favorite movie?)"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("Kannst du/ Können Sie mir helfen?"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("Ich bin (Übersetzer/Dolmetscher) / Geschäftsmann"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("你吃饭了吗"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("慢慢吃"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("# (sd sdsds   (lakem 0.5) "))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("A=+n_the)m, w!h@a#`t w~e k$n%o^w s&o f*a(r!"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("كيف تجد النيش ذات النقرات مرتفعة الثمن في أدسنس"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("إليك أقوى برنامج إسترجاع ملفات في العالم بعرض حصري !"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("عاجل ...  شركة Oppo تستعرض هاتفها الجديد  Eno"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("إنترنيت الجيل الخامس مميزاتها ! و هل صحيح ما يقوله الخبراء عن سوء إستخدامها من طرف الصين للتجسس ؟؟"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("لماذا إنخفضت أسهم شركة Apple بنسبة %20 ؟؟"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly("10 نصائح لتصبح محترف في مجال Dropshipping"))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly(`"إيلون ماسك" و "زوكربرغ"... ما سبب الخلاف يا ترى ؟`))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly(`ماكروسوفت تطور من Outlook.com`))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly(`ما هو  HTTPS  و هل يضمن الأمان %100 ؟؟`))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly(`ما هي خدمة  Apple TV+ و لماذا هذا التوقيت ؟؟`))
    writeStream.write("\n")
    writeStream.write(makeURLFriendly(`مُراجعة هاتف سَامسونغ S10 Plus`))
}

test()

关于javascript - 双向、URL 友好的 Node.js slug 函数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55613514/

相关文章:

javascript - Node.js 模块、函数未正确导出

unicode - 如何直接从设备输入 Unicode?我想要一个 "Machine interface device"而不是 "Human interface device"(HID)

regex - 在 Django url 正则表达式模式中使用括号和破折号

Ruby:中文解析时间

javascript - React 功能组件 props 按字母顺序排序。 (在定义方面)

javascript - 函数在 jQuery 插件中被多次调用

javascript - webpack-dev-server 得到黑屏

javascript - 如何将img输出到子类

node.js - 用 node + express 解压 POST body

javascript - promise 是如何让代码异步的?