python - 对 unicode 字母进行 Pyparsing

我需要对 unicode 字符使用 pyparsing。所以我尝试了他们的 github 存储库中带有法语字符 cédille 的简单示例，但给出了错误。

我的代码

from pyparsing import Word, alphas
greet = Word(alphas) + "," + Word(alphas) + "!"
hello = "Hello, cédille!"
greet.parseString(hello)

然后报错

pyparsing.ParseException: Expected "!" (at char 8), (line:1, col:9)

有没有办法解决这个问题？

最佳答案

Pyparsing 有 pyparsing_unicode 模块，它定义了一些 unicode 字符范围，每个范围内都有 alphas、nums 等定义.范围包括 CJK、Cyrillic、Devanagari、Hebrew、Arabic 等。 examples 目录中的 greetingInGreek.py 和 greetingInKorean.py 示例显示了其中的几个实际操作。

您的示例使用 Latin1 集，如下所示:

from pyparsing import Word, pyparsing_unicode as ppu
intl_alphas = ppu.Latin1.alphas
greet = Word(intl_alphas) + "," + Word(intl_alphas) + "!"
hello = "Hello, cédille!"
print(greet.parseString(hello))

打印:

['Hello', ',', 'cédille', '!']

alphas8bit 可能会保留用于遗留支持，但新应用程序应使用 pyparsing_unicode.Latin1.alphas。

关于python - 对 unicode 字母进行 Pyparsing，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59106516/

上一篇：python - Python 是否有像 Ruby 的 %w(things in array) 这样创建数组的快捷方式

下一篇：python - 从行中的多个值计算一个值

python - Pandas - 按每个可能的键组合聚合

java - 为什么 Java 使用修改后的 UTF-8 而不是 UTF-8？

parsing - 重写多用途日志文件解析器以使用形式语法会提高可维护性吗？

python - 在给定元素值的python中使用webdriver检索任何元素的ID

python : group by columns with columns values that are grouped by occurs only once and retain all other columns

php - 在 PHP : (*UTF8) Works on Windows but not Linux 中使用 preg_match 匹配 UTF 字符

ios - 版权/注册符号编码不起作用

python - 如何在 pyparsing 中迭代 ParseResults

python - Pyparsing 中的递归