Python - 分子式的语法检查

标签 python parsing syntax formula bnf

这是一个很长的问题,希望大家耐心等待。

我正在编写一个程序来检查分子式的语法是否正确。

我有一个 BNF 语法:

<formel>::= <mol> \n
<mol>   ::= <group> | <group><mol>
<group> ::= <atom> |<atom><num> | (<mol>) <num>
<atom>  ::= <LETTER> | <LETTER><letter>
<LETTER>::= A | B | C | ... | Z
<letter>::= a | b | c | ... | z
<num>   ::= 2 | 3 | 4 | ...

这是我的代码:

from linkedQFile import LinkedQ
import string
import sys

ATOMER = ["H","He","Li","Be","B","C","N","O","F","Ne","Na","Mg","Al","Si","P","S","Cl","Ar"]

class FormelError(Exception):
    pass
class Gruppfel(Exception):
    pass

q = LinkedQ()
formel= "(Cl)2)3"

for symbol in formel:
    q.put(symbol)


def readNum():
    """Reads digits larger than 1. Raises exception if condition is not fulfilled."""
    try:
        if int(q.peek()) >= 2:
            print(q.peek())
            q.get()
            return
        else:
            q.get()
            print("Too small digit at the end of row: "+getRest())
            sys.exit()
    except (ValueError,TypeError):
        raise FormelError("Not a number.")

def readletter():
    """Reads lowercase letters and returns them."""
    if q.peek() in string.ascii_lowercase:
        print(q.peek())
        return q.get()
    else:
        raise FormelError("Expected lowercase letter.")

def readLetter():
    """Reads capital letters and returns them."""
    if q.peek() in string.ascii_uppercase:
        print(q.peek())
        return q.get()
    else:
        raise FormelError("Expected capital letter.")

def readAtom():
    """Reads atoms on the form X and Xx. Raises Exception if the format for an atom is not fulfilled or if the atom does not exist."""
    X = ""
    try:
        X += readLetter()
    except FormelError:
        print("Missing capital letter at end of row: "+getRest())
        sys.exit()  
        return

    try:
        x = readletter()
        atom = X+x
    except (FormelError, TypeError):
        atom = X

    if atom in ATOMER:
        return
    else:
        raise FormelError("Unknown atom.")

def readGroup():
    if q.peek() in string.ascii_uppercase or q.peek() in string.ascii_lowercase:
        try:
            readAtom()
        except:
            print("Unknown atom at end of row: "+getRest())
            sys.exit()
        try:
            while True:
                readNum()
        except FormelError:
            pass
        return
    if q.peek() == "(":
        print(q.peek())
        q.get()
        try:
            readMol()
        except FormelError:
            pass
        if q.peek() == ")":
            print(q.peek())
            q.get()
        else:
            print("Missing right parenthesis at end of row: "+ getRest())
            sys.exit()
            return
        digitfound = False
        try:
            while True:
                readNum()
                digitfound = True
        except:
            if digitfound:
                return
            print("Missing digit at end of row: "+getRest())
            sys.exit()
            return
    raise FormelError("Incorrect start of group")

def readMol():
    try:
        readGroup()
    except FormelError:
        print("Incorrect start of group at end of row: "+getRest()) 
        raise FormelError
    if q.peek() == None:
        return
    if not q.peek() == ")": 
        try:
            readMol()
        except FormelError:
            pass

def readFormel():
    try:
        readMol()
    except:
        return
    print("Correct formula")

def getRest():
    rest = ""
    while not q.isEmpty():
        rest += q.get()
    return rest

readFormel()

现在代码应该接受一些给定的公式,并为一些给定的不正确的公式提供错误代码。让我们看看这些给定的公式:

正确: Si(C3(COOH)2)4(H2O)7

错误: H2O)Fe

(Cl)2)3

程序接受正确的公式,但不幸的是也接受错误的公式。出现这种情况的原因在于:

中的if语句
if not q.peek() == ")": 
    try:
        readMol()
    except FormelError:
        pass

使得右侧不平衡的括号(右侧有一个或多个括号过多)从代码中滑过,而不是被检测为“组”的不正确开头。我怎样才能解决这个问题,同时仍然让 Si(C3(COOH)2)4(H2O)7 被接受为语法正确?

感谢您的耐心等待:)

最佳答案

您的 readMol 代码对“)”有这个错误的测试(您甚至告诉我们)。如果您正在编写(如您所愿)recursive descent parser,那么您的语法并不需要进行此类测试。 .

事实上,你的语法对 mol 有一个奇怪的规则:

<mol>   ::= <group> | <group><mol>

这不适用于递归下降解析器。您必须重构此类规则以共享每个规则中的公共(public)前缀。在这种情况下,很容易:

<mol>   ::= <group> ( <mol> | empty ) ;

然后直接根据语法规则编写代码(参见上面的链接) [你差不多是这么做的,除了“)”检查。] 它应该看起来像这样(我不是 python 专家):

def readMol():
    try:
        readGroup()
    except FormelError:
        print("Incorrect start of group at end of row: "+getRest()) 
        raise FormelError
    try:
       readMol()
    except FormelError:
       pass

在编写递归下降解析器时,首先将语法调整为最兼容的形式(就像我对你的 mol 规则所做的那样)很有帮助。然后对各个识别器进行编码是一项纯粹的机械任务,很难出错。

关于Python - 分子式的语法检查,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/29738017/

相关文章:

python - Django删除 super 用户

python - 使用 python 将一些键发送到非事件窗口

python - 从 python 调用 gnuplot

Python BeautifulSoup XML 解析

javascript - 替换子字符串,但如果在标签之间或在带有 javascript 的标签中则不替换

c - 当指针出现在变量(指针*)的末尾时,这意味着什么?

ios - 使用ObjC block 作为CGDataProviderCreateWithData的回调

python - 如何判断我的端口扫描仪是否正常工作?

parsing - Go 解析器未检测到结构类型的文档注释

css - 如何在 CSS 中处理带有 '.' 的 id 标签