用于解析SQL语句的Python正则表达式

标签 python regex

我需要使用正则表达式从 SQL DDL 语句中解析一些信息。 SQL 语句如下所示:

CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
USING parquet
OPTIONS (
  serialization.format '1'
)
PARTITIONED BY (DATA2, DATA3)

我需要在 Python 中解析它并提取 PARTITIONED BY 子句中指定的列。我已经想出了一个正则表达式来在删除换行符后实现它,但如果其中有换行符,我就无法让它工作。这是一些演示代码:

import re
def print_partition_columns_if_found(ddl_string):
    regex = r'CREATE +?(TEMPORARY +)?TABLE *(?P<db>.*?\.)?(?P<table>.*?)\((?P<col>.*?)\).*?USING +([^\s]+)( +OPTIONS *\([^)]+\))?( *PARTITIONED BY \((?P<pcol>.*?)\))?'
    match = re.search(regex, ddl_string, re.MULTILINE | re.DOTALL)
    if match.group("pcol"):
        print match.group("pcol").strip()
    else:
        print 'did not find any pcols in {matches}'.format(matches=match.groups())        


ddl_string1 = """
CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
USING parquet OPTIONS (serialization.format '1') PARTITIONED BY (DATA2, DATA3)"""
print_partition_columns_if_found(ddl_string1)

print "--------"

ddl_string2 = """
CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
USING parquet
OPTIONS (
  serialization.format '1'
)
PARTITIONED BY (DATA2, DATA3)
"""
print_partition_columns_if_found(ddl_string2)

返回:

DATA2, DATA3
--------
did not find any pcols in (None, 'default.', 'table1 ', 'DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT', 'parquet', None, None, None)

有正则表达式专家愿意帮助我吗?

最佳答案

让我们检查一下 python sqlparse 文档:Documentation - getting started

>>> import sqlparse
>>> ddl_string2 = """
... CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
... USING parquet
... OPTIONS (
...   serialization.format '1'
... )
... PARTITIONED BY (DATA2, DATA3)
... """
>>> ddl_string1 = """
... CREATE TABLE default.table1 (DATA4 BIGINT, DATA5 BIGINT, DATA2 BIGINT, DATA3 BIGINT)
... USING parquet OPTIONS (serialization.format '1') PARTITIONED BY (DATA2, DATA3)"""
>>> def print_partition_columns_if_found(sql):
...     parse = sqlparse.parse(sql)
...     data = next(item for item in reversed(parse[0].tokens) if item.ttype is None)[1]
...     print(data)
...
>>> print_partition_columns_if_found(ddl_string1)
DATA2, DATA3
>>> print_partition_columns_if_found(ddl_string2)
DATA2, DATA3
>>>

关于用于解析SQL语句的Python正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49321785/

相关文章:

php - 正则表达式:如何捕获以匹配的字符集开头的组

regex - 使用 sed 转义特殊字符

regex - 匹配正则表达式中表格中的数字

python - C++ 17 与 Python 2.7 的兼容性

python - 如何使用 Flask 和 Sqlalchemy 自动创建数据库表和架构?

python - sqlalchemy:将 html 表插入 mysql 数据库

python - 使用字典理解反转一对多映射

javascript - 替换所有出现的子字符串,除了 "之间的字符串

python - 使用 Nuitka 编译的脚本引发段错误

python - 提取每个单词正则表达式周围的名称