parsing - 如何使用PARSE方言从CSV中读取行?

标签 parsing csv rebol

我正在尝试使用PARSE将CSV行变成Rebol块。足够容易用开放代码编写,但是与其他问题一样,我正在尝试学习方言可以做到这一点。

因此,如果一行显示:

"Look, that's ""MR. Fork"" to you!",Hostile Fork,,http://hostilefork.com

然后我要块:
[{Look, that's "MR. Fork" to you!} {Hostile Fork} none {http://hostilefork.com}]

注意事项:
  • CSV字符串中的嵌入式引号用""指示
  • 逗号可以放在引号内,因此可以是文字的一部分,而不是列分隔符
  • 相邻的逗号分隔指示字段为空
  • 不包含引号或逗号的字符串可以不带引号出现
  • 目前,我们可以将http://rebol.com之类的内容保留为STRING!而不是LOAD将它们分为URL!
  • 之类的类型

    为了使它更加统一,我要做的第一件事是在输入行上附加一个逗号。然后我有一个column-rule,它捕获由逗号终止的单列...它可以用引号引起来,也可以不用引号引起来。

    我知道由于标题行而应有多少列,因此代码如下:
    unless parse line compose [(column-count) column-rule] [
        print rejoin [{Expected } column-count { columns.}]
    ]
    

    但是我在写column-rule时有些卡住。我需要一种方言表达方式:“一旦找到报价,就不断跳过报价对,直到找到一个独立存在的报价。”有什么好方法吗?

    最佳答案

    与大多数解析问题一样,我尝试构建一种最能描述输入格式元素的语法。

    在这种情况下,我们有名词:

    [comma ending value-chars qmark quoted-chars value header row]
    

    一些动词:
    [row-feed emit-value]
    

    和操作名词:
    [current chunk current-row width]
    

    我想我可以将其分解一些,但足以使用。一,基础:
    comma: ","
    ending: "^/"
    qmark: {"}
    value-chars: complement charset reduce [qmark comma ending]
    quoted-chars: complement charset reduce [qmark]
    

    现在的值(value)结构。引用的值是从我们发现的有效字符或引号的大块中建立起来的:
    current: chunk: none
    quoted-value: [
        qmark (current: copy "")
        any [
            copy chunk some quoted-chars (append current chunk)
            |
            qmark qmark (append current qmark)
        ]
        qmark
    ]
    
    value: [
        copy current some value-chars
        | quoted-value
    ]
    
    emit-value: [
        (
            delimiter: comma
            append current-row current
        )
    ]
    
    emit-none: [
        (
            delimiter: comma
            append current-row none
        )
    ]
    

    请注意,在每行的开头将delimiter设置为ending,然后在我们传递值后立即将其更改为comma。因此,将输入行定义为[ending value any [comma value]]

    剩下的就是定义文档结构:
    current-row: none
    row-feed: [
        (
            delimiter: ending
            append/only out current-row: copy []
        )
    ]
    
    width: none
    header: [
        (out: copy [])
        row-feed any [
            value comma
            emit-value
        ]
        value body: ending :body
        emit-value
        (width: length? current-row)
    ]
    
    row: [
        row-feed width [
            delimiter [
                value emit-value
                | emit-none
            ]
        ]
    ]
    
    if parse/all stream [header some row opt ending][out]
    

    将其包装起来以屏蔽所有这些单词,您将拥有:
    REBOL [
        Title: "CSV Parser"
        Date: 19-Nov-2012
        Author: "Christopher Ross-Gill"
    ]
    
    parse-csv: use [
        comma ending delimiter value-chars qmark quoted-chars
        value quoted-value header row
        row-feed emit-value emit-none
        out current current-row width
    ][
        comma: ","
        ending: "^/"
        qmark: {"}
        value-chars: complement charset reduce [qmark comma ending]
        quoted-chars: complement charset reduce [qmark]
    
        current: none
        quoted-value: use [chunk][
            [
                qmark (current: copy "")
                any [
                    copy chunk some quoted-chars (append current chunk)
                    |
                    qmark qmark (append current qmark)
                ]
                qmark
            ]
        ]
    
        value: [
            copy current some value-chars
            | quoted-value
        ]
    
        current-row: none
        row-feed: [
            (
                delimiter: ending
                append/only out current-row: copy []
            )
        ]
        emit-value: [
            (
                delimiter: comma
                append current-row current
            )
        ]
        emit-none: [
            (
                delimiter: comma
                append current-row none
            )
        ]
    
        width: none
        header: [
            (out: copy [])
            row-feed any [
                value comma
                emit-value
            ]
            value body: ending :body
            emit-value
            (width: length? current-row)
        ]
    
        row: [
            opt ending end break
            |
            row-feed width [
                delimiter [
                    value emit-value
                    | emit-none
                ]
            ]
        ]
    
        func [stream [string!]][
            if parse/all stream [header some row][out]
        ]
    ]
    

    关于parsing - 如何使用PARSE方言从CSV中读取行?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13451026/

    相关文章:

    c++ - 带有 OpenGL 的 OBJ 文件解析器

    php - 从 mysql 到 csv 的列名

    python - 需要帮助从 csv excel 文件创建字典

    c - 为什么 scanf 忽略最后一个值?

    sql - 解析sql中的时间跨度字符串

    json - 使用 altjson 库创建 JSON 数组

    rebol - Rebol View2 中的文字浏览器已损坏

    rebol - 为什么 rebol 使用 https 返回错误

    compact-framework - C# Compact-Framework 友好的命令行解析器

    python - 处理错误 "TypeError: Expected tuple, got str"将 CSV 加载到 pandas 多级和多索引 (pandas)