python - bigquery python 客户端 : load_table_from_file not working with csv file

标签 python csv google-cloud-platform google-bigquery storage

我正在尝试从 csv 文件在现有 bigquery 表中追加新行。 csv 是:

"sprotocol";"w5q53";"insertingdate";"closeddate";"sollectidate";"company";"companyid";"contact"
"20-22553";"DELETED";"2020-01-26;0000-01-01 00:00";"0000-01-01 00:00";"";"";"this is a ticket"

这是我的 python 函数:

job_config = bigquery.LoadJobConfig()
    job_config.source_format = 'text/csv'
    job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
    job_config.source_format = bigquery.SourceFormat.CSV
    job_config.skip_leading_rows = 1
    job_config.autodetect = False
    job_config.schema = [
        bigquery.SchemaField("sprotocol", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("w5q53", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("insertingdate", "TIMESTAMP", mode="NULLABLE"),
        bigquery.SchemaField("closeddate", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("sollectidate", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("company", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("companyid", "STRING", mode="NULLABLE"),
        bigquery.SchemaField("contact", "STRING", mode="NULLABLE")
    ]
    job_config.fieldDelimiter = ';'
    job_config.allow_quoted_newlines = True

    with open(file_path, "rb") as file:
        load_job = _connection.load_table_from_file(
            file,
            table_ref,
            job_config=job_config
        )  # API request
        print("Starting job {}".format(load_job.job_id))

        load_job.result()  # Waits for table load to complete.
        print("Job finished.")
    file.close()

我收到以下错误:

[{'reason': 'invalid', 'message': 'Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the errors[] collection for more details.'}, {'reason': 'invalid', 'message': 'Error while reading data, error message: CSV table references column position 55, but line starting at position:743 contains only 1 columns.'}]

我也尝试删除架构定义,但收到相同的错误。 有人可以帮助我吗?

最佳答案

上述代码存在三个问题

  1. 使用field_delimiter而不是fieldDelimiter

    job_config.field_delimiter = ';'

  2. 使用DATE而不是TIMESTAMP,因为输入仅包含日期

    bigquery.SchemaField("insertingdate", "DATE", mode="NULLABLE"),

  3. 双引号不正确

    "20-22553";"已删除";"2020-01-26";"0000-01-01 00:00";"0000-01-01 00:00";"";"";"这是一张票"

关于python - bigquery python 客户端 : load_table_from_file not working with csv file,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/61021127/

相关文章:

python - matplotlib 中的直方图,x 轴上的时间

python - 保存每行的前两个单词

Python:多个函数调用附加到同一个列表

google-cloud-platform - 服务帐户即使具有 'owner' 权限也会抛出权限不足错误

node.js - Kubernetes Node.js 容器不断崩溃

python - 不区分大小写 'in'

python - Python 3 中字典生成器的文档在哪里?

python - 如何使用python仅为特定列组合创建相关矩阵?

firebase - Google Cloud Platform - 将 "no organization"的 firebase 项目移动到 "organization"时出错

python - 在特定级别重新索引 pandas MultiIndex