python - 在循环中从 bigquery 查询数据时出现 Bad Request 错误

标签 python google-bigquery

我正在循环中使用下面提到的 get_data_from_bq 方法从 bigquery 查询数据:

def get_data_from_bq(product_ids):
    format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
    query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
    query_job = bigquery_client.query(query, job_config=job_config)
    return query_job.result()

虽然第一次查询(迭代)返回的数据是正确的,但所有后续查询都抛出下面提到的异常

    results = query_job.result()
  File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2415, in result
    super(QueryJob, self).result(timeout=timeout)
  File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 660, in result
    return super(_AsyncJob, self).result(timeout=timeout)
  File "/home/ishank/.local/lib/python2.7/site-packages/google/api_core/future/polling.py", line 120, in result
    raise self._exception
google.api_core.exceptions.BadRequest: 400 Cannot explicitly modify anonymous table xyz:_bf4dfedaed165b3ee62d8a9efa.anon1db6c519_b4ff_dbc67c17659f

编辑 1: 下面是一个抛出上述异常的示例查询。此外,这在 bigquery 控制台中运行顺利。

select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in ("168561","175936","161684","161681","161686") and eventTime > CAST("2018-05-30 11:21:19" as DATETIME) group by eventType, productId order by productId;

最佳答案

我遇到了完全相同的问题。问题不在于查询本身,而是您很可能重复使用相同的 QueryJobConfig。当您执行查询时,除非您设置了 destination,否则 BigQuery 会将结果存储在一个匿名表中,该表在 QueryJobConfig 对象中说明。如果您重复使用此配置,BigQuery 会尝试将新结果存储在同一个匿名表中,因此会出现错误。 老实说,我不是特别喜欢这种行为。

你应该像这样重写你的代码:

def get_data_from_bq(product_ids):
    format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
    query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
    query_job = bigquery_client.query(query, job_config=QueryJobConfig())
    return query_job.result()

希望这对您有所帮助!

关于python - 在循环中从 bigquery 查询数据时出现 Bad Request 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50838525/

相关文章:

python - 退出 PyQT 应用程序时出现段错误

python - 合并多个 CSV 文件并按字段删除重复项

python - 使用 TensorFlow 从 Google Drive 下载时出现 HTTP 错误

python - 如何中断 Tornado 协程

sql - 查询执行期间资源超出

google-bigquery - 大查询 : Querying repeated fields

mysql - 如何对不同表和不同过滤器上的多个计数求和

python - 我如何从 Python 中检测到 MySQL 服务器的总死亡?

amazon-s3 - 将数据从Google Cloud Storage导出到Amazon S3

sql - BigQuery GitHub 包含所有数据吗?