mysql - 查找嵌套 JSON 中具有 'true' 值的所有记录

标签 mysql json apache-drill

这是我的嵌套 JSON:

{  
   "business_id":"pNQwnY_q4okdlnPiR-3RBA",
   "full_address":"6105 S Fort Apache Rd\nSpring Valley\nLas Vegas, NV 89148",
   "hours":{  },
   "open":true,
   "categories":[  ],
   "city":"Las Vegas",
   "review_count":68,
   "name":"Empire Bagels",
   "neighborhoods":[  
      "Spring Valley"
   ],
   "longitude":-115.298175926911,
   "state":"NV",
   "stars":3.0,
   "latitude":36.07728616051,
   "attributes":{  
      "Take-out":true,
      "Wi-Fi":"no",
      "Good For":{  
         "dessert":false,
         "latenight":false,
         "lunch":false,
         "dinner":false,
         "breakfast":true,
         "brunch":false
      },
      "Caters":true,
      "Noise Level":"quiet",
      "Takes Reservations":false,
      "Delivery":false,
      "Ambience":{  
         "romantic":false,
         "intimate":false,
         "classy":false,
         "hipster":false,
         "divey":false,
         "touristy":false,
         "trendy":false,
         "upscale":false,
         "casual":true
      },
      "Parking":{  
         "garage":false,
         "street":false,
         "validated":false,
         "lot":true,
         "valet":false
      },
      "Has TV":true,
      "Outdoor Seating":true,
      "Attire":"casual",
      "Alcohol":"none",
      "Waiter Service":false,
      "Accepts Credit Cards":true,
      "Good for Kids":true,
      "Good For Groups":true,
      "Price Range":1
   },
   "type":"business"
}

我正在使用 apache 钻查询此。我想找出一个城市中所有餐厅的前 10 个最常见的“真实”属性。我想要的是这样的:

Accepts Credit Cards : 200,
Alcohol: 300,
Good For Kids : 500

我的查询会是什么样子?这就是我所做的:

select attributes, count(*) attributes from `yelp_dataset` group by attributes;

我收到此错误:

Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REQUIRED and MAP:REQUIRED.

Fragment 0:0

[Error Id: 8fe8a616-92c7-4da0-ab65-b5542d391f47 on 192.168.10.104:31010] (state=,code=0)

我的查询应该是什么?

最佳答案

由于混合数据类型,我无法使用 KVGEN() 自动展平属性,但您可以尝试使用一些 UNION ALL 强力的 CTE:

WITH ReviewAttributes AS (
    SELECT
      reviews.name,
      'Accepts Credit Cards' as `AttributeName`,
      CASE WHEN reviews.attributes.`Accepts Credit Cards` = true THEN 1 ELSE 0 END as `AttributeValue`
    FROM 
      `yelp_dataset` reviews
    UNION ALL
    SELECT
      reviews.name,
      'Alcohol' as `AttributeName`,
      CASE WHEN reviews.attributes.`Alcohol` <> 'none' THEN 1 ELSE 0 END as `AttributeValue`
    FROM
      `yelp_dataset` reviews
    UNION ALL
    SELECT
      reviews.name,
      'Good for Kids' as `AttributeName`,
      CASE WHEN reviews.attributes.`Good for Kids` = true THEN 1 ELSE 0 END as `AttributeValue`
    FROM
      `yelp_dataset` reviews
)
SELECT 
  `AttributeName`,
  SUM(`AttributeValue`) as `AttributeCount`
FROM
  ReviewAttributes
GROUP BY
  `AttributeName`;

CASE 语句还可以帮助您解决 bool 字段和枚举字段之间的一些差异,例如计算样本中的酒精

关于mysql - 查找嵌套 JSON 中具有 'true' 值的所有记录,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40648431/

相关文章:

python - pyarrow 可以将多个 Parquet 文件写入 fastparquet 的 file_scheme ='hive' 选项之类的文件夹吗?

php - mysql like 语句未按预期工作

PHP/MySQL OnClick 更新 MySQL

java - 使用 HttpURLConnection 连接到远程服务器

json - 如何在 swift 4.2 中将 JSON 转换为 Codable?

apache-drill - 在 Windows 10 上以嵌入式模式启动 Apache Drill 时出错

mysql - 使用 Apache Drill 的 GROUP_CONCAT() 的替代方案是什么?

javascript - 如何使用 mysql 对函数进行请求

mysql - 在mysql数据库中保存表情符号字符

python - 当文件使用 Python 具有不同的键时将 JSON 转换为 CSV?