我想根据每天访问的所有访客来计算新访客。目前我唯一可用的数据是前 2 列,因此我需要根据前 2 列推断后 2 列。
这是我到目前为止创建第一列和第二列的内容
WITH visitor_log_response AS (
SELECT
CAST(JSON_PARSE(visitor_log) AS MAP<VARCHAR, VARCHAR>) AS visitor_map,
date
FROM visitor_log_response_table
),
names_and_dates AS (
SELECT DISTINCT
visitor_name AS visitor_name,
date
FROM visitor_log_response
CROSS JOIN UNNEST(visitor_map) AS u(visitor_name, visitor_age)
),
visitor_names AS (
SELECT
date,
ARRAY_JOIN(
ARRAY_AGG(
visitor_name
ORDER BY
visitor_name
),
','
) visitors_today,
FROM names_and_dates
GROUP BY
date
ORDER BY
date DESC
)
SELECT
date,
visitors_today
FROM visitor_names
结果是这样的
如果使用此查询对表进行规范化
SELECT ds, visitors_today_split
FROM previous_table
CROSS JOIN UNNEST(SPLIT(visitors_today),',')) as (visitors_today_split)
我会得到这个输出
最佳答案
您可以使用window functions使用数组聚合(从 ARRAY_JOIN
CTE 中删除 visitor_names
):
-- sample data
with dataset(date, visitors_today) as (
values ('Dec 6', array['Allie', 'Jon']),
('Dec 7', array['Allie', 'Jon', 'Zach']),
('Dec 8', array['Barb', 'Jon']),
('Dec 9', array['Janet', 'Zach'])
)
-- query
select date,
visitors_today,
array_distinct(visitors_today || prev_visitors) all_visitors_to_date,
array_except(visitors_today, prev_visitors) new_visitors
from (
select *,
coalesce(
flatten(array_distinct(array_agg(visitors_today)
over (order by date rows between UNBOUNDED PRECEDING and 1 PRECEDING))),
array[]) as prev_visitors -- combine all visitors before today into non null array
from dataset);
输出:
请注意,就性能而言,数组可能不是最佳类型,并且在 Presto/Trino 中仅限于 10000 个元素。
关于sql - 从 SQL Presto 中的日常字段创建新字段列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74777204/