SQL/配置单元 : How to calculate days to purchase

标签 sql hadoop hive

SQL/Hive:我希望计算访客购买的天数。这是我的数据的样子

date    visitor orders
1-Jan   A   0  
1-Jan   B   0  
4-Jan   B   1  
5-Jan   A   0  
12-Jan  A   1

这是我期待的结果:

Days to purchase    count of visitors
0   0
1   0 
2   0
3   1
4   0
5   0
.   .
.   .
.   .
11  1

有什么帮助吗?

最佳答案

如果我理解正确的话: 您需要做的是找到访客+订单的每个组合的最短日期

select visitor,orders,min(date) as min.date from table group by visitor,orders

这应该是这样的:

visitor orders min.date
  A         0  1-Jan 
  B         0  1-Jan
  B         1  4-Jan
  A         1  12-Jan

这个表(我们称它为tbl)可以自连接给

select A.visitor,datediff(day,purchase.date,first.visit) as days.to.purchase 
from (select visitor,min.date as first.visit from tbl where orders=0) A 
inner join (select visitor,min.date as purchase.date from tbl where orders=1) B
on A.visitor=B.visitor

现在,用一个外部查询包装这个查询来计算具有相同日期差异的访问者:

 select days.to.purchase,count(visitors) as visitors from 
 (select A.visitor,datediff(day,purchase.date,first.visit) as days.to.purchase 
    from (select visitor,min.date as first.visit from tbl where orders=0) A 
    inner join (select visitor,min.date as purchase.date from tbl where orders=1) B
    on A.visitor=B.visitor
) joined
group by days.to.purchase order by days.to.purchase

希望我理解正确。我不确定这是正确的解决方案,但您没有给我太多开始:)

完整的解决方案可能是:

 select days.to.purchase,count(visitors) as visitors from 
 (select A.visitor,datediff(day,purchase.date,first.visit) as days.to.purchase 
    from 
(select visitor,min.date as first.visit from 
(select visitor,orders,min(date) as min.date from table group by visitor,orders) tbl where orders=0) A 
    inner join 
(select visitor,min.date as purchase.date from 
(select visitor,orders,min(date) as min.date from table group by visitor,orders) tbl where orders=1) B
    on A.visitor=B.visitor
) joined
group by days.to.purchase order by days.to.purchase

关于SQL/配置单元 : How to calculate days to purchase,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34976796/

相关文章:

sql - 使用 NULL 查询性能

sql - Rails 在不同模型中通过 SQL 搜索进行过滤

php - MySQL - 删除特定 ID 值后的行。

hadoop - 多个映射后跟一个使用 Hadoop 和 HBase 的 reduce

java - Lucene索引和Map/Reduce搜索

mysql - SQL - 选择不遵循正确格式的所有内容

hadoop - 如何获取基于IP的国家/地区名称

hadoop - 为所有节点在 hdfs 中复制数据是否更快?

hbase - Hive:如何按行时间戳检索数据

hive - 将 Spark SchemaRDD 保存到 Hive 数据仓库