sql - 数据 block : Equivalent code for SQL query

标签 sql apache-spark-sql databricks azure-databricks

我正在寻找查询的等效数据 block 代码。 我添加了一些示例代码和预期的代码,但特别是我正在 Databricks 中寻找 查询 的等效代码。目前我陷入了 CROSS APPLY STRING SPLIT 部分。

示例 SQL 数据:

    CREATE TABLE FactTurnover
    (
    ID INT,
    SalesPriceExcl NUMERIC (9,4),
  Discount VARCHAR(100)
  )
 INSERT INTO FactTurnover
 VALUES 
   (1, 100, '10'),
   (2, 39.5877, '58, 12'),
   (3, 100, '50, 10, 15'),
   (4, 100, 'B')

查询:

    ;WITH CTE AS
    (
     SELECT Id, SalesPriceExcl, 
         CASE WHEN value = 'B' THEN 0
         ELSE CAST(value as int) END AS Discount
     From FactTurnover
     CROSS APPLY STRING_SPLIT(Discount, ',')
     )
     SELECT Id,  
       Min(SalesPriceExcl) AS SalesPriceExcludingDiscount,
       EXP(SUM(LOG((100 - Discount) / 100.0))) As TotalDiscount,
       Cast(EXP(SUM(LOG((100 - Discount) / 100.0))) * 
            MIN(SalesPriceExcl) As Numeric(9,2))
        PriceAfterDiscount
     FROM CTE
     GROUP BY ID

预期结果:

| Id | SalesPriceExcludingDiscount |       TotalDiscount | PriceAfterDiscount |

|----|-----------------------------|---------------------|--------------------|

|  1 |                         100 |                 0.9 |                 90 |

|  2 |                     39.5877 | 0.36960000000000004 |              14.63 |

|  3 |                         100 | 0.38250000000000006 |              38.25 |

|  4 |                         100 |                   1 |                100 |

最佳答案

使用SPLIT将逗号分隔的字符串转换为数组,然后使用LATERAL VIEWEXPLODE对该数组的元素进行操作大批。大致等效的语法(包括 CTE)是:

%sql
--SELECT * FROM FactTurnover;

WITH cte AS
(
SELECT *
FROM
  (
  SELECT Id, SalesPriceExcl, SPLIT ( Discount, ',' ) AS discountArray
  FROM FactTurnover
  ) x
  LATERAL VIEW EXPLODE ( discountArray ) x AS xdiscount
)
SELECT 
  Id,
  MIN(SalesPriceExcl) AS SalesPriceExcludingDiscount,
  EXP ( SUM( LOG( ( 100 - xdiscount ) / 100.00 ) ) ) AS TotalDiscount
FROM cte
GROUP BY Id
ORDER BY Id

如果您足够勇敢,也可以使用 higher order functions 来执行此操作。我在下面举了两个例子。我想说这些更难调试,您可能应该在性能方面尝试它们,这取决于您对什么感到满意:

%sql
-- Convert Discount text column to array with SPLIT function and filter out value 'B' from the array
;WITH filterB AS (
SELECT *, FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ) discountArray
FROM FactTurnover
), cte1 AS (
-- Do initial calcs on array
SELECT 
  Id,
  TRANSFORM ( discountArray, discountArray -> LOG( ( 100 - discountArray ) / 100.00 ) ) discountArray2
FROM filterB
)
SELECT
  Id,
  EXP( AGGREGATE ( discountArray2, CAST( 0 AS DOUBLE ), ( x, y ) -> x + y ) ) AS x
FROM cte1;

-- all in one example
SELECT 
    Id,
    EXP( AGGREGATE( TRANSFORM( FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ), y -> LOG( ( 100 - y ) / 100.00 ) ), CAST( 0 AS DOUBLE ), ( z, a ) -> z + a ) )
    AS final
FROM FactTurnover
ORDER BY Id

关于sql - 数据 block : Equivalent code for SQL query,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/57994948/

相关文章:

php - session 冲突错误

apache-spark - 外部表未从 Spark 流写入的 Parquet 文件更新

apache-spark - Spark中用scala计算总体百分比

仅当文件不存在时才使用 dbtuils 复制 Databricks 文件

mysql - 如何从 ubuntu 终端创建 .sql 文件

java - 使用 Hibernate SQLQuery 的 AVG 函数中的 NullPointerException

sql - sql中的条件xml

python - 我可以更改 Spark 数据框中列的可空性吗?

pyspark - Databricks - 显示数据框并打印字符串

python - Dataframe Koalas 到 Delta 表 : ERROR: An error occurred while calling o237. 保存