sql - MS SQL-用户定义函数-斜率截取RSquare;如何按投资组合分组

标签 sql sql-server statistics user-defined-functions

下面是我的数据示例,表RR_Linest:

投资组合----月数----集合
A --- --------- 1 --------------------- $ 100 -------------- -------------------------------------------------- ---------------------
A -------------- 2 --------------------- $ 90
A ------------- 3 --------------------- $ 80 ------------- -------------------------------------------------- -----------------------
A ------------- 4 --------------------- $ 70 ------------- -------------------------------------------------- -----------------------
B ------------- 1 -------------------- $ 100 -------------- -------------------------------------------------- ---------------------
B ---- -------- 2 ---------------------- $ 90 ------------- -------------------------------------------------- ----------------------
B------------- 3 --------------------- $ 80

通过删除投资组合列并仅选择的ozt_strong(x)和收集数据(y)来选择一个投资组合(我删除了数据),我能够弄清楚如何获得一个投资组合的斜率,截距,RSquare对于投资组合B),然后运行以下代码。

我一直在尝试更改功能,以便在我运行它时;它给了我投资组合的斜率,截距和R平方。有人知道该怎么做吗?我已经尝试了许多方法,但我只是想不通。

首先,我创建了函数:

声明@RegressionInput_A [dbo]。[RegressionInput_A]

插入@RegressionInput_A(x,y)
选择
([型号月份]),log([集合$])
来自[dbo]。[RR_Linest]

从[dbo]中选择*。LinearRegression_A

GO

drop function dbo.LinearRegression_A

CREATE FUNCTION dbo.LinearRegression_A
( 
@RegressionInputs_A AS dbo.RegressionInput_A READONLY 
) 
RETURNS @RegressionOutput_A TABLE 
( 
Slope DECIMAL(18, 6), 
Intercept DECIMAL(18, 6), 
RSquare DECIMAL(18, 6) 
) 
AS
BEGIN 

DECLARE @Xaverage AS DECIMAL(18, 6)
DECLARE @Yaverage AS DECIMAL(18, 6)
DECLARE @slope AS DECIMAL(18, 6)
DECLARE @intercept AS DECIMAL(18, 6)
DECLARE @rSquare AS DECIMAL(18, 6)

SELECT
@Xaverage = AVG(x),
@Yaverage = AVG(y)
FROM
@RegressionInputs_A

SELECT
@slope = SUM((x - @Xaverage) * (y - @Yaverage))/SUM(POWER(x - @Xaverage, 2))
FROM
@RegressionInputs_A

SELECT
@intercept = @Yaverage - (@slope * @Xaverage) 

SELECT @rSquare = 1 - (SUM(POWER(y - (@intercept + @slope * x), 2))/(SUM(POWER(y - (@intercept + @slope * x), 2)) + SUM(POWER(((@intercept + @slope * x) - @Yaverage), 2))))
FROM
@RegressionInputs_A

INSERT INTO
@RegressionOutput_A
(
Slope,
Intercept,
RSquare
)
SELECT
@slope,
@intercept,
@rSquare

RETURN

END
GO

然后我运行函数
declare @RegressionInput_A  [dbo].[RegressionInput_A]

insert into @RegressionInput_A (x,y)
select
([model month]),log([collection $])
from [dbo].[RR_Linest]

select * from [dbo].[LinearRegression_A](@RegressionInput_A)

最佳答案

哇,这是一个非常酷的示例,展示了如何在在线表值函数中使用嵌套的CTE。您想使用ITVF,因为它们速度很快。请参阅Wayne Sheffield的blog文章,证明了这一事实。

如果要确保为用户提供正确的解决方案真的很复杂,我总是从示例数据库/表开始。

让我们基于模型创建一个名为[test]的数据库。

--
-- Create a simple db
--

-- use master
use master;
go

-- delete existing databases
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Test')
DROP DATABASE Test
GO

-- simple db based on model
create database Test;
go

-- switch to new db
use [Test];
go

让我们创建一个名为[InputToLinearReg]的表类型。
--
-- Create table type to pass data
--

-- Delete the existing table type
IF  EXISTS (SELECT * FROM sys.systypes WHERE name = 'InputToLinearReg')
DROP TYPE dbo.InputToLinearReg
GO

--  Create the table type
CREATE TYPE InputToLinearReg AS TABLE
(
portfolio_cd char(1),
month_num int,
collections_amt money
);
go

好的,这是使用CTE的多层SELECT语句。查询分析器将此视为一条SQL语句,该语句可以并行执行,而常规函数则不能。请参阅韦恩文章的黑盒部分。
--
-- Create in line table value function (fast)
--

-- Remove if it exists
IF OBJECT_ID('CalculateLinearReg') > 0
DROP FUNCTION CalculateLinearReg
GO

-- Create the function
CREATE FUNCTION CalculateLinearReg
( 
    @ParmInTable AS dbo.InputToLinearReg READONLY 
) 
RETURNS TABLE 
AS
RETURN
(

  WITH cteRawData as
  (
    SELECT
        T.portfolio_cd,
        CAST(T.month_num as decimal(18, 6)) as x,
        LOG(CAST(T.collections_amt as decimal(18, 6))) as y
    FROM
        @ParmInTable as T
  ),

  cteAvgByPortfolio as
  (
    SELECT
        portfolio_cd,
        AVG(x) as xavg,
        AVG(y) as yavg
    FROM
        cteRawData 
    GROUP BY 
        portfolio_cd
  ),

  cteSlopeByPortfolio as
  (
    SELECT
        R.portfolio_cd,
        SUM((R.x - A.xavg) * (R.y - A.yavg)) / SUM(POWER(R.x - A.xavg, 2)) as slope
    FROM
        cteRawData as R 
    INNER JOIN 
        cteAvgByPortfolio A
    ON 
        R.portfolio_cd = A.portfolio_cd
    GROUP BY 
        R.portfolio_cd
  ),

  cteInterceptByPortfolio as
  (
    SELECT
        A.portfolio_cd,
        (A.yavg - (S.slope * A.xavg)) as intercept
    FROM
        cteAvgByPortfolio as A
    INNER JOIN 
        cteSlopeByPortfolio S
    ON 
        A.portfolio_cd = S.portfolio_cd

  )

  SELECT 
      A.portfolio_cd,
      A.xavg,
      A.yavg,
      S.slope,
      I.intercept,
      1 - (SUM(POWER(R.y - (I.intercept + S.slope * R.x), 2)) /
      (SUM(POWER(R.y - (I.intercept + S.slope * R.x), 2)) + 
      SUM(POWER(((I.intercept + S.slope * R.x) - A.yavg), 2)))) as rsquared
  FROM
      cteRawData as R 
        INNER JOIN 
      cteAvgByPortfolio as A ON R.portfolio_cd = A.portfolio_cd
        INNER JOIN 
      cteSlopeByPortfolio S ON A.portfolio_cd = S.portfolio_cd
        INNER JOIN 
      cteInterceptByPortfolio I ON S.portfolio_cd = I.portfolio_cd
  GROUP BY 
      A.portfolio_cd,
      A.xavg,
      A.yavg,
      S.slope,
      I.intercept
);

最后但并非最不重要的一点是,设置表变量并获取答案。与上述解决方案不同,它按投资组合ID分组。
-- Load data into variable
DECLARE @InTable AS InputToLinearReg;

-- insert data
insert into @InTable
values
('A', 1, 100.00),
('A', 2, 90.00),
('A', 3, 80.00),
('A', 4, 70.00),
('B', 1, 100.00),
('B', 2, 90.00),
('B', 3, 80.00);

-- show data
select * from CalculateLinearReg(@InTable)
go

这是使用您的数据的结果图。

关于sql - MS SQL-用户定义函数-斜率截取RSquare;如何按投资组合分组,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21311393/

相关文章:

python - 存储时间序列数据的最佳开源解决方案是什么?

c# - 当前上下文中不存在名称 'Database'?

sql-server - select语句有值则返回,无值则返回

mysql - MySQL如何对不同表中的两列求和

sql - 针对树结构的优化SQL

sql-server - 将表从本地 Sql 数据库复制到 Azure SQL 数据库

python-3.x - 从平均值和标准差计算 Z 分数

statistics - 比例测试 : Z-test vs bootstrap/permutation - different results

java - 尝试运行 executeQuery/prepared statements() 时出错

sql - 数据库设计——多1对多关系