SQL 几何 VS 小数(8,6) 纬度、经度性能

标签 sql sql-server database-performance sqlgeography geographic-distance

我正在研究选择与给定坐标一定距离内的最近点的性能。

选项是使用两个decimal(8,6) - lat、long列或单个geography列并使用它。

我只感兴趣哪个更快?

最佳答案

TL;DR 地理速度快约 10 倍。

好的,我已经设置了测试:

几个表,其中一个包含 id,lat,long (int,decimal(8,6),decimal(8,6)) 另一个包含 id,coord (int, geography)

然后插入47k随机数据。

为了索引第一个表,我在纬度、经度上使用了非聚集升序索引,填充因子为 95。 第二个 GRIDS =(LEVEL_1 = LOW,LEVEL_2 = MEDIUM,LEVEL_3 = LOW,LEVEL_4 = LOW,填充因子为 95。

CREATE TABLE dbo.Temp
(
Id int NOT NULL IDENTITY (1, 1),
Coord geography NOT NULL
)  ON [PRIMARY]
 TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE dbo.Temp ADD CONSTRAINT
    PK_Temp PRIMARY KEY CLUSTERED 
    (
    Id
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO


declare @i int =0
    declare @lat decimal(8,6) =0.0
      declare @long decimal(8,6) =0.0
  while (@i < 47000)
  begin
  set @lat =(select (0.9 -Rand()*1.8)*100)
 set @long =(select (0.9 -Rand()*1.8)*100)
    insert into Temp
  select geography::Point(@lat, @long,4326)


set @i =@i+1

 end

go


CREATE SPATIAL INDEX [SpatialIndex_1] ON [dbo].Temp
(
    [coord]
)USING  GEOGRAPHY_GRID 
WITH (GRIDS =(LEVEL_1 = LOW,LEVEL_2 = MEDIUM,LEVEL_3 = LOW,LEVEL_4 = LOW), 
CELLS_PER_OBJECT = 16, PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = OFF, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]

GO

CREATE TABLE [dbo].[Temp2](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Lat] [decimal](8, 6) NOT NULL,
    [Long] [decimal](8, 6) NOT NULL,
 CONSTRAINT [PK_Temp2] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO


declare @i int = 0
declare @lat decimal(8,6)  = 0 
declare @long decimal(8,6)  = 0

while (@i < 47000)
begin
set @lat = (select (0.9 - (RAND()*1.8))*100)
set @long = (select (0.9 - (RAND()*1.8))*100)

insert into Temp2
select @lat , @long

set @i = @i +1
end

go
CREATE NONCLUSTERED INDEX [Coord_IX] ON [dbo].[Temp2] 
(
    [Lat] ASC,
    [Long] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 95) ON [PRIMARY]
GO

然后我进行了一些测试:

第一个是纬度,经度。

declare @lat decimal(8,6) = 0.0,
 @lon decimal(8,6) = 0.0,
@i int = 0,
@start datetime = getdate()

while(@i < 100)
begin

set @lat =   (select (0.9 - Rand()*1.8)*100)
set @lon =  (select (0.9 - (RAND()*1.8))*100.0)

DECLARE @lat_s FLOAT = SIN(@lat * PI() / 180),
        @lat_c FLOAT = COS(@lat * PI() / 180)


SELECT DISTINCT top 1000 @lat, @lon, *
FROM (
    SELECT
        lat,
        long,
        ((ACOS(@lat_s * SIN(lat * PI() / 180) + @lat_c * COS(lat * PI() / 180) * COS((@lon - long) * PI() / 180)) * 180 / PI()) * 60 * 1.1515) AS dist
    FROM dbo.Temp2
) t
ORDER BY dist

set @i= @i+1
end
print CONVERT(varchar,(getdate()-@start),108)
go

第二个是地理。

 DECLARE @g geography;


   declare @point nvarchar(50)  =''
 declare @i int =0,
     @lat decimal(8,6) =0.0,
       @long decimal(8,6) =0.0,
       @start datetime = getdate()
  while (@i < 100)
  begin
  set @lat =(select (0.9 -Rand()*1.8)*100)
 set @long =(select (0.9 -Rand()*1.8)*100)
 set @point = (select 'POINT('+CONVERT(varchar(10), @lat)+ '  ' +CONVERT(varchar(10), @long)+')')
 SET @g = geography::STGeomFromText(@point, 4326);
    SELECT TOP 1000
    @lat,
    @long,
        @g.STDistance(st.[coord]) AS [DistanceFromPoint (in meters)] 
    ,   st.[coord]
    ,   st.id
FROM    Temp st 
ORDER BY @g.STDistance(st.[coord]) ASC

set @i =@i+1

 end
print CONVERT(varchar,(getdate()-@start),108)
 go

结果:

  • 纬度、经度 - 00:00:10
  • 地理 - 00:02:21

对于那些想知道为什么地理表现如此糟糕的人 这是执行计划 - 请注意,它不使用空间索引,并且需要很长时间才能排序,因为行大小为 4047 字节(十进制为 25 字节)。尝试to force index results in runtime error

enter image description here

P.S我也为平面做了一个,但与球形的差异非常小~0.5秒(在9.5-10.0秒内返回,这似乎稍微快一些)仍然拥有一切在一个地方有这样的脚本:

print 'flat'
declare @lat decimal(8,6) = 0.0,
 @lon decimal(8,6) = 0.0,
@i int = 0,
@start datetime = getdate()

while(@i < 100)
begin

set @lat =   (select (0.9 - Rand()*1.8)*100)
set @lon =  (select (0.9 - (RAND()*1.8))*100.0)

SELECT DISTINCT top 1000 @lat, @lon, *
FROM (
    SELECT
        lat,
        long,
        sqrt(power((@lat - lat),2) + (power((@lon - long),2))) AS dist
    FROM dbo.Temp2
) t

ORDER BY dist

set @i= @i+1
end
print CONVERT(varchar,(getdate()-@start),108)
go

更新:

切换到 SQL 2014 并强制使用具有 10M 记录的索引后:

  • 纬度、经度 00:00:22.935
  • 平拍时间为 00:00:22.988
  • 地理拍摄时间为 00:00:02.427

使用的地理脚本:

DECLARE @g geography;
declare @point nvarchar(50)  =''
declare @i int =0,
        @lat decimal(8,6) =0.0,
        @long decimal(8,6) =0.0,
        @start datetime = getdate()
set @lat =(select (0.9 -Rand()*1.8)*100)
set @long =(select (0.9 -Rand()*1.8)*100)
set @point = (select 'POINT('+CONVERT(varchar(10), @lat)+ '  ' 
             +CONVERT(varchar(10), @long)+')')
SET @g = geography::STGeomFromText(@point, 4326);

SELECT TOP 1000
    @lat,
    @long,
        @g.STDistance(st.[coord]) AS [DistanceFromPoint (in meters)] 
    ,   st.[coord]
    ,   st.id
FROM    Temp st with(index([SpatialIndex_1]))
WHERE @g.STDistance(st.[coord])  IS NOT NULL
ORDER BY @g.STDistance(st.[coord]) asc

关于SQL 几何 VS 小数(8,6) 纬度、经度性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/34763910/

相关文章:

sql - 过滤SQL查询返回的结果

sql-server - 我可以在 SQL 查询中使用 HAVING 而不是 WHERE 吗?

sql-server - 在 sql server 2008R2 数据库中查找所有出现的 sp_send_dbmail

Mysql Order by 子句使用 "FileSort"

mysql - 提高空间 MySQL 查询的性能

sql - 数据库设计——多1对多关系

php - SELECT 语句给出 fatal error : Invalid parameter number in

c# - 使用 SQL 注入(inject)安全的参数插入数据库?

mysql - 是否可以包含连接表中除连接表之外的所有字段?

sql - 在大型数据集上优化 Oracle SELECT