django - PostGIS 最近邻搜索结果乱序?

标签 django postgresql postgis nearest-neighbor

我有一个 Django/PostgreSQL 应用程序,可以显示哪些用户离特定用户最近。它在 ORDER BY 子句中使用 PostGIS 2.0 KNN(K 最近邻)<-> 运算符来列出用户,最近的优先。我在初始数据集中发现的两个搜索结果是乱序的(所有距离都是从加利福尼亚州洛杉矶开始测量的):

Member, City, State, Distance (miles)

user1, North Las Vegas, NV, 239
user2, Phoenix, AZ, 365
user3, Provo, UT, 568
user4, Twin Falls, ID, 630
user5, Albuquerque, NM, 673
user6, Portland, OR, 828
user7, Bozeman, MT, 896
user8, Seattle, WA, 962
user9, Boulder, CO, 834       <- Out of order!
user10, Laramie, WY, 862      <- Out of order!
user11, Naperville, IL, 1756

成员名称只是 Django 的 contrib.auth.models 用户类中的用户名列。包含几何信息的UserAccount类定义如下:

class UserAccount(models.Model):
    user = models.OneToOneField(User, primary_key=True, unique=True)
    address_line_1 = models.CharField(max_length=30)
    address_line_2 = models.CharField(max_length=30, blank=True)
    city = models.CharField(max_length=30)
    region = models.CharField(max_length=30, blank=True)
    postal_code = models.CharField(max_length=10, blank=True)
    country = models.ForeignKey('Country')
    measurement_sys = models.CharField(max_length=5)  # US or Metric

    # User's home (default) and current longitude and latitude
    home_lon = models.FloatField(default=0.0)
    home_lat = models.FloatField(default=0.0)
    current_lon = models.FloatField(default=0.0)
    current_lat = models.FloatField(default=0.0)

    # GeoDjango-specific fields 
    home_point = models.PointField(srid=4326)
    current_point = models.PointField(srid=4326)
    objects = models.GeoManager()

这是我的 Django View 中的查询:

def members(request, template):
    """View all members of the website."""
    uid = request.session['uid']   # PK from User table

    # Get the current user's lon/lat and measurement system
    try:
        ua = UserAccount.objects.get(user_id=uid)
        lon = ua.current_lon
        lat = ua.current_lat
        measurement_sys = ua.measurement_sys
    except UserAccount.DoesNotExist as e:
        return HttpResponseRedirect(reverse('unable-to-display-members'))

    # Define the proximity query.
    if measurement_sys == 'US':
        multiplier = 0.000621371  # Convert to miles
    else:
        multiplier = 0.001  # Convert to kilometers

    query = "SELECT \
                ua.user_id, \
                au.username, \
                ua.city, \
                ua.region, \
                ST_Distance( \
                    ua.current_point::geography, \
                    ST_GeographyFromText( \
                        'SRID=4326;POINT(" \
                            + str(lon) \
                            + " " \
                            + str(lat) + \
                        ")' \
                    ) \
                )*" + str(multiplier) + " AS distance \
            FROM \
                user_account ua \
                INNER JOIN \
                auth_user au \
                ON (ua.user_id = au.id) \
            WHERE ua.user_id != %s \
            ORDER BY \
                ua.current_point::geometry \
                <-> \
                'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geometry \
            LIMIT 250;"

    # Run the proximity query
    raw_queryset = UserAccount.objects.raw(query, [uid])

    # Paginate results
    user_list = [user for user in raw_queryset]
    list_size = len(list(user_list))
    paginator = Paginator(user_list, 10, 4)
    paginator._count = list_size

    page = request.GET.get('page')
    try:
        users = paginator.page(page)
    except PageNotAnInteger:
        users = paginator.page(1)
    except EmptyPage:
        users = paginator.page(paginator.num_pages)
    return render(request, template, {'users': users})

我在查询中做错了什么吗? KNN 运算符有时会“打嗝”并乱序返回一些结果吗?我问这个是因为当我尝试从我的表中取出两个乱序记录,然后为地址更远的用户添加额外的记录时(即在 IL、LA、MI、NC、PA、NY 和ME),所有结果的顺序都是正确的。

顺便说一下,我的输入位于 here .

谢谢!

最佳答案

更新的答案:

Postgis 有两个针对 kNN 邻居功能的近似解决方案,因为 September 2011 :

  • 使用 <-> 运算符,您可以使用边界框的中心获得最近的邻居来计算对象间距离。
  • 使用 <#> 运算符,您可以使用边界框本身获得最近的邻居来计算对象间距离。

您的问题是,两者都是近似值,因此并不完美。因此,如果您想要最好的 250 个结果,您可以使用它们中的任何一个来检索例如最好的 1000 个结果,然后按 ST_DISTANCE 和 LIMIT 250 对相同结果进行排序,以从大约 1000 个结果中获得最好的 250 个结果。

示例:

SELECT * FROM 
    (SELECT *,ST_DISTANCE(current_point::geography, 'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geography ) AS st_dist
    FROM ua
    ORDER BY current_point::geometry <-> 'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geometry 
    LIMIT 1000) AS s
    ORDER BY st_dist LIMIT 250;

关于django - PostGIS 最近邻搜索结果乱序?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23941425/

相关文章:

python - 如何在 "no module named xyz"错误上获得更多诊断输出?

Django Rest 框架 : how to make field required/read-only only for update actions such as PUT and PATCH?

python - 多个 For 循环 Django

PostgreSQL 表 : How can I find a list of values that were inserted during a given time period?

d3.js - 使用 PostGIS 和 D3 绘制水深图

python - django 中的 _lte、__name、__startswith 等查询 - 记录在哪里?

javascript - 通过一次 API 调用在多个表中创建多条记录的最有效方法是什么(sequelize/postgres/node.js)

python - 在 Django ORM 之外编辑数据库

SQL 选择多边形内的要素

postgresql - 如何将两个查询放在一起?