algorithm - 用于存储数千个向量的数据结构

标签 algorithm language-agnostic data-structures

我在一个空间中有多达 10,000 个随机定位的点,我需要能够在任何给定时间分辨出光标离哪个点最近。为了添加一些上下文,这些点采用矢量绘图的形式,因此用户可以不断快速地添加和删除它们,并且还可能在 Canvas 空间中不平衡。

因此,我试图找到最有效的数据结构来存储和查询这些点。如果可能的话,我想让这个问题与语言无关。

最佳答案

问题更新后

  1. 使用两个 Red-Black TreeSkip_list map 。两者都是紧凑的自平衡数据结构,为您提供 O(log n) 的搜索、插入和删除操作时间。一张 map 将每个点的 X 坐标用作键,点本身作为值,另一张 map 将使用 Y 坐标作为键,点本身作为值。

  2. 作为权衡,我建议最初用一个正方形限制光标周围的搜索区域。为了完美匹配,正方形的边应等于光标周围“敏感度圆”的直径。即,如果您只对距离光标 10 像素半径内的最近邻居感兴趣,则正方形的边需要为 20px。作为替代方案,如果您在寻找最近邻居之后而不考虑距离,您可以尝试通过评估相对于光标的地板和天花板动态地找到边界。

  3. 然后从边界内的 map 中检索两个点子集,合并以仅包含两个子集中的点。

  4. 遍历结果,计算与每个点的接近度(dx^2+dy^2,避免平方根,因为您对实际距离不感兴趣,只对接近度感兴趣),找到最近的邻居。

  5. 从邻近图中取平方根来测量到最近邻的距离,看它是否大于“敏感圆”的半径,如果是则表示圆内没有点。

  6. 我建议对每种方法进行一些基准测试;通过优化很容易达到顶峰。在我的适度硬件 (Duo Core 2) 上,在 10K 点内重复一千次的最近邻居的朴素单线程搜索在 Java 中需要 350 毫秒。只要整个 UI react 时间低于 100 毫秒,它对用户来说似乎是即时的,请记住,即使是简单的搜索也可能会给你足够快的 react 。

通用解决方案

最有效的数据结构取决于您计划使用的算法、时空权衡以及点的预期相对分布:

  • 如果空间不是问题,最有效的方法可能是预先计算屏幕上每个点的最近邻居,然后将最近邻居的唯一 ID 存储在表示屏幕的二维数组中。
  • 如果时间不是问题,则将 10K 个点存储在一个简单的二维数组中并每次都进行简单搜索,即遍历每个点并计算距离可能是一个很好且简单易维护的选项。
  • 对于两者之间的一些权衡,这里有一个关于各种可用的最近邻搜索选项的很好的演示:http://dimacs.rutgers.edu/Workshops/MiningTutorial/pindyk-slides.ppt
  • 各种最近邻搜索算法的大量详细资料:http://simsearch.yury.name/tutorial.html ,只需选择最适合您的需求即可。

因此,评估数据结构与算法的隔离是不可能的,如果没有很好地了解任务约束和优先级,这反过来又很难评估。

示例 Java 实现

import java.util.*;
import java.util.concurrent.ConcurrentSkipListMap;

class Test
{

  public static void main (String[] args)
  {

      Drawing naive = new NaiveDrawing();
      Drawing skip  = new SkipListDrawing();

      long start;

      start = System.currentTimeMillis();
      testInsert(naive);
      System.out.println("Naive insert: "+(System.currentTimeMillis() - start)+"ms");
      start = System.currentTimeMillis();
      testSearch(naive);
      System.out.println("Naive search: "+(System.currentTimeMillis() - start)+"ms");


      start = System.currentTimeMillis();
      testInsert(skip);
      System.out.println("Skip List insert: "+(System.currentTimeMillis() - start)+"ms");
      start = System.currentTimeMillis();
      testSearch(skip);
      System.out.println("Skip List search: "+(System.currentTimeMillis() - start)+"ms");

  }

  public static void testInsert(Drawing d)
  {
      Random r = new Random();
      for (int i=0;i<100000;i++)
            d.addPoint(new Point(r.nextInt(4096),r.nextInt(2048)));
  }

  public static void testSearch(Drawing d)
  {
      Point cursor;
      Random r = new Random();
      for (int i=0;i<1000;i++)
      {
          cursor = new Point(r.nextInt(4096),r.nextInt(2048));
          d.getNearestFrom(cursor,10);
      }
  }


}

// A simple point class
class Point
{
    public Point (int x, int y)
    {
        this.x = x;
        this.y = y;
    }
    public final int x,y;

    public String toString()
    {
        return "["+x+","+y+"]";
    }
}

// Interface will make the benchmarking easier
interface Drawing
{
    void addPoint (Point p);
    Set<Point> getNearestFrom (Point source,int radius);

}


class SkipListDrawing implements Drawing
{

    // Helper class to store an index of point by a single coordinate
    // Unlike standard Map it's capable of storing several points against the same coordinate, i.e.
    // [10,15] [10,40] [10,49] all can be stored against X-coordinate and retrieved later
    // This is achieved by storing a list of points against the key, as opposed to storing just a point.
    private class Index
    {
        final private NavigableMap<Integer,List<Point>> index = new ConcurrentSkipListMap <Integer,List<Point>> ();

        void add (Point p,int indexKey)
        {
            List<Point> list = index.get(indexKey);
            if (list==null)
            {
                list = new ArrayList<Point>();
                index.put(indexKey,list);
            }
            list.add(p);
        }

        HashSet<Point> get (int fromKey,int toKey)
        {
            final HashSet<Point> result = new HashSet<Point> ();

            // Use NavigableMap.subMap to quickly retrieve all entries matching
            // search boundaries, then flatten resulting lists of points into
            // a single HashSet of points.
            for (List<Point> s: index.subMap(fromKey,true,toKey,true).values())
                for (Point p: s)
                 result.add(p);

            return result;
        }

    }

    // Store each point index by it's X and Y coordinate in two separate indices
    final private Index xIndex = new Index();
    final private Index yIndex = new Index();

    public void addPoint (Point p)
    {
        xIndex.add(p,p.x);
        yIndex.add(p,p.y);
    }


    public Set<Point> getNearestFrom (Point origin,int radius)
    {


          final Set<Point> searchSpace;
          // search space is going to contain only the points that are within
          // "sensitivity square". First get all points where X coordinate
          // is within the given range.
          searchSpace = xIndex.get(origin.x-radius,origin.x+radius);

          // Then get all points where Y is within the range, and store
          // within searchSpace the intersection of two sets, i.e. only
          // points where both X and Y are within the range.
          searchSpace.retainAll(yIndex.get(origin.y-radius,origin.y+radius));


          // Loop through search space, calculate proximity to each point
          // Don't take square root as it's expensive and really unneccessary
          // at this stage.
          //
          // Keep track of nearest points list if there are several
          // at the same distance.
          int dist,dx,dy, minDist = Integer.MAX_VALUE;

          Set<Point> nearest = new HashSet<Point>();

          for (Point p: searchSpace)
          {
             dx=p.x-origin.x;
             dy=p.y-origin.y;
             dist=dx*dx+dy*dy;

             if (dist<minDist)
             {
                   minDist=dist;
                   nearest.clear();
                   nearest.add(p);
             }
             else if (dist==minDist)
             {
                 nearest.add(p);
             }


          }

          // Ok, now we have the list of nearest points, it might be empty.
          // But let's check if they are still beyond the sensitivity radius:
          // we search area we have evaluated was square with an side to
          // the diameter of the actual circle. If points we've found are
          // in the corners of the square area they might be outside the circle.
          // Let's see what the distance is and if it greater than the radius
          // then we don't have a single point within proximity boundaries.
          if (Math.sqrt(minDist) > radius) nearest.clear();
          return nearest;
   }
}

// Naive approach: just loop through every point and see if it's nearest.
class NaiveDrawing implements Drawing
{
    final private List<Point> points = new ArrayList<Point> ();

    public void addPoint (Point p)
    {
        points.add(p);
    }

    public Set<Point> getNearestFrom (Point origin,int radius)
    {

          int prevDist = Integer.MAX_VALUE;
          int dist;

          Set<Point> nearest = Collections.emptySet();

          for (Point p: points)
          {
             int dx = p.x-origin.x;
             int dy = p.y-origin.y;

             dist =  dx * dx + dy * dy;
             if (dist < prevDist)
             {
                   prevDist = dist;
                   nearest  = new HashSet<Point>();
                   nearest.add(p);
             }
             else if (dist==prevDist) nearest.add(p);

          }

          if (Math.sqrt(prevDist) > radius) nearest = Collections.emptySet();

          return nearest;
   }
}

关于algorithm - 用于存储数千个向量的数据结构,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/1920692/

相关文章:

algorithm - 如果旅行推销员乘飞机旅行怎么办?

java - 根据系统时间调用函数

language-agnostic - TDD - 重构为黑盒?

winapi - 将 `Ex`添加到函数/方法名称是什么意思?

url - 我如何找到链接到特定长网址的所有短网址?

algorithm - 数据结构问题

java - 我应该使用八角树吗?

algorithm - 自相交多边形的面积

haskell - 为字典实现 Applicative 实例(Map、关联数组)

data-structures - 与哈希表相比,splay 树有哪些优势?