c# - 在子查询中涉及计数的高效查询

标签 c# linq entity-framework-5


public class Paper
  public int Id { get; set; }
  public string Title { get; set; }
  public virtual ICollection<Author> Authors { get; set; }

public class Author
  public int Id { get; set; }
  public string Name { get; set; }
  public virtual ICollection<Paper> Papers { get; set; }

我想使用 LINQ 构建一个查询,该查询将提供每位作者与其他作者相比的“受欢迎程度”,即作者贡献的论文数量除以所有作者贡献的总体总数文件。我想出了几个问题来实现这一目标。

选项 1:

var query1 = from author in db.Authors
             let sum = (double)db.Authors.Sum(a => a.Papers.Count)
             select new
               Author = author,
               Popularity = author.Papers.Count / sum

选项 2:

var temp = db.Authors.Select(a => new
             Auth = a,
             Contribs = a.Papers.Count
var query2 = temp.Select(a => new
               Author = a,
               Popularity = a.Contribs / (double)temp.Sum(a2 => a2.Contribs)


double sum = db.Authors.Sum(a => a.Papers.Count);
var query3 = from author in db.Authors
             select new
               Author = author,
               Popularity = author.Papers.Count / sum



您应该寻找的第一件事是它们可以完美地转换为 SQL 或尽可能接近,这样数据就不会全部加载到内存中只是为了应用这些计算。

但我觉得选项 2 可能是您的最佳选择,它还进行了一项优化以缓存贡献的页面总数。这样一来,您只需调用数据库一次即可获得您无论如何都需要的作者,其余的将在您的代码中运行,您可以在那里并行化并执行任何您需要的操作以加快速度。

所以像这样(抱歉,我更喜欢 Linq 的 Fluent 风格):

//here you can even load only the needed info if you don't need the whole entity.
//I imagine you might only need the name and the Pages.Count which you can use below, this would be another optimization.
var allAuthors = db.Authors.All(); 

var totalPageCount = allAuthors.Sum(x => x.Pages.Count);

var theEndResult = allAuthors .Select(a => new
           Author = a,
           Popularity = a.Pages.Count/ (double)totalPageCount

