我有半百万条记录表,我需要找到重复项。所以我使用我创建的这段代码:
var dups2 = from m in mg_B
group m by new { m.Addr1, m.Addr2, m.City, m.State }
into g
where g.Count() > 1
select g;
此代码的问题在于,它不会将 addr1 为空字符串“”且分别为 NULL 的 2 条记录视为重复项。
基本上,在比较字段的空值和空值时,它认为它们不同,但我需要被视为相同。
我知道我可以遍历每条记录并将空值替换为“”,但我花了计算机 1 分钟来遍历 4000 条记录。当有人点击按钮时,这将重复完成。
我发现这个 null 空字符串问题是因为我最初创建了一个只有一些字段的类(该表有 40 多个字段)。
List<CombineClass> mg = (from m in db.MG_Backup
where m.IsArchived == false
select new CombineClass { id = m.ID, name = m.Name, addr1 = string.IsNullOrEmpty(m.Addr1) ? "" : m.Addr1, addr2 = string.IsNullOrEmpty(m.Addr2) ? "" : m.Addr2, city = m.City, state = m.State }).ToList();
有什么想法吗?
最佳答案
此版本与 Linq-to-Sql/Linq-to-Entities 兼容
var dups2 = from m in mg_B
group m by new
{
Addr1 = m.Addr1 ?? string.Empty,
Addr2 = m.Addr2 ?? string.Empty,
City = m.City ?? string.Empty,
State = m.State ?? string.Empty,
}
into g
where g.Count() > 1
select g;
生成的sql看起来有点像这样:
-- Parameters
DECLARE @p0 NVarChar(1000) = ''
DECLARE @p1 NVarChar(1000) = ''
DECLARE @p2 NVarChar(1000) = ''
DECLARE @p3 NVarChar(1000) = ''
DECLARE @p4 Int = 1
SELECT [t2].[value2] AS [Addr1], [t2].[value22] AS [Addr2], [t2].[value3] AS [City], [t2].[value3] AS [State]
FROM (
SELECT COUNT(*) AS [value], [t1].[value] AS [value2], [t1].[value2] AS [value22], [t1].[value3], [t1].[value4]
FROM (
SELECT COALESCE([t0].[Addr1],@p0) AS [value], COALESCE([t0].[Addr2],@p1) AS [value2], COALESCE([t0].[City],@p2) AS [value3], COALESCE([t0].[State],@p3) AS [value4]
FROM [SettingSystemNodes] AS [t0]
) AS [t1]
GROUP BY [t1].[value], [t1].[value2], [t1].[value3], [t1].[value4]
) AS [t2]
WHERE [t2].[value] > @p4
请注意,如果您将 string.Empty
设置为查询之前的局部变量,甚至是查询中的 let
变量,则只有一个参数将用于空字符串.
关于c# - linq sql查找重复项但将null和空字符串视为相同,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/15788702/