我有一组代表文档历史的字符串。每个字符串都是整个文档 - 尚未进行任何差异分析。
我需要一个相对高效的算法来允许我用它们来自的版本注释文档的子字符串。
例如,如果文档历史是这样的:
Rev1: The quiet fox
Rev2: The quiet brown fox
Rev3: The quick brown fox
算法会给出:
The quick brown fox
1111111331222222111
即修订一加“qui”,修订三加“ck”,修订一加“”,修订二加“棕”,最后修订一加“fox”。
最佳答案
我有一个类库可以很容易地做到这一点,但我不知道它在进行大量或多次此类修订时在性能方面的表现如何。
图书馆在这里:DiffLib on CodePlex (您也可以通过 NuGet 安装它。)
问题中示例的脚本在这里(如果添加对 DiffLib 程序集的引用,则可以在 LINQPad 中运行它):
void Main()
{
var revs = new string[]
{
"The quiet fox",
"The quiet brown fox",
"The quick brown fox",
"The quick brown fox.",
"The quick brown fox jumped over the lazy dog.",
"The quick brown fox jumped over the lazy cat.",
"The Quick Brown Fox jumped over the Lazy Cat.",
};
string current = revs[0];
List<int> owner = new List<int>();
foreach (char c in current)
owner.Add(1); // owner 1 owns entire string
Action<int> dumpRev = delegate(int rev)
{
Debug.WriteLine("rev " + rev);
Debug.WriteLine(current);
Debug.WriteLine(new string(owner.Select(i => (char)(48 + i)).ToArray()));
Debug.WriteLine("");
};
dumpRev(0);
for (int index = 1; index < revs.Length; index++)
{
int ownerId = index + 1;
var diff = new DiffLib.Diff<char>(current, revs[index]).ToArray();
int position = 0;
foreach (var part in diff)
{
if (part.Equal)
position += part.Length1;
else
{
// get rid of old owner for the part that was
// removed or replaced
for (int index2 = 0; index2 < part.Length1; index2++)
owner.RemoveAt(position);
// insert new owner for the part that was
// added or did replace the old text
for (int index2 = 0; index2 < part.Length2; index2++)
owner.Insert(position, ownerId);
position += part.Length2;
}
}
current = revs[index];
dumpRev(index);
}
}
输出:
rev 0 The quiet fox 1111111111111 rev 1 The quiet brown fox 1111111111222222111 rev 2 The quick brown fox 1111111331222222111 rev 3 The quick brown fox. 11111113312222221114 rev 4 The quick brown fox jumped over the lazy dog. 111111133122222211155555555555555555555555554 rev 5 The quick brown fox jumped over the lazy cat. 111111133122222211155555555555555555555556664 rev 6 The Quick Brown Fox jumped over the Lazy Cat. 111171133172222271155555555555555555755557664
关于algorithm - 字符差异/注释算法,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5079679/