C# 如何根据多个范围索引生成新字符串

假设我有一个像这样的字符串，左边部分是一个单词，右边部分是一组索引(单个或范围)，用于在我的单词中引用汉字的注音(语音) :

string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす"

详细模式:

word,<startIndex>(-<endIndex>):<furigana>

实现这样的事情的最佳方法是什么(在汉字前面有一个空格来标记哪个部分链接到[furigana]):

子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]

编辑:(感谢您的评论)

这是我到目前为止所写的内容:

static void Main(string[] args)
        {
            string myString = "ABCDEF,1:test;3:test2";

            //Split Kanjis / Indices
            string[] tokens = myString.Split(',');

            //Extract furigana indices
            string[] indices = tokens[1].Split(';');

            //Dictionnary to store furigana indices
            Dictionary<string, string> furiganaIndices = new Dictionary<string, string>();

            //Collect
            foreach (string index in indices)
            {
                string[] splitIndex = index.Split(':');
                furiganaIndices.Add(splitIndex[0], splitIndex[1]);
            }

            //Processing
            string result = tokens[0] + ",";

            for (int i = 0; i < tokens[0].Length; i++)
            {
                string currentIndex = i.ToString();

                if (furiganaIndices.ContainsKey(currentIndex)) //add [furigana]
                {
                    string currentFurigana = furiganaIndices[currentIndex].ToString();
                    result = result + " " + tokens[0].ElementAt(i) + string.Format("[{0}]", currentFurigana);
                }
                else //nothing to add
                {
                    result = result + tokens[0].ElementAt(i);
                }
            }

            File.AppendAllText(@"D:\test.txt", result + Environment.NewLine);
        }

结果:

ABCDEF,A B[test]C D[test2]EF

我努力寻找一种处理范围索引的方法:

string myString = "ABCDEF,1:test;2-3:test2";
Result : ABCDEF,A B[test] CD[test2]EF

最佳答案

我本身并不反对手动操作字符串。但鉴于您似乎有描述输入的常规模式，在我看来，使用正则表达式的解决方案将更具可维护性和可读性。因此，考虑到这一点，下面是一个采用这种方法的示例程序:

class Program
{
    private const string _kinvalidFormatException = "Invalid format for edit specification";

    private static readonly Regex
        regex1 = new Regex(@"(?<word>[^,]+),(?<edit>(?:\d+)(?:-(?:\d+))?:(?:[^;]+);?)+", RegexOptions.Compiled),
        regex2 = new Regex(@"(?<start>\d+)(?:-(?<end>\d+))?:(?<furigana>[^;]+);?", RegexOptions.Compiled);

    static void Main(string[] args)
    {
        string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
        string result = EditString(myString);
    }

    private static string EditString(string myString)
    {
        Match editsMatch = regex1.Match(myString);

        if (!editsMatch.Success)
        {
            throw new ArgumentException(_kinvalidFormatException);
        }

        int ichCur = 0;
        string input = editsMatch.Groups["word"].Value;
        StringBuilder text = new StringBuilder();

        foreach (Capture capture in editsMatch.Groups["edit"].Captures)
        {
            Match oneEditMatch = regex2.Match(capture.Value);

            if (!oneEditMatch.Success)
            {
                throw new ArgumentException(_kinvalidFormatException);
            }

            int start, end;

            if (!int.TryParse(oneEditMatch.Groups["start"].Value, out start))
            {
                throw new ArgumentException(_kinvalidFormatException);
            }

            Group endGroup = oneEditMatch.Groups["end"];

            if (endGroup.Success)
            {
                if (!int.TryParse(endGroup.Value, out end))
                {
                    throw new ArgumentException(_kinvalidFormatException);
                }
            }
            else
            {
                end = start;
            }

            text.Append(input.Substring(ichCur, start - ichCur));
            if (text.Length > 0)
            {
                text.Append(' ');
            }
            ichCur = end + 1;
            text.Append(input.Substring(start, ichCur - start));
            text.Append(string.Format("[{0}]", oneEditMatch.Groups["furigana"]));
        }

        if (ichCur < input.Length)
        {
            text.Append(input.Substring(ichCur));
        }

        return text.ToString();
    }
}

注释:

此实现假设编辑规范将按顺序列出并且不会重叠。它不会尝试验证该部分输入；根据您从何处获取输入，您可能需要添加它。如果不按顺序列出规范是有效的，您还可以扩展上述内容，首先将编辑存储在列表中，并在实际编辑字符串之前按开始索引对列表进行排序。 (与其他提议的答案的工作方式类似；不过，为什么他们使用字典而不是简单的列表来存储单独的编辑，我不知道......这对我来说似乎很复杂。)
我包含了基本的输入验证，当模式匹配失败时抛出异常。更加用户友好的实现将为每个异常添加更具体的信息，描述输入的哪一部分实际上是无效的。
Regex类实际上有一个 Replace()方法，允许完全定制。上面的内容可以通过使用 Replace() 来实现。和一个 MatchEvaluator提供替换文本，而不是仅将文本附加到 StringBuilder 。采用哪种方式主要取决于偏好，尽管MatchEvaluator如果您需要更灵活的实现选项(即结果的确切格式可能有所不同)，则可能是首选。
如果您确实选择使用其他建议的答案，我强烈建议您使用 StringBuilder而不是简单地连接到 results多变的。对于短字符串来说，这并不重要，但您应该养成始终使用 StringBuilder 的习惯。当您有一个增量添加到字符串值的循环时，因为对于长字符串，使用连接的性能影响可能会非常负面。

关于C# 如何根据多个范围索引生成新字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38680298/

C# 如何根据多个范围索引生成新字符串

上一篇：c# - 如何在代码中创建一个 SuperSocket WebSocket 服务器

下一篇：c# - UWP DatagramSocket 多播