我正在使用以下行:
(SENT (VBP (HPP (HP Vem))(VB kan)(VBP (VB få)(PMP (PM ATP)))(MADP (MAD ?))))
我想做如下输出:
SENT -> VBP -> HPP -> HP
SENT -> VBP -> VB
SENT -> VBP -> VBP -> VB
SENT -> VBP -> VBP -> PMP -> PM
SENT -> VBP -> MADP -> MAD
为了实现这一点,我首先想到循环遍历每个括号,从最外面开始,然后越来越深(如果有的话)。 (也许是递归函数?)
但由于实际上并没有用括号拆分的函数,我尝试用 (
进行拆分,然后在循环时查找 )
,如下所示:
var row = "(SENT (VBP (HPP (HP Vem))(VB kan)(VBP (VB få)(PMP (PM ATP)))(MADP (MAD ?))))";
string[] splitP = row.Split('(');
for (int i = 0; i < splitP.Length; i++ )
{
string data = splitP[i];
// string[] dataSplit = data.Split(')');
Console.WriteLine(data);
}
Console.ReadLine();
但如您所见,我卡住了,上面的内容甚至不代表我试图归档的内容 - 因为我发现我的想法是错误的,而且不能那样做。
我怎样才能实现这个?
更新。
一个更大的测试线:
(SENT (VBP (PPP (PP På)(NNP (NN grundval))(PPP (PP av))(NNP (DTP (DT en))(NN intervju)(PPP (PP efter)(NNP (NN experimentet)))(PPP (PP med)(PCP (DTP (DT de))(PC oinvigda)(VBP (HPP (HP som))(VB gjort)(NNP (JJP (JJ felaktiga))(NN bedömningar)))))))(VB kunde)(PNP (PN man))(VBP (VB dela)(PLP (PL in))(PNP (PN dem))(PPP (PP i)(NNP (RGP (RG tre))(NN grupper)(MIDP (MID :))(KNP (NNP (NN (a)))(PNP (PN de)(VBP (HPP (HP som))(ABP (AB faktiskt))(VB trodde)(SNP (SN att)(VBP (PNP (PN de))(VB bedömt)(ABP (AB riktigt))))))(MIDP (MID ,))(PNP (NNP (NN (b)))(PN de)(VBP (HPP (HP som))(VB trodde)(SNP (SN att)(VBP (DTP (DT de)(JJP (JJ själva)))(VB måste)(VBP (VB ha)(VBP (VB misstagit)(PNP (PN sig))(SNP (SN eftersom)(VBP (ABP (AB inte))(PNP (ABP (AB så))(PN många))(VB kan)(VBP (VB ha)(ABP (AB fel))(PPP (PP mot)(NNP (DTP (DT en))(JJP (JJ enda))(NN person))))))))))))(KN och)(PNP (NNP (NN (c)))(PN de)(KNP (VBP (HPP (HP som))(ABP (AB faktiskt))(VB var)(JJP (JJ medvetna))(PPP (PP om)(SNP (SN att)(VBP (PNP (PN de))(VB angav)(NNP (JJP (JJ felaktiga))(NN bedömningar))))))(KN men)(VBP (HPP (HP som))(ABP (AB inte))(VB ville)(VBP (VB avvika)(PPP (PP från)(NNP (NN gruppen)))))))))))(MADP (MAD .))))
最佳答案
这里是另外一个答案,我尽量让它通俗易懂:
public class Class1
{
public static void Main()
{
new Class1().myRec("(SENT (VBP (HPP (HP Vem))(VB kan)(VBP (VB få)(PMP (PM ATP)))(MADP (MAD ?))))", null);
}
public void myRec(string input, string start)
{
if (input == null)
return;
if (input[0] != '(' || input[input.Length - 1] != ')')
{
Console.WriteLine(start);
return;
}
int count = 0;
List<string> subStrs = new List<string>();
input = input.Remove(0, 1);
input = input.Remove(input.Length - 1, 1);
int i = input.IndexOf(' ');
string nextInput = i>0?input.Substring(0, i):input;
if (start != null)
start = start + " -> " + nextInput;
else
start = nextInput;
input = input.Remove(0, i + 1);
string tempStr = "";
for (int j = 0; j < input.Length; j++)
{
tempStr += input[j];
if (input[j] == '(')
count++;
else if (input[j] == ')')
{
count--;
if (count == 0)
{
subStrs.Add(tempStr);
tempStr = "";
}
}
}
if (subStrs.Count == 0)
subStrs.Add(tempStr);
subStrs.ForEach(delegate(string it)
{
new Class1().myRec(it, start);
});
}
}
它使用递归,而且只有当你的输入正确时它才有效,我的意思是你有相等的(
和)
。另外,我不是 C# 程序员,所以我知道这段代码可以改进很多。
编辑用列表替换数组,使代码更准确。
编辑 2 使其适用于可能不包含一些空格的输入,例如 OP 的新的更大的测试用例我做了一些更改:
在我的代码中替换它:
if (start != null)
start = start + " -> " + input.Substring(0, i);
else
start = input.Substring(0, i);
用这个:
string nextInput = i>0?input.Substring(0, i):input;
if (start != null)
start = start + " -> " + nextInput;
else
start = nextInput;
(我已经做了)
这是输出:
SENT -> VBP -> PPP -> PP
SENT -> VBP -> PPP -> NNP -> NN
SENT -> VBP -> PPP -> PPP -> PP
SENT -> VBP -> PPP -> NNP -> DTP -> DT
SENT -> VBP -> PPP -> NNP -> NN
SENT -> VBP -> PPP -> NNP -> PPP -> PP
SENT -> VBP -> PPP -> NNP -> PPP -> NNP -> NN
SENT -> VBP -> PPP -> NNP -> PPP -> PP
SENT -> VBP -> PPP -> NNP -> PPP -> PCP -> DTP -> DT
SENT -> VBP -> PPP -> NNP -> PPP -> PCP -> PC
SENT -> VBP -> PPP -> NNP -> PPP -> PCP -> VBP -> HPP -> HP
SENT -> VBP -> PPP -> NNP -> PPP -> PCP -> VBP -> VB
SENT -> VBP -> PPP -> NNP -> PPP -> PCP -> VBP -> NNP -> JJP -> JJ
SENT -> VBP -> PPP -> NNP -> PPP -> PCP -> VBP -> NNP -> NN
SENT -> VBP -> VB
SENT -> VBP -> PNP -> PN
SENT -> VBP -> VBP -> VB
SENT -> VBP -> VBP -> PLP -> PL
SENT -> VBP -> VBP -> PNP -> PN
SENT -> VBP -> VBP -> PPP -> PP
SENT -> VBP -> VBP -> PPP -> NNP -> RGP -> RG
SENT -> VBP -> VBP -> PPP -> NNP -> NN
SENT -> VBP -> VBP -> PPP -> NNP -> MIDP -> MID
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> NNP -> NN -> a
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> PN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> HPP -> HP
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> ABP -> AB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> SN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> PNP -> PN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> ABP -> AB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> MIDP -> MID
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> NNP -> NN -> b
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> PN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> HPP -> HP
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> SN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> DTP -> DT
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> DTP -> JJP -> JJ
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> PNP -> PN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> SN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> VBP -> ABP -> AB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> VBP -> PNP -> ABP -> AB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> VBP -> PNP -> PN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> VBP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> VBP -> VBP -> ABP -> AB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> VBP -> VBP -> PPP -> PP
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> VBP -> VBP -> PPP -> NNP -> DTP -> DT
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> VBP -> VBP -> PPP -> NNP -> JJP -> JJ
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> VBP -> SNP -> VBP -> VBP -> VBP -> SNP -> VBP -> VBP -> PPP -> NNP -> NN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> KN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> NNP -> NN -> c
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> PN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> HPP -> HP
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> ABP -> AB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> JJP -> JJ
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> PPP -> PP
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> PPP -> SNP -> SN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> PPP -> SNP -> VBP -> PNP -> PN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> PPP -> SNP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> PPP -> SNP -> VBP -> NNP -> JJP -> JJ
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> PPP -> SNP -> VBP -> NNP -> NN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> KN
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> HPP -> HP
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> ABP -> AB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> VBP -> VB
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> VBP -> PPP -> PP
SENT -> VBP -> VBP -> PPP -> NNP -> KNP -> PNP -> KNP -> VBP -> VBP -> PPP -> NNP -> NN
SENT -> VBP -> MADP -> MAD
关于c# - 在 c# 中拆分/循环包含括号的行,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27686014/