我有一些从 PDF 文档中提取的文本,该文档包含一个项目符号列表,其中包含如下内容:
3 BILL REFERRED TO MAIL COMMITTEE
Mr Fitzgibbon (Chief Government Whip), by leave, moved—That the Tax Laws Amendment (2011 Measures No. 7) Bill 2011 be referred to the Main Committee for further consideration. Question—put and passed.
4 CORPORATIONS AMENDMENT (FUTURE OF FINANCIAL ADVICE) BILL 2011
Mr Shorten (Minister for Financial Services and Superannuation), pursuant to notice, presented a Bill for an Act to amend the law in relation to financial advice, and for related purposes. Document Mr Shorten presented an explanatory memorandum to the bill. Bill read a first time. Mr Shorten moved—That the bill be now read a second time. Debate adjourned (Mr Randall), and the resumption of the debate made an order of the day for the next sitting.
5 TAX LAWS AMENDMENT (2011 MEASURES NO. 8) BILL 2011
Mr Shorten (Minister for Financial Services and Superannuation) presented a Bill for an Act to amend the law relating to taxation, and for related purposes. Document
我需要将它们分开,这样每个项目符号点都像这样:
[0,0] = Title
[0,1] = Body
[1,0] = Title
[1,1] = Body
我修改了示例以包含一些真实世界的内容。
如有任何帮助,我们将不胜感激。
我正在使用 .NET 框架 C#。
最佳答案
您可以使用 LINQ:
var result = input
.Split(new[] { "\r\n" }, StringSplitOptions.None)
.Where(x => !string.IsNullOrWhiteSpace(x))
.GroupAdjacent((g, x) => !char.IsDigit(x[0]))
.Select(g => new
{
Title = g.First().Trim(),
Body = string.Join(" ", g.Skip(1).Select(x => x.Trim()))
})
.ToArray();
示例:
string input = @"3 BILL REFERRED TO MAIL COMMITTEE
Mr Fitzgibbon (Chief Government Whip), by leave, moved—That the
Tax Laws Amendment (2011 Measures No. 7) Bill 2011 be referred
to the Main Committee for further consideration. Question—put
and passed.
4 CORPORATIONS AMENDMENT (FUTURE OF FINANCIAL ADVICE) BILL 2011
Mr Shorten (Minister for Financial Services and Superannuation),
pursuant to notice, presented a Bill for an Act to amend the law
in relation to financial advice,and for related purposes. Mr
Shorten presented an explanatory memorandum to the bill. Bill
read a first time. Mr Shorten moved—That the bill be now read
a second time. Debate adjourned (Mr Randall), and the resumption
of the debate made an order of the day for the next sitting.
5 TAX LAWS AMENDMENT (2011 MEASURES NO. 8) BILL 2011
Mr Shorten (Minister for Financial Services and Superannuation)
presented a Bill for an Act to amend the law relating to
taxation, and for related purposes.";
输出:
result[0] == { Title = "3 BILL REFERRED ...", Body = "Mr Fitzgibbon ..." }
result[1] == { Title = "4 CORPORATIONS ...", Body = "Mr Shorten ..." }
result[2] == { Title = "5 TAX LAWS ...", Body = "Mr Shorten ..." }
扩展方法:
public static IEnumerable<IEnumerable<T>> GroupAdjacent<T>(
this IEnumerable<T> source, Func<IEnumerable<T>, T, bool> adjacent)
{
var g = new List<T>();
foreach (var x in source)
{
if (g.Count != 0 && !adjacent(g, x))
{
yield return g;
g = new List<T>();
}
g.Add(x);
}
yield return g;
}
关于c# - 如何将一串要点(带有标题和正文内容)拆分成一个多维数组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8462780/