给定以下 HTML:
<p><span class="xn-location">OAK RIDGE, N.J.</span>, <span class="xn-chron">March 16, 2011</span> /PRNewswire/ -- Lakeland Bancorp, Inc. (Nasdaq: <a href='http://studio-5.financialcontent.com/prnews?Page=Quote&Ticker=LBAI' target='_blank' title='LBAI'> LBAI</a>), the holding company for Lakeland Bank, today announced that it redeemed <span class="xn-money">$20 million</span> of the Company's outstanding <span class="xn-money">$39 million</span> in Fixed Rate Cumulative Perpetual Preferred Stock, Series A that was issued to the U.S. Department of the Treasury under the Capital Purchase Program on <span class="xn-chron">February 6, 2009</span>, thereby reducing Treasury's investment in the Preferred Stock to <span class="xn-money">$19 million</span>. The Company paid approximately <span class="xn-money">$20.1 million</span> to the Treasury to repurchase the Preferred Stock, which included payment for accrued and unpaid dividends for the shares.  This second repayment, or redemption, of Preferred Stock will result in annualized savings of <span class="xn-money">$1.2 million</span> due to the elimination of the associated preferred dividends and related discount accretion.  A one-time, non-cash charge of <span class="xn-money">$745 thousand</span> will be incurred in the first quarter of 2011 due to the acceleration of the Preferred Stock discount accretion.  The warrant previously issued to the Treasury to purchase 997,049 shares of common stock at an exercise price of <span class="xn-money">$8.88</span>, adjusted for stock dividends and subject to further anti-dilution adjustments, will remain outstanding.</p>
我想获取 <span>
中的值元素。我还想获得 class
的值<span>
上的属性元素。
理想情况下,我可以通过一个函数运行一些 HTML 并取回提取实体的字典(基于上面定义的 <span>
解析)。
上面的代码是一个更大的源 HTML 文件的片段,它无法与 XML 解析器匹配。所以我正在寻找一个可能的正则表达式来帮助提取感兴趣的信息。
最佳答案
使用此工具(免费): http://www.radsoftware.com.au/regexdesigner/
使用这个正则表达式:
"<span[^>]*>(.*?)</span>"
组 1 中的值(对于每个匹配项)将是您需要的文本。
在 C# 中它看起来像:
Regex regex = new Regex("<span[^>]*>(.*?)</span>");
string toMatch = "<span class=\"ajjsjs\">Some text</span>";
if (regex.IsMatch(toMatch))
{
MatchCollection collection = regex.Matches(toMatch);
foreach (Match m in collection)
{
string val = m.Groups[1].Value;
//Do something with the value
}
}
修正回答评论:
Regex regex = new Regex("<span class=\"(.*?)\">(.*?)</span>");
string toMatch = "<span class=\"ajjsjs\">Some text</span>";
if (regex.IsMatch(toMatch))
{
MatchCollection collection = regex.Matches(toMatch);
foreach (Match m in collection)
{
string class = m.Groups[1].Value;
string val = m.Groups[2].Value;
//Do something with the class and value
}
}
关于regex - 如何使用 RegEx 从 HTML 中提取值?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5327503/