谁能帮我解决这个问题:
我有一个这样格式化的 HTTP 页面:
<ul class="ms-spbTree" id="_ul">
<li id="XXX\$username">Person0
<ul id="XXX\$username_ul">
<li id="XXX\$username">Person1
<ul id="XXX\$username_ul"></ul>
</li>
<li id="XXX\$username">Person2
<ul id="XXX\$username_ul"></ul>
</li>
<li id="XXX\$username">Person3
<ul id="XXX\$username_ul"></ul>
</li>
<li id="XXX\$username">Person4
<ul id="XXX\$username_ul">
<li id="XXX\$username">Person5
<ul id="XXX\$username_ul"></ul>
</li>
<li id="XXX\$username">Person6
<ul id="XXX\$username_ul"></ul>
</li>
<li id="XXX\$username">Person7
<ul id="XXX\$username_ul"></ul>
</li>
</ul>
</li>
<li id="XXX\$username">Person8
<ul id="XXX\$username_ul"></ul>
</li>
</ul>
</li>
</ul>
我的职能是:
func loadTutorial(webString: NSString)
{
var data : NSData = webString.dataUsingEncoding(NSUTF8StringEncoding)!
var tutorialsParser = TFHpple(HTMLData: data)
var tutorialsXPathString = "//ul/li"
var tutorialNodes = tutorialsParser.searchWithXPathQuery(tutorialsXPathString) as NSArray
if(tutorialNodes.count == 0)
{
println("empty here")
}
else
{
for element in tutorialNodes
{
var elementTwo: TFHppleElement = element as TFHppleElement
// var tutorial = Tutorial()
println("\(elementTwo.firstChild.content)")
// println(elementTwo.raw)
let userscanner = NSScanner(string:elementTwo.raw)
var userscanned: NSString?
if userscanner.scanUpToString("li id=\"", intoString:nil){
userscanner.scanString("li id=\"", intoString:nil)
if userscanner.scanUpToString("\">", intoString:&userscanned) {
let newResult: String = userscanned as String
println("NewResultValue: \(newResult)")
}
}
}
}
}
但我得到的是以下列表:
Person0
Person1
Person2
Person3
Person4
Person5
Person6
Person7
Person8
我只想检索
Person0
Person1
Person2
Person3
Person4
Person8
或者我列出来,这样我就知道 Person5、6 和 7 有 Person4 作为经理。
我尝试过
var tutorialsXPathString = "//ul[not(contains(@style, 'style=\"display: none;\"'))/li"
因为所有子人(5,6,7)的标签中都有这个,但它不起作用:(
如有任何帮助,我们将不胜感激。
最佳答案
一些想法:
使用
//
表示“在 HTML 中任何地方查找此内容”。如果您想控制要考虑的级别,只需使用/
并从文档的根目录开始执行即可。例如,要获得第二级,而不是第一级或第三级,您可以执行以下操作:let tutorialsParser = TFHpple(HTMLData: data) let tutorialsXPathString = "/html/body/ul/li/ul/li" if let tutorialNodes = tutorialsParser.searchWithXPathQuery(tutorialsXPathString) as? [TFHppleElement] { for element in tutorialNodes { let content = element.firstChild.content.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet()) let identifier = element.attributes["id"] as String println("id = \(identifier); content = \(content)") } }
注意,我不确定您为什么使用扫描仪,但如果您想要元素的属性,可以使用
<attributes
方法。我还将
tutorialNodes
定义为TFHppleElement
对象数组,这稍微简化了for
循环。如果您想要顶级
/ul/li
后跟第二级,而不是第三级,您可以执行以下操作:let tutorialsParser = TFHpple(HTMLData: data) let tutorialsXPathString = "/html/body/ul/li" if let tutorialNodes = tutorialsParser.searchWithXPathQuery(tutorialsXPathString) as? [TFHppleElement] { for element in tutorialNodes { let content = element.firstChild.content.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet()) let identifier = element.attributes["id"] as String println("id = \(identifier); content = \(content)") if let ul = element.childrenWithTagName("ul") as? [TFHppleElement] { if let li = ul.first?.childrenWithTagName("li") as? [TFHppleElement] { for element in li { let content = element.firstChild.content.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet()) let identifier = element.attributes["id"] as String println(" child id = \(identifier); content = \(content)") } } } } }
或者你可以这样做:
let tutorialsParser = TFHpple(HTMLData: data) let tutorialsXPathString = "/html/body/ul/li" if let tutorialNodes = tutorialsParser.searchWithXPathQuery(tutorialsXPathString) as? [TFHppleElement] { for element in tutorialNodes { let content = element.firstChild.content.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet()) let identifier = element.attributes["id"] as String println("id = \(identifier); content = \(content)") if let children = element.searchWithXPathQuery("/html/body/li/ul/li") as? [TFHppleElement] { for element in children { let content = element.firstChild.content.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet()) let identifier = element.attributes["id"] as String println(" child id = \(identifier); content = \(content)") } } } }
关于html - 快速解析 HTML,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27470363/