我一直在为 iPad 平台开发校报应用程序。我正在使用 NSXMLParser 获取每篇文章的标题、简要说明和链接。为了从每个已解析的链接中获取 HTML 项,我决定使用 Hpple 解析器。我想我正在正确地解析和存储 RSS 项目,但是当我尝试使用 for 循环从每个已解析的链接中解析 HTML 项目时,它告诉我我有一个用于 RSS 项目的空数组。但是,我可以在控制台上显示 RSS 项目持有者的内容。所以,它不是空的。我将放置部分代码并从控制台显示。请帮帮我。该项目的截止日期很快。提前致谢。
下面是我如何开始加载我的 RSS 解析器 (articleParser):
- (void)loadData {
[self loadInitData];
//[self loadDataWithLink];
}
- (void)loadInitData {
if (sections == nil) {
[activityIndicator startAnimating];
NSLog(@"STARTING ARTICLE PARSER FROM MAIN URL!!!");
Parser *articleParser = [[Parser alloc] init];
[articleParser parseRssFeed:@"http://theaggie.org/rss/headlines.xml" withDelegate:self];
[articleParser release];
} else {
}
}
下面是我如何将收到的文章项目存储在名为“sections”的 NSMutable 数组中。然后我使用 for 循环遍历已解析文章的每个链接。
- (void)receivedArticleItems:(Article *)theArticle {
if (sections == nil) {
sections = [[NSMutableArray alloc] init];
}
[sections addObject:theArticle];
NSLog(@"We recieved the article!");
NSLog(@"Article: %@", theArticle);
NSLog(@"What is in sections: %@", sections);
for (int i = 1; i < 5; i++) {
NSLog(@"articleItems: %@",[sections objectAtIndex:0]);
NSLog(@"articleItems at index 0: %@",[[[sections objectAtIndex:0] articleItems] objectAtIndex:0]);
[self loadDataWithLink:[[[[sections objectAtIndex:0] articleItems] objectAtIndex:0] objectForKey:@"link"]];
}
[activityIndicator stopAnimating];
}
下面是我如何使用 TFFHpple 解析器从每个已解析的链接中获取 HTML 项目:
- (void)loadDataWithLink:(NSString *)urlString{
NSData *htmlData = [NSData dataWithContentsOfURL:[NSURL URLWithString:urlString]];
// Create parser
TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData];
//Get all the cells main body
htmlElements = [xpathParser search:@"//div[@id='main']/div[@id='mainCol1']/div[@id='main-body']"];
// Access the first cell
TFHppleElement *htmlElement = [htmlElements objectAtIndex:0];
// NSString *title = [htmlElement content];
NSLog(@"What is in element: %@", htmlElement);
[xpathParser release];
//[htmlData release];
}
这就是我在控制台上得到的:
2011-05-02 22:58:35.355 TheCalAggie[2443:207] Parsing started for article!
2011-05-02 22:58:35.356 TheCalAggie[2443:207] Adding story title: Students say, 'No time for books'
2011-05-02 22:58:35.356 TheCalAggie[2443:207] From the link: http://theaggie.org/article/2011/05/03/students-say-no-time-for-books
2011-05-02 22:58:35.357 TheCalAggie[2443:207] Summary: The last book managerial economics major Kiyan Parsa read for fun was The Lord of the Rings. That was in high school.
2011-05-02 22:58:35.358 TheCalAggie[2443:207] Published on: Tue, 03 May 2011 00:00:00 -0700
2011-05-02 22:58:35.359 TheCalAggie[2443:207] Parsing started for article!
2011-05-02 22:58:35.360 TheCalAggie[2443:207] Adding story title: UC Davis craft center one of largest college crafting centers
2011-05-02 22:58:35.360 TheCalAggie[2443:207] From the link: http://theaggie.org/article/2011/05/02/uc-davis-craft-center-one-of-largest-college-crafting-centers
2011-05-02 22:58:35.361 TheCalAggie[2443:207] Summary: Hidden away in the South Silo, the UC Davis Craft Center offers 10 craft studios and more than a hundred classes for students looking to learn or perfect their crafting skills.
2011-05-02 22:58:35.362 TheCalAggie[2443:207] Published on: Mon, 02 May 2011 00:00:00 -0700
2011-05-02 22:58:35.362 TheCalAggie[2443:207] We recieved the article!
2011-05-02 22:58:35.363 TheCalAggie[2443:207] Article: *nil description*
2011-05-02 22:58:35.364 TheCalAggie[2443:207] What is in sections: (
(null)
)
2011-05-02 22:58:35.374 TheCalAggie[2443:207] articleItems: *nil description*
2011-05-02 22:58:35.375 TheCalAggie[2443:207] articleItems at index 0: {
link = "http://theaggie.org/article/2011/05/03/peaceful-rally-held-on-campus-after-killing-of-bin-laden\n";
pubDate = "Tue, 03 May 2011 00:00:00 -0700";
summary = "The announcement of Osama bin Laden's death sent a wave of patriotism across the nation and UC Davis. Bin Laden was the leader of al-Qaeda - the organization allegedly behind the Sept. 11, 2001 attacks that killed over 3,000 Americans.\n";
title = "Peaceful rally held on campus after killing of bin Laden \n";
}
2011-05-02 22:59:35.376 TheCalAggie[2443:207] Unable to parse.
2011-05-02 22:59:35.379 TheCalAggie[2443:207] *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[NSMutableArray objectAtIndex:]: index 0 beyond bounds for empty array'
*** Call stack at first throw:
任何帮助将不胜感激。再次感谢。
最佳答案
2011-05-02 22:59:35.376 TheCalAggie[2443:207] Unable to parse.
解析器正在努力解析 HTML。该解析器在解析 HTML 时并不完美。对可能损坏/无效的 HTML 文档运行 XPath 进行解析是一件复杂的事情。
通过 W3C 验证器传递您尝试解析的链接 here正在抛出一些错误;所以它不是完全有效的 HTML。如果它太坏而无法使用该解析器进行解析,则您必须进行调试并找出答案。要真正弄清楚这个问题,您需要在您使用的 TFHpple 解析器中设置断点以了解更多信息。
关于objective-c - 使用 Hpple 解析器和 NSXMLParser 迭代解析内部 HTML,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5865737/