objective-c - 使用 Hpple 解析器和 NSXMLParser 迭代解析内部 HTML

标签 objective-c ipad nsxmlparser hpple

我一直在为 iPad 平台开发校报应用程序。我正在使用 NSXMLParser 获取每篇文章的标题、简要说明和链接。为了从每个已解析的链接中获取 HTML 项,我决定使用 Hpple 解析器。我想我正在正确地解析和存储 RSS 项目,但是当我尝试使用 for 循环从每个已解析的链接中解析 HTML 项目时,它告诉我我有一个用于 RSS 项目的空数组。但是,我可以在控制台上显示 RSS 项目持有者的内容。所以,它不是空的。我将放置部分代码并从控制台显示。请帮帮我。该项目的截止日期很快。提前致谢。

下面是我如何开始加载我的 RSS 解析器 (articleParser):

- (void)loadData {
    [self loadInitData];

    //[self loadDataWithLink];

}

- (void)loadInitData {
    if (sections == nil) {
        [activityIndicator startAnimating];

        NSLog(@"STARTING ARTICLE PARSER FROM MAIN URL!!!");

        Parser *articleParser = [[Parser alloc] init];
        [articleParser parseRssFeed:@"http://theaggie.org/rss/headlines.xml" withDelegate:self];
        [articleParser release];
    } else {

    }

}

下面是我如何将收到的文章项目存储在名为“sections”的 NSMutable 数组中。然后我使用 for 循环遍历已解析文章的每个链接。

- (void)receivedArticleItems:(Article *)theArticle {
    if (sections == nil) {
        sections = [[NSMutableArray alloc] init];
    }
    [sections addObject:theArticle];

    NSLog(@"We recieved the article!");
    NSLog(@"Article: %@", theArticle);
    NSLog(@"What is in sections: %@", sections);

for (int i = 1; i < 5; i++) {
        NSLog(@"articleItems: %@",[sections objectAtIndex:0]);
        NSLog(@"articleItems at index 0: %@",[[[sections objectAtIndex:0] articleItems] objectAtIndex:0]);

        [self loadDataWithLink:[[[[sections objectAtIndex:0] articleItems] objectAtIndex:0] objectForKey:@"link"]];
    }
    [activityIndicator stopAnimating];
}

下面是我如何使用 TFFHpple 解析器从每个已解析的链接中获取 HTML 项目:

- (void)loadDataWithLink:(NSString *)urlString{

 NSData *htmlData = [NSData dataWithContentsOfURL:[NSURL URLWithString:urlString]];

 // Create parser
 TFHpple *xpathParser = [[TFHpple alloc] initWithHTMLData:htmlData];

 //Get all the cells main body
 htmlElements  = [xpathParser search:@"//div[@id='main']/div[@id='mainCol1']/div[@id='main-body']"];

 // Access the first cell
 TFHppleElement *htmlElement = [htmlElements objectAtIndex:0];

 // NSString *title = [htmlElement content];

 NSLog(@"What is in element: %@", htmlElement);

 [xpathParser release];
 //[htmlData release];
}

这就是我在控制台上得到的:

2011-05-02 22:58:35.355 TheCalAggie[2443:207] Parsing started for article!
2011-05-02 22:58:35.356 TheCalAggie[2443:207] Adding story title: Students say, 'No time for books'
2011-05-02 22:58:35.356 TheCalAggie[2443:207] From the link: http://theaggie.org/article/2011/05/03/students-say-no-time-for-books
2011-05-02 22:58:35.357 TheCalAggie[2443:207] Summary: The last book managerial economics major Kiyan Parsa read for fun was The Lord of the Rings. That was in high school.
2011-05-02 22:58:35.358 TheCalAggie[2443:207] Published on: Tue, 03 May 2011 00:00:00 -0700
2011-05-02 22:58:35.359 TheCalAggie[2443:207] Parsing started for article!
2011-05-02 22:58:35.360 TheCalAggie[2443:207] Adding story title: UC Davis craft center one of largest college crafting centers
2011-05-02 22:58:35.360 TheCalAggie[2443:207] From the link: http://theaggie.org/article/2011/05/02/uc-davis-craft-center-one-of-largest-college-crafting-centers
2011-05-02 22:58:35.361 TheCalAggie[2443:207] Summary: Hidden away in the South Silo, the UC Davis Craft Center offers 10 craft studios and more than a hundred classes for students looking to learn or perfect their crafting skills.
2011-05-02 22:58:35.362 TheCalAggie[2443:207] Published on: Mon, 02 May 2011 00:00:00 -0700
2011-05-02 22:58:35.362 TheCalAggie[2443:207] We recieved the article!
2011-05-02 22:58:35.363 TheCalAggie[2443:207] Article: *nil description*
2011-05-02 22:58:35.364 TheCalAggie[2443:207] What is in sections: (
    (null)
)
2011-05-02 22:58:35.374 TheCalAggie[2443:207] articleItems: *nil description*
2011-05-02 22:58:35.375 TheCalAggie[2443:207] articleItems at index 0: {
    link = "http://theaggie.org/article/2011/05/03/peaceful-rally-held-on-campus-after-killing-of-bin-laden\n";
    pubDate = "Tue, 03 May 2011 00:00:00 -0700";
    summary = "The announcement of Osama bin Laden's death sent a wave of patriotism across the nation and UC Davis. Bin Laden was the leader of al-Qaeda - the organization allegedly behind the Sept. 11, 2001 attacks that killed over 3,000 Americans.\n";
    title = "Peaceful rally held on campus after killing of bin Laden \n";
}
2011-05-02 22:59:35.376 TheCalAggie[2443:207] Unable to parse.
2011-05-02 22:59:35.379 TheCalAggie[2443:207] *** Terminating app due to uncaught exception 'NSRangeException', reason: '*** -[NSMutableArray objectAtIndex:]: index 0 beyond bounds for empty array'
*** Call stack at first throw:

任何帮助将不胜感激。再次感谢。

最佳答案

2011-05-02 22:59:35.376 TheCalAggie[2443:207] Unable to parse.

解析器正在努力解析 HTML。该解析器在解析 HTML 时并不完美。对可能损坏/无效的 HTML 文档运行 XPath 进行解析是一件复杂的事情。

通过 W3C 验证器传递您尝试解析的链接 here正在抛出一些错误;所以它不是完全有效的 HTML。如果它太坏而无法使用该解析器进行解析,则您必须进行调试并找出答案。要真正弄清楚这个问题,您需要在您使用的 TFHpple 解析器中设置断点以了解更多信息。

关于objective-c - 使用 Hpple 解析器和 NSXMLParser 迭代解析内部 HTML,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/5865737/

相关文章:

objective-c - 如何使用 libstrophe 构建 jabber 客户端

objective-c - SiriKit:使用多个电话号码调用联系人

html - @media 查询不适用于 ipad 肖像(已经搜索了所有问题)

iphone - 无法加载 nib 错误信息

objective-c - NSComboBox - 获取选定的信息和 NSComboBoxDataSource

iphone - 当在 Objective-C 中调用 dealloc 时

ios - 如何在 iPad 中显示 Actionsheet

ios - 我的 NSXMLParser 在后台不工作

ios - iOS 中的 XML 解析问题

iphone - 如何确定字典根标题