我正在尝试创建一个 iOS 应用程序来提取网页部分。
我有连接到 URL 并将 HTML 存储在 NSString 中的代码
我已经试过了,但我的结果只是得到空字符串
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
// Create a new scanner and give it the html data to parse.
while (![newScanner isAtEnd])
{
[newScanner scanUpToString:@"<body>" intoString:NULL];
// Scam until <body> tag is found
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
// Everything up to the end tag will get placed into the memory address of the result string
}
我试过另一种方法...
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
// Create a new scanner and give it the html data to parse.
while (![newScanner isAtEnd])
{
[newScanner scanUpToString:@"<body" intoString:NULL];
// Scam until <body> tag is found
[newScanner scanUpToString:@">" intoString:NULL];
// Go to end of opening <body> tag
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
// Everything up to the end tag will get placed into the memory address of the result string
}
第二种方式返回一个以>< script...
开头的字符串等等
老实说,我没有一个好的 URL 来测试它,我认为在删除正文中的标签方面有一些帮助可能会更容易(比如 <p></p>
)
非常感谢任何帮助
最佳答案
我不知道为什么你的第一种方法不起作用。我假设您在该片段之前定义了 bodyText。这段代码对我来说很好用,
- (void)viewDidLoad {
[super viewDidLoad];
NSString *htmlData = @"This is some stuff before <body> this is the body </body> with some more stuff";
NSScanner* newScanner = [NSScanner scannerWithString:htmlData];
NSString *bodyText;
while (![newScanner isAtEnd]) {
[newScanner scanUpToString:@"<body>" intoString:NULL];
[newScanner scanString:@"<body>" intoString:NULL];
[newScanner scanUpToString:@"</body>" intoString:&bodyText];
}
NSLog(@"%@",bodyText); // 2015-01-28 15:58:00.360 ScanningOfHTMLProblem[1373:661934] this is the body
}
请注意,我添加了对 scanString:intoString:
的调用通过第一个"<body>"
.
关于html - Objective C 使用 NSScanner 从 html 中获取 <body>,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/28204380/