我尝试使用 UTF-8 字符集获取 html 页面
NSString *html=[NSString stringWithContentsOfURL:[NSURL URLWithString: @"http://forums.drom.ru/general/t1151288178.html"] encoding:NSUTF8StringEncoding error:&error]);
但是 NSLog(@"%@",html)
返回 null
为什么会这样?
最佳答案
问题在于,虽然文件的元标记声称是 UTF8,但实际上不是(至少不完全是)。您可以通过以下方式确认:
下载 html(作为
NSData
,成功):NSError *error = nil; NSURL *url = [NSURL URLWithString:@"http://forums.drom.ru/general/t1151288178.html"]; NSData *data = [NSData dataWithContentsOfURL:url options:0 error:&error]; NSString *docsPath = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES)[0]; NSString *filename = [docsPath stringByAppendingPathComponent:@"test.html"]; [data writeToFile:filename atomically:YES];
在终端命令行运行
iconv
,会报错(包括行号和字符号):iconv -f UTF-8 test.html > /dev/null
Thanks to Torsten Marek for sharing that with us.
When I look at that portion of the HTML, there are definitely not UTF8 characters there, buried in the setting of the clever_cut_pattern
JavaScript variable.
If we thought you just got the encoding wrong, the typical counsel in these cases would generally be to use the rendition of stringWithContentOfURL
with the usedEncoding
parameter (i.e. rather than guessing what the encoding is, let NSString
determine this for you):
NSStringEncoding encoding;
NSString *html = [NSString stringWithContentsOfURL:url usedEncoding:&encoding error:&error];
不幸的是,在这种情况下,即使那样也会失败(大概是因为该文件声称是 UTF8,但实际上不是)。
然后问题就变成了“好的,那我现在该怎么办”。无论如何,这取决于您尝试在您的应用程序中下载该 HTML 的原因。如果您确实需要将其转换为 UTF8(即去除非 UTF8 字符),理论上您可以获得 GNU iconv(3)
function ,它是 libiconv
的一部分 library .这可以识别您可能删除的不合格字符。问题在于您愿意付出多少工作来处理这个不合格的网页。
关于iOS utf-8 编码问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/18167932/