我正在使用此调用加载网站 HTML -
NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url];
[request setValue:@"utf-8" forHTTPHeaderField:@"Accept-Encoding"];
[request setValue:@"text/html" forHTTPHeaderField:@"Accept"];
[NSURLConnection sendAsynchronousRequest:request
queue:[NSOperationQueue currentQueue]
completionHandler:^(NSURLResponse *response, NSData *data, NSError *error) { ... }
然后,要将 NSData 转换为 NSString,我需要知道编码,所以我调用 -
NSString *textEncoding = [response textEncodingName];
来自代码块,但在未指定“Content-Encoding” header 字段的网站上返回 nil。
如果我不知道编码,[[NSString alloc] initWithData:data encoding:responseEncoding]
不会给我可读的 HTML。
如何检测不发送“Content-Encoding” header 字段的网站的正确编码?
最佳答案
可以尝试不同的编码并查看哪种编码会产生可读的文本 -
static int encodingPriority[] = {
NSUTF8StringEncoding,
NSASCIIStringEncoding,
NSISOLatin1StringEncoding,
NSISOLatin2StringEncoding,
NSUnicodeStringEncoding,
NSWindowsCP1251StringEncoding,
NSWindowsCP1252StringEncoding,
NSWindowsCP1253StringEncoding,
NSWindowsCP1254StringEncoding,
NSWindowsCP1250StringEncoding,
NSNEXTSTEPStringEncoding,
NSJapaneseEUCStringEncoding,
NSNonLossyASCIIStringEncoding,
NSShiftJISStringEncoding, /* kCFStringEncodingDOSJapanese */
NSISO2022JPStringEncoding, /* ISO 2022 Japanese encoding for e-mail */
NSMacOSRomanStringEncoding,
NSUTF16BigEndianStringEncoding,
NSUTF16LittleEndianStringEncoding,
NSUTF32StringEncoding,
NSUTF32BigEndianStringEncoding,
NSUTF32LittleEndianStringEncoding
};
#define REQUIRED_HTML_STRING @"<html"
- (NSString *)htmlStringForUnknownEncodingData:(NSData *)data detectedEncoding:(NSStringEncoding *)detectedEncoding
{
NSStringEncoding encoding;
NSString *html;
for (int i = 0; i < sizeof(encodingPriority); i++) {
encoding = encodingPriority[i];
// try this encoding
html = [[NSString alloc] initWithData:data encoding:encoding];
// we need to find a text, because bad encoding will return an unreadable text
if (html && [html rangeOfString:REQUIRED_HTML_STRING options:NSCaseInsensitiveSearch].location != NSNotFound) {
*detectedEncoding = encoding;
return html;
}
}
return nil;
}
然后,要检测 NSData 中的 HTML 使用哪种编码,请调用 -
NSStringEncoding encoding;
html = [self htmlStringForUnknownEncodingData:data detectedEncoding:&encoding];
if (html)
NSLog("Encoding detected!");
else
NSLog("No encoding detected");
关于html - 当 NSURLResponse 对于 textEncodingName 返回 nil 时检测 HTML 编码,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17702782/