我正在尝试分隔如下字符串:
let path = "/Users/user/Downloads/history.csv"
do {
let contents = try NSString(contentsOfFile: path, encoding: String.Encoding.utf8.rawValue )
let rows = contents.components(separatedBy: "\n")
print("contents: \(contents)")
print("rows: \(rows)")
}
catch {
}
我有两个文件,它们看起来几乎相同。 第一个文件的输出如下:
输出文件1:
contents: 2017-07-31 16:29:53,0.10109999,9.74414271,0.98513273,0.15%,42302999779,-0.98513273,9.72952650
2017-07-31 16:29:53,0.10109999,0.25585729,0.02586716,0.25%,42302999779,-0.02586716,0.25521765
rows: ["2017-07-31 16:29:53,0.10109999,9.74414271,0.98513273,0.15%,42302999779,-0.98513273,9.72952650", "2017-07-31 16:29:53,0.10109999,0.25585729,0.02586716,0.25%,42302999779,-0.02586716,0.25521765", "", ""]
输出文件2:
contents: 40.75013313,0.00064825,5/18/2017 7:17:01 PM
19.04004820,0.00059900,5/19/2017 9:17:03 PM
rows: ["4\00\0.\07\05\00\01\03\03\01\03\0,\00\0.\00\00\00\06\04\08\02\05\0,\05\0/\01\08\0/\02\00\01\07\0 \07\0:\01\07\0:\00\01\0 \0P\0M\0", "\0", "1\09\0.\00\04\00\00\04\08\02\00\0,\00\0.\00\00\00\05\09\09\00\00\0,\0\05\0/\01\09\0/\02\00\01\07\0 \09\0:\01\07\0:\00\03\0 \0P\0M\0", "\0", "\0", "\0"]
因此这两个文件都可以作为字符串读取,因为 print(content)
正在工作。
但是一旦字符串被分离,第二个文件就不再可读。
我尝试了不同的编码,但没有任何效果。有谁知道如何将字符串强制到第二个文件,以保持可读字符串?
最佳答案
您的文件显然是 UTF-16(little-endian)编码的:
$ hexdump fullorders4.csv 0000000 4f 00 72 00 64 00 65 00 72 00 55 00 75 00 69 00 0000010 64 00 2c 00 45 00 78 00 63 00 68 00 61 00 6e 00 0000020 67 00 65 00 2c 00 54 00 79 00 70 00 65 00 2c 00 0000030 51 00 75 00 61 00 6e 00 74 00 69 00 74 00 79 00 ...
For ASCII characters, the first byte of the UTF-16 encoding is the ASCII code, and the second byte is zero.
If the file is read as UTF-8 then the zeros are converted to an
ASCII NUL character, that is what you see as \0
in the output.
Therefore specifying the encoding as utf16LittleEndian
works
in your case:
let contents = try NSString(contentsOfFile: path, encoding: String.Encoding.utf16LittleEndian.rawValue)
// or:
let contents = try String(contentsOfFile: path, encoding: .utf16LittleEndian)
还有一种方法可以尝试检测所使用的编码 (比较 iOS: What's the best way to detect a file's encoding )。在 Swift 中,这将是
var enc: UInt = 0
let contents = try NSString(contentsOfFile: path, usedEncoding: &enc)
// or:
var enc = String.Encoding.ascii
let contents = try String(contentsOfFile: path, usedEncoding: &enc)
但是,在您的特定情况下,这会将文件读取为 UTF-8
再次因为它是有效的 UTF-8。前置 byte order mark (BOM)
到文件(FF FE
for UTF-16 little-endian)可以解决这个问题
可靠地解决问题。
关于ios - String 仅返回 splitBy 之后的数字,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45663620/