r - 从混合数据文件中提取二进制数据

我正在尝试使用 R 从混合数据文件(ascii 和二进制)读取二进制数据，该数据文件以伪 xml 格式构造。我的想法是使用扫描函数，读取特定行，然后将二进制转换为数值，但我似乎无法在 R 中执行此操作。我有一个 python 脚本可以执行此操作，但我想这样做R中的作业，python脚本如下。数据文件中的二进制部分由开始和结束标记以及括起来。

数据文件是包含光谱数据的专有格式，下面包含示例数据文件的链接。引用用户手册:

Data of BinData elements are written as a binary array of bytes. Each 8 bytes of the binary array represent a one double-precision floating-point value. Therefore the size of the binary array is NumberOfPoints * 8 bytes. For two-dimensional arrays, data layout follows row-major form used by SafeArrays. This means that moving to next array element increments the last index. For example, if a two-dimensional array (e.g. Data(i,j)) is written in such one-dimensional binary byte array form, moving to the next 8 byte element of the binary array increments last index of the original two-dimensional array (i.e. Data(i,j+1)). After the last element of the binary array the combination of carriage return and linefeed characters (ANSI characters 13 and 10) is written.

感谢您提前提供任何建议!

示例数据文件的链接:

https://docs.google.com/file/d/0B5F27d7b1eMfQWg0QVRHUWUwdk0/edit?usp=sharing

Python 脚本:

import sys, struct, csv
f=open(sys.argv[1], 'rb')
#
t = f.read()
i = t.find("<BinData>") + len("<BinData>") + 2 # add \r\n line end
header = t[:i]
#
t = t[i:]
i = t.find("\r\n</BinData>")
bin = t[:i]
#
doubles=[]
for i in range(len(bin)/8):
  doubles.append(struct.unpack('d', bin[i*8:(i+1)*8])[0])
#
footer = t[i+2:]
#
myfile = open("output.csv", 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(doubles)

最佳答案

我写了pack package让这变得更容易。不过，您仍然需要搜索二进制数据的开头/结尾。

b <- readBin("120713b01.ols", "raw", 4000)
# raw version of the start of the BinData tag
beg.raw <- charToRaw("<BinData>\r\n")
# only take first match, in case binary data randomly contains "<BinData>\r\n"
beg.loc <- grepRaw(beg.raw,b,fixed=TRUE)[1] + length(beg.raw)
# convert header to text
header <- scan(text=rawToChar(b[1:beg.loc]),what="",sep="\n")
# search for "<Number of Points"> tags and calculate total number of points
numPts <- prod(as.numeric(header[grep("<Number of Points>",header)+1]))

library(pack)
Data <- unlist(unpack(rep("d", numPts), b[beg.loc:length(b)]))

关于r - 从混合数据文件中提取二进制数据，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/17973314/

r - 从混合数据文件中提取二进制数据

上一篇：objective-c - 设置 UIColor 属性时 KVC 调用的 componentRGBA 方法的实现

下一篇：handlebars.js - 在页面集合循环中获取页面 URL