r - 在 R 中，如何使用自定义行尾 (eol) 读取文件

我有一个要在 R 中读取的文本文件(并存储在 data.frame 中)。该文件按多行和多列组织。 “sep”和“eol”都是自定义的。

问题:自定义eol，即“\t&nd”(不带引号)，不能在read.table(...)中设置(或者read.csv(...), read.csv2(...), ...) 也不在 fread(...) 中，我无法找到解决方案。

我在这里搜索(“[r] read eol”和其他我不记得了)，我没有找到解决方案:唯一的一个是预处理文件更改 eol(在我的情况下不可能，因为进入一些我可以找到类似\n,\r,\n\r, ", ... 的字段，这就是自定义的原因)。

谢谢!

最佳答案

你可以通过两种不同的方式来解决这个问题:

A. 如果文件不太宽，您可以使用 scan 读取所需的行并使用 strsplit 将其拆分为所需的列，然后组合成 data.frame .例子:

# Provide reproducible example of the file ("raw.txt" here) you are starting with
your_text <- "a~b~c!1~2~meh!4~5~wow"
write(your_text,"raw.txt"); rm(your_text)  

eol_str = "!" # whatever character(s) the rows divide on
sep_str = "~" # whatever character(s) the columns divide on

# read and parse the text file   
# scan gives you an array of row strings (one string per row)
# sapply strsplit gives you a list of row arrays (as many elements per row as columns)
f <- file("raw.txt")
row_list <- sapply(scan("raw.txt", what=character(), sep=eol_str), 
                   strsplit, split=sep_str) 
close(f)

df <- data.frame(do.call(rbind,row_list[2:length(row_list)]))
row.names(df) <- NULL
names(df) <- row_list[[1]]

df
#   a b   c
# 1 1 2 meh
# 2 4 5 wow

B. 如果 A 不起作用，我同意 @BondedDust 的观点，即您可能需要一个外部实用程序——但您可以在 R 中使用 system() 调用它并执行查找/替换以将文件重新格式化为 read.table .您的调用将特定于您的操作系统。示例:https://askubuntu.com/questions/20414/find-and-replace-text-within-a-file-using-commands .既然你注意到你有 \n , 和 \r\n在你的文本中，我建议你首先找到它们并将它们替换为临时占位符——也许是它们自己的引用版本——然后你可以在你建立你的 data.frame 之后将它们转换回来。 .

关于r - 在 R 中，如何使用自定义行尾 (eol) 读取文件，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29803117/

r - 在 R 中，如何使用自定义行尾 (eol) 读取文件

上一篇：scala - Play 框架 2.4.0 中的 I18n

下一篇：oauth-2.0 - Bigcommerce - 如何请求授权码/访问 token