string - R 字符串解释 : why does "\040" get interpreted as "" and what other potential pitfalls could I come across in string interpretation?

标签 string r string-interpolation

我今天正在帮助某人从我们作为 txt 文件读入的 pdf 文件中正则表达式一些信息。不幸的是，tm 包 readPDF 功能当时无法正常工作，尽管通过一些修改我们能够让它正常工作。当我们正则表达式从 .txt 文件中去掉一些内容时，我们发现了一些令我们大多数人感到惊讶的东西，即字符串“\040”被解释为一个空格“”。

> x <- "\040"    
> x    
> [1] " "

对于您可能期望发生这种情况的其他类似字符串(即“\n”或“\t”)，不会发生这种情况。

> y <- "\n"   
> y    
> [1] "\n"    
> z <- "\t"    
> z    
>[1] "\t"

为什么是这样？在 R 中还有哪些其他字符串的解释不同？

编辑:

经过简单的实验，似乎 x 是数字的任何 "\xxx"都会产生不同的结果。这有什么值(value)？

最佳答案

看看这里:http://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html

Backslash is used to start an escape sequence inside character constants. Escaping a character not in the following table is an error.

...

\nnn character with given octal code (1, 2 or 3 digits)

那就看看at this ASCII table看看八进制代码是如何表示的。正如您将看到的 040 是一个空格。
只是为了好玩:

> '\110\145\154\154\157\040\127\157\162\154\144\041'
[1] "Hello World!"

关于string - R 字符串解释 : why does "\040" get interpreted as "" and what other potential pitfalls could I come across in string interpretation?，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/20552825/

上一篇：floating-point - HUGE_VALF和INFINITY常数之间的差异

下一篇：.NET 创建信号量失败并在 Win 2008 上出现 IoException

r - vtree 对象在 Markdown 中呈现，但不在四开本中呈现

r - 将 R 中的矩阵展平为四列(索引和上/下三角形)

javascript - 带有字符串插值的字符串数组

python - 对 Python 列表中的 __str__ 感到困惑

c - 复制文件中的单词

java - 字符串 switch case 标签的编译器错误

r - 在空格处拆分 R 字符串，但在空格位于单引号之间时不拆分

Scala : Way to use a function directly into the println(. ..) 使用字符串插值

vb.net - VB.NET 项目中字符串插值出现 "Unexpected token"错误