regex - 从字符串 : R regex 中删除美国邮政编码

我正在尝试从字符串中删除/提取邮政编码。逻辑是我正在捕获以下内容:

必须正好包含 5 个连续数字或

必须正好包含 5 个连续数字，后跟一个破折号，然后正好包含 4 个连续数字或

必须正好包含 5 个连续数字，后跟一个空格，然后正好包含 4 个连续数字

字符串的 zip 部分可以以空格开头，但也可能不是。

这是一个 MWE 和我尝试过的。 2 个尝试的正则表达式基于 this question 和 this question :

text.var <- c("Mr. Bean bought 2 tickets 2-613-213-4567",
  "43 Butter Rd, Brossard QC K0A 3P0 – 613 213 4567", 
  "Rat Race, XX, 12345",
  "Ignore phone numbers(613)2134567",
  "Grab zips with dashes 12345-6789 or no space before12345-6789",  
  "Grab zips with spaces 12345 6789 or no space before12345 6789",
  "I like 1234567 dogs"
)

pattern1 <- "\\d{5}([- ]*\\d{4})?"
pattern2 <- "[0-9]{5}(-[0-9]{4})?(?!.*[0-9]{5}(-[0-9]{4})?)"


regmatches(text.var, gregexpr(pattern1, text.var, perl = TRUE)) 
regmatches(text.var, gregexpr(pattern2, text.var, perl = TRUE)) 

## [[1]]
## character(0)
## 
## [[2]]
## character(0)
## 
## [[3]]
## [1] "12345"
## 
## [[4]]
## [1] "21345"
## 
## [[5]]
## [1] "12345-6789"
## 
## [[6]]
## [1] "12345"
## 
## [[7]]
## [1] "12345"

所需的输出

## [[1]]
## character(0)
## 
## [[2]]
## character(0)
## 
## [[3]]
## [1] "12345"
## 
## [[4]]
## character(0)
## 
## [[5]]
## [1] "12345-6789" "12345-6789"
## 
## [[6]]
## [1] "12345 6789" "12345 6789"
## 
## [[7]]
## character(0)

注意 R 的正则表达式类似于其他正则表达式，但特定于 R。这个问题特定于 R 的正则表达式，而不是一般的正则表达式问题。

最佳答案

您可以使用这样的正则表达式:

"(?<!\\d)(\\d{5}(?:[-\\s]\\d{4})?)\\b"

Working demo

enter image description here

关于regex - 从字符串 : R regex 中删除美国邮政编码，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25223792/

regex - 从字符串 : R regex 中删除美国邮政编码

上一篇：appstore-sandbox - 脚本目标的权利 key /访问组

下一篇：sql-server - 每当另一个字段值为零时返回一个字段的最大值