我有一个字符串 - 这是电子邮件链,我需要提取发件人的姓名 (From :)
。在下面找到电子邮件示例
str1 <- 'From : Wendy YEOW (SLA) To : xxxx@lt.org Subject : RE: OneService@S
From: SLA Enquiry (SLA) Sent: Friday, 5 June, 2015 5:26 PM To : xxxx@lt.org Subject : RE: OneService@S
From: Siti Zaharah RAMAN (ARKS) Sent: Friday, 5 June, 2015 5:26 PM To : xxxx@lt.org Subject : RE: OneService@S
From: SLA Enquiry (SLA) Sent: Friday, 5 June, 2015 5:26 PM To : xxxx@lt.org Subject : RE: OneService@S
From: Chin Hwang LAU (TA) Sent: Friday, 5 June, 2015 5:26 PM To : xxxx@lt.org Subject : RE: OneService@S'
我有以下代码 - 提取名称
str_extract_all(string=str1,pattern="\\b(From\\s*[:]+\\s*(\\w*))\\b")[[1]]
[1] "From : Wendy" "From: SLA" "From: Siti" "From: SLA" "From: Chin"
但我想要的输出是:
[1] "Wendy YEOW (SLA)" "SLA Enquiry (SLA)" "Siti Zaharah RAMAN (ARKS)" "SLA Enquiry (SLA)" "Chin Hwang LAU (TA)"
最佳答案
您可以使用strsplit
。这里不需要 gsub
。
strsplit(str1, "From ?: | (To|Sent) ?:.*?(\\nFrom ?: |$)")[[1]][-1]
# [1] "Wendy YEOW (SLA)" "SLA Enquiry (SLA)" "Siti Zaharah RAMAN (ARKS)"
# [4] "SLA Enquiry (SLA)" "Chin Hwang LAU (TA)"
正则表达式基本上由两部分组成:
"From ?: "
:这是字符串的开头。拆分返回空字符串和原始字符串的其余部分。"(To|Sent) ?:.*?(\\nFrom ?: |$)"
:此正则表达式表示名称后的文本。它包括以"To"
或"Sent"
开头并以换行符 ("\\n"
) 结尾的子字符串,后跟下一个"From"
或字符串结尾 ("$"
).
最后,[-1]
是删除空字符串所必需的(在第一个 “From”
之前)。
关于regex - 在 R 中使用正则表达式从电子邮件中提取名称,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/30747992/