r - 在 R 中扩展电子邮件数据集的标题

标签 r dplyr tidyr tibble tidytext

我有大量的电子邮件数据,如下所示:

library(dplyr)

emails <- tibble(
  from = c('<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="30555d405c5f4955551e0170484442511e535f" rel="noreferrer noopener nofollow">[email protected]</a>','<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="90f5fde0fcffe9f5f5bea5d0e8e4e2f1bef3ff" rel="noreferrer noopener nofollow">[email protected]</a>','<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="30555d405c5f4955551e0170484442511e535f" rel="noreferrer noopener nofollow">[email protected]</a>',
           '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1f7a726f7370667a7a312c5f676b6d7e317c70" rel="noreferrer noopener nofollow">[email protected]</a>','<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="086d65786467716d6d263948707c7a69266b67" rel="noreferrer noopener nofollow">[email protected]</a>'),
  to = list(
    c('<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4f2a223f2320362a2a617a0f373b3d2e612c20" rel="noreferrer noopener nofollow">[email protected]</a>', 'employee.3xtra.co'),
    c('<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e7828a978b889e8282c9d4a79f939586c98488" rel="noreferrer noopener nofollow">[email protected]</a>', '<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f3969e839f9c8a9696ddc2b38b878192dd909c" rel="noreferrer noopener nofollow">[email protected]</a>'),
    c('<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e2878f928e8d9b8787ccd0a29a969083cc818d" rel="noreferrer noopener nofollow">[email protected]</a>'),
    c('<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="01646c716d6e7864642f3041797573602f626e" rel="noreferrer noopener nofollow">[email protected]</a>'),
    c('<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="86e3ebf6eae9ffe3e3a8b5c6fef2f4e7a8e5e9" rel="noreferrer noopener nofollow">[email protected]</a>','<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="385d55485457415d5d160d78404c4a59165b57" rel="noreferrer noopener nofollow">[email protected]</a>','<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f5909885999a8c9090dbc3b58d818794db969a" rel="noreferrer noopener nofollow">[email protected]</a>')),
  
  cc = list(
    c('employee.2xtra.co', 'employee.4xtra.co', 'employee.6xtra.co'),
    c('employee.1xtra.co', 'employee.8xtra.co', 'employee.6xtra.co'),
    NA,
    c('employee.2xtra.co', 'employee.4xtra.co'),
    c('employee.2xtra.co', 'employee.6xtra.co'))
)

emails

# A tibble: 5 x 3
  from               to        cc       
  <chr>              <list>    <list>   
1 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3c59514c5053455959120d7c44484e5d125f53" rel="noreferrer noopener nofollow">[email protected]</a> <chr [2]> <chr [3]>
2 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="96f3fbe6faf9eff3f3b8a3d6eee2e4f7b8f5f9" rel="noreferrer noopener nofollow">[email protected]</a> <chr [2]> <chr [3]>
3 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ee8b839e8281978b8bc0dfae969a9c8fc08d81" rel="noreferrer noopener nofollow">[email protected]</a> <chr [1]> <lgl [1]>
4 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c1a4acb1adaeb8a4a4eff281b9b5b3a0efa2ae" rel="noreferrer noopener nofollow">[email protected]</a> <chr [1]> <chr [2]>
5 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4d28203d2122342828637c0d35393f2c632e22" rel="noreferrer noopener nofollow">[email protected]</a> <chr [3]> <chr [2]>

我需要您的帮助才能扩展每个组合的每条记录。例如,我想要为第 1 行实现的是:

from                to                  cc
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="44212934282b3d21216a75043c3036256a272b" rel="noreferrer noopener nofollow">[email protected]</a>  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e98c84998586908c8cc7dca9919d9b88c78a86" rel="noreferrer noopener nofollow">[email protected]</a>  employee.2xtra.co
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="93f6fee3fffceaf6f6bda2d3ebe7e1f2bdf0fc" rel="noreferrer noopener nofollow">[email protected]</a>  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fe9b938e9291879b9bd0cbbe868a8c9fd09d91" rel="noreferrer noopener nofollow">[email protected]</a>  employee.4xtra.co
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="81e4ecf1edeef8e4e4afb0c1f9f5f3e0afe2ee" rel="noreferrer noopener nofollow">[email protected]</a>  <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="54313924383b2d31317a61142c2026357a373b" rel="noreferrer noopener nofollow">[email protected]</a>  employee.6xtra.co
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="60050d100c0f1905054e5120181412014e030f" rel="noreferrer noopener nofollow">[email protected]</a>  employee.3xtra.co   employee.2xtra.co
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d8bdb5a8b4b7a1bdbdf6e998a0acaab9f6bbb7" rel="noreferrer noopener nofollow">[email protected]</a>  employee.3xtra.co   employee.4xtra.co
<a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="37525a475b584e52521906774f434556195458" rel="noreferrer noopener nofollow">[email protected]</a>  employee.3xtra.co   employee.6xtra.co

非常感谢您抽出时间。

最佳答案

我们可以应用unnest两次。

library(dplyr)
library(tidyr)

emails2 <- emails %>%
  unnest(cols = "to") %>%
  unnest(cols = "cc")
head(emails2)
# # A tibble: 6 x 3
#   from               to                 cc               
#   <chr>              <chr>              <chr>            
# 1 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="5c39312c3033253939726d1c24282e3d723f33" rel="noreferrer noopener nofollow">[email protected]</a> <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4a2f273a2625332f2f647f0a323e382b642925" rel="noreferrer noopener nofollow">[email protected]</a> employee.2xtra.co
# 2 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="f99c94899596809c9cd7c8b9818d8b98d79a96" rel="noreferrer noopener nofollow">[email protected]</a> <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a3c6ced3cfccdac6c68d96e3dbd7d1c28dc0cc" rel="noreferrer noopener nofollow">[email protected]</a> employee.4xtra.co
# 3 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="fd98908d9192849898d3ccbd85898f9cd39e92" rel="noreferrer noopener nofollow">[email protected]</a> <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d9bcb4a9b5b6a0bcbcf7ec99a1adabb8f7bab6" rel="noreferrer noopener nofollow">[email protected]</a> employee.6xtra.co
# 4 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="ff9a928f9390869a9ad1cebf878b8d9ed19c90" rel="noreferrer noopener nofollow">[email protected]</a> employee.3xtra.co  employee.2xtra.co
# 5 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="17727a677b786e72723926576f636576397478" rel="noreferrer noopener nofollow">[email protected]</a> employee.3xtra.co  employee.4xtra.co
# 6 <a href="https://stackoverflow.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2e4b435e4241574b4b001f6e565a5c4f004d41" rel="noreferrer noopener nofollow">[email protected]</a> employee.3xtra.co  employee.6xtra.co

如果您要扩展两列以上,以下是一种方法。首先确定列出的列。将列名称存储在 names_target 中,然后使用 for 循环重复应用 unnest 函数。

names_target <- emails %>%
  select(where(is.list)) %>%
  names()

temp <- emails

for (i in names_target){
  temp <- temp %>% unnest(cols = all_of(i))
}

identical(temp, emails2)
# [1] TRUE

关于r - 在 R 中扩展电子邮件数据集的标题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66416750/

相关文章:

r - 如何使用R的遗传算法来优化支持向量机的参数

r - 根据特定的 NA 量和 R 中的特定组删除行

r 将一串数据拆分成多列,按各个变量排序

r - 如何从 tibble 中删除非缺失值与其他行中的值子集匹配的行?

r - R中梯度下降的线性回归

python - 逻辑向量作为 Python 中的索引?

r - 比较 R 中各种数据框的列名

r - 使用多个测试组执行 Wilcoxon 测试

r - dplyr summarise_each 与 na.rm

r - 如何使用 tidyr::replace_na 替换数据帧中的所有 NA?