r - 将数据框中的列分成几个其他列

标签 r dataframe

我有一个数据框:

                     value
2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = "jbohl"
2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = "msmith". limit test
2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = "msmith"
我如何将列“值”分成 6 列,由时间戳和括号定义?所需的结果必须如下所示:
        timestamp           col2      col3          col4               col5              message
2020-11-20 09:10:28:005    DEBUG     <main>      {EVENT-upload}     [Item_create]      increase values: user = "jbohl"
2020-11-20 09:11:10:055    DEBUG     <main>      {EVENT-upload}     [Item_create]      redirect: user = "msmith". limit test
2020-11-20 09:10:28:174    INFO      <main>      {EVENT-upload}     [INPUT]            new set: id = 12442, user = "msmith"
输出:
df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl", "2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test", "2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith" )), class = "data.frame", row.names = c(NA, -3L)) 

最佳答案

您可以使用 tidyrextract并为每个列值提供一个模式来提取。

tidyr::extract(df, value,
               c('timestamp', paste0('col', 2:5), 'message'), 
               '(\\d+-\\d+-\\d+ \\d+:\\d+:\\d+:\\d+)\\s*([A-Z]+)\\s*(<.*?>)\\s*({.*?})\\s*(\\[.*?\\])\\s*(.*)')

#               timestamp  col2   col3           col4          col5
#1 2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create]
#2 2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create]
#3 2020-11-20 09:10:28:174  INFO <main> {EVENT-upload}       [INPUT]

#                              message
#1       increase values: user = jbohl
#2 redirect: user = msmith. limit test
#3  new set: id = 12442, user = msmith
timestamp - 提取遵循模式 num-num-num num:num:num:num 的数字col2 - 提取以下所有大写文本col3 - 在 <.*> 中提取值col4 - 在 {.*} 中提取值col5 - 在 [.*] 中提取值col6 - 所有剩余的文本。
数据
df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl", 
"2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test", 
"2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith"
)), class = "data.frame", row.names = c(NA, -3L))

关于r - 将数据框中的列分成几个其他列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64984213/

相关文章:

python - 重复 pandas 数据帧的特定行

python - 打印 pandas dataframe 的内容而不带索引

r - 修改glm函数以在R中采用用户指定的链接函数

r - mclapply 调用应该嵌套吗?

r - 使用 facet_wrap 和 ggplotly 的第一个和最后一个方面比中间方面大

R 总开/关时间

python - Pandas 的时间事件研究

r - R 中稀疏矩阵的多核求解

r - ggplot2 是否有一种简单的方法来包装注释文本?

python - 将 Pandas 中的单元格拆分为多行