我有一个数据框:
value
2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = "jbohl"
2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = "msmith". limit test
2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = "msmith"
我如何将列“值”分成 6 列,由时间戳和括号定义?所需的结果必须如下所示: timestamp col2 col3 col4 col5 message
2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = "jbohl"
2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = "msmith". limit test
2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = "msmith"
输出:df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl", "2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test", "2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith" )), class = "data.frame", row.names = c(NA, -3L))
最佳答案
您可以使用 tidyr
的 extract
并为每个列值提供一个模式来提取。
tidyr::extract(df, value,
c('timestamp', paste0('col', 2:5), 'message'),
'(\\d+-\\d+-\\d+ \\d+:\\d+:\\d+:\\d+)\\s*([A-Z]+)\\s*(<.*?>)\\s*({.*?})\\s*(\\[.*?\\])\\s*(.*)')
# timestamp col2 col3 col4 col5
#1 2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create]
#2 2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create]
#3 2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT]
# message
#1 increase values: user = jbohl
#2 redirect: user = msmith. limit test
#3 new set: id = 12442, user = msmith
timestamp
- 提取遵循模式 num-num-num num:num:num:num
的数字col2
- 提取以下所有大写文本col3
- 在 <.*>
中提取值col4
- 在 {.*}
中提取值col5
- 在 [.*]
中提取值col6
- 所有剩余的文本。数据
df <- structure(list(value = c("2020-11-20 09:10:28:005 DEBUG <main> {EVENT-upload} [Item_create] increase values: user = jbohl",
"2020-11-20 09:11:10:055 DEBUG <main> {EVENT-upload} [Item_create] redirect: user = msmith. limit test",
"2020-11-20 09:10:28:174 INFO <main> {EVENT-upload} [INPUT] new set: id = 12442, user = msmith"
)), class = "data.frame", row.names = c(NA, -3L))
关于r - 将数据框中的列分成几个其他列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64984213/