我有一个包含特定格式结构的大型 txt 文件。我的目标是使用 readLines
在 R 中加载文本,我想根据我的 df 数据框用新值替换每条记录的权重值。我不想更改 .txt 数据结构格式或解析 .txt 文件。最终输出应具有与原始 .txt (writeLines()
) 完全相同的结构。我如何读取它并更新值?谢谢
这是我的引用数据框
df <- tibble::tribble(
~House_id, ~id, ~new_weight,
18105265, "Mab", 4567,
18117631, "Maa", 3367,
18121405, "Mab", 4500,
71811763, "Maa", 2455,
71811763, "Mab", 2872
)
这是我的 .txt 的一小部分
H18105265_0
R1_0
Mab_3416311514210525745_W923650.80
T1_0
T2_0
T3_0
V64_0_2_010_ab171900171959
H18117631_0
R1_0
Maa_1240111711220682016_W123650.80
T1_0
V74_0_1_010_aa081200081259_aa081600081859_aa082100095659_aa095700101159_aa101300105059
H18121405_0
R1_0
Mab_2467211713110643835_W923650.80
T1_0
T2_0
V62_0_1_010_090500092459_100500101059_101100101659_140700140859_141100141359
H71811763_0
R1_0
Maa_5325411210120486554_W923650.80
Mab_5325411210110485554_W723650.80
T1_0
T2_0
T3_0
T4_0
此处是第一条个人记录 house_id = 18105265 的期望输出:更新 Mab_3416311514210525745_W923650.80
符合 df
Mab_3416311514210525745_W4567
基数
H18105265_0
R1_0
Mab_3416311514210525745_W4567
T1_0
T2_0
T3_0
V64_0_2_010_ab171900171959
最佳答案
编辑 - 添加 id
以查找以区分非唯一的 House_id。
这是一种方法,我读取数据,加入 df
中的更新权重,然后使用新权重在以“M”开头的行上创建更新值。
library(tidyverse)
read_fwf("txt_sample.txt" , col_positions = fwf_empty("txt_sample.txt")) %>% # edit suggested by DanG
# if the row starts with H, extract 8 digit house number and
# use that to join to the table with new weights
mutate(House_id = if_else(str_starts(X1, "H"), as.numeric(str_sub(X1, 2,9)), NA_real_),
id = if_else(str_starts(X1, "M"), str_sub(X1, 1,3), NA_character_)) %>%
fill(House_id) %>%
left_join(df, by = c("House_id", "id")) %>%
fill(new_weight) %>%
# make new string using updated weight (or keep existing string)
mutate(X1_new = coalesce(
if_else(str_starts(X1, "M"),
paste0(word(X1, end = 2, sep = "_"), "_W", new_weight),
NA_character_),
X1)) %>%
pull(X1_new) %>%
writeLines()
输出
H18105265_0
R1_0
Mab_3416311514210525745_W4567
T1_0
T2_0
T3_0
V64_0_2_010_ab171900171959
H18117631_0
R1_0
Maa_1240111711220682016_W3367
T1_0
V74_0_1_010_aa081200081259_aa081600081859_aa082100095659_aa095700101159_aa101300105059
H18121405_0
R1_0
Mab_2467211713110643835_W4500
T1_0
T2_0
V62_0_1_010_090500092459_100500101059_101100101659_140700140859_141100141359
H71811763_0
R1_0
Maa_5325411210120486554_W2455
Mab_5325411210110485554_W2872
T1_0
T2_0
T3_0
T4_0
关于r - 将txt文件加载到R中并根据其他数据框替换一些值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/69243577/