我有以下数据框,您可以在此处获取 CSV 格式的数据:http://www.sharecsv.com/s/c4d94fd3a7dc43e3ec249bff373a0082/data.csv
我的昵称如下。正如您所看到的,名称不是连续的(例如,缺少 Q1、没有 Q3 或 Q4 等等)。我需要保持这种状态。检查某些案例是否有额外的不相关的“.1”。
[1] "Q2_1" "Q2_2" "Q5_1" "Q5_2" "Q6_1" "Q6_2" "Q9_1" "Q9_2" "Q11_1" "Q11_2" "Q8_1" "Q8_2" "Q14_1"
[14] "Q14_2" "Q16_1" "Q16_2" "Q10_1" "Q10_2" "Q11_1.1" "Q11_2.1" "Q19_1" "Q19_2" "Q20_1" "Q20_2" "Q21_1" "Q21_2"
[27] "Q15_1" "Q15_2" "Q23_1" "Q23_2" "Q24_1" "Q24_2" "Q25_1" "Q25_2" "Q26_1" "Q26_2" "Q20_1.1" "Q20_2.1" "Q21_1.1"
[40] "Q21_2.1" "Q29_1" "Q29_2" "Q30_1" "Q30_2" "Q35_1" "Q35_2" "Q36_1" "Q36_2" "Q37_1" "Q37_2" "Q38_1" "Q38_2"
[53] "Q39_1" "Q39_2" "Q41_1" "Q41_2" "Q30_1.1" "Q30_2.1" "Q43_1" "Q43_2" "Q44_1" "Q44_2" "Q45_1" "Q45_2" "Q47_1"
[66] "Q47_2" "Q48_1" "Q48_2" "Q36_1.1" "Q36_2.1" "Q37_1.1" "Q37_2.1" "Q51_1" "Q51_2" "Q52_1" "Q52_2" "Q53_1" "Q53_2"
[79] "Q41_1.1" "Q41_2.1" "Q42_1" "Q42_2" "Q56_1" "Q56_2" "Q57_1" "Q57_2" "Q58_1" "Q58_2" "Q59_1" "Q59_2" "Q60_1"
[92] "Q60_2" "Q61_1" "Q61_2" "Q62_1" "Q62_2" "Q63_1" "Q63_2" "Q64_1" "Q64_2" "Q65_1" "Q65_2" "Q53_1.1" "Q53_2.1"
[105] "Q54_1" "Q54_2" "Q68_1" "Q68_2" "Q75_1" "Q75_2" "Q57_1.1" "Q57_2.1" "Q58_1.1" "Q58_2.1" "Q59_1.1" "Q59_2.1" "Q60_1.1"
[118] "Q60_2.1" "Q61_1.1" "Q61_2.1" "Q81_1" "Q81_2" "Q82_1" "Q82_2" "Q83_1" "Q83_2" "Q87_1" "Q87_2" "Q88_1" "Q88_2"
[131] "Q89_1" "Q89_2" "Q90_1" "Q90_2" "Q91_1" "Q91_2" "Q92_1" "Q92_2" "Q93_1" "Q93_2" "Q94_1" "Q94_2" "Q95_1"
[144] "Q95_2" "Q74_1" "Q74_2" "Q75_1.1" "Q75_2.1" "Q76_1" "Q76_2" "Q77_1" "Q77_2" "Q100_1" "Q100_2" "Q101_1" "Q101_2"
[157] "Q102_1" "Q102_2" "Q103_1" "Q103_2" "Q104_1" "Q104_2" "Q105_1" "Q105_2" "Q106_1" "Q106_2" "Q107_1" "Q107_2" "Q108_1"
[170] "Q108_2" "Q113_1" "Q113_2" "Q114_1" "Q114_2" "Q117_1" "Q117_2" "Q96_1" "Q96_2" "Q97_1" "Q97_2" "Q98_1" "Q98_2"
[183] "Q121_1" "Q121_2" "Q103_1.1" "Q103_2.1" "Q104_1.1" "Q104_2.1" "Q127_1" "Q127_2" "Q128_1" "Q128_2" "Q129_1" "Q129_2"
问题:我需要将其从 WIDE 转换为 LONG,以获得如下内容:
QUESTION CASE VALUE
Q2 1 5
Q2 2 5
Q5 1 1
Q5 2 2
我尝试按如下方式 reshape ,但不断收到不同的错误,而且我也不确定是否正确分割了它:
test <- reshape(data, sep = "_", times = c(1, 2), direction = "long", varying = colnames(data))
最佳答案
您可以使用 pivot_longer
形式 tidyr,以“_”分隔:
library(tidyr)
pivot_longer(data, cols=everything(), names_sep="_",
names_to=c("Question","Case"))
# A tibble: 194 x 3
Question Case value
<chr> <chr> <int>
1 Q2 1 5
2 Q2 2 5
3 Q5 1 1
4 Q5 2 2
5 Q6 1 4
6 Q6 2 4
7 Q9 1 4
8 Q9 2 4
9 Q11 1 5
10 Q11 2 3
# ... with 184 more rows
尝试使用reshape
函数会因宽格式中的变量名称不一致而导致错误。例如,正如您提到的,某些名称有一个附加的“.1”,您说这是无关紧要的。 tidyr 包同意这一点,因为它只是收集分隔符之后的所有内容,并将找到的所有内容放入“Case”变量(names_to
参数中的第二项)。 reshape
函数更加严格。也就是说,附加的“.1”不是不相关(根据函数),因为函数将尝试猜测名称中包含的值(“_”之后),并查看长度不相等,并且失败并显示'variing'参数必须具有相同的长度
。
下面给出了使用 sep=""
的替代方案...
关于r - 从宽到长,列名不一致,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60923171/