原始数据如下所示:
Restaurant Question rating
McDonalds How was the food? 5
McDonalds How were the drinks? 3
McDonalds How were the workers? 2
Burger_King How was the food? 1
Burger_King How were the drinks? 3
Burger_King How were the workers? 4
平均值如下:
Question average_rating error
How was the food? 3.13 0.7
How were the drinks? 2.37 0.56
如何使用原始数据绘制点图(x = 问题,y = 评级,填充 = 餐厅),然后在其顶部绘制误差线(ymin/ymax = 平均评级 ± 误差) ?
tribble
为了方便起见:
tribble(
~restaurant, ~question, ~rating,
"McDonalds", "How was the food?", 5,
"McDonalds", "How were the drinks?", 3,
"McDonalds", "How were the drinks?", 2,
"BurgerKing", "How was the food?", 1,
"BurgerKing", "How were the drinks?", 3,
"BurgerKing", "How were the drinks?", 4
)
tribble(
~question, ~average_rating, ~error,
"How was the food?", 3.13, 0.7,
"How were the drinks?", 2.37, 0.56
)
最佳答案
您想要的输出与当前的数据帧不太一致。因为,您的第二个数据框包含每个餐厅的平均评分,而不是每个问题的平均评分(如 @StupidWolf 所概述)。因此,要么您想在 x 轴上绘制餐厅,这很容易做到,要么您需要合并两个数据帧并将 Average_ rating
设置为变量 question< 的离散值
。
我对第二个选项执行以下操作:
library(dplyr)
df2 %>% mutate(question = "Average_rating") %>%
rename(rating = average_rating) %>% full_join(df1,.) %>%
mutate(restaurant = sub("BurgerKing","Burger_King",restaurant))
Joining, by = c("restaurant", "question", "rating")
# A tibble: 8 x 4
restaurant question rating error
<chr> <chr> <dbl> <dbl>
1 McDonalds How was the food? 5 NA
2 McDonalds How were the drinks? 3 NA
3 McDonalds How were the drinks? 2 NA
4 Burger_King How was the food? 1 NA
5 Burger_King How were the drinks? 3 NA
6 Burger_King How were the drinks? 4 NA
7 McDonalds Average_rating 3.13 0.7
8 Burger_King Average_rating 2.37 0.56
然后,如果您想添加绘图,可以执行以下操作:
library(ggplot2)
library(dplyr)
df2 %>% mutate(question = "Average_rating") %>%
rename(rating = average_rating) %>% full_join(df1,.) %>%
mutate(restaurant = sub("BurgerKing","Burger_King",restaurant)) %>%
ggplot(aes(x = question, y= rating, color = restaurant))+
geom_point(position = position_dodge(0.9))+
geom_errorbar(aes(ymin = rating-error, ymax = rating+error), width = 0.1, position = position_dodge(0.9))
编辑:每个问题的绘图错误意味着
使用包含每个问题平均速率的新数据框,您可以使用geom_pointrange
,如下所示:
ggplot(df1, aes(x = question, y = rating, color = restaurant))+
geom_jitter(width = 0.2)+
geom_pointrange(inherit.aes = FALSE,
data = df3,
aes(x = question,
y = average_rating,
ymin = average_rating-error,
ymax = average_rating+error))
它能回答你的问题吗?
关于r - 从一个 df 绘制点,从另一个 df 绘制误差条,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60233796/