r - 从一个 df 绘制点,从另一个 df 绘制误差条

标签 r ggplot2 dplyr

原始数据如下所示:

Restaurant     Question               rating

McDonalds      How was the food?      5       
McDonalds      How were the drinks?   3     
McDonalds      How were the workers?  2     
Burger_King    How was the food?      1       
Burger_King    How were the drinks?   3       
Burger_King    How were the workers?  4      

平均值如下:

Question              average_rating    error
How was the food?     3.13              0.7
How were the drinks?  2.37              0.56

如何使用原始数据绘制点图(x = 问题,y = 评级,填充 = 餐厅),然后在其顶部绘制误差线(ymin/ymax = 平均评级 ± 误差) ?

tribble为了方便起见:

tribble(
  ~restaurant, ~question,  ~rating,
  "McDonalds", "How was the food?", 5,
  "McDonalds", "How were the drinks?", 3,
  "McDonalds", "How were the drinks?", 2,
  "BurgerKing", "How was the food?", 1,
  "BurgerKing", "How were the drinks?", 3,
  "BurgerKing", "How were the drinks?", 4
)
tribble(
  ~question, ~average_rating, ~error,
  "How was the food?", 3.13, 0.7,
  "How were the drinks?", 2.37, 0.56
)

最佳答案

您想要的输出与当前的数据帧不太一致。因为,您的第二个数据框包含每个餐厅的平均评分,而不是每个问题的平均评分(如 @StupidWolf 所概述)。因此,要么您想在 x 轴上绘制餐厅,这很容易做到,要么您需要合并两个数据帧并将 Average_ rating 设置为变量 question< 的离散值

我对第二个选项执行以下操作:

library(dplyr)
df2 %>% mutate(question = "Average_rating") %>%
  rename(rating = average_rating) %>% full_join(df1,.) %>%
  mutate(restaurant = sub("BurgerKing","Burger_King",restaurant)) 
Joining, by = c("restaurant", "question", "rating")
# A tibble: 8 x 4
  restaurant  question             rating error
  <chr>       <chr>                 <dbl> <dbl>
1 McDonalds   How was the food?      5    NA   
2 McDonalds   How were the drinks?   3    NA   
3 McDonalds   How were the drinks?   2    NA   
4 Burger_King How was the food?      1    NA   
5 Burger_King How were the drinks?   3    NA   
6 Burger_King How were the drinks?   4    NA   
7 McDonalds   Average_rating         3.13  0.7 
8 Burger_King Average_rating         2.37  0.56

然后,如果您想添加绘图,可以执行以下操作:

library(ggplot2)
library(dplyr)
df2 %>% mutate(question = "Average_rating") %>%
  rename(rating = average_rating) %>% full_join(df1,.) %>%
  mutate(restaurant = sub("BurgerKing","Burger_King",restaurant)) %>%
  ggplot(aes(x = question, y= rating, color = restaurant))+
  geom_point(position = position_dodge(0.9))+
  geom_errorbar(aes(ymin = rating-error, ymax = rating+error), width = 0.1, position = position_dodge(0.9))

enter image description here

编辑:每个问题的绘图错误意味着

使用包含每个问题平均速率的新数据框,您可以使用geom_pointrange,如下所示:

ggplot(df1, aes(x = question, y = rating, color = restaurant))+
  geom_jitter(width = 0.2)+
  geom_pointrange(inherit.aes = FALSE,
                  data = df3, 
                  aes(x = question, 
                      y = average_rating,
                      ymin = average_rating-error,
                      ymax = average_rating+error))  

enter image description here

它能回答你的问题吗?

关于r - 从一个 df 绘制点,从另一个 df 绘制误差条,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/60233796/

相关文章:

R - 如何将时间戳列添加到数据框定义

r - 如何使用 logit 函数编写 JAGS 二项式模型文件

r - R CMD 检查中的全局变量注释没有可见绑定(bind)

python - 在 Python 中,如何像 R 一样进行 group by + mutate + ifelse?

r - 合并后是否有可用的 _merge 指示器?

r - 如何通过 vglm tobit 模型使用健壮的 SE 和集群 SE?

R 函数(循环?)为数据集中的每一列创建一个新图表

r - 列与 ggplot geom_col 中的数据不对齐

r - 对组学/生物统计学中的重复数据进行平均

r - 在R中的map()中使用管道