r - 多重响应分析

标签 r spss survey

df1 <-
  data.frame(c("male", "female", "male"),
             c("1", "2", "3", "4", "5", "6"),
             seq(141, 170))

names(df1) = c("gender", "age", "height")

df1$age <- factor(
  df1$age,
  levels = c(1, 2, 3, 4, 5, 6),
  labels = c("16-24", "25-34", "35-44", "45-54", "55-64", "65+")
)

q1a = c(1, 0, 1, 0, 0, 1)
q1b = c(0, 0, 2, 2, 2, 0)
q1c = c(0, 0, 3, 3, 0, 3)
# 1,2 and 3 used to be compatible with existing datasets. 
# Could change all to 1 if necessary.

df2 <- data.frame(q1a = q1a, q1b = q1b, q1c = q1c)
df1 <- cbind(df1, df2)

rm(q1a, q1b, q1c, df2)

我希望在 R 中复制 SPSS 的多重响应问题的分析。

目前我正在使用此代码:

#creating function for analysing questions with grouped data
multfreqtable <- function(a, b, c) {
  # number of respondents (for percent of cases)
  totrep = sum(a == 1 | b == 2 | c == 3)
  
  #creating frequency table
  table_a = data.frame("a", sum(a == 1))
  names(table_a) = c("question", "freq")
  table_b = data.frame("b", sum(b == 2))
  names(table_b) = c("question", "freq")
  table_c = data.frame("c", sum(c == 3))
  names(table_c) = c("question", "freq")
  table_question <- rbind(table_a, table_b, table_c)
  
  #remove individual question tables
  rm(table_a, table_b, table_c)
  
  #adding total
  total = as.data.frame("Total")
  totalsum = (sum(table_question$freq, na.rm = TRUE))
  totalrow = cbind(total, totalsum)
  names(totalrow) = c("question", "freq")
  table_question = rbind(table_question, totalrow)
  
  #adding percentage column to frequency table
  percentcalc = as.numeric(table_question$freq)
  percent = (percentcalc / totalsum) * 100
  table_question <- cbind(table_question, percent)
  
  #adding percent of cases column to frequency table
  poccalc = as.numeric(table_question$freq)
  percentofcases = (poccalc / totrep) * 100
  table_question <- cbind(table_question, percentofcases)
  
  #print percent of cases value
  total_respondents <<- data.frame(totrep)
  
  #remove all unnecessary data and values
  rm(
    total,
    totalsum,
    totalrow,
    b,
    c,
    percent,
    percentcalc,
    percentofcases,
    totrep,
    poccalc
  )
  
  return(table_question)
}

#calling function - must tie to data.frame using $ !!!
q1_frequency <- multfreqtable(df1$q1a, df1$q1b, df1$q1c)

#renaming percent of cases - This is very important while using current method
total_respondents_q1 <- total_respondents
rm(total_respondents)

生成此表的结果:

Output table

我正在寻找一种更有效的方法来执行此操作,如果存在或多或少的多项选择问题,理想情况下不需要编辑函数。

最佳答案

您的函数实际上对于您需要做的事情来说太复杂了。我认为这样的功能应该可以工作并且更加灵活。

multfreqtable = function(data, question.prefix) {
  # Find the columns with the questions
  a = grep(question.prefix, names(data))
  # Find the total number of responses
  b = sum(data[, a] != 0)
  # Find the totals for each question
  d = colSums(data[, a] != 0)
  # Find the number of respondents
  e = sum(rowSums(data[,a]) !=0)
  # d + b as a vector. This is your overfall frequency 
  f = as.numeric(c(d, b))
  data.frame(question = c(names(d), "Total"),
             freq = f,
             percent = (f/b)*100,
             percentofcases = (f/e)*100 )
}

向示例数据集添加另一个问题:

set.seed(1); df1$q2a = sample(c(0, 1), 30, replace=T)
set.seed(2); df1$q2b = sample(c(0, 2), 30, replace=T)
set.seed(3); df1$q2c = sample(c(0, 3), 30, replace=T)

为“q1”响应制作一个表格:

> multfreqtable(df1, "q1")
  question freq   percent percentofcases
1      q1a   15  33.33333             60
2      q1b   15  33.33333             60
3      q1c   15  33.33333             60
4    Total   45 100.00000            180

为“q2”响应制作一个表格:

> multfreqtable(df1, "q2")
  question freq   percent percentofcases
1      q2a   14  31.11111       53.84615
2      q2b   13  28.88889       50.00000
3      q2c   18  40.00000       69.23077
4    Total   45 100.00000      173.07692

多个问题的表格

以下是该函数的修改版本,允许您同时为多个问题创建表格列表:

multfreqtable = function(data, question.prefix) {
  z = length(question.prefix)
  temp = vector("list", z)

  for (i in 1:z) {
    a = grep(question.prefix[i], names(data))
    b = sum(data[, a] != 0)
    d = colSums(data[, a] != 0)
    e = sum(rowSums(data[,a]) !=0)
    f = as.numeric(c(d, b))
    temp[[i]] = data.frame(question = c(sub(question.prefix[i], 
                                            "", names(d)), "Total"),
                           freq = f,
                           percent = (f/b)*100,
                           percentofcases = (f/e)*100 )
    names(temp)[i] = question.prefix[i]
  }
  temp
}

示例:

> multfreqtable(df1, "q1")
$q1
  question freq   percent percentofcases
1        a   15  33.33333             60
2        b   15  33.33333             60
3        c   15  33.33333             60
4    Total   45 100.00000            180

> test1 = multfreqtable(df1, c("q1", "q2"))
> test1
$q1
  question freq   percent percentofcases
1        a   15  33.33333             60
2        b   15  33.33333             60
3        c   15  33.33333             60
4    Total   45 100.00000            180

$q2
  question freq   percent percentofcases
1        a   14  31.11111       53.84615
2        b   13  28.88889       50.00000
3        c   18  40.00000       69.23077
4    Total   45 100.00000      173.07692

> test1$q1
  question freq   percent percentofcases
1        a   15  33.33333             60
2        b   15  33.33333             60
3        c   15  33.33333             60
4    Total   45 100.00000            180

关于r - 多重响应分析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/9265003/

相关文章:

r - R 是否有像 Perl 的 qw() 这样的类似引号的操作符?

r - 预测与实际情节

variables - SPSS语法将所有日期变量转换为字符串

statistics - 如何遍历 SPSS 中的变量?我想避免代码重复

r - 基于两个总体基准分布的分层抽样

r - 按行名聚合大矩阵中的行

r - 计算 3D (NED) 地理坐标(纬度、经度、深度)之间的距离

python - 如何使用Python修改SPSS输出文件?

r - 是否可以在 Windows 上获得 R 调查包的 `svyby` 函数 multicore= 参数?

python - 如何匹配某个人在不同时间段内的调查响应以形成面板数据集?