r - 计算熵

标签 r frequency entropy

我已经尝试了几个小时来计算熵,我知道我错过了一些东西。希望这里有人能给我一个主意!

编辑:我认为我的公式是错误的!

代码:

 info <- function(CLASS.FREQ){
      freq.class <- CLASS.FREQ
      info <- 0
      for(i in 1:length(freq.class)){
        if(freq.class[[i]] != 0){ # zero check in class
          entropy <- -sum(freq.class[[i]] * log2(freq.class[[i]]))  #I calculate the entropy for each class i here
        }else{ 
          entropy <- 0
        } 
        info <- info + entropy # sum up entropy from all classes
      }
      return(info)
    }

我希望我的帖子很清楚,因为这是我第一次在这里发帖。

这是我的数据集:

buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")

credit <- c("fair", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "fair", "fair", "excellent", "excellent", "fair", "excellent")

student <- c("no", "no", "no","no", "yes", "yes", "yes", "no", "yes", "yes", "yes", "no", "yes", "no")

income <- c("high", "high", "high", "medium", "low", "low", "low", "medium", "low", "medium", "medium", "medium", "high", "medium")

age <- c(25, 27, 35, 41, 48, 42, 36, 29, 26, 45, 23, 33, 37, 44) # we change the age from categorical to numeric

最佳答案

最终我发现您的代码没有错误,因为它运行时没有错误。我认为您缺少的部分是类(class)频率的计算,您会得到答案。快速浏览您提供的不同对象,我怀疑您正在查看购买

buys <- c("no", "no", "yes", "yes", "yes", "no", "yes", "no", "yes", "yes", "yes", "yes", "yes", "no")
freqs <- table(buys)/length(buys)
info(freqs)
[1] 0.940286

作为改进代码的问题,您可以大大简化它,因为如果为您提供了类频率向量,则不需要循环。

例如:

# calculate shannon-entropy
-sum(freqs * log2(freqs))
[1] 0.940286

顺便说一句,函数 entropy.empirical 位于 entropy 包中,您可以在其中将单位设置为 log2,从而获得更大的灵 active 。示例:

entropy.empirical(freqs, unit="log2")
[1] 0.940286

关于r - 计算熵,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27254550/

相关文章:

r - 如何在 R 中绘制多种颜色的多段线?

R Sweave 用户定义函数

r - 查找需要解析 data.frame 的组合频率

compression - 压缩 60 位字符串的最佳方法

python - 训练后所有权重都变为负数

javascript - 随机移动的图像被屏幕的左上角吸引

R dplyr : group by without aggregate function

r - 累积列总和以通知 R 中的另一个列值

timer - 如何设置16位P&F校正PWM中Timer1的频率和占空比

使用 RDTSC 在 C 中计算 CPU 频率总是返回 0