r - R中的for循环回归分析

标签 r for-loop regression linear-regression

我有一个鱼类丰度数据的数据集,我想对其执行回归分析。 但是,我想对数据的不同子集执行大量回归,而不必手动执行此操作,并将 coefs 和 P 值保存在新的数据框中。

数据结构如下(示例):

site year species abund
a    2011  a      3
a    2016  b      5
b    2011  a      4
b    2015  a      9
a    2018  b      1
c    2010  a      2
b    2016  c      3
c    2012  a      1

我总共有 883 行、21 个独特地点、41 个独特物种和 8 个不同年份。

我想要每个物种-位点组合的回归模型。 (每个组合至少有 5 个观察值)模型如下:

lm(abund ~ year)  

每个地点的每个物种都有一个模型。因此,一种模型适用于 a 地点的物种 a,一种适用于 a 地点的物种 b,一种适用于 b 地点的物种 a 等。

堆栈上有几个关于此的主题,但似乎不适合我的需要。 我的想法是使用 for 循环,但我无法让它正常工作。一整天都在调整,但无法让它发挥作用。

slopes <- numeric(nrow(df))

for (i in 1:nrow(df)) {
  y <- as.numeric(df[i,4]) # row 4 is the abundancy data
  x <- df([i, 1]) # row 1 is the year data
  slopes[i] <- coef(lm(y ~ x))[2]
}

我的问题:如何对所有独特的位点物种组合进行线性回归模型并将 coefs 和 P 值存储在新的数据框中?最好使用我失败的尝试的改进版本。

提前致谢

50 个随机行样本的输出:

df <- structure(list(year = c(2015, 2012, 2017, 2014, 2018, 2018, 2012, 
    2018, 2013, 2013, 2012, 2018, 2012, 2018, 2016, 2013, 2018, 2019, 
    2012, 2019, 2017, 2014, 2013, 2014, 2018, 2016, 2013, 2019, 2019, 
    2018, 2019, 2014, 2012, 2018, 2017, 2016, 2017, 2015, 2017, 2019, 
    2012, 2016, 2019, 2019, 2018, 2014, 2012, 2015, 2012, 2012), 
        species = c("Aal", "Brasem", "Kolblei", "Dunlipharder", "Snoekbaars", 
        "Snoekbaars", "Paling", "Baars", "Tong", "Sprot", "Paling", 
        "Kolblei", "Baars", "Sprot", "Tong", "Baars", "Baars", "Zwartbekgrondel", 
        "Dikkopje", "Snoekbaars", "Blankvoorn", "Kolblei", "Kolblei", 
        "Baars", "Aal", "Kolblei", "Bot", "Snoekbaars", "Baars", 
        "Blankvoorn", "Zeebaars", "Snoekbaars", "Zeebaars", "Bot", 
        "Snoekbaars", "Bot", "Baars", "Baars", "Aal", "Snoekbaars", 
        "Baars", "Baars", "Bot", "Bot", "Bot", "Kleine koornaarvis", 
        "Snoekbaars", "Bot", "Blankvoorn", "Kleine koornaarvis"), 
        site = c("Amerikahaven (kop Aziëhaven)", "Het IJ (thv EyE)", 
        "Westhaven en ADM-haven (kop)", "Jan van Riebeeckhaven (thv Nuon)", 
        "Coenhaven", "Mercuriushaven", "Amerikahaven (kop Aziëhaven)", 
        "Mercuriushaven", "Amerikahaven kop Australiëhaven", "Het IJ (thv het Keerkringpark)", 
        "Amerikahaven kop Australiëhaven", "Coenhaven", "Jan van Riebeeckhaven (thv Nuon)", 
        "Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)", "Het IJ (thv het Keerkringpark)", 
        "Jan van Riebeeckhaven (kop NZK)", "Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)", 
        "Het IJ (thv EyE)", "Het IJ (thv EyE)", "Het IJ (thv het Keerkringpark)", 
        "Westhaven en ADM-haven (kop)", "Het IJ (kop Noordhollandsch kanaal)", 
        "Amerikahaven kop Australiëhaven", "Jan van Riebeeckhaven (thv Nuon)", 
        "Amerikahaven (kop Aziëhaven)", "Jan van Riebeeckhaven (kop NZK)", 
        "Westhaven en ADM-haven (kop)", "Jan van Riebeeckhaven (thv Nuon)", 
        "Amerikahaven kop Australiëhaven", "Het IJ (thv EyE)", "Amerikahaven (kop Aziëhaven)", 
        "Mercuriushaven", "Westhaven en ADM-haven (kop)", "Amerikahaven kop Australiëhaven", 
        "Minervahaven", "Westhaven en ADM-haven (kop)", "Westhaven en ADM-haven (kop)", 
        "Amerikahaven kop Australiëhaven", "Amerikahaven (kop Aziëhaven)", 
        "Amerikahaven kop Australiëhaven", "Jan van Riebeeckhaven (thv Nuon)", 
        "Westhaven en ADM-haven (kop)", "Petroleumhaven", "Westhaven en ADM-haven (kop)", 
        "Het IJ (thv EyE)", "Het IJ (kop Noordhollandsch kanaal)", 
        "Het IJ (thv EyE)", "Westhaven en ADM-haven (kop)"), abund = c(5, 
        25, 2, 15, 3, 4, 1, 176, 4, 1, 1, 4, 55, 1, 1, 37, 75, 11, 
        1, 121, 4, 2, 2, 412, 38, 1, 5, 2, 443, 2, 6, 12, 1, 10, 
        33, 14, 120, 377, 67, 29, 43, 524, 4, 31, 18, 5, 18, 1, 9, 
        31), n = c(6L, 4L, 4L, 7L, 3L, 3L, 3L, 3L, 7L, 4L, 5L, 3L, 
        8L, 4L, 8L, 7L, 8L, 6L, 4L, 7L, 7L, 4L, 4L, 7L, 8L, 6L, 5L, 
        8L, 8L, 5L, 6L, 7L, 3L, 3L, 8L, 8L, 3L, 8L, 8L, 8L, 6L, 8L, 
        7L, 8L, 3L, 4L, 7L, 3L, 7L, 4L)), row.names = c(NA, -50L), class = c("tbl_df", 
    "tbl", "data.frame"), na.action = structure(c(`47` = 47L, `52` = 52L, 
    `60` = 60L, `88` = 88L, `128` = 128L, `401` = 401L, `488` = 488L, 
    `593` = 593L, `633` = 633L), class = "omit"))

最佳答案

all_sites <- unique(df$site)
all_species <- unique(df$species)

slopes <- expand.grid(all_sites, all_species)
names(slopes) <- c('site','species')
slopes$coef <- NA_real_
  
for (site in all_sites) {
  for (species in all_species){
    this_subset <- df[(df$site==site & df$species==species),]
    if (nrow(this_subset)<2) next;
    y <- this_subset$abund
    x <- this_subset$year
    cat('\n',site,species,nrow(this_subset),sum(is.na(x)),sum(is.na(y)))
    slopes[slopes$site==site & slopes$species==species, ]$coef <- coef(lm(y ~ x))[2]
  }
}

关于r - R中的for循环回归分析,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/64823367/

相关文章:

r - 为什么 dplyr::cummean(x) 不等于 cumsum(x)/seq_along(x)?

c++ - 为什么进程返回-1?

apache-spark - Spark MLLib : convert arbitrary,稀疏特征到固定长度向量

python - 如何使用 GridSearchCV 获得每组参数的预测?

r - 在 POSIXct 样本中生成随机时间

r - 在 R 中选择列的规范方法

r - 如何使用 RCurl 与 reddit 进行身份验证

python - 遍历文件夹中的文件

java - java中如何使用多维数组获取每一行的总和?

python - 在类内的函数之间传递变量