R - 从数据框中过滤数据

标签 r filter dataframe

我是 R 的新手,真的不确定如何过滤日期框架中的数据。

我创建了一个包含两列的数据框,其中包括每月日期和相应的温度。它的长度为 324。

> head(Nino3.4_1974_2000)
  Month_common               Nino3.4_degree_1974_2000_plain
1   1974-01-15                       -1.93025
2   1974-02-15                       -1.73535
3   1974-03-15                       -1.20040
4   1974-04-15                       -1.00390
5   1974-05-15                       -0.62550
6   1974-06-15                       -0.36915


我已经消除了温度低于 0.5 度的数据(见下文)。
for (i in 1) {
el_nino=Nino3.4_1974_2000[which(Nino3.4_1974_2000$Nino3.4_degree_1974_2000_plain >= 0.5),]

> head(el_nino)
   Month_common               Nino3.4_degree_1974_2000_plain
32   1976-08-15                      0.5192000
33   1976-09-15                      0.8740000
34   1976-10-15                      0.8864501
35   1976-11-15                      0.8229501
36   1976-12-15                      0.7336500
37   1977-01-15                      0.9276500




temps <- Nino3.4_1974_2000$Nino3.4_degree_1974_2000_plain

因此,由于该向量中的每个温度总是相隔一个月,我们只需要寻找 temps[i]>=0.5 处的运行。 ,并且运行时间必须至少为 5。

ofinterest <- temps >= 0.5

我们将有一个向量 ofinterestTRUE FALSE FALSE TRUE TRUE ....等它在哪里TRUEtemps[i]是 >= 0.5 和 FALSE除此以外。

为了重新表述您的问题,我们只需要查找 的出现情况。至少五个 TRUE连续 .

为此,我们可以使用函数 rle . ?rle给出:
> ?rle
     Compute the lengths and values of runs of equal values in a vector
     - or the reverse operation.
     ‘rle()’ returns an object of class ‘"rle"’ which is a list with
 lengths: an integer vector containing the length of each run.
  values: a vector of the same length as ‘lengths’ with the
          corresponding values.

所以我们使用 rle计算连续 TRUE 的所有条纹连续连续FALSE连续,并寻找至少 5 TRUE连续。

# for you, temps <- Nino3.4_1974_2000$Nino3.4_degree_1974_2000_plain
temps <- runif(1000) 

# make a vector that is TRUE when temperature is >= 0.5 and FALSE otherwise
ofinterest <- temps >= 0.5

# count up the runs of TRUEs and FALSEs using rle:
runs <- rle(ofinterest) 

# we need to find points where runs$lengths >= 5 (ie more than 5 in a row), 
# AND runs$values is TRUE (so more than 5 'TRUE's in a row).
streakIs <- which(runs$lengths>=5 & runs$values)

# these are all the el_nino occurences. 
# We need to convert `streakIs` into indices into our original `temps` vector.
# To do this we add up all the `runs$lengths` up to `streakIs[i]` and that gives
#  the index into `temps`.
# that is:
# startMonths <- c()
# for ( n in streakIs ) {
#     startMonths <- c(startMonths,   sum(runs$lengths[1:(n-1)]) + 1
# }
# However, since this is R we can vectorise with sapply:
startMonths <- sapply(streakIs, function(n) sum(runs$lengths[1:(n-1)])+1)

现在如果你这样做 Nino3.4_1974_2000$Month_common[startMonths]您将获得厄尔尼诺现象开始的所有月份。

runs <- rle(Nino3.4_1974_2000$Nino3.4_degree_1974_2000_plain>=0.5) 
streakIs <- which(runs$lengths>=5 & runs$values)
startMonths <- sapply(streakIs, function(n) sum(runs$lengths[1:(n-1)])+1)

r - R data.table 中的条件唯一计数