随机森林实现需要对每棵树的每个节点处的训练样本值进行排序,以确定要设置的最佳阈值。我想知道 R 包 randomForest 使用哪种排序算法。我找不到任何相关信息。非常感谢!
最佳答案
randomForest
包为 Breiman 和 Cutler 的 Fortran 程序提供了一个 R 接口(interface)(有关更多详细信息,请参阅 here)。
Fortran 代码可用 here .
用于排序的子例程如下(基于“改进的霍尔快速排序”):
subroutine quicksort(v,iperm,ii,jj,kk)
c
c puts into iperm the permutation vector which sorts v into
c increasing order. only elementest from ii to jj are considered.
c array iu(k) and array il(k) permit sorting up to 2**(k+1)-1 elements
c
c this is a modification of acm algorithm #347 by r. c. singleton,
c which is a modified hoare quicksort.
c
real v(kk),vt,vtt
integer t,tt,iperm(kk),iu(32),il(32)
integer ii,jj,kk,m,i,j,k,ij,l
c
m=1
i=ii
j=jj
10 if (i.ge.j) go to 80
20 k=i
ij=(j+i)/2
t=iperm(ij)
vt=v(ij)
if (v(i).le.vt) go to 30
iperm(ij)=iperm(i)
iperm(i)=t
t=iperm(ij)
v(ij)=v(i)
v(i)=vt
vt=v(ij)
30 l=j
if (v(j).ge.vt) go to 50
iperm(ij)=iperm(j)
iperm(j)=t
t=iperm(ij)
v(ij)=v(j)
v(j)=vt
vt=v(ij)
if (v(i).le.vt) go to 50
iperm(ij)=iperm(i)
iperm(i)=t
t=iperm(ij)
v(ij)=v(i)
v(i)=vt
vt=v(ij)
go to 50
40 iperm(l)=iperm(k)
iperm(k)=tt
v(l)=v(k)
v(k)=vtt
50 l=l-1
if (v(l).gt.vt) go to 50
tt=iperm(l)
vtt=v(l)
60 k=k+1
if (v(k).lt.vt) go to 60
if (k.le.l) go to 40
if (l-i.le.j-k) go to 70
il(m)=i
iu(m)=l
i=k
m=m+1
go to 90
70 il(m)=k
iu(m)=j
j=l
m=m+1
go to 90
80 m=m-1
if (m.eq.0) return
i=il(m)
j=iu(m)
90 if (j-i.gt.10) go to 20
if (i.eq.ii) go to 10
i=i-1
100 i=i+1
if (i.eq.j) go to 80
t=iperm(i+1)
vt=v(i+1)
if (v(i).le.vt) go to 100
k=i
110 iperm(k+1)=iperm(k)
v(k+1)=v(k)
k=k-1
if (vt.lt.v(k)) go to 110
iperm(k+1)=t
v(k+1)=vt
go to 100
end
关于r - 训练期间的节点排序算法(R : randomForest),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48665559/