我有一个大型数据集,其中每个站点都有相同的纬度和经度。在数据集中,某些行缺少纬度和经度,而是显示“未知”。我需要用其他数据不丢失的站点的经纬度来填充未知数。
在此示例中,我希望第 5 行插入 3 和 8 作为纬度和经度:
> station <- c("a","b","c","c","c")
> lat <- c("1","2","3","3","unknown")
> lon <- c("6","7","8","8","unknown")
> data.frame(station,lat,lon)
station lat lon
1 a 1 6
2 b 2 7
3 c 3 8
4 c 3 8
5 c unknown unknown
我的数据集中有一百万行,如果需要几分钟才能完成也没关系,因为它在分析开始之前只运行一次。除非确实有必要,否则我宁愿不安装另一个软件包。
最佳答案
也许是这样的 -
df$station <- as.character(df$station)
unknownstations <- unique(subset(df,df$lat == "unknown","station"))
unknownstationscoords <- unique(subset(df,station %in% unknownstations$station & lat != "unknown"))
for( i in unknownstations$station)
{
df[df$station == i,"lat"] <- subset(unknownstationscoords,station %in% i,"lat")
df[df$station == i,"lon"] <- subset(unknownstationscoords,station %in% i,"lon")
}
关于R数据框根据其他数据框填充缺失值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19750708/