我完全迷失在尝试根据变量(在我的例子中是邮政编码)来计算我的疾病患病率。我已经尝试了一切,但似乎没有任何效果:(
我知道疾病患病率很容易计算(患病总数除以总人口),但它不允许我对病例进行求和,并按邮政编码对人口进行求和,然后将它们分开。
我试图计算患病率的列称为“莱姆病”,它是一个逻辑变量(0=负,1=正)。然后“FSA”一栏是我的邮政编码。请帮忙!
这是我的代码:
Data.All.df <- data.frame(Data.All) ## Create Data Frame from Data file
Data.All.df.2008 <- subset(Data.All.df, Year=="2008") ##only use 2008
library(dplyr)
Data.All.df.2008 <- Data.All.df.2008 %>%
group_by(FSA) %>%
mutate_each(funs(Cases = ((Lyme=="1")/((Lyme=="0")+(Lyme=="1")))))```
X.1 X Source Patient Accession Customer Year Date Country City Province Postal Name Age Gender Species Breed SNAP Apspp Ehrspp HW Lyme Coinfections dupID FSA
1710 4913 4913 Veterinary Clinic Bronson Sprartacus796575981360 7.97e+13 79657 2008 2008-01-08 Canada WINDSOR ON N8N 3T4 Bronson Sprartacus 132 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8N
1711 4915 4915 Veterinary Clinic Scotty9233669481432 9.23e+13 92336 2008 2008-01-08 Canada WINDSOR ON N8R 1A5 Scotty 84 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8R
1712 4916 4916 Veterinary Clinic Hershey9233683161435 9.23e+13 92336 2008 2008-01-08 Canada WINDSOR ON N8R 1A5 Hershey 48 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8R
1713 4918 4918 Veterinary Clinic Brandy7965736441362 7.97e+13 79657 2008 2008-01-09 Canada WINDSOR ON N8N 3T4 Brandy 156 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8N
1714 4919 4919 Veterinary Clinic Trish9233699481443 9.23e+13 92336 2008 2008-01-10 Canada WINDSOR ON N8R 1A5 Trish 132 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8R
1715 4929 4929 Veterinary Clinic Lexie8001685020761364 8.00e+13 80016 2008 2008-01-17 Canada HALIFAX NS B3L 2C2 Lexie 29 Spayed Canine Non-Sporting 4Dx 0 0 0 0 0 TRUE B3L
1716 4937 4937 Veterinary Clinic CUBBIE79700431 7.97e+12 79700 2008 2008-01-21 Canada DARTMOUTH NS B2W 2N3 CUBBIE 118 Spayed Canine Non-Sporting 4Dx 0 0 0 0 0 TRUE B2W
1717 4945 4945 Veterinary Clinic Stevie7965765291433 7.97e+13 79657 2008 2008-01-25 Canada WINDSOR ON N8N 3T4 Stevie 36 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8N
1718 4947 4947 Veterinary Clinic Bailey9233644191501 9.23e+13 92336 2008 2008-01-25 Canada WINDSOR ON N8R 1A5 Bailey 132 Not Specified Canine Not Specified 4Dx 0 0 0 0 0 TRUE N8R
1719 4948 4948 Veterinary Clinic ZAK925369448482 9.25e+12 92536 2008 2008-01-25 Canada HUNTSVILLE ON P1H 1B5 ZAK 96 Neutered Canine Hound 4Dx 0 0 0 0 0 TRUE P1H
17
最佳答案
使用以下最小示例数据:
# Generate data.
set.seed(0934)
Data.All.df.2008 <- data.frame(FSA = sample(c("N8N", "N8R", "B3L", "P1H"), 50, T),
Lyme = sample(0:1, 50, T),
stringsAsFactors = F)
# First 10 observations.
head(Data.All.df.2008)
# FSA Lyme
# 1 N8N 1
# 2 P1H 1
# 3 N8N 0
# 4 P1H 0
# 5 N8N 1
# 6 N8N 1
患病率可以通过阳性诊断数除以观察总数来计算,即 sum(Lyme)/n()
。适当的函数是summarise
:
library(dplyr)
Data.All.df.2008 %>%
group_by(FSA) %>%
summarise(Prevalence = sum(Lyme)/n())
# # A tibble: 4 x 2
# FSA Prevalence
# <chr> <dbl>
# 1 B3L 0.778
# 2 N8N 0.571
# 3 N8R 0.583
# 4 P1H 0.467
关于r - 如何通过 R 中的变量计算疾病患病率,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59941654/