- 泊松回归介绍
泊松回归适用于在给定时间内响应变量为事件发生数目的情况,它假设 Y 服从泊松分布,线性模型的拟合形式为

其中
是 Y 的均值(也等于方差)。此时,连接函数为
,概率分布为泊松分布,拟合泊松回归模型如下


glm(Y ~ X1 + X2 + X3, family=poisson(link="log"),data=mydata)
1
1
1
glm(Y ~ X1 + X2 + X3, family=poisson(link="log"),data=mydata)
例
使用robust包中的breslow(癫痫数据)数据
响应变量为sumy(随机化后八周内癫痫发病数),预测变量为治疗条件(Trt)、年龄(Age)、前八周的基础癫痫发病数(Base)
观看基础的癫痫发病数和年龄,对响应变量的潜在影响,最后看下药物治疗能否减少癫痫的发病数
> data(breslow.dat,package = "robust") #导入robust包中的breslow数据
> names(breslow.dat) #变量名称
[1] "ID" "Y1" "Y2" "Y3" "Y4" "Base" "Age" "Trt" "Ysum" "sumY" "Age10"
[12] "Base4"
> summary(breslow.dat[c(6,7,8,10)]) #获得6、7、8、10的变量数据等同于 breslow.dat[,c(6,7,8,10)]
Base Age Trt sumY
Min. : 6.00 Min. :18.00 placebo :28 Min. : 0.00
1st Qu.: 12.00 1st Qu.:23.00 progabide:31 1st Qu.: 11.50
Median : 22.00 Median :28.00 Median : 16.00
Mean : 31.22 Mean :28.34 Mean : 33.05
3rd Qu.: 41.00 3rd Qu.:32.00 3rd Qu.: 36.00
Max. :151.00 Max. :42.00 Max. :302.00
#绘制图形观察基本的情况
> opar <- par(no.readonly = TRUE) #复制一份图形设置
> par(mfrow = c(1,2)) #修改参数
> attach(breslow.dat)
> hist(sumY,breaks = 20,xlab="Seizure Count",
+ main="Distribution of Seizure")
> boxplot(sumY~Trt,xlab ="Treatment",main="Group Comparisons")
> par(opar) #还原原来的设置
22
22
1
> data(breslow.dat,package = "robust") #导入robust包中的breslow数据
2
> names(breslow.dat) #变量名称
3
[1] "ID" "Y1" "Y2" "Y3" "Y4" "Base" "Age" "Trt" "Ysum" "sumY" "Age10"
4
[12] "Base4"
5
> summary(breslow.dat[c(6,7,8,10)]) #获得6、7、8、10的变量数据等同于 breslow.dat[,c(6,7,8,10)]
6
Base Age Trt sumY
7
Min. : 6.00 Min. :18.00 placebo :28 Min. : 0.00
8
1st Qu.: 12.00 1st Qu.:23.00 progabide:31 1st Qu.: 11.50
9
Median : 22.00 Median :28.00 Median : 16.00
10
Mean : 31.22 Mean :28.34 Mean : 33.05
11
3rd Qu.: 41.00 3rd Qu.:32.00 3rd Qu.: 36.00
12
Max. :151.00 Max. :42.00 Max. :302.00
13
14
#绘制图形观察基本的情况
15
> opar <- par(no.readonly = TRUE) #复制一份图形设置
16
> par(mfrow = c(1,2)) #修改参数
17
> attach(breslow.dat)
18
19
> hist(sumY,breaks = 20,xlab="Seizure Count",
20
+ main="Distribution of Seizure")
21
> boxplot(sumY~Trt,xlab ="Treatment",main="Group Comparisons")
22
> par(opar) #还原原来的设置

因变量的偏倚特性以及可能的离群点,初看图形时,药物治疗下癫痫发病数似乎变小,且方差也变小了(泊松分布中,较小的方差伴随着较小的均值)。
与标准最小二乘法回归不同,泊松回归并不关注方差异质性
#拟合泊松回归
> fit <- glm(sumY ~ Base + Age + Trt, data=breslow.dat, family=poisson()) # family = poisson()
> summary(fit)
Call:
glm(formula = sumY ~ Base + Age + Trt, family = poisson(), data = breslow.dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-6.0569 -2.0433 -0.9397 0.7929 11.0061
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.9488259 0.1356191 14.370 < 2e-16 ***
Base 0.0226517 0.0005093 44.476 < 2e-16 ***
Age 0.0227401 0.0040240 5.651 1.59e-08 ***
Trtprogabide -0.1527009 0.0478051 -3.194 0.0014 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2122.73 on 58 degrees of freedom
Residual deviance: 559.44 on 55 degrees of freedom
AIC: 850.71
Number of Fisher Scoring iterations: 5
x
27
1
#拟合泊松回归
2
> fit <- glm(sumY ~ Base + Age + Trt, data=breslow.dat, family=poisson()) # family = poisson()
3
> summary(fit)
4
5
Call:
6
glm(formula = sumY ~ Base + Age + Trt, family = poisson(), data = breslow.dat)
7
8
Deviance Residuals:
9
Min 1Q Median 3Q Max
10
-6.0569 -2.0433 -0.9397 0.7929 11.0061
11
12
Coefficients:
13
Estimate Std. Error z value Pr(>|z|)
14
(Intercept) 1.9488259 0.1356191 14.370 < 2e-16 ***
15
Base 0.0226517 0.0005093 44.476 < 2e-16 ***
16
Age 0.0227401 0.0040240 5.651 1.59e-08 ***
17
Trtprogabide -0.1527009 0.0478051 -3.194 0.0014 **
18
---
19
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
20
21
(Dispersion parameter for poisson family taken to be 1)
22
23
Null deviance: 2122.73 on 58 degrees of freedom
24
Residual deviance: 559.44 on 55 degrees of freedom
25
AIC: 850.71
26
27
Number of Fisher Scoring iterations: 5
发现 age,base,Trtprogabide都是显著的,需要进行过度离势检测