Summary Nonparametric Dynamic Trend of Spatio-Temporal Model with Spatial-Time Interaction
INLA는 잠재 가우스 모형 (latent gaussian model)에 특화된 방법이며 복잡한 사후 분포를 단순화된 라플라스 근사치를 이용하고 적분을 유한개의 가중 합으로 계산하기 때문에 MCMC에 비해 비교적 짧은 계산시간에 정확한 결과를 제공한다는 이점이 있다. 이러한 장점을 이용하여 2017년 ~ 2019년에 서울에서 발생한 5대강력범죄를 분석하기 위해 R-INLA (Integrated Nested Laplace Approximations)를 적용해보았다. 데이터는 서울 열린데이터광장에서 수집하였다. 게다가 범죄발생은 공간과 밀접한 관련이 있다는 것은 잘 알려진 것이며, 공간의 가변적 특성을 고려한다면
시공간 모형(Spatio-Temporal Model)
을 적용하여 분석하는 것이 적절하다. 공간 모형으로는BYM Model
, 시간 모형으로는"Nonparametric Dynamic Trend"
을 이용하였다.
서울 강간강제추행범죄는 count data이기 때문에 Poisson regression이 사용된다. \(i\)번째 자치구의 \(t\)시간에서 발생한 5대 강력범죄 건수를 \(y_{it}\)라고 할 때, \[\begin{align*} y_{it} &\thicksim Poisson(\lambda_{it}), \;\; \lambda_{it} = E_{it}\rho_{it}\\ \eta_{it} &= \log{\rho_{it}} = b_{0} + u_{i} + v_{i} + \gamma_{t} + \phi_{t} + \delta_{it} \end{align*}\]
서울의 5대강력범죄에 대한 공간 패턴과 시간 패턴을 파악하기 위해서 공간 모형으로는 공간 모형으로는 BYM Model
, 시간 모형으로는 "Nonparametric Dynamic Trend"
을 이용한다. 하지만 이 두 모형은 전반적인 공간과 시간 패턴을 파악하기 때문에, 모든 시간 또는 모든 공간에 대해 변하지 않는다. 그렇기 때문에 시공간 상호작용은 공간과 시간의 주요 요인으로 설명되지 않는 추과 효과를 포착
할 수 있다. 또한 시간에 따른 공간 패턴
도 파악 할 수 있으며, 각 지역의 추세
도 파악할 수 있다. 시공간 상호작용은 다음과 같은 네가지의 Type
을 가진다.
Interaction | Parameter interacting |
---|---|
Type I | \(v_{i}\) and \(\phi_t\) |
Type II | \(v_{i}\) and \(\gamma_t\) |
Type III | \(u_{i}\) and \(\phi_t\) |
Type IV | \(u_{i}\) and \(\gamma_t\) |
2017년 ~ 2019년에 서울에서 발생한 5대강력범죄를 분석하기 위해 R-INLA (Integrated Nested Laplace Approximations)를 이용하여 시공간모형을 적용해보았다. 데이터는 서울 열린데이터광장에서 수집하였다.
pacman::p_load("maptools", # For readShapePoly
"spdep", # For poly2nb
"dplyr",
"RColorBrewer", # For brewer.pal
"ggplot2",
"INLA")
dat.2017 <- read.csv("2017_crime.csv",header=T)
dat.2018 <- read.csv("2018_crime.csv",header=T)
dat.2019 <- read.csv("2019_crime.csv",header=T)
# Convert rows in the order of ESPI_PK
dat.2017 <- dat.2017[order(dat.2017$ESRI_PK),]
dat.2018 <- dat.2018[order(dat.2018$ESRI_PK),]
dat.2019 <- dat.2019[order(dat.2019$ESRI_PK),]
# Change for year
dat.2017 <- dat.2017 %>%
mutate(year = 1)
dat.2018 <- dat.2018 %>%
mutate(year = 2)
dat.2019 <- dat.2019 %>%
mutate(year = 3)
# Combining for data
dat <- rbind(dat.2017, dat.2018, dat.2019)
seoul.map <- maptools::readShapePoly("./TL_SCCO_SIG_W_SHP/TL_SCCO_SIG_W.shp") # Call .shp file
seoul.nb <- poly2nb(seoul.map) # Builds a neighbours list based on regions with contiguous boundaries
seoul.listw <- nb2listw(seoul.nb) # Supplements a neighbours list with spatial weights for the chosen coding scheme
seoul.mat <- nb2mat(seoul.nb) # Generates a weights matrix for a neighbours list with spatial weights for the chosen coding scheme
# Object of class "nb"
흔하게 사용하는 예상되는 수 (\(E_{it}\))는 \[\begin{align*} E_{it} = n_{it}\frac{\sum_{t}\sum_{i} y_{it}}{\sum_{t}\sum_{i} n_{it}}, \end{align*}\]
(참고 : Bayesian Disease Mapping: Hierarchical Modeling in Spatial Epidemiology p.293)
INLA에서는 근접행렬을 이용하여 그래프로 나타낼 수 있다.
H <- inla.read.graph(filename="Seoul.graph")
image(inla.graph2matrix(H),xlab="",ylab="")
Type I 상호작용은 두 개의 비구조화된 효과 \(v_{i}\) 와 \(\phi_t\) 상호작용이다. 이 때 상호작용효과 \(\delta_{it}\thicksim N(0, 1/\tau_{\delta})\), where \(\tau_{\delta}=1/\sigma^2_{\delta}\)로 가정된다.
f(area.year, model="iid")
dat$ID.area <- 1:25 # The Identifiers For The Boroughs
dat$year1 <- dat$year # The Identifiers For year
dat$area.year <- 1:dim(dat)[1] # Interaction Index (25 areas * 3 years)
formulaI <- crime ~ 1 + f(ID.area, model="bym", graph = seoul.adj) +
f(year, model="rw1") + # RW of order1
f(year1, model="iid") + # phi_t
f(area.year, model="iid") # Interaction
lcs <- inla.make.lincombs(year = diag(3), year1 = diag(3)) # Linear Combination gamma_{t} + phi_{t}
model.intI <- inla(formulaI, family = "poisson", data = dat, E = E, # E = E or offset = log(E)
lincomb = lcs,
control.predictor=list(compute=TRUE), # Compute the marginals of the model parameters
control.compute = list(dic = TRUE)) # Compute some model choice criteria
summary(model.intI)
Call:
c("inla(formula = formulaI, family = \"poisson\", data = dat,
E = E, ", " lincomb = lcs, control.compute = list(dic = TRUE),
control.predictor = list(compute = TRUE))" )
Time used:
Pre = 0.878, Running = 0.68, Post = 0.146, Total = 1.7
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.022 0.073 -0.123 0.022 0.167 0.022 0
Linear combinations (derived):
ID mean sd 0.025quant 0.5quant 0.975quant mode kld
lc1 1 0.019 0.013 -0.004 0.018 0.048 0.016 0
lc2 2 -0.015 0.013 -0.044 -0.014 0.006 -0.012 0
lc3 3 -0.004 0.012 -0.028 -0.004 0.021 -0.003 0
Random effects:
Name Model
ID.area BYM model
year RW1 model
year1 IID model
area.year IID model
Model hyperparameters:
mean sd
Precision for ID.area (iid component) 8.04 2.25
Precision for ID.area (spatial component) 1929.51 1927.42
Precision for year 22721.45 28607.64
Precision for year1 15985.06 32507.37
Precision for area.year 417.00 95.97
0.025quant 0.5quant
Precision for ID.area (iid component) 4.37 7.80
Precision for ID.area (spatial component) 132.18 1358.93
Precision for year 1326.40 13942.18
Precision for year1 795.64 7215.41
Precision for area.year 257.03 407.60
0.975quant mode
Precision for ID.area (iid component) 13.16 7.34
Precision for ID.area (spatial component) 7068.68 362.09
Precision for year 97427.97 3501.36
Precision for year1 85832.58 1982.98
Precision for area.year 632.35 389.89
Expected number of effective parameters(stdev): 70.30(0.937)
Number of equivalent replicates : 1.07
Deviance Information Criterion (DIC) ...............: 906.82
Deviance Information Criterion (DIC, saturated) ....: 146.44
Effective number of parameters .....................: 70.56
Marginal log-Likelihood: -578.39
Posterior marginals for the linear predictor and
the fitted values are computed
Spatio-Temporal Interaction Model에서 시간에 따른 공간패턴
을 파악하기 위하여 상호작용 효과에 대한 그래프를 그려보았다.
# Random Effect for spatial structure (ui+vi)
del <- model.intI$marginals.random$area.year # Extract the marginal posterior distribution for each element of the random effects
exdel <- lapply(del,function(x) inla.emarginal(exp,x)) # The exponential transformation and calculate the posterior mean for each of them
# Define the cutoff for zeta
exdel.cutoff <- c(0.9, 1.0, 1.1, 1.2)
# Transform in categorical variable
exdel.cat <- cut(unlist(exdel),breaks=exdel.cutoff,
include.lowest=TRUE)
maps.cat.del <- data.frame(ESRI_PK=dat.2019$ESRI_PK, del1.2017=exdel.cat[1:25],
del1.2018=exdel.cat[26:50], del1.2019=exdel.cat[51:75])
# For adding name of Seoul
lbls <- as.character(seoul.map$SIG_ENG_NM) # 자치구명
spl <- list('sp.text', coordinates(seoul.map), lbls, cex=.7)
#Add the categorized zeta to the spatial polygon
data.boroughs <- attr(seoul.map, "data")
attr(seoul.map, "data") <- merge(data.boroughs, maps.cat.del,
by="ESRI_PK")
# Map
spplot(obj=seoul.map, zcol= c("del1.2017", "del1.2018", "del1.2019"),
names.attr = c("2017", "2018", "2019"), # Changes Names
sp.layout = spl,
col.regions=brewer.pal(4,"Blues"), as.table=TRUE)
Type II 상호작용은 비구조화된 공간 효과 \(v_{i}\) 와 구조화된 시간 효과 \(\gamma_t\) 상호작용이다. 이 상호작용은 랜덤 워크(Random Walk)
를 통해서 명시된 이웃 구조(Neighborhood Structure)이다. 즉, \(i\)번째 지역에 대한 모수 벡터 \(\delta_{i1},\ldots,\delta_{iT}\)는 서로 다른 지역들과 독립이며, 시간 요소에 대해서 랜덤 워크
를 가진다. 그러므로 Type II 상호작용은 시간적 추세가 지역마다 다르지만 어떠한 공간 구조도 가지고 있지 않을 때 적절
하다.
ID.area.int <- dat$ID.area
year.int <- dat$year
formulaII <- crime ~ 1 + f(ID.area, model="bym", graph = seoul.adj) +
f(year, model="rw1") + # RW of order1
f(year1, model="iid") + # phi_t
f(ID.area.int, model="iid", group=year.int,
control.group=list(model="rw1")) # Interaction
lcs <- inla.make.lincombs(year = diag(3), year1 = diag(3)) # Linear Combination gamma_{t} + phi_{t}
model.intII <- inla(formulaII, family = "poisson", data = dat, E = E, # E = E or offset = log(E)
lincomb = lcs,
control.predictor=list(compute=TRUE), # Compute the marginals of the model parameters
control.compute = list(dic = TRUE)) # Compute some model choice criteria
summary(model.intII)
Call:
c("inla(formula = formulaII, family = \"poisson\", data = dat,
E = E, ", " lincomb = lcs, control.compute = list(dic = TRUE),
control.predictor = list(compute = TRUE))" )
Time used:
Pre = 0.59, Running = 16.3, Post = 0.132, Total = 17
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) -0.013 23170.47 -45491.54 -0.665 45453.26 -0.013 0
Linear combinations (derived):
ID mean sd 0.025quant 0.5quant 0.975quant mode kld
lc1 1 0.018 0.011 -0.003 0.018 0.041 0.018 0
lc2 2 -0.018 0.010 -0.038 -0.018 0.001 -0.017 0
lc3 3 0.000 0.011 -0.022 0.000 0.021 0.000 0
Random effects:
Name Model
ID.area BYM model
year RW1 model
year1 IID model
ID.area.int IID model
Model hyperparameters:
mean sd
Precision for ID.area (iid component) 2355.01 764.83
Precision for ID.area (spatial component) 1092.06 380.30
Precision for year 32326.89 11090.69
Precision for year1 8755.95 4215.68
Precision for ID.area.int 1002.66 287.75
0.025quant 0.5quant
Precision for ID.area (iid component) 1146.82 2263.19
Precision for ID.area (spatial component) 505.38 1041.91
Precision for year 15874.43 30575.12
Precision for year1 4026.80 7639.48
Precision for ID.area.int 488.57 993.15
0.975quant mode
Precision for ID.area (iid component) 4114.25 2080.00
Precision for ID.area (spatial component) 1976.22 941.94
Precision for year 58955.59 27369.08
Precision for year1 19793.92 5960.92
Precision for ID.area.int 1588.88 968.85
Expected number of effective parameters(stdev): 70.26(8.79)
Number of equivalent replicates : 1.07
Deviance Information Criterion (DIC) ...............: 906.18
Deviance Information Criterion (DIC, saturated) ....: 145.81
Effective number of parameters .....................: 69.25
Marginal log-Likelihood: -531.13
Posterior marginals for the linear predictor and
the fitted values are computed
# Random Effect for spatial structure (ui+vi)
del <- model.intII$marginals.random$ID.area.int # Extract the marginal posterior distribution for each element of the random effects
edel <- lapply(del,function(x) inla.emarginal(function(x) x, x)) # Calculate the posterior mean for each of them
# Define the cutoff for zeta
del.cutoff <- c(-0.7,-0.4,-0.1,0.2,0.5,0.8,1.2)
# Transform in categorical variable
edel.cat <- cut(unlist(edel),breaks=del.cutoff,
include.lowest=TRUE)
maps.cat.del2 <- data.frame(ESRI_PK=dat.2019$ESRI_PK, del2.2017=edel.cat[1:25],
del2.2018=edel.cat[26:50], del2.2019=edel.cat[51:75])
# For adding name of Seoul
lbls <- as.character(seoul.map$SIG_ENG_NM) # 자치구명
spl <- list('sp.text', coordinates(seoul.map), lbls, cex=.7)
#Add the categorized zeta to the spatial polygon
data.boroughs <- attr(seoul.map, "data")
attr(seoul.map, "data") <- merge(data.boroughs, maps.cat.del2,
by="ESRI_PK")
# Map
spplot(obj=seoul.map, zcol= c("del2.2017", "del2.2018", "del2.2019"),
names.attr = c("2017", "2018", "2019"), # Changes Names
sp.layout = spl,
col.regions=brewer.pal(6,"Blues"), as.table=TRUE)
Type III 상호작용은 구조화된 공간 효과 \(u_{i}\) 와 비구조화된 시간 효과 \(\phi_t\) 상호작용이다. 이 상호작용은 CAR
를 통해서 정의된 이웃 구조(Neighborhood Structure)이다. 즉, \(t\) 시간점의 모수 \(\delta_{1},\ldots,\delta_{n}\)은 다른 시점들로부터 독립인 공간 구조를 가진다. 그러므로 Type III 상호작용은 공간 구조가 시점마다 다를 때 적절
하다.
ID.area.int <- dat$ID.area
year.int <- dat$year
formulaIII <- crime ~ 1 + f(ID.area, model="bym", graph = seoul.adj) +
f(year, model="rw1") + # RW of order1
f(year1, model="iid") + # phi_t
f(year.int,model="iid", group=ID.area.int,
control.group=list(model="besag",
graph=seoul.adj))
lcs <- inla.make.lincombs(year = diag(3), year1 = diag(3)) # Linear Combination gamma_{t} + phi_{t}
model.intIII <- inla(formulaIII, family = "poisson", data = dat, E = E, # E = E or offset = log(E)
lincomb = lcs,
control.predictor=list(compute=TRUE), # Compute the marginals of the model parameters
control.compute = list(dic = TRUE)) # Compute some model choice criteria
summary(model.intIII)
Call:
c("inla(formula = formulaIII, family = \"poisson\", data =
dat, E = E, ", " lincomb = lcs, control.compute = list(dic =
TRUE), control.predictor = list(compute = TRUE))" )
Time used:
Pre = 0.638, Running = 9.88, Post = 0.143, Total = 10.7
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.011 0.641 -1.247 0.011 1.267 0.011 0
Linear combinations (derived):
ID mean sd 0.025quant 0.5quant 0.975quant mode kld
lc1 1 0 0.015 -0.030 0 0.030 0 0
lc2 2 0 0.013 -0.026 0 0.026 0 0
lc3 3 0 0.015 -0.030 0 0.030 0 0
Random effects:
Name Model
ID.area BYM model
year RW1 model
year1 IID model
year.int IID model
Model hyperparameters:
mean sd
Precision for ID.area (iid component) 8.09 2.28
Precision for ID.area (spatial component) 1858.81 1834.91
Precision for year 18755.46 18425.61
Precision for year1 18744.35 18419.26
Precision for year.int 267.05 62.09
0.025quant 0.5quant
Precision for ID.area (iid component) 4.38 7.84
Precision for ID.area (spatial component) 125.17 1316.78
Precision for year 1262.62 13321.41
Precision for year1 1262.32 13311.85
Precision for year.int 164.04 260.75
0.975quant mode
Precision for ID.area (iid component) 13.27 7.38
Precision for ID.area (spatial component) 6714.20 341.26
Precision for year 67493.04 3451.86
Precision for year1 67467.26 3450.38
Precision for year.int 407.08 248.84
Expected number of effective parameters(stdev): 71.11(0.929)
Number of equivalent replicates : 1.05
Deviance Information Criterion (DIC) ...............: 907.85
Deviance Information Criterion (DIC, saturated) ....: 147.47
Effective number of parameters .....................: 70.34
Marginal log-Likelihood: -592.75
Posterior marginals for the linear predictor and
the fitted values are computed
# Random Effect for spatial structure (ui+vi)
del <- model.intIII$marginals.random$year.int # Extract the marginal posterior distribution for each element of the random effects
exdel <- lapply(del,function(x) inla.emarginal(exp, x)) # The exponential transformation and calculate the posterior mean for each of them
# Define the cutoff for zeta
exdel.cutoff <- c(1.13,1.17,1.21,1.25,1.29,1.33,1.37,1.4)
# Transform in categorical variable
exdel.cat2 <- cut(unlist(exdel),breaks=exdel.cutoff,
include.lowest=TRUE)
maps.cat.del3 <- data.frame(ESRI_PK=dat.2019$ESRI_PK, del3.2017=exdel.cat2[1:25],
del3.2018=exdel.cat2[26:50], del3.2019=exdel.cat2[51:75])
# For adding name of Seoul
lbls <- as.character(seoul.map$SIG_ENG_NM) # 자치구명
spl <- list('sp.text', coordinates(seoul.map), lbls, cex=.7)
#Add the categorized zeta to the spatial polygon
data.boroughs <- attr(seoul.map, "data")
attr(seoul.map, "data") <- merge(data.boroughs, maps.cat.del3,
by="ESRI_PK")
# Map
spplot(obj=seoul.map, zcol= c("del3.2017", "del3.2018", "del3.2019"),
names.attr = c("2017", "2018", "2019"), # Changes Names
sp.layout = spl,
col.regions=brewer.pal(7,"Blues"), as.table=TRUE)
Type IV 상호작용은 구조화된 공간 효과 \(u_{i}\) 와 구조화된 시간 효과 \(\gamma_t\) 상호작용이다. 이 상호작용은 시간과 공간에 따라 완전히 달라지며 더 이상 독립적으로 분해할 수 없다는 것을 의미한다. 즉, 각 지역에 대한 시간 의존 구조는 더 이상 다른 모든 지역으로부터 독립적이지 않으며 이웃 지역의 시간 패턴에도 의존한다.
ID.area.int <- dat$ID.area
year.int <- dat$year
formulaIV <- crime ~ 1 + f(ID.area, model="bym", graph = seoul.adj) +
f(year, model="rw1") + # RW of order1
f(year1, model="iid") + # phi_t
f(ID.area.int,model="besag", graph=seoul.adj,
group=year.int,
control.group=list(model="rw1"))
lcs <- inla.make.lincombs(year = diag(3), year1 = diag(3)) # Linear Combination gamma_{t} + phi_{t}
model.intIV <- inla(formulaIV, family = "poisson", data = dat, E = E, # E = E or offset = log(E)
lincomb = lcs,
control.predictor=list(compute=TRUE), # Compute the marginals of the model parameters
control.compute = list(dic = TRUE)) # Compute some model choice criteria
summary(model.intIV)
Call:
c("inla(formula = formulaIV, family = \"poisson\", data = dat,
E = E, ", " lincomb = lcs, control.compute = list(dic = TRUE),
control.predictor = list(compute = TRUE))" )
Time used:
Pre = 0.71, Running = 1.1, Post = 0.111, Total = 1.92
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 0.022 0.014 -0.007 0.022 0.051 0.022 0
Linear combinations (derived):
ID mean sd 0.025quant 0.5quant 0.975quant mode kld
lc1 1 0.027 0.012 0.002 0.026 0.052 0.026 0
lc2 2 -0.023 0.012 -0.048 -0.023 0.002 -0.022 0
lc3 3 -0.004 0.012 -0.028 -0.004 0.021 -0.004 0
Random effects:
Name Model
ID.area BYM model
year RW1 model
year1 IID model
ID.area.int Besags ICAR model
Model hyperparameters:
mean sd
Precision for ID.area (iid component) 1911.12 1903.51
Precision for ID.area (spatial component) 1898.13 1887.42
Precision for year 16297.48 18060.06
Precision for year1 3981.82 3406.26
Precision for ID.area.int 249.20 59.10
0.025quant 0.5quant
Precision for ID.area (iid component) 131.21 1348.15
Precision for ID.area (spatial component) 129.51 1340.01
Precision for year 655.66 10567.24
Precision for year1 603.88 3044.76
Precision for ID.area.int 151.53 243.09
0.975quant mode
Precision for ID.area (iid component) 6984.57 359.36
Precision for ID.area (spatial component) 6910.57 354.83
Precision for year 64503.74 1472.40
Precision for year1 13016.60 1618.43
Precision for ID.area.int 382.82 231.53
Expected number of effective parameters(stdev): 68.80(1.12)
Number of equivalent replicates : 1.09
Deviance Information Criterion (DIC) ...............: 906.45
Deviance Information Criterion (DIC, saturated) ....: 146.07
Effective number of parameters .....................: 69.11
Marginal log-Likelihood: -564.91
Posterior marginals for the linear predictor and
the fitted values are computed
# Random Effect for spatial structure (ui+vi)
del <- model.intIV$marginals.random$ID.area.int # Extract the marginal posterior distribution for each element of the random effects
exdel <- lapply(del,function(x) inla.emarginal(exp, x)) # The exponential transformation and calculate the posterior mean for each of them
# Define the cutoff for zeta
exdel.cutoff <- c(0.5,0.9,1.3,1.7,2.1,2.5,2.9,3.4)
# Transform in categorical variable
exdel.cat3 <- cut(unlist(exdel),breaks=exdel.cutoff,
include.lowest=TRUE)
maps.cat.del4 <- data.frame(ESRI_PK=dat.2019$ESRI_PK, del4.2017=exdel.cat3[1:25],
del4.2018=exdel.cat3[26:50], del4.2019=exdel.cat3[51:75])
# For adding name of Seoul
lbls <- as.character(seoul.map$SIG_ENG_NM) # 자치구명
spl <- list('sp.text', coordinates(seoul.map), lbls, cex=.7)
#Add the categorized zeta to the spatial polygon
data.boroughs <- attr(seoul.map, "data")
attr(seoul.map, "data") <- merge(data.boroughs, maps.cat.del4,
by="ESRI_PK")
# Map
spplot(obj=seoul.map, zcol= c("del4.2017", "del4.2018", "del4.2019"),
names.attr = c("2017", "2018", "2019"), # Changes Names
sp.layout = spl,
col.regions=brewer.pal(7,"Blues"), as.table=TRUE)
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".