Origin Estimation for Propagation Processes on Complex Networks

This is the main function for origin estimation for propagation processes on complex networks. Different methods are available: effective distance median ('edm'), recursive backtracking ('backtracking'), and centrality-based source estimation ('centrality'). For details on the methodological background, we refer to the corresponding publications.

origin_edm for effective distance-median origin estimation (Manitz et al., 2016)

origin(events, type = c("edm", "backtracking", "centrality", "bayesian"), ...)

origin_edm(events, distance, silent = TRUE)

origin_backtracking(events, graph, start_with_event_node = TRUE, silent = TRUE)

origin_centrality(events, graph, silent = TRUE)

origin_bayesian(
  events,
  thres.vec,
  obs.vec,
  mu.mat,
  lambda.list,
  poss.candidate.vec,
  prior,
  use.prior = TRUE
)

Arguments

events	numeric vector of event counts at a specific time point; if type is 'bayesian', 'events' is a matrix, number of nodes x time points; entries represent number of cases
type	character specifying the method, `'edm'`, `'backtracking'`, `'centrality'` and `'bayesian'` are available.
...	parameters to be passed to origin methods `origin_edm`, `origin_backtracking`, `origin_centrality` or `origin_centrality`
distance	numeric matrix specifying the distance matrix (for `type='edm'`)
silent	locigal, should the messages be suppressed?
graph	igraph object specifying the underlying network graph (for `type='backtracking'` and `type='centrality'`)
start_with_event_node	logical specifying whether backtracking only starts from nodes that experienced events (for `type='backtracking'`)
thres.vec	vector, length represents number of cities/nodes, representing thresholds for cities/nodes that they are infected
obs.vec	list of cities ids used as observers
mu.mat	matrix- number of cities/nodes x number of observers, each row represents - if this node is the source, the mean of arrival time vector
lambda.list	a length-number of cities/nodes list, each element is a number of observers x number of observers matrix - if a node is the source, the covariance matrix for arrival time vector
poss.candidate.vec	a boolean vector indicating if a node has the potential to be the source
prior	vector, length - number of cities/nodes, prior for cities
use.prior	boolean, TRUE or FALSE, if use prior, default TRUE

Value

origin_edm returns an object of class origin, list with

est origin estimate
aux data.frame with auxiliary variables
- id as node identifier,
- events for event magnitude,
- wmean for weighted mean,
- wvar for weighted variance, and
- mdist mean distance from a node to all other nodes.
type = 'edm' effective distance median origin estimation

origin_backtracking returns an object of class origin, list with

est origin estimate
aux data.frame with auxiliary variables
- id as node identifier,
- events for event magnitude, and
- bcount for backtracking counts, how often backtracking identifies this source node.
type = 'backtracking' backtracking origin estimation

origin_centrality returns an object of class origin, list with

est origin estimate
aux data.frame with auxiliary variables
- id as node identifier,
- events for event magnitude, and
- cent for node centrality (betweenness divided degree).
type = 'centrality' centrality-based origin estimation

a dataframe with columns 'nodes' and 'probab', indicating nodes indices and their posteriors

References

Comin, C. H. and da Fontoura Costa, L. (2011). Identifying the starting point of a spreading process in complex networks. Physical Review E, 84. <doi: 10.1103/PhysRevE.84.056105>
Manitz, J., J. Harbering, M. Schmidt, T. Kneib, and A. Schoebel (2017): Source Estimation for Propagation Processes on Complex Networks with an Application to Delays in Public Transportation Systems. Journal of Royal Statistical Society C (Applied Statistics), 66: 521-536. <doi: 10.1111/rssc.12176>
Manitz, J. (2014). Statistical Inference for Propagation Processes on Complex Networks. Ph.D. thesis, Georg-August-University Goettingen. Verlag Dr.~Hut, ISBN 978-3-8439-1668-4. Available online: https://ediss.uni-goettingen.de/handle/11858/00-1735-0000-0022-5F38-B.
Manitz, J., Kneib, T., Schlather, M., Helbing, D. and Brockmann, D. (2014). Origin detection during food-borne disease outbreaks - a case study of the 2011 EHEC/HUS outbreak in Germany. PLoS Currents Outbreaks, 1. <doi: 10.1371/currents.outbreaks.f3fdeb08c5b9de7c09ed9cbcef5f01f2>
Li, J., Manitz, J., Bertuzzo, E. and Kolaczyk, E.D. (2020). Sensor-based localization of epidemic sources on human mobility networks. arXiv preprint Available online: https://arxiv.org/abs/2011.00138.

Author

Juliane Manitz with contributions by Jonas Harbering

Jun Li

Examples

data(delayGoe)

# compute effective distance
data(ptnGoe)
goenet <- igraph::as_adjacency_matrix(ptnGoe, sparse=FALSE)
p <- goenet/rowSums(goenet)
eff <- eff_dist(p)
#> Computing the effective distance between 257 nodes:
#>  1...................................................................................................
#>  100...................................................................................................
#>  200.........................................................done
# apply effective distance median source estimation
om <- origin(events=delayGoe[10,-c(1:2)], type='edm', distance=eff)
summary(om)
#> Effective distance median origin estimation:
#> 
#> estimated node of origin 91: X.Gotthelf.Leimbach.Strasse 
#> 
#> auxiliary variables:
#>        id          events            wmean             wvar       
#>  Min.   :  1   Min.   : 0.0000   Min.   : 5.482   Min.   :0.3987  
#>  1st Qu.: 65   1st Qu.: 0.0000   1st Qu.:21.572   1st Qu.:2.2761  
#>  Median :129   Median : 0.0000   Median :27.345   Median :2.4050  
#>  Mean   :129   Mean   : 0.6459   Mean   :26.948   Mean   :2.4989  
#>  3rd Qu.:193   3rd Qu.: 0.0000   3rd Qu.:33.359   3rd Qu.:2.9986  
#>  Max.   :257   Max.   :46.0000   Max.   :47.762   Max.   :6.2052  
#>      mdist      
#>  Min.   :14.34  
#>  1st Qu.:20.75  
#>  Median :24.23  
#>  Mean   :24.92  
#>  3rd Qu.:28.88  
#>  Max.   :39.16  
plot(om, 'mdist',start=1)
plot(om, 'wvar',start=1)
performance(om, start=1, graph=ptnGoe)
#>                   start                         est  hitt rank spj dist
#> 1 X.Adolf.Hoyer.Strasse X.Gotthelf.Leimbach.Strasse FALSE    2   2 1332

# backtracking origin estimation (Manitz et al., 2016)
ob <- origin(events=delayGoe[10,-c(1:2)], type='backtracking', graph=ptnGoe)
summary(ob)
#> Backtracking origin estimation:
#> 
#> estimated node of origin 87: X.Gesundbrunnen 
#> 
#> auxiliary variables:
#>        id          events            bcount       
#>  Min.   :  1   Min.   : 0.0000   Min.   :0.00000  
#>  1st Qu.: 65   1st Qu.: 0.0000   1st Qu.:0.00000  
#>  Median :129   Median : 0.0000   Median :0.00000  
#>  Mean   :129   Mean   : 0.6459   Mean   :0.03891  
#>  3rd Qu.:193   3rd Qu.: 0.0000   3rd Qu.:0.00000  
#>  Max.   :257   Max.   :46.0000   Max.   :3.00000  
plot(ob, start=1)
performance(ob, start=1, graph=ptnGoe)
#>                   start             est  hitt rank spj dist
#> 1 X.Adolf.Hoyer.Strasse X.Gesundbrunnen FALSE    4   8 5328

# centrality-based origin estimation (Comin et al., 2011)
oc <- origin(events=delayGoe[10,-c(1:2)], type='centrality', graph=ptnGoe)
summary(oc)
#> Centrality-based origin estimation:
#> 
#> estimated node of origin 119: X.Hermann.Kolbe.Strasse 
#> 
#> auxiliary variables:
#>        id          events             cent      
#>  Min.   :  1   Min.   : 0.0000   Min.   :0.000  
#>  1st Qu.: 65   1st Qu.: 0.0000   1st Qu.:2.938  
#>  Median :129   Median : 0.0000   Median :5.875  
#>  Mean   :129   Mean   : 0.6459   Mean   :5.167  
#>  3rd Qu.:193   3rd Qu.: 0.0000   3rd Qu.:8.062  
#>  Max.   :257   Max.   :46.0000   Max.   :9.000  
#>                                  NA's   :247    
plot(oc, start=1)
performance(oc, start=1, graph=ptnGoe)
#>                   start                     est  hitt rank spj dist
#> 1 X.Adolf.Hoyer.Strasse X.Hermann.Kolbe.Strasse FALSE    9   5 3330

# fake training data, indicating format
nnodes <- 851
max.day <- 1312
nsimu <- 20
max.case.per.day <- 10
train.data.fake <- list()
for (j in 1:nnodes) {
  train.data.fake[[j]] <- matrix(sample.int(max.day, 
    size = nsimu*nnodes, replace = TRUE), nrow = nsimu, ncol = nnodes)
}
obs.vec <- (1:9)
candidate.thres <- 0.3
mu.lambda.list <- compute_mu_lambda(train.data.fake, obs.vec, candidate.thres)
#> This is loop: 100
#> This is loop: 200
#> This is loop: 300
#> This is loop: 400
#> This is loop: 500
#> This is loop: 600
#> This is loop: 700
#> This is loop: 800
# matrix representing number of cases per node per day
cases.node.day <- matrix(sample.int(max.case.per.day, 
  size = nnodes*max.day, replace = TRUE), nrow = nnodes, ncol = max.day)
nnodes <- dim(cases.node.day)[1] # number of nodes
# fixed threshold for all nodes - 10 infected people
thres.vec <- rep(10, nnodes)
# flat/non-informative prior
prior <- rep(1, nnodes) 
result2.df <- origin(events = cases.node.day, type = "bayesian",
                     thres.vec = thres.vec,
                     obs.vec = obs.vec,
                     mu.mat=mu.lambda.list$mu.mat, lambda.list = mu.lambda.list$lambda.list, 
                     poss.candidate.vec=mu.lambda.list$poss.candidate.vec,
                     prior=prior, use.prior=TRUE)