Network Inference from Co-Occurrences
Figueiredo, M. A. T.
; Nowak, R.
IEEE Transactions on Information Theory Vol. 54, Nº 9, pp. 4053 - 4068, September, 2008.
ISSN (print): 0018-9448
Journal Impact Factor: 3,793 (in 2008)
Digital Object Identifier: 10.1109/TIT.2008.926315
The discovery of networks is a fundamental problem arising in
numerous fields of science and technology, including communication systems, biology, sociology, and neuroscience.
Unfortunately, it is often difficult, or impossible, to obtain data
that directly reveal network structure, and so one must infer a
network from incomplete data. This paper considers inferring
network structure from “co-occurrence” data: observations that
identify which network components (e.g., switches, routers, genes)
carry each transmission but do not indicate the order in which they handle the transmission. Without order information, the number of networks that are consistent with the data grows exponentially with the size of the network (i.e., the number of nodes). Yet, the basic engineering/evolutionary principles underlying most networks strongly suggest that not all data-consistent networks are equally likely. In particular, nodes that co-occur in many observations are probably closely connected. With this in mind, we model the co-occurrence observations as independent realizations of a random walk on the network, subjected to a random permutation to account for the lack of order information. Treating permutations as missing data, we derive an expectation-maximization (EM) algorithm for estimating the random walk parameters. The model and EM algorithm significantly simplify the problem, but the computational complexity of the reconstruction process does grow exponentially in the length of each transmission path. For networks with long paths the exact E-step may be computationally intractable. We propose a polynomial-time Monte Carlo EM (MCEM) algorithm based on importance sampling and derive conditions which ensure convergence of the algorithm with high probability. Simulations and experiments with Internet measurements demonstrate the promise of this approach.