Creating and sharing knowledge for telecommunications

Probabilistic Evidence Accumulation for Clustering Ensembles

Lourenço, A. ; Bulo, S. Bulo ; Rebagliati, N. ; Figueiredo, M. A. T. ; Fred, A. L. N. ; Pelillo, M.

Probabilistic Evidence Accumulation for Clustering Ensembles, Proc International Conf. on Pattern Recognition Applications and Methods - ICPRAM, Barcelona, Spain, Vol. ?, pp. ? - ?, February, 2013.

Digital Object Identifier:

Ensemble clustering methods derive a consensus partition of a set of objects starting from the results of a collection of base clustering algorithms forming the ensemble. Each partition in the ensemble provides a set of pairwise observations of the co-occurence of objects in a same cluster. The evidence accumulation clustering paradigm uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix, which is fed to a pairwise similarity clustering algorithm to obtain a final consensus clustering. The advantage of this solution is the avoidance of the label correspondence problem, which affects to other ensemble clustering schemes. In this paper we derive a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix. We introduce a probabilistic model for the co-association matrix parametrized by the unknown assignments of objects to clusters, which are in turn estimated using a maximum likelihood approach. Additionally, we propose a novel algorithm to carry out the parameter estimation with convergence guarantees towards a local solution. Experiments on both synthetic and real benchmarks data show the effectiveness of the proposed approach.