Creating and sharing knowledge for telecommunications

Efficient Unsupervised Feature Selection for Sparse Data

Ferreira, A. ; Figueiredo, M. A. T.

Efficient Unsupervised Feature Selection for Sparse Data, Proc Conf. on Telecommunications - ConfTele, Lisbon, Portugal, Vol. --, pp. -- - --, April, 2011.

Digital Object Identifier:

Download Full text PDF ( 86 KBs)

Feature selection and feature reduction are central
problems in machine learning and pattern recognition. Many
datasets have a sparse nature, that is, many features have zero
value. For instance, in text classification based on the bag-ofwords
(BoW) or similar representations, there is usually a large
number of features, many of which may be irrelevant (or even
detrimental) for classification tasks.
This paper proposes a new unsupervised feature selection
method for sparse data, suitable for both standard and binarized
representations. The method is applicable to supervised, semisupervised,
and unsupervised learning, since it does not use
class labels. The experimental results on standard benchmarks
show that the proposed method performs better than existing
ones on numeric floating-point and binary feature. It yields
efficient feature selection, reducing the number of features while
simultaneously improving the classification accuracy.