On spectral clustering, informativeness and seriation
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
This thesis studies spectral clustering and seriation, which have very similar
relaxed objective functions. We analyzed these problems from the standpoint of informativeness,
an unsupervised measure of the similarity within a data set. We are
motivated by the fact that spectral clustering depends on the choice of similarity and it
does not perform well for all values of Gaussian kernel bandwidth parameter. We proposed
three alternative fixes to spectral clustering algorithm, such that it could be used
for all kind of similarities. We also implemented state-of-the-art seriation algorithms,
GNCR and FAQ. For both clustering and seriation, we analyzed the relationship between
the Gaussian kernel bandwidth parameter with informativeness and observed
that, in order to find the optimal Gaussian kernel bandwidth parameter, one can use
informativeness as a guiding principle to get a candidate range of values. We also
observed that GNCR is more robust to variations in the kernel bandwidth parameter
as compared to FAQ.
Description
Keywords
Seriation algorithms, Spectral clustering, Informativeness