Chapter One
Clustering Methods and Their Uses
in Computational Chemistry
Geoff M. Downs and John M. Barnard
Barnard Chemical Information Ltd., 46 Uppergate Road,
Stannington, Sheffield S6 6BX, United Kingdom
INTRODUCTION
Clustering is a data analysis technique that, when applied to a set of
heterogeneous items, identifies homogeneous subgroups as defined by a given
model or measure of similarity. Of the many uses of clustering, a prime motivation
for the increasing interest in clustering methods is their use in the selection
and design of combinatorial libraries of chemical structures pertinent to
pharmaceutical discovery.
One feature of clustering is that the process is unsupervised, that is, there
is no predefined grouping that the clustering seeks to reproduce. In contrast to
supervised learning, where the task is to establish relationships between given
inputs and outputs to enable prediction of the output from new inputs, in
unsupervised learning only the inputs are available and the task is to reveal
aspects of the underlying distribution of the input ... read full excerpt from Reviews in Computational Chemistry, Volume 18 ebook