PAUL E. GREEN
Wharton School, University of Pennsylvania
FRANK J. CARMONE, JR.
Wright State University
SCOTT M. SMITH
Brigham Young University
Original Publication:
Allyn and Bacon
1970, 1989
ISBN 0-205-11657-4
Published on the Internet 1997
SECTION FIVE: DIMENSION REDUCING METHODS AND CLUSTER ANALYSIS 110
THE DATA MATRIX 110 REDUCED SPACE ANALYSIS 111
Metric Reduction--Principal Components Analysis 112
TABLE 5-1 113
Figure 5-1 113
Figure 5.2 113
Nonmetric Reduced Space Analysis 114
CLUSTER ANALYSIS 114
Basic Questions in Cluster Analysis 115
Choice of Proximity Measure 116
Distance-Type Measures 116
Other Euclidean Distance Measures 117
Hierarchical Methods 118
Figures 5-3 120
Figure 5-4 120
Single Linkage 120
Complete Linkage 120
Average Linkage 120
The Howard-Harris Clustering Method 120
Factor Analytic Methods 121
Matching-Type Measures 122
Describing the Clusters 124
AN APPLICATION OF REDUCED SPACE AND CLUSTER ANALYSIS 125
TABLE 5-5 126
Figure 5-7 127
Figure 5-8 128
Summary of Study 129
RECENT DEVELOPMENTS IN CLUSTERING TECHNIQUES 129
Nominal vs. Dimensional Structures 129
Figure 5-9 130
Overlapping Clustering Techniques 131
SUMMARY 132
SECTION FIVE
___________________________________________________________
DIMENSION-REDUCING METHODS AND CLUSTER ANALYSIS
___________________________________________________________
In preceding sections we have frequently mentioned the phrases "objective space" and "cluster analysis." While our discussion emphasized the use of multidimensional scaling methods in the representation of psychological responses--namely, similarities and preferences--we also indicated from time to time that the methodology can be viewed as a means of data reduction and summarization. In this latter role, our interest is on reducing the subject's evaluation data (evaluation of each of a set of attributes on each of a set of stimuli) into a space of fewer dimensions, with little loss in information.
We can, of course, turn the problem around and consider the dual question of reducing the number of subjects by partitioning them into homogeneous groups, based on their "closeness" to one another in variable space. The first approach may be called reduced space analysis (factor analysis) and the second, cluster analysis.(1) We shall often consider the use of both approaches.
In this section, we first discuss these dual problems of data reduction and clustering in the context of non-psychological data. After describing some of the basic characteristics of the data matrix we discuss both metric and nonmetric approaches to reduced space representation. We then turn to the topic of cluster analysis and describe the major characteristics of clustering procedures, including the choice of association coefficient, types of grouping algorithms, cluster description, and problems of statistical inference.
The concluding section discusses some recent developments in reduced
space and cluster analysis, particularly an overlapping clustering technique
that appears particularly useful in product positioning where products/subjects
may belong to several clusters simultaneously.
THE DATA MATRIX The key concept underlying the utilization of reduced
space and cluster analysis is that in most real world situations the multiple
measurements collected on a set of subjects are partly redundant, i.e.,
correlated in some way. The dual of this assertion is that some sets pairs
of subjects, when considered across variables, are similar to each other.
Geometrically, this could be observed by putting N subjects in N-1 dimensional
space. We would generally not find all points equidistant from each other,
thus indicating that certain sets of points are correlated.
While one can often take advantage of natural association to make predictions as, for example, in regression analysis or discriminant analysis, situations occur in which no variable can be singled out as a dependent or criterion variable. In these situations, we may still wish to summarize the information provided by a whole set of subjects-by-stimuli "scores". We would search for some more parsimonious structure which removes redundancy in the original set of data while preserving most of the information contained in the original data matrix.
In either case, the "raw input" to any such analysis consists of the data matrix. The data matrix is here assumed to be a rectangular matrix of numerical entries whose informational content is to be summarized and portrayed in some way. For example, the computation of the mean and standard deviation of a unidimensional array is often done simply because we are unable to comprehend the meaning of the entire column of values. In so doing we often (willingly) forego the full information provided by the data in order to understand some of its basic characteristics, e.g., central tendency and dispersion. The problem becomes even more complex when we have multiple measurements on a set of objects.
Table 5-1 shows a conceptual illustration of a data matrix. We note that the array consists of a set of subjects (the N rows) and a set of variables (the n columns). The (i,j)th cell entry represents the value of subject i on variable j. The subjects may be people, objects, concepts, or events. The variables are characteristics, properties, attributes or some other stimuli descriptive or related to the subjects.
Cell values may consist of nominal, ordinal, interval, or ratio-scaled measurements or various combinations of these as we go across columns. The complete (row) vector of values is often called a subject's profile.
In many cases the investigator is able to partition the data matrix into subsets of columns (or rows) on the basis of prior information. For example, suppose the first column of Table 5-1 is the elapsed time for an automobile to travel from 0 to 100 KPH. and the other columns consist of various automobile performance and physical dimensions for the N automobiles. The researcher may wish to predict quickness or elapsed time from some linear combination of the n-1 remaining variables. If so, prior judgement is used to describe how the dependency is to be explained. In this instance, we would probably use a multiple regression of horsepower, torque, weight and tire size to establish the hypothesized functional relationships.
Quite frequently, however, we may have no reasonable basis for prior
partitioning of the data matrix into criterion (dependent) or predictor
(independent) variables. The purpose may merely be to group subjects into
"similar" subsets (based on their similarity over the whole profile
of variables) or to portray the columns of the data matrix in terms of
a smaller number of new variables (e.g., linear combinations of the original
set) which retain most of the information in the original data matrix.
These are the cases with which this section is concerned. Both classes
of procedures start with the data matrix or some set of measures derived
from it.
REDUCED SPACE ANALYSIS In earlier sections we saw how a set of association
measures (similarities data, for example) could be "dimensionalized"
by multidimensional scaling techniques. We also saw how a set of stimuli
in variable space could be converted into a set of "association"
measures (e.g., distances) between each stimulus pair. In general, we can
go from an association matrix to a dimensional representation and vice
versa.
In reduced space analysis we usually, though not necessarily, start off with a data matrix like that conceptualized in Table 5-1. The objective here is to transform this matrix into one having the same number of rows but fewer columns, without serious loss of information.
In finding this reduced space representation, however, we may go through the seemingly roundabout process of:
-- finding correlation measures;
-- rotating of the initial configuration of points (objects to a new orientation
of the same dimensionality. This rotated configuration exhibits the characteristic
of mutually orthogonal dimensions with sequentially maximal variance. That
is, the second dimension displays the next largest variance, subject to
being orthogonal to the first, and so on;
-- reducing the dimensionality of this transformed space, usually by discarding
those higher dimensions that exhibit the smallest variance of point projections;
-- finding still a new orientation of the reduced space that makes the
retained dimensions more interpretable from a content point of view;
-- interpreting the reoriented dimensions in terms of the variables that
show high association with each dimension [Green 1978].
Historically, the methods of factor analysis have been used for a variety
of purposes [Rummel 1970]:
-- reducing the dimensionality of a variable space;
-- scaling and the spatial representation of perception and preference
data;
-- Untangling complex patterns of intervariable association in multivariate
data;
-- exploratory research in the identification of latent characteristics
for future experimentation;
-- developing empirical typologies of variables;
-- developing a data-based unidimensional index that maximally separates
individuals;
-- Testing hypothesized relationships between certain variables in specific
content areas;
-- transforming the matrix of predictor variables prior to applying some
other technique like multiple regression or canonical correlation. We shall
briefly discuss one type of factoring--principal components factor analysis--as
the major metric technique for reducing the dimensionality of space.
Metric Reduction--Principal Components Analysis Factor analysis
reduces the dimensionality of a data set to produce a set of underlying
dimensions, which are often referred to as components, or factors.
Factor analysis then seeks a linear combination of the original variable
scores--assumed here to be at least interval scaled that displays maximum
variance if the subject points are projected on to it.
Once this is done, we seek a second linear combination, axis or underlying dimension that is orthogonal to the first and maximizes residual variance explained. This process continues until all of the variance in the original data is explained. Having done this, we disregard those higher dimensioned linear combinations that account for small amounts of variance.
As we construct the linear combinations, any set of weights, in the
linear combination, same or different over each column, plus or minus,
might suffice. As a matter of fact, the various types of factoring methods
are differentiated in terms of the bases upon which the weights defining
the factor are selected. Factor scores are obtained by merely finding for
each subject the numerical value of the linear combination obtained by
substituting original data values in the linear equation of, say, factor
one:
F1 = a1X1 + a2X2
+...+ anXn
where the aj are the factor weights.
If we then correlate each of these new columns of the (subject x factor)
factor score matrix with the original variables, we obtain a set of factor
loadings. A factor loading is defined simply as the correlation (across
subjects) of an original variable with a factor.
Again, if the method of principal components is used, the weights are chosen in such a way that the successive principal components will account for a maximal portion of total variability in the data, subject to being mutually orthogonal to all previously obtained linear combinations. That is, the first set of weights will lead to a set of component scores which account for maximum variance. Each successive component will account for a decreasing portion of total variance in the original set of data and be orthogonal to all previously existing components.
Some of the geometric aspects of principal components analysis are shown in Figures 5-1 and 5-2. We assume that the original data matrix consists of the measurements of N subjects on three variables. We note from Figure 5-1 that the positions of the actual subject data (from Table 5-1) are portrayed as a somewhat elliptical scatterplots of points. Because variables Xs1 and Xs2 are not perfectly correlated, the plot of points forms an ellipse, rather than a straight line. Figure 5-2 (a) depicts this more clearly with the principal components being represented by the new set of axes (Zi) drawn through the major and minor axes of the ellipse.
Z1, the first principal axis, is the major axis of the ellipse. However, it does not exhaust all of the information in the data, necessitating a second axis Z2. Z2 is the minor axis of the ellipse. We note that most of the information would be preserved if only Z1 were retained.
Figure 5-2(b) shows the analogous case for three dimensions in which a concentration ellipsoid describes the point scatter. Figure 5-2(b) would appear as a flattened balloon because length (Z1), width (Z2), and height (Z3) have less and less variance (dispersion) to account for.
Figure 5-2(c) reflects the situation where the variables are uncorrelated
in three dimensions. The scatterplot appears as a sphere, where each of
the dimensions is orthogonal to each other.
Factoring the Association Matrix. In practice, the factors are obtained
not by factoring the data matrix itself, but a set of association measures--cross
products, covariances, correlations--that are derived from the data matrix.
(See Table 5-1)
If as many components are extracted as there are variables, the original
association matrix (e.g., matrix of correlation coefficients) can be perfectly
reproduced by a matrix product of the factor loadings. Of course, in this
case no parsimony would be obtained, since one would have as many linear
combinations (components) of the variables as there were variables to begin
with.
INSERT Figure 5.1 Scatter Plots of Standardized Data of Table 5.1 and the
First Principal Component z1
INSERT Figure 5.2 Geometric Aspects of Principal Components Analysis
Metric Approach to Reduced Space Analysis. In terms of our emphasis
on reduced space techniques, principal components analysis represents the
major metric procedure for reducing the original variable space. In the
usual case where a (variable by variable) correlation matrix serves as
the starting point for the component analysis (R-type factor analysis),
one can compute unit-variance (factor) scores for each object on each component.
If weighted squared distances in component space are computed between object
pairs, using component eigenvalues as weights, such measures will be proportional
to squared interpoint distance in the original variables space if all components
are extracted.
Nonmetric Reduced Space Analysis In general, if the function relating
interpoint distance between pairs of points to input data is nonlinear
but monotonic, a metric procedure like principal components analysis will
lead to a larger number of dimensions in which to portray the point locations
in terms of orthogonal axes than would be the case with linear functions.
However, if we replace the objective of attempting to reconstruct the actual
values of the input matrix with the objective of trying to reproduce only
their rank order, we will obtain a space of fewer dimensions. Moreover,
it turns out that multidimensional scaling techniques discussed in this
book can be used as types of nonmetric factoring procedures. The input
data can consist of a symmetric matrix of any type of association measure
whose values can be weakly ordered and obey the metric axioms discussed
in Section 2.
In addition to nonmetric programs that find a dimensional representation
of a single set of points (e.g., objects) whose ranks of interpoint distances
are monotonic with the original input data, other approaches leading to
joint spaces can be employed. For example, the vector and unfolding models
described in Section 4 can be used for the same general objective, depending
upon the type of representation desired.
CLUSTER ANALYSIS Cluster analysis, like reduced space analysis,
is concerned with data matrices in which the variables have not been partitioned
beforehand into criterion versus predictor subsets. In reduced space analysis
our interest centered on reducing the variable space to a smaller number
of orthogonal dimensions which maintained most of the information--metric
or ordinal-- contained in the original data matrix. Emphasis was placed
on the variables rather than on the subjects (rows) of the data matrix.
In cluster analysis our concern is with the similarity of the subjects--that is, the resemblance of their profiles over the whole set of variables. These variables may be the original set or may consist of a representation of them in reduced space. In either case the objective of cluster analysis is to find similar groups of subjects, where "similarity" between each pair of subjects is usually construed to mean some global measure over the whole set of characteristics--either original variables or derived coordinates, if preceded by a reduced space analysis. In this section we discuss various methods of clustering and the key role that distance functions play as measures of the proximity of pairs of points.
We first discuss the fundamentals of cluster analysis in terms of major questions concerning choice of proximity measure, choice of clustering technique, and descriptive measures by which the resultant clusters can be defined. We show that clustering results can be sensitive to the type of distance function used to summarize proximity between pairs of profiles.
We next discuss the characteristics of various computer programs that
have been proposed for grouping profiles, i.e., for partitioning the rows
(subjects) of the data matrix. This is followed by brief discussions of
statistics for defining clusters and the problems associated with statistical
inference in this area.
Basic Questions in Cluster Analysis The most common use of cluster
analysis is for classification. That is, subjects are separated into groups
such that each subject is more like other subjects in its group than it
is to subjects outside the group. Cluster analysis is thus concerned ultimately
with classification and represents a set of techniques which are part of
the field of numerical taxonomy (Frank and Green, [1968]; Punj and Stewart
[1983]; Aldenderfer and Blashfield [1984]).
We will initially focus on clustering procedures that result in the assignment of each subject to one and only one class. Subjects within a class are usually assumed to be indistinguishable from one another. Thus, we assume that the underlying structure of the data involves an unordered set of discrete classes. In some cases we may also view these classes as hierarchical in nature, with some classes divided into subclasses.
Clustering procedures can be viewed as "preclassificatory"
in the sense that the researcher has not used prior judgment to partition
the subjects (rows of the data matrix). However, it is assumed that some
of the objectives are heterogeneous; that is, that "clusters"
exist. This presupposition of different groups is based on commonalities
within the set of independent variables. This assumption is different from
that made in the case of discriminant analysis or automatic interaction
detection, where the dependent variable is used to formally define groups
of objects and the distinction is not made on the basis of profile resemblance
in the data matrix itself. Thus, given that no information on group definition
is formally evaluated in advance, the major problems of cluster analysis
will be discussed as follows:
1. What measure of intersubject similarity is to be used and how is each
variable to be "weighted" in the construction of such a summary
measure?
2. After intersubject similarities are obtained, how are the classes to
be formed?
3. After the classes have been formed, what summary measures of each cluster
are appropriate in a descriptive sense; that is, how are the clusters to
be defined?
4. Assuming that adequate descriptions of the clusters can be obtained,
what inferences can be drawn regarding their statistical significance?
Choice of Proximity Measure The choice of proximity, similarity,
association, or resemblance measure (all four terms will be used synonymously
here) is an interesting problem in cluster analysis. The concept of similarity
always connotes the question: similarity with respect to what? Proximity
measures are usually viewed in relative terms--two objects are similar,
relative to the group, if their profiles across variables are "close"
or they share "many" aspects in common, relative to those which
other pairs share in common.
Most clustering procedures use pairwise measures of proximity. The choice
of which subjects and variables to use in the first place is largely a
matter for the researcher's judgment. While these (prior) choices are important
ones, they are beyond our scope of coverage. Even assuming that such choices
have been made, however, the possible measures of pairwise proximity are
many. Generally speaking, these measures fall into two classes: (a) distance-type
measures (including correlation coefficients); and (b) matching-type measures.
The characteristics of each class are discussed in turn.
Distance-Type Measures. A surprisingly large number of proximity
measures--including correlation measures--can be viewed as distances in
some type of metric space. In Section 2 we introduced the notion of Euclidean
distance between two points in a space of r dimensions. We recall that
the formula was:
![]()
where xij, xjk are the projections of points i and j on dimension k; (k = 1,2,...,r).
Inasmuch as the variables are often measured in different units, the above formula is usually applied after each variable has been standardized to mean zero and unit standard deviation. Our subsequent discussion will assume that this preliminary step has been taken.
The Euclidean distance measure assumes that the space of (standardized)
variables is orthogonal, i.e., that the variables are uncorrelated. While
the Euclidean measure can still be used with correlated variables, it is
useful to point out that (implicit) weighting of the components underlying
the associated variables occurs with the use of the Euclidean measure:
1. Squared Euclidean distance in the original variable space has the effect
of weighting each underlying principal component by that component's eigenvalue.
2. Squared Euclidean distance in the component space (where all components
are first standardized to unit variance) has the effect of assigning equal
weights to all components.
3. In terms of the geometry of the configuration, in the first case all
points are rotated to orthogonal axes with no change in squared interpoint
distance. The general effect is to portray the original configuration as
a hyperellipsoid with principal components serving as axes of that figure.
Equating all axes to equal length has the effect of transforming the hyperellipsoid
into a hypersphere where all "axes" are of equal length.
The above considerations can be represented in terms of the following squared
distance model:
![]()
where: yik, yjk denote unit variance components of profiles i and j on component axis k (k = 1,2,...,r). If one weights the component scores according to the variances of the components (before standardization) the expression is:
where is the k-th component variance, or eigenvalue. This expression is equivalent to d2ij expressed in original variable space.
The above relationships assume that all principal components are extracted. As described earlier, if such is not the case, squared interpoint distances will be affected by the fact that they are computed in a component space of lower dimensionality than the original variable space.
In summary, both the Euclidean distance measure in original variable
space and the Euclidean distance in component space (assuming all components
have been extracted) preserve all of the information in the original data
matrix. Finally it should be pointed out that if (in addition to being
standardized to mean zero and unit variance) the original variables are
uncorrelated, both d2ij and *d2ij
will be equivalent.
Other Euclidean Distance Measures. Two other measures have often
been proposed as proximity measures. Both of these measures derive
from historical clustering methods which used Q-type factor analysis to
cluster subjects. In Q-type factor analysis--as described briefly
in our discussion of reduced space analysis--the correlation (or covariance)
matrix to be factored consists of intersubject rather than intervariable
proximities. In these methods the weights k are left intact.
The effect of Q-type component analysis of either covariance or correlation matrices, has been shown by Cronbach and Gleser [1953], to reduce the dimensionality of the space underlying the computation of proximity measures. Both procedures reduce the dimensionality of the original space to one less dimension by equating all profiles to zero mean. As such, profile differences in elevation are removed. In addition, a Q-type analysis applied to the intersubject correlation matrix will remove interprofile variation due to differences in dispersion. The result, is projection of points representing each profile into two fewer dimensions.
Figures 5-3 and 5-4 show these effects geometrically. In the case of either covariance or correlation matrices the profile mean is subtracted from each vector component which, in the 2-component case of Figure 5-3, results in a centroid with (new) origin located at point X on the figure.
Figure 5-4 shows the effect of removing profile dispersion. If we assume that the profiles were originally positioned in three-space, removal of each profile's mean reduces their dimensionality to two-space. By using a correlation matrix we further reduce dimensionality by projecting all points on to the unit circle, since the standard deviation of a profile can be represented by the distance of the profile point from the origin (the centroid of the points first having been translated to the origin).
Thus, profiles a, b, and c would all be identical after the transformation, as would profiles d and e. The cosine of the angle separating the two vectors represents the Q-correlation between them.
Of course, there may be cases where the researcher is not interested in profile differences due to either mean and/or dispersion. If so, a Q-type analysis applied to covariance or correlation matrices, as the case may be, is perfectly sensible even though information is (willingly) discarded. In general, however, we should expect differences in the derived squared distance measure computed from these procedures--both between themselves and between those computed by the techniques previously discussed.
While the authors have a predilection for the information-preserving measures d2ij and *d2ij, it is well to point out that no universally applicable distance measure exists. The choice of which measure to use depends upon which aspect of the data is worth "preserving." A wide variety of distance-type measures are available for cluster analysis; several of which are compared by Aldenderfer and Blashfield (1984).
Once the researcher has selected a method of measuring pairwise profile
similarity, the computational routine for clustering the subjects must
be selected. Aldenderfer and Blashfield identify several families of clustering
methods, each of which uses a different approach to creating groups: (1)
hierarchical agglomerative; (2) hierarchical divisive; (3) factor analytic;
(4) non-hierarchical.
Hierarchical Methods. These procedures are characterized by the
construction of a hierarchy or tree-like structure. In some methods each
point starts out as a unit (single-point) cluster. At the next level the
two closest points are placed in a cluster. At the next level a third point
joins the first two or else a second two-point cluster is formed, based
on various criterion rules for assignment. In application, hierarchical
clustering is useful in determining if points are substitutable rather
than mutually exclusive.
----------------------------------
(INSERT Figures 5-2, HERE)
----------------------------------
----------------------------------
(INSERT Figure 5-3 HERE)
----------------------------------
The different assignment rules include:
1. Single Linkage.(2) The single
linkage or minimum distance rule starts out by finding the two points with
the minimum distance. These are placed in the first cluster. At the next
stage a third point joins the already-formed cluster of two if the minimum
distance to any of the members of the cluster is smaller than the distance
between the two closest unclustered points. Otherwise, the two closest
unclustered points are placed in a cluster. The process continues until
all points end up in one cluster. The distance between two clusters is
defined as the shortest distance from a point in the first cluster that
is closest to a point in the second.
2. Complete Linkage. The complete linkage option starts out in just
the same way by clustering the two points with the minimum distance. However,
the criterion for joining points to clusters or clusters to clusters involves
maximum (rather than minimum) distance. That is, a third point joins the
already formed cluster of two if the maximum distance to any of the members
of the cluster is smaller than the distance between the two closest unclustered
points. In other words, the distance between two clusters is the longest
distance from a point in the first cluster to a point in the second cluster.
3. Average Linkage. The average linkage option starts out in the
same way as the other two. However, in this case the distance between two
clusters is the average distance from points in the first cluster to points
in the second cluster.
Ward's Method. Ward's Method starts out by finding two points with
the minimum within groups sum of squares. Points continue to be joined
to the first cluster or to other points depending on which combination
minimizes the error sum of squares from the group centroid. This method
is also known as a k-means approach.
Closely related to Ward's algorithm is the Howard-Harris algorithm [Harris,
1981]. The Howard-Harris algorithm is a hierarchical divisive method which
uses the k-means method of assigning cases to the clusters. The k-means
method assigns the case to the closest centroid. The approach may take
either of the two forms described below:
Approach #1
1. Initially the entire set of observations is considered as one set.
The group is split based on the one variable which makes the greatest contribution
to within-group sum of squares.
2. Group centroids are re-computed and subject distances to all group centroids
are computed. The subject that would best improve the objective function
is re-assigned. This process is repeated until either a finite number of
transfers are performed, no further improvement in within-groups sum of
squares is found, or a local optimum is reached.
3. The group with the largest within-groups sum of squares is selected
for splitting. Steps 2 and 3 are then repeated until the desired number
of clusters are identified.
Approach #2
1. A m x m covariance matrix is formed and analyzed using principal
components analysis. A factor score is computed for each of the n subjects
on the first (and most important) dimension or factor.
2. All subjects with factor scores that exceed the mean value of the factor
are assigned into a new cluster.
3. After splitting, each observation is re-evaluated against all clusters.
If the objective function is improved by re-assigning a case to another
cluster, the case making the greatest improvement is re-assigned. Optimization
continues until
a) a finite number of transfers are performed,
b) no further improvement in the objective function is found, or
c) a local optimum is reached.
4. The next factor is selected as the basis for splitting the next cluster.
Steps 2, 3, and 4 are then repeated until the desired number of clusters
are identified.
Hierarchical clustering algorithms may be identified as either hierarchical
agglomerative or hierarchical divisive, meaning that they contract or expand
the space between groups of points in the multivariate space. Divisive
linkage methods tend to start new clusters rather than join points to existing
clusters. Ward's method and complete linkage rules are of the divisive
variety and tend to create clusters of roughly equal size that are hyperspherical
in form. The average linkage method neither expands nor contracts the original
space, while the single linkage tends to agglomerate or contract the space
between groups of points in multivariate space.
Factor Analytic Methods. These methods analyze an n x n correlation
matrix of similarities between the n cases to find a dimensional representation
of the points. Clusters are then developed based on the resulting factor
loadings (the correlation between the subject and the underlying dimension).
The use of factor analytic methods for clustering have been criticized
because of the use of a linear model that is developed across cases rather
than across variables. The linear model tends to moderate the correct identification
of anything other than linear additive predictive groups.
Non Hierarchical Methods, have in general been subject to limited
use and testing, making their specific operational characteristics difficult
to identify. In general, however, these methods start right from a proximity
matrix and work in the following fashion (Aldenderfer and Blashfield, 1984):
1. Begin with an initial split of the data into a specified number of clusters.
2. Allocate each data point to the cluster with the nearest centroid.
3. Compute the new centroid of the clusters after all points are assigned
to clusters.
4. Iterate steps 2 and 3 until no further changes occur.
Several general types of non-hierarchical clustering designs exist and
can be characterized:
a. Sequential Threshold--in this case a cluster center is selected and
all objects within a prespecified threshold value are grouped. Then a new
cluster center is selected and the process is completed. Once points enter
a cluster they are removed from further processing.
b. Parallel Threshold--this method is similar to the one immediately above
except that several cluster centers are selected in advance and points
within the threshold level are assigned to the nearest center; threshold
levels can then be adjusted to admit fewer or more points to clusters.
c. Parallel Partitioning--this method is similar to the one immediately
above except that once several cluster centers are chosen, the whole set
of data is partitioned into disjoint sets based on nearest distance to
cluster centers being within threshold distance of still other objects.
Matching-Type Measures. Quite often the analyst wishing to cluster
profiles must contend with data that are only nominal scaled, in whole
or in part. While dichotomous data, after suitable transformation,
can often be expressed in terms of interpoint distances, the usual approach
to nominal-scaled data uses attribute matching coefficients. Intuitively
speaking, two profiles are viewed as similar to the extent to which they
share common attributes.
As an illustration of this approach, consider the two profiles appearing
below:
ATTRIBUTE
| Object 1 2 3 4 5 6 |
| 1 1 0 0 1 1 0 |
| 2 0 1 0 1 0 1 |
Each of the above objects is characterized by possession or non-possession
of each of six attributes, where a "1" denotes possession and
"0" denotes non-possession. Suppose we just count up the
total number of matches--either 1, 1 or 0, 0--and divide by the total number
of attributes. A simple matching measure could then be stated as:
S12 = M/N = 2/6 = 1/3
where M denotes the number of attributes held in common (matching 1's or 0's) and N denotes the total number of attributes. We notice that this measure varies between zero and one.
If weak matches (non-possession of an attribute) are to be de-emphasized, the above measures can be modified to:
![]()
In this case, Sij = 1/5. A variety of such matching
type coefficients are described by Sneath and Sokal [1973] and Everett
[1980]. Attributes need not be limited to dichotomies, however. In
the case of unordered multichotomies, matching coefficients are often developed
by means similar to the above by recoding the k-state variables into k-1
dummy (zero-one) variables. Naturally such coefficients will be sensitive
to variation in the number of states in each multichotomy.
Finally, mention should be made of the case in which the variables consist
of mixed scales--nominal, ordinal, and interval. Interval-scaled
variables may be handled in terms of similarity coefficients by the simple
device of computing the range of the variable Rk and finding:

This measure has been suggested by Gower [1967] as a device to handle both nominal- and interval-scaled data in a single similarity coefficient.
Mixed scales which include ordinal-scaled variables present greater difficulties. If ordinal and interval scales occur, one can downgrade the interval-scaled data to ordinal scales and use nonmetric procedures. If all three scales--nominal, ordinal, and interval--appear, one is more or less forced to downgrade all data to nominal measures and use matching type coefficients. An alternative approach would be to compute "distances" for each pair of objects according to each scale type separately, standardize the measures to zero mean and unit standard deviation, and then compute some type of weighted association measure. Such approaches are quite ad hoc, however.
While the above classes of clustering algorithms are not exhaustive of the field, most of the currently available routines can be typed as falling into one (or a combination) of the above categories. Criteria for grouping include such measures as average within-cluster distance and threshold cut-off values. The fact remains, however, that even the "optimizing" approaches generally achieve only conditional optima, since an unsettled
question in this field is how many clusters to form in the first place.
The recommendation of an algorithm is difficult at best. For classification purposes, clusters should be able to identify distinct separations between different clusters of items. Clusters should also be internally consistent. Because meeting these challenges is often a function of the type of data analyzed, selection of an optimal algorithm is also a function of the characteristics of the data. Overall, the k-means clustering technique appears to perform well (see Punj and Stewart 1983) when the initial starting configuration is non-random. In situations where a random starting configuration is required, the minimum variance type of algorithm often performs well. It is even suggested that clustering might best be approached using a combination of reduced space analysis and clustering techniques, so as to group points in the space obtained from principal components or nonmetric scaling techniques. This approach is particularly beneficial if the number of dimensions is small, allowing the researcher to augment the clustering results with visual inspection of the configuration.
If the researcher is more concerned with structure than classification,
overlapping clustering ignores the concept of distinct separations between
clusters in an attempt to allow products/subjects to belong to more than
one cluster.
Describing the Clusters Once clusters are developed, the researcher
still faces the task of describing the clusters. One measure which
is used frequently is the centroid; that is, the average value of the objects
contained in the cluster on each of the variables making up each object's
profile. If the data are interval scaled and clustering is performed
in original variable space, this measure appears quite natural as a summary
description. If the space consists of principal components' dimensions
obtained from nonmetric scaling methods, the axes usually are not capable
of being described simply. Often in this case the researcher will
want to go back to the original variables and compute average profile measures
in these terms.
If matching type coefficients are used, the cluster may be described by the group's modal profile on each of the attributes; in other cases, arithmetic averages may be computed.
In addition to central tendency, the researcher may compute some measure
of the cluster's variability, e.g., average interpoint distance of all
members of the cluster from their centroid or average interpoint distance
between all pairs of points within the cluster.
Statistical Significance Despite attempts made to construct various
tests of statistical significance of clusters, current statistical tests
are little more than heuristics offering relatively indefensible procedures.
The lack of appropriate tests stems from the difficulty of specifying realistic
null hypotheses. First, it is not clear just what the universe of
content is. Quite often the researcher arbitrarily selects objects
and variables and is often interested in confining attention to only this
sample. Second, the researcher is usually assuming that heterogeneity
exists in the first place--otherwise, why bother to cluster? Moreover,
the clusters are formed from the data and not on the basis of outside criteria.
Thus, one would be placed in the uncomfortable statistical position of
"testing" the significance between groups formed on the basis
of the data itself. Third, the distributions of objects are largely
unknown, and it would be dangerous to assume that they conformed to some
tractable model like a multivariate normal distribution.
It is indeed likely that different types of clusters may be present simultaneously in the data. It is a major difficulty to specify a-priori the type of clustering or homogeniety to be detected.
These limitations in mind, Arnold (1979) proposed using a statistic originally suggested by Friedman and Rubin (1967). The statistic, given by
![]()
where |T| is the determinant of the total variance-covariance matrix and |W| is the determinant of the pooled within-groups covariance matrix.
We continue to believe that, at least in the present state of cluster analysis, the objective of is class of techniques should be to formulate rather than test categorizations of data. After a classification has been developed and supported by theoretical research and subsequent reformulation of classes, other techniques like discriminant analysis might prove useful in the assignment of new members to groups identified on grounds which are not solely restricted to the original cluster analysis.
While the above caveats are not to be taken lightly, clustering techniques
are useful--in ways comparable to the objectives of factor analysis--as
systematic procedures for the orderly preclassification of multivariate
data. The results of using these approaches can be helpful and meaningful
(after the fact) as will be illustrated next.
AN APPLICATION OF REDUCED SPACE AND CLUSTER ANALYSIS Thus far, our
descriptions of reduced space and clustering methods have largely remained
at the conceptual level. In this section of the section we describe
their application to a realistically sized problem--one dealing with the
similarities and differences among 90 1987 automobiles, trucks and utility
vehicles whose prices range from $5,000 to $168,000. In this abridged version
of the study, we illustrate the use of cluster analysis to 90 vehicles.
Figure 5-5 identifies the 90 vehicles for which data was collected on 20
attributes. This 90 vehicle x 20 variable data matrix forms the basis for
our analysis directed at grouping the vehicles according to similarity
of attributes.
Application of the Howard Harris procedure yielded two different clustering
solutions, which, based on the within-group sum of squares for each group,
appeared to be worth examining. Figure 5-6 shows the "Scree"-type
diagram plotting sum of squares against number of clusters. The curve appears
to flatten at 5 clusters and again at 12 clusters.
The 5-cluster solution is shown in Figure 5-7 and the 12-cluster solution is shown in Figure 5-8. As might be surmised, the 5 cluster representation was inferior to the 12 cluster representation.
TABLE 5-2
VEHICLE PERFORMANCE CHARACTERISTICS AND LISTING
_____________________________________________________________
| VAR | COLUMN | TYPE | EXPLANATION |
| 1 | 1-3 | Real | 1987 Model Base Price (In U.S. $) |
| 2-3 | 4-7 | Dummy | Country of Mgf. .(1 0 = US;1 1 = Asia;0 1 = Europe) |
| 4 | 8-9 | Real | Number of Doors (2,3,4,5) |
| 5-6 | 10-13 | Dummy | Body Style (0 0=Car;0 1=Truck;1 0=Wagon;1 1=Van) |
| 7 | 14-15 | Real | Number of Passengers (2,3,4,5,6,7,8,9) |
| 8 | 16-18 | Real | Number of Engine Cyl. (4,5,6,8,12) |
| 9-10 | 19-22 | Dummy | Motor Location.(1 0 = Front;1 1 = Mid-car;0 1 = Rear) |
| 11-12 | 23-26 | Dummy | Drive Location (1 0 = Front;1 1 = 4 WD;0 1 = Rear) |
| 13 | 27-30 | Real | Horse Power Rating |
| 14 | 31-32 | Ordinal. | Tire Type (1=<175 cm, 2=<200 cm, 3=>200 cm, 4=H/V
Rated, 5= Special Racing Tire) |
| 15 | 33-35 | Real | Turning Radius (FT.) |
| 16 | 36-38 | Real | Vehicle Curb Wgt (in 1000 lbs.) |
| 17 | 39-42 | Real | Vehicle Length in inches |
| 18 | 43-45 | Real | Vehicle Width in inches |
| 19 | 46-48 | Real | Vehicle Height in inches |
| 20 | 49-52 | Real | FT./LBS. of Torque |
VEHICLES
| 1 ASTON MARTIN SALOON
2 FERRARI TESTAROSA 3 LAMBORGHINI CONTACH 4 LOTUS TURBO ESPIRIT 5 PORSCHE 928 6 PORSCHE 944 7 MAZDA RX7 8 NISSAN 300ZX 9 FERRARI 928 GTS 10 CORVETTE 11 LINCOLN VII LSC 12 ROLLS CORNICHE 13 MERCEDES 560 SEC 14 CADILLAC SEVILLE 15 ASTON MARTN LAGONDA 16 BMW 730i 17 JAGUAR XJ6 18 CADILLAC DEVILLE 19 BMW 528e 20 JAGUAR XJS 21 MASERATI BITURBO 22 TOYOTA SUPRA 23 MITSUBISHI STARION 24 ACURA LEGEND 25 AUDI COUPE GT 26 CHRYS. LEBARON GTS 27 HONDA PRELUDE 28 ISUZU IMPULSE 29 NISSAN 200 SX 30 FIREBIRD TRANSAM |
31 SHELBY CHARGER GLHS
32 TOYOTA CELICA 33 VOLVO 780 COUPE 34 CHEVROLET CHEVETTE 35 DODGE COLT 36 FORD ESCORT 37 HONDA CIVIC 38 HYUNDI EXCEL 39 ISUZU IMARK 40 MAZDA 323 41 TOYOTA MR2 42 MERKUR XR4TI 43 MITSUBISHI MIRAGE 44 NISSAN SENTRA 45 FORD MUSTANG 46 SUBARU HATCHBACK 47 TOYOTA COROLLA FWD 48 VW GOLF 49 YUGO GV 50 AUDI 5000 51 DODGE COLT VISTA 52 TOYOTA CAMRY 53 BUICK SKYHAWK 54 CHEV. CELEBRITY 55 CHRYSLER LEBARON 56 HONDA CIVIC WAGON 57 DODGE ARIES 58 MAZDA 323 WAGON 59 NISSAN STANZA 60 SUBARU WAGON |
61 TOYOTA TERCELL
62 MERCEDES 300E 63 CHEV. G10 SPORTSVAN 64 DODGE B250 MAXIVAN 65 FORD E150 CLUB WGN 66 CHEV ASTRO 67 DODGE CARAVAN 68 FORD AEROSTAR 69 NISSAN VAN 70 TOYOTA VAN LE 71 VW VANAGON 72 SUBURBAN R10 73 JEEP GRAND WGN 74 DODGE RAM 50 75 CHEV. S10 76 ISUZU P'UP 77 MAZDA B2000 78 MITSUBISHI 79 NISSAN 80 TOYOTA 81 DODGE DAKOTA 82 JEEP COMANCHE 83 S10 BLAZER 84 BRONCO II 85 JEEP CHEROKEE 86 JEEP WRANGLER 87 ISUZU TROOPER 88 MITSUBIS. MONTERRO 89 TOYOTA LANDCRUSER 90 SUZUKI SAMARI |
FIVE CLUSTER SOLUTION
| CLUSTER NUMBER 1 N = 1
71 VW VANAGON CLUSTER NUMBER 2 N = 33 24 ACURA LEGEND 25 AUDI COUPE GT 26 CHRYSLER LEBARON GTS 27 HONDA PRELUDE 28 ISUZU IMPULSE 31 SHELBY CHARGER GLH-S 32 TOYOTA CELICA 35 DODGE COLT 36 FORD ESCORT 37 HONDA CIVIC 38 HYUNDI EXCEL 39 ISUZU IMARK 40 MAZDA 323 43 MITSUBISHI MIRAGE 44 NISSAN SENTRA 46 SUBARU HATCHBACK 47 TOYOTA COROLLA FWD 48 VW GOLF 49 YUGO GV 50 AUDI 5000 51 DODGE COLT VISTA 52 TOYOTA CAMRY 53 BUICK SKYHAWK 54 CHEV. CELEBRITY 55 CHRYSLER LEBARON 56 HONDA CIVIC WAGON 57 DODGE ARIES 58 MAZDA 323 WAGON 59 NISSAN STANZA 60 SUBARU WAGON 61 TOYOTA TERCELL 86 JEEP WRANGLER 90 SUZUKI SAMURAI |
CLUSTER NUMBER 3 N = 16
1 ASTON MARTIN SALOON 2 FERRARI TESTAROSA 3 LAMBORGHINI COUNTACH 4 LOTUS TURBO ESPRIT 5 PORSCHE 928 9 FERRARI 928 GTS 10 CORVETTE 11 LINCOLN MARK VII LSC 12 ROLLS ROYCE CORNICHE 13 MERCEDES 560 SEC 15 ASTON MARTIN LAGONDA 16 BMW 730i 17 JAGUAR XJ6 20 JAGUAR XJS 30 FIREBIRD TRANS AM 62 MERCEDES 300E CLUSTER NUMBER 4 N = 17 18 CADILLAC DEVILLE 63 CHEV. G10 SPORTSVAN 64 DODGE B250 MAXIVAN 65 FORD E150 CLUB WAGON 66 CHEV ASTRO 67 DODGE CARAVAN 68 FORD AEROSTAR 69 NISSAN VAN 70 TOYOTA VAN LE 72 SUBURBAN R10 73 JEEP GRAND WAGONEER 83 S10 BLAZER 84 BRONCO II 85 JEEP CHEROKEE 87 ISUZU TROOPER 89 TOYOTA LANDCRUSER |
CLUSTER NUMBER 5 N = 23 6 PORSCHE 944 7 MAZDA RX7 8 NISSAN 300ZX 19 BMW 528e 21 MASERATI BITURBO 22 TOYOTA SUPRA 23 MITSUBISHI STARION 29 NISSAN 200 SX 33 VOLVO 780 COUPE 34 CHEVROLET CHEVETTE 41 TOYOTA MR2 42 MERKUR XR4TI 45 FORD MUSTANG 74 DODGE RAM 50 75 CHEV. S10 76 ISUZU P'UP 77 MAZDA B2000 78 MITSUBISHI 79 NISSAN 80 TOYOTA 81 DODGE DAKOTA 82 JEEP COMANCHE 88 MITSUBISHI MONTERO |
12 CLUSTER SOLUTION
| CLUSTER NUMBER 1 N = 1
71 VW VANAGON CLUSTER NUMBER 2 N = 8 14 CADILLAC SEVILLE 24 ACURA LEGEND 25 AUDI COUPE GT 26 CHRYSLER LEBARON GTS 31 SHELBY CHARGER GLH-S 50 AUDI 5000 52 TOYOTA CAMRY 54 CHEV. CELEBRITY CLUSTER NUMBER 3 N = 10 5 PORSCHE 928 10 CORVETTE 11 LINCOLN MARK VII LSC 13 MERCEDES 560 SEC 16 BMW 730i 17 JAGUAR XJ6 18 CADILLAC DEVILLE 20 JAGUAR XJS 30 FIREBIRD TRANS AM 62 MERCEDES 300E CLUSTER NUMBER 4 N = 8 68 FORD AEROSTAR 69 NISSAN VAN 70 TOYOTA VAN LE 83 S10 BLAZER 84 BRONCO II 85 JEEP CHEROKEE 87 ISUZU TROOPER 89 TOYOTA LANDCRUSER |
CLUSTER NUMBER 5 N = 12
6 PORSCHE 944 7 MAZDA RX7 8 NISSAN 300ZX 19 BMW 528e 21 MASERATI BITURBO 22 TOYOTA SUPRA 23 MITSUBISHI STARION 29 NISSAN 200 SX 33 VOLVO 780 COUPE 41 TOYOTA MR2 42 MERKUR XR4TI 45 FORD MUSTANG CLUSTER NUMBER 6 N = 6 63 CHEV. G10 SPORTSVAN 64 DODGE B250 MAXIVAN 65 FORD E150 CLUB WAGON 66 CHEV ASTRO 72 SUBURBAN R10 73 JEEP GRAND WAGONEER CLUSTER NUMBER 7 N = 3 86 JEEP WRANGLER 88 MITSUBISHI MONTERO 90 SUZUKI SAMURAI CLUSTER NUMBER 8 N = 4 2 FERRARI TESTAROSA 3 LAMBORGHINI COUNTACH 500 4 LOTUS TURBO ESPRIT 9 FERRARI 928 GTS CLUSTER NUMBER 9 N = 9 74 DODGE RAM 50 PICKUP 75 CHEV. S10 PICKUP 76 ISUZU P'UP 77 MAZDA B2000 PICKUP 78 MITSUBISHI PICKUP 79 NISSAN PICKUP 80 TOYOTA PICKUP 81 DODGE DAKOTA PICKUP 82 JEEP COMANCHE PICKUP |
CLUSTER NUMBER 10 N = 3
1 ASTON MARTIN SALOON 12 ROLLS ROYCE CORNICHE II 15 ASTON MARTIN LAGONDA CLUSTER NUMBER 11 N = 18 27 HONDA PRELUDE 28 ISUZU IMPULSE 32 TOYOTA CELICA 34 CHEVROLET CHEVETTE 35 DODGE COLT 36 FORD ESCORT 37 HONDA CIVIC 38 HYUNDI EXCEL 39 ISUZU IMARK 40 MAZDA 323 43 MITSUBISHI MIRAGE 44 NISSAN SENTRA 46 SUBARU HATCHBACK 47 TOYOTA COROLLA FWD 48 VW GOLF 49 YUGO GV 53 BUICK SKYHAWK 61 TOYOTA TERCELL CLUSTER NUMBER 12 N = 8 51 DODGE COLT VISTA 55 CHRYSLER LEBARON 56 HONDA CIVIC WAGON 57 DODGE ARIES 58 MAZDA 323 WAGON 59 NISSAN STANZA 60 SUBARU WAGON 67 DODGE CARAVAN |
In Figure 5-6, we note that cluster membership is somewhat more evenly
distributed than in Figure 5-7 (twelve) clusters. The groups are
rather homogeneous, though now and again, a vehicle seems to be out of
place.
From Tables 5-6 and 5-7 one can get some idea of the current intermanufacturer
competition. Market positioning strategies seem to be well developed,
with several of the manufacturers having multiple vehicles within the same
segments. This product positioning is even more apparent when one recognizes
that these are the major model differences and that minor options/product
distinctions are present for most vehicles.
Summary of Study The foregoing results constituted only one of several possible facets of this study. Additional analytical steps may have involved: (a) the development of clusters based only on the nominal-scaled (features) data; (b) the development of clusters based only on the interval-scaled data; and (c) clustering (involving both features and measured data) on a combined time period basis.
In terms of substantive results, we found that five "clusters" explained most of the similarities and differences among the vehicles models--VW Vanagon, smaller 4-5 passenger vehicles and wagons, exotic sports and large capacity passenger cars and utility vehicles, and popular sports cars and pickups. Of course, the clusters became more detailed as the number of clusters increased. The resulting clusters indicated which manufacturers competed with which other manufacturers in terms of similarity in the performance profiles of their vehicles.
For purposes of this section, suffice it to say that clustering techniques
can be used in marketing studies involving large-scale data banks.
Moreover, the combination of reduced space (principal components) and cluster
analysis can provide a useful dual treatment of the data. The reduced
space phase may provides help in summarizing the original variables in
terms of a smaller number of dimensions, e.g., speed or cargo capacity.
The clustering phase permits one to group vehicles according to their coordinates
in this reduced space.
RECENT DEVELOPMENTS IN CLUSTERING TECHNIQUES Our previous discussion of clustering analysis has tended to emphasize the tandem approach of dimensional and nominal (class-like) representation of data structures. In addition to using multidimensional scaling techniques for reduced space analysis, a number of other nonlinear approaches have been developed, including nonlinear factor analysis [McDonald, 1962], polynomial factor analysis [Carroll 1969], correspondence analysis [Carroll, Green and Schaffer, 1986].
Space does not permit anything but brief mention of this interesting
work. We do consider in some detail, however, a combination qualitative-quantitative
approach to an important problem in reduced space analysis--the interpretation
of data structures.
Nominal vs. Dimensional Structures As mentioned earlier, even
a pure class structure--where class membership accounts for all of the
information in the data--can be represented spatially. More commonly,
however, we consider cluster analysis as a more appropriate technique for
characterizing such data. On the other hand, other data structures
are inherently dimensional, so that measures of proximity are assumed to
be able to vary rather continuously throughout the whole matrix of proximities.
Pure typal and pure dimensional structures represent only two extremes. Since all proximity matrices (that obey certain properties [Gower, 1966]) can be represented spatially, it would seem of interest to consider data structures in terms of the restrictions placed on the points as they are arranged in that space. This motivation underlies many of the most recent developments in cluster analysis.
Torgerson [1965] was one of the first researchers to become interested
in the problem of characterizing data as "mixtures" of discrete
class and quantitative variables. Several varieties of such structures
can be obtained:
1. Data consisting of pure and unordered class structure. Dimensional
representation of such data would consist of points at the n vertices of
an n-1 dimensional simplex where interpoint distances are all equal.
For example, three classes could be represented by an equilateral triangle
in two-space, four classes by a regular tetrahedron in three-space, and
so on.
2. Data consisting of concentrated masses of points, corresponding to classes,
where interclass distances are unequal, thus implying the existence of
latent dimensions underlying class descriptions.
3. Data consisting of hierarchical sets of attributes where some classes
are nested within other classes, e.g., cola and non-cola drinks within
the diet-drink class.
4. Data consisting of dimensional variables nested within discrete classes,
e.g., sweet to non-sweet cereals within the class of "processed"
shape (as opposed to "natural" shape) cereals.
5. Data consisting of mixtures of ideal (mutually exclusive) classes so
that one may find, for example, points in the interior of an equilateral
triangle whose vertices represent three unordered classes.
6. Data consisting of pure dimensional structure in which, theoretically,
all of the space can be filled up by points.
Insert Figure 5-7
While the above categorizations are neither exclusive nor exhaustive, they are illustrative of the variety of data structures that could be obtained in the analysis of "objective" data or subjective (similarities) data of the sort described in the preceding sections. From the viewpoint of cluster analysis, some of the above structures could produce elongated, parallel clusters in which average intracluster distance need not be smaller than intercluster distances. Moreover, one could have structures in which the clusters curve or twist around one another along some manifold embedded in a higher dimensional space [Shepard and Carroll, 1966].
Figure 5-8 shows three types of data structures as related to the above categories [Torgerson, 1965]. The first panel illustrates the case of three unordered discrete classes. The second panel illustrates the case of discrete class structure where class descriptors are assumed to be orderable. The third panel shows the case of three discrete classes and an orthogonal variable which is quantitative. Points occur only along the solid lines of the prism. The fourth panel illustrates the case where objects are made up of mixtures of discrete classes plus an orthogonal quantitative dimension. In this case all objects lie on or within the boundaries of the curve prism while "pure" cases would lie at one of the three edges with location dependent upon the degree of the quantitative variable which each possesses.
Research in cluster analysis and related techniques is proceeding in
new directions for dealing with heretofore intractable data structures.
The continued development and refinement of interactive display devices
should further these efforts by enabling the researcher to "visualize"
various characteristics of the data array as a guide to the selection of
appropriate grouping methods.
Overlapping Clustering Techniques The key element of all clustering
techniques discussed so far is the mutually exclusive and exhaustive nature
of the clusters developed. While in most cases, managers view segments
as mutually exclusive and hierarchical in nature, cases do exist where
segments are mutually exclusive. Indeed, consumers may well fit into several
segments. Overlapping clustering is a new clustering model which relaxes
the exclusivity constraint of most other hierarchical and non-hierarchical
cluster models.
As an example of a cluster analysis of brands of soft drinks, Tab may be perceived as fitting into clusters identifying diet drink, cola, and used by women, whereas Diet Pepsi would fit into only the first two benefit clusters. Brands might compete across product categories. V8 drink would compete against other vegetable/fruit drinks, as well as against soft drinks and even as a between meals snack. A cluster of toothpaste users might show that Aqua-Fresh toothpaste appeals to the fresh breath, decay prevention, and brighteners clusters, while Crest may appeal to only the decay prevention benefit cluster.
Overlapping clustering simply allows for patterns of overlapping to be considered. Arabie [1977], Shepard and Arabie [1979], Arabie and Carroll [1980], Arabie, Carroll, DeSarbo and Wind [1981] outline methods for overlapping clustering, but point out that limitations do occur in practice. First, it is difficult to develop an algorithm that effectively considers all possible cluster overlap options, especially if the sample size is large. Second, most overlapping clustering algorithms produce too many clusters with excessive overlap. A high degree of overlap results in poor configuration recovery, or in otherwords, a great mathematical model that is difficult to visualize from the data.
Shepard and Arabie [1979] provide a detailed explaination of their ADCLUS (for "additive clustering") model. The ADCLUS model represents a set of m clusters which may or may not be overlapping. Each cluster is assigned a numberical weight, wk, where k=1,...,m. The similarity between any pair of points is predicted in the model as the sum of the weights of those clusters that contain the pair. Arabie and Carroll [1980] and Arabie, Carroll, DeSarbo and Wind [1981] further develop the ability to fit the ADCLUS by presenting the MAPCLUS (for MAthematical Programming CLUStering) algorithm. This implementation appears to meet the needs of clustering items in more than a single cluster. In addition, clusters may be added, deleted, or modified to produce constrained solutions [Carroll and Arabie, 1980], and estimate (in a regression sense) the importance of new sets of clusters in explaining variance in the data.
The importance of overlapping clustering is self evident, particularly
in applications where clusters are not mutually exclusive, but are overlapping.
This reality reflects the existence of multi-attribute decision rules in
decision making behavior, divergent product application or use scenarios,
and even joint decisions made by multiple users within the same household.
SUMMARY This section has considered a companion objective of the
scaling of similarities and preference data--the use of metric and nonmetric
approaches in data reduction and taxonomy. We have pointed out that
many of the multidimensional scaling programs can serve useful functions
as types of nonmetric factoring procedures. Moreover, clustering
procedures are often a helpful adjunct in data analysis when one desires
to group objects (or variables) according to their relative similarity.
We first discussed metric approaches to reduced space analysis, more specifically focusing on principal components. This was followed by brief descriptions of nonmetric analogues to factor analysis, including several of the algorithms originally discussed in the context of similarities and preference data.
We then turned to a description of clustering methods and addressed the topics of association measures, grouping algorithms, cluster descriptions and statistical inference. This led to presentation of some pilot research utilizing cluster analysis, in examining the performance structure of the automobile market.
We concluded the section with a description of the general problem of portraying data structures that consist of mixtures of categorical and dimensional variables and a discussion of the usefulness of overlapping clustering.
1. We note that factor analysis can be used to cluster respondents and cluster analysis can be used to group variables. This is done by transposing the data matrix (i.e., using a variables by subjects data matrix rather than the more common subjects by variables data matrix).
2. See Jackson [1983] for a step-by-step demonstration of the single, complete, and average linkage rules.