USE AND THEORY OF MONANOVA, A PROGRAM TO ANALYZE FACTORIAL EXPERIMENTS BY ESTIMATING MONOTONE TRANSFORMATIONS OF THE DATA(1)

J. B. Kruskal
Bell Telephone Laboratories

Frank J. Carmone, Jr.
Marketing Science Institute


NON-TECHNICAL INTRODUCTION

MONANOVA is an additive (main effects) model for performing conjoint analysis with full factorial designs. Operationally, this means that each respondent is given a set of descriptions to rank or otherwise respond to. Each of these descriptions is formed from a combination of the levels of the two or more independent variables. Each of the descriptions must be formed according to a full factorial design.

The analysis finds utility values for each level of each independent variable such that the ranking of main-effects combinations best preserves the original ranking of the design descriptions.

MONANOVA is capable of analyzing data for multiple respondents or aggregate data for a group of respondents.


TECHNICAL INTRODUCTION

A preliminary transformation of data will sometimes greatly improve subsequent analysis.  For example, suppose a 3x3 factorial experiment yields the values:

1   4   9
4   9  16
9  16  25

Then taking square-roots removes the interaction term from the analysis of variance.  It also leads to a much simpler description of the original data as the squares of an additive table.

MONANOVA (Monotone Analysis of Variance) is a procedure for transforming data from a factorial experiment.  It searches over all monotone (ascending) transformations of the data, and picks the 'best' one.  This means the monotone transformation which results in the greatest percentage of variance being accounted for by the main effects.

The theory and computational procedure have been published by Kruskal (1964). Briefly, the algorithm seeks to minimize the 'stress' (which can be thought of as the residual variance divided by the total variance), and uses an iterative numerical method of steepest descent (method of gradients) for this purpose.  Further explanation is given in a theory section, which contains important new insight not included in Reference 1.


Theory

Since a complete discussion of the theory has already been published, we give here only a brief discussion to help clarify what the program does.  However, this discussion embodies a further clarification of the theory which was not available before.  For simplicity, it will be phrased in terms of a two-way design with no replications, although the theory and the program apply much more generally.

Suppose the data values are dij, with i = 1 to I and j = 1 to J.  For convenience, we will sometimes think of the data values as strung out in a long vector, and we call that vector D.  The idea is to find a monotonic transformation fM such that the values fM(ij) can be explained as fully as possible by their main effects,. that is, so that they are as nearly additive as possible.  For convenience, we will sometimes think of the values fM(ij) as strung out in a long vector, and we call that vector M.  Any vector M obtained in this way will be 'monotonic over D,' or just 'monotonic.'

Given any vector M, we can do an analysis of variance, and use the main effects to form the 'fitted value' for each cell.  (If the grand mean is µ, the row effects are , and the column effects are ßj, then the fitted value for cell ij is µ + a + ß).

Of course, the fitted values are perfectly additive.  When they are strung out into a long vector we call that vector Pa(M).  Any such vector which comes from perfectly additive values is called on 'additive vector,' and may be referred to as A.  Going back to the basic meaning of analysis of variance, Pa(M) may be described as the additive vector A which is closest (in the sum of squares sense) to M.  It may also be described as the orthogonal projection of M onto the space of additive vectors.

In the terminology just described, we can say that our goal is to find the monotonic function fM such that the sum of squares of the components of [M - Pa(M)] is as small as possible.  Of course, we must mean this relatively, not absolutely, for we can make the sum of squares 0 trivially by using fM = 0, and hence, M = 0.  Thus, more precisely, our goal can be taken as finding an M that minimizes:

R1(M) = (sum of squares of [M - Pa(M)]) / (sum of squares of M) = (sine)2 of the angle between M and Pa(M).

However, finding M is equivalent to finding Pa(M), for it is possible to recover M from the latter (see remarks below).  It is computationally more convenient to work with additive vectors rather than monotonic vectors, primarily because they can be described by far fewer parameters.

Given any additive vector A, we can do a least squares monotone regression of A over D, and form a vector Pm (A) which is monotone over D.  (By definition, Pm (A) is the monotone vector for which the sum of squares of [A - Pm (A)] is minimum.) Thus, we are naturally led to a very similar but apparently distinct goal, namely to find that A which minimizes:

R2(A) = (sum of squares of [A - Pm (A)]) / (sum of squares of A) = (sine)2 of the angle between A and Pm (A)

Fortunately, it is possible to prove mathematically that these two goals are essentially the same.

In brief, given the data vector D, suppose M' minimizes R1 over all vectors monotone in D, and suppose A' minimizes R2 attention entirely to vectors whose components average to 0.  Clearly, this makes no real difference.)

Theorem:

Pm (A') is a scalar multiple of M',
Pa(M') is a scalar multiple of A',
R1(M') = R2(A').

Actually, we need to assume also that one of these problems has a unique solution (up to scalar multiples).  This added assumption apparently holds except when it is possible to find a vector which is both additive and monotonic at the same time, that is, except when:

R1(M') = R2(A') = 0.

'Stress' refers to the square-root of R2.  The program actually searches over the space of additive vectors, parameterized by the row and column effects, and seeks to minimize the stress.  It uses the method of gradients to perform this search. All the 'natural' scale factors will lead to essentially the same solutions.  This further increases the theoretical appeal of this method.

This is not the place to prove the theorem.  However, a brief sketch may be useful.  The proofs are not deep, but require a certain amount of mathematical sophistication.  Briefly, let us impose the convention of having average value 0 on all additive and monotonic vectors.  Then the set of all possible additive vectors A form a subspace a of vector space.  The set of all possible monotonic vectors M form a polyhedral convex cone from the origin.  Pa and Pm are the orthogonal mappings onto the corresponding sets.  We look at the face of m which contains M', and reduce the problem to a simpler one in which m is a subspace instead of a polyhedral cone.  Then we use the fact that A' minimizes the numerator sum of squares, subject to the constraint of having length 1 and lying on a.  We set this up as an unconstrained minimization by the use of Lagrange multipliers, and use the fact that the gradient vector must be 0.  (Colin Mallows provided the idea for this proof.)


RUNNING MONANOVA

The PC-MDS version of MONANOVA accommodates data from factorial experiments with up to 12 factors, up to 100 levels per factor, and unlimited replication within cells.  The total number of data values, however, is limited to 5000.  Any number of observations may be missing from the design.  Thus, it is possible to handle a Latin square design, for example, by considering it in a standard manner as a three-way table with most entries missing.

Data must be entered as though the problem is a full factorial design. This means that if the design is a fractional factorial, zero values must be entered in the portion of the full factorial design where no data is collected.

Figure 1 shows a complete data set ready for input to MONANOVA.

                        FIGURE 1
                  A PRELIMINARY EXAMPLE
+------------------------------------------------------------+
¦ LINE NO.    COLUMNS 1-72        FORMAT                     ¦
+------------------------------------------------------------+
¦                                                            ¦
¦   1	  3  2  2  2  1        FREE          Parameter line  ¦
¦   2   (4F5.3)                              Format line     ¦
¦   3	 10.0  2.1  4.3  5.2    Defined by     Data line     ¦
¦   4	  7.1  8.4  7.3  8.9    Format                       ¦
¦                                                            ¦
+------------------------------------------------------------+


Interpreting MONANOVA

The first line must contain the parameters. This line contains up to 6 integers.
The first parameter indicates the number of factors, which cannot exceed 12. 
The next several parameters indicate the number of levels in each factor.  There must be as many of these parameters as there are factors. 
The last parameter indicates the maximum number of replications in each cell of the design. 
If any cell contains less than this number of replications, artificial values must be supplied (so that the program will be able to know which values go in which cells).  The artificial values should be chosen so they will be discarded, as explained in connection with the parameter CUTOFF.

In the example, line 1 contains the parameter input.  The first parameter indicates that there are 3 factors,. that is, the data pertain to a 3-way design.  The next parameters indicate that each of these three factors has 2 levels.  The last parameter indicates that each cell of the design has (no more than) 1 replication.  From these parameters it follows that the data must contain 2x2x2x1 = 8 data values.

The second line of the data file must contain a format statement. It should contain a Fortran format suitable for reading the entire set of data.  (For information about Fortran formats, consult the first section of this manual or a Fortran manual.) A format conversion character like E or F, which yields floating-point numbers internally, must be used.  The format should be enclosed in parentheses, and should occupy no more than the first 80 characters of the line.

In the example, line 2 contains the format for reading the data.  The format statement (4F5.3) is constructed according to the rules for Fortran formats.  Four observations are read per line of data, and each is read as a field five columns wide with three decimal places. In this example we have placed a decimal point, which overrides the statement's three decimal places. It does, however look for four consecutive fields of five places each.

The third and subsequent lines should contain the data values themselves, entered according to your format on the format line.  The parameter line specifies exactly how many data values should occur, namely the product of the number of levels in each factor, times the number of replications.  The format specifies exactly how many data values must occur on each data line (except the last).  Thus, the number of data lines is implicitly specified.  Exactly the right number of them must appear.  If any data values are missing, artificial values should be supplied, in such a way that they will be discarded, as explained in connection with the option CUTOFF.

The arrangement of the data items is rigidly specified.  Briefly speaking, the replication index 'changes fastest,' the index of the last factor 'changes next fastest,' and so forth, and the index of the first factor 'changes slowest.' More precisely, let us characterize each data value (e.g., 4.3 in the preliminary example) by a sequence of indices (e.g., 1211) where the first index indicates the level of the second factor, and so forth, and the last index indicates the replication number.  Then the arrangement of data values is such as to put the corresponding sequences in dictionary order (lexicographical order).

Because the replications subscript changes most rapidly, the data from multiple subjects is read together. For example, we read data for the first level combination. Data for all subjects is read. Next, data for all subjects is read for the second level combination. This process continues until data for each subject is read for the last level combination.

In the example, lines 3 and 4 contain the data for a total of 8 data values (2x2x2), four of which are contained in each data line (as indicated by the format statement). The 8 data points are ordered by (2x2x2) levels, but are also arranged on the data file.  The arrangement of data values is in fixed format such that it is found in exactly the same place on each of the data lines.


Output

The program provides output to a user specified disk file.  This output is written in ASCII text file formatted as 80 character wide records which may be accessed by any word processor or file editor.

Output includes (1) a final configuration, or solution, and (2) the corresponding value of the 'stress' (The stress is the badness-of-fit quantity which is minimized by the program.), additionally (3) a 'history of the calculation' is printed, which shows the progress of the iterative convergence process, and (4) a scatter plot is provided which displays two functions on one plot.  Function one shows the best monotonic transformation of the data (indicated in the vertical axis) versus the original data values (horizontal axis).  The second scatter plot function shows, for each original data value, the fitted value for that cell based on main effects.

The monotonic transformation displayed is a least squares monotone regression of the fitted values.  (This is the operation referred to as PM in the section of Theory.) If we go from the monotonically transformed values to fitted values by conventional analysis of variance (the operation referred to as PA in the section on Theory), we will obtain fitted values which are not the same as those displayed and printed, but are smaller by a constant factor, for reasons which are made clear in the Theory section.  (If the calculation has not converged fully, even this may still fail.  Indeed, such failure is good evidence of incomplete convergence.) This constant factor is equal to (1 - stress2), and is thus nearly 1.  For example, if the stress is 0.1 (that is, 10%) then the factor is 0.99.  Thus, for practical purposes, we may think of the main effects as the result of an ordinary analysis of variance of the transformed data values.


Control Options

The program permits various different options, which are accessible from the Monanova menu.  These are specified below.  For every option, there is a 'normal' or 'default' choice.  If the option is not explicitly controlled, the normal choice is made.  Thus, it is only necessary to mention those options where other than the normal option is chosen.

+---------------------------------------------------------+
¦              M O N A N O V A   M E N U                  ¦
¦       ENTER IDENTIFYING NUMBER TO SELECT OPTION:        ¦
¦                                                         ¦
¦      NUM.  VARIABLE   DESCRIPTION/DEFAULT               ¦
¦       1     STRMIN    MINIMUM STRESS OBJECTIVE AT       ¦
¦                       WHICH ITERATION IS TERMINATED     ¦
¦                       (DEFAULT = .01)                   ¦
¦       2     SRATST    MINIMUM ACCEPTABLE CHANGE IN      ¦
¦                       STRESS ON A GIVEN ITERATION       ¦
¦                       (DEFAULT=.999) (MUST BE LT. 1.0)  ¦
¦       3     ITERAT    MAX. ITERATIONS TO BE             ¦
¦                       PERFORMED (DEFAULT = 50)          ¦
¦       4     CUTOFF    IGNORES DATA POINTS IF LT. CUTOFF ¦
¦                       (DEFAULT CUTOFF = -1.23E+20)      ¦
¦       5     PRINT     PRINT INPUT (DEFAULT = YES)       ¦
¦       6     TIES      1=PRIMARY(DEF)   2=SECONDARY      ¦
¦       7     READ      1 = DATA SETUP (DEFAULT)          ¦
¦                       2 = CONFIGURATION SETUP           ¦
¦       8     WRITE     CONFIGURATION FILE (DEF = NO)     ¦
¦       9     TITLE     GIVE A NEW TITLE   (DEF = NO)     ¦
¦      10     EDITING COMPLETE, START ANALYSIS NOW.       ¦
¦      11     QUIT  PROGRAM                               ¦
+---------------------------------------------------------+

For control option parameters, a sharp distinction is made between integers and numbers-with-decimal-point.  They are distinguished by the presence of absence of the decimal point, and the correct type must be used whenever a parameter value is entered.

Normally, the computation stops when any one of several criteria is reached.  The reason for stopping is indicated in a general way by printed phrase such as MINIMUM WAS REACHED (in fact, it may be either local or the global minimum, but the program has no way of knowing), or MAXIMUM NUMBER OF ITERATIONS WAS USED.  To alter the usual numerical values used in these criteria, the following control options may be used:



STRMIN = number-with-decimal-point (normal value is 0.01),

STRATST = number-with-decimal-point (normal value is 0.999),

ITERATIONS = integer (normal value is 50).



STRMIN

STRMIN (which is short for 'stress minimum') simply controls the 'satisfactory value' of stress at which scaling is terminated, regardless of whether other criteria have been reached.  The other two options pertain to the criterion for deciding that a local minimum has been reached.



STRATST

STRATST refers to how slowly the stress values must be changing for a local minimum to be presumed.  It should always be smaller than 1.  For greater stringency, a value closer to 1, such as 0.9999, should be used.  For less stringency, a value such as 0.99 should be used. 



ITERAT

ITERATIONS simply controls the maximum number of iterations that will be performed in any given number of dimensions.  When this number is reached, scaling is terminated regardless of other criteria.

CUTOFF

If individual data values are missing, this fact should be indicated by using values which are below the 'cutoff' value.  Such data items are discarded when the data are read in.  Unless otherwise specified, the cutoff value is -1.23x1020.  If you wish to use some other cutoff, you must specify the cutoff option

and enter a value:



CUTOFF = number-with-decimal-point(normal value is -1.23x1020).



Values which are < CUTOFF will be discarded.  Since the value of this parameter is used while the data is being read in, it must be specified from the menu. If the option is omitted, the 'normal' value is used, which is unlikely to exclude any data values.



PRINT

The print option may be used to print the data, together with the values of the factor indices.  This is convenient for reference, or if the program seems to misunderstand how your data are arranged.



TIES

If the data values contain ties (that is, exactly equal values), two different kinds of (least squares) monotone regression are possible.  In the 'primary approach,' no restriction is placed by the regression on values which correspond to tied data values.  In other words, these fitted values need not be equal.  In the 'secondary approach,' it is required that if two data values are tied, then their corresponding fitted regression values must be equal.

There does not seem to be any basis in general for preferring one approach over the other.  The primary approach is appealing because it gives a broader meaning to the notion of 'monotonic transformation.' In some contexts, one approach or the other seems intuitively to be more natural.  If there are only a few ties, it usually makes little practical difference which approach is used. 



When there are a great many ties (for example, when the data values simply indicate which of several coarse categories the appropriate item lies in), the secondary approach is less likely to lead to degeneracy.  For further information, see References 1 and 8.

To determine how ties among the data should be treated, you may use one of these control options:



TIES 1 = PRIMARY (Default normal case),

TIES 2 = SECONDARY.

In the 'default' case, the option need not be specified.



READ

1 = Data File (Default)

2 = Configuration File



Normally, the program generates a configuration of its own from which to start the iterative process.  If for some reason you want to start the iterative process from a configuration of your own choosing, you can include a 'configuration data set.' The option READ signals that a fixed sequence of non-control lines will be read in.



WRITE

Following the analysis of one data file, it is possible to create a configuration file containing the resulting

configuration.  To control this option, use one of these phrases:



WRITE 1 = NO (Normal case),

WRITE 2 = YES.

The latter option causes an entire 'configuration file' to be output for each run performed.  This file, contains a parameter line, and finally the configuration itself.

TITLE

TITLE 1 = NO (Default),

TITLE 2 = YES.

Title allows input of a new title for purposes identifying a new analysis.  This option is useful where multiple analyses are run from a single data file containing one or more data sets.


SAMPLE DATA FILE
  3  2  2  2  1 
(8F7.2)
  98.18  65.62  39.97   7.41  87.08  54.52  28.86    .00


NOTES

1. This paper has been extensively modified for PC-MDS by Scott M. Smith


SAMPLE OUTPUT
                                 M O N A N O V A 
                         MONOTONE  ANALYSIS  OF  VARIANCE 
                     PROGRAM WRITTEN BY DR. JOSEPH B. KRUSKAL 
                                 PC-MDS  VERSION 
  

 ANALYSIS TITLE: MONANOVA TEST DATA
 DATA IS READ FROM FILE: MONANOVA.DAT
 OUTPUT FILE IS: MONANOVA.PRN    
  
 INPUT PARAMETERS:  3  2  2  2  1
 INPUT FORMAT:        (8F7.2)                                                     
 SEQ. NO.   DATA     SUBSCRIPTS 
  
     1      98.18000    1    1    1 
     2      65.62000    1    1    2 
     3      39.97000    1    2    1 
     4       7.41000    1    2    2 
     5      87.08000    2    1    1 
     6      54.52000    2    1    2 
     7      28.86000    2    2    1 
     8        .00000    2    2    2 
  
 HISTORY OF COMPUTATION. 
  
ITERAT STRESS  SRAT  SRTAVG CAGRGL  COSAV  ACSAV    GRMAG   GRMULT    STEP 
   0   .000   .0000  1.2000   .000   .000   .200   .00000   .00000  .00000 
  
 ZERO STRESS WAS REACHED 
 MINIMUM WAS ACHIEVED 
 SATISFACTORY STRESS WAS REACHED 
 FINAL CONFIGURATION HAS STRESS OF     .0 PERCENT. 
  
 MONANOVA TEST DATA                                                               
   2   .266  -.266 
   2  1.498 -1.498 
   2   .827  -.827 
          .   5.8908.  27.4904.  49.0900.  70.6896.  92.2892. 
       -4.9090   16.6906   38.2902   59.8898   81.4894  103.0890 
        *.****.****.****.****.****.****.****.****.****.****.* 
  2.85 ..                                                   ..2.85 
T 2.64 ..                                                0  ..2.64 
H 2.43 ..                                                   ..2.43 
E 2.22 ..                                                   ..2.22 
  2.01 ..                                           0       ..2.01 
X 1.80 ..                                                   ..1.80 
  1.58 ..                                                   ..1.58 
A 1.37 ..                                                   ..1.37 
R 1.16 ..                                                   ..1.16 
E  .95 ..                                 0                 .. .95 
   .74 ..                                                   .. .74 
T  .53 ..                                                   .. .53 
H  .32 ..                            0                      .. .32 
E  .11 ..                                                   .. .11 
  -.11 ..                                                   ..-.11 
L -.32 ..                     0                             ..-.32 
I -.53 ..                                                   ..-.53 
N -.74 ..                                                   ..-.74 
E -.95 ..                0                                  ..-.95 
A-1.16 ..                                                   .-1.16 
R-1.37 ..                                                   .-1.37 
 -1.58 ..                                                   .-1.58 
M-1.80 ..                                                   .-2.80 
O-2.01 ..      0                                            .-2.01 
D-2.22 ..                                                   .-2.22 
E-2.43 ..                                                   .-2.43 
L-2.64 ..  0                                                .-2.64 
 -2.85 ..                                                   .-2.85 
        *.****.****.****.****.****.****.****.****.****.****.* 
         .   5.8908.  27.4904.  49.0900.  70.6896.  92.2892. 
       -4.9090   16.6906   38.2902   59.8898   81.4894  103.0890 
  
  
 SEQ NO  DATA     LINEAR   MONOTONE MODELS 
    1    98.180     2.592     2.592 
    2    65.620      .937      .937 
    3    39.970     -.405     -.405 
    4     7.410    -2.059    -2.059 
    5    87.080     2.059     2.059 
    6    54.520      .405      .405 
    7    28.860     -.937     -.937 
    8      .000    -2.592    -2.592 
  
                                  ************ 
            SPEARMAN-S RANK DIFFERENCE CORRELATION COEFFICIENT (RD)  
                           RD           =  1.000000 

                           RD - SQUARED =  1.000000