STEPWISE DISCRIMINANT ANALYSIS

Program, documentation and technical appendix are modified from the BMD statistical package, BMD07M was developed under a National Science Foundation grant.


REQUIREMENTS: Discriminant analysis tests n predictor (independent) variables as discriminators of the differences between k groups of a single discrete (categorical) dependent variable. The dependent variable is interval scaled.


Multiple regression analysis and the analysis of variance and covariance each deal with the analysis of dependent variable structures; the data structure having a single criterion variable and multiple predictors. Several other techniques are available groups variation relative to within-groups variations.

TWO-GROUP DISCRIMINANT ANALYSIS:

In discriminant analysis we predict classification into a given category as a function of the predictor variables. Here we assume a linear function. In case of Two-group Discriminant Analysis, only two categories are involved.

The Objectives of a Two-Group Analysis:

1) Finding Linear combinations of the predictor variables that enable the analyst to separate the groups by maximizing between-variable variation relative to within-groups variations. identity.

2) Establishing procedures for assigning new individuals into the groups. For these individuals, a profile on the predictor variables are known, but not for group identity.

3) Testing whether significant differences exist between the means of the two groups on the predictor variable profiles.

4) Determining which variables account for most of the differences in mean profiles of the groups.

The Two-Group Discriminant Analysis:

Group X1 Group X2

Category A

Category B

1) The variable is classified in two categories. The various sum of squares and cross products, the means of each group and the total sample mean are computed. The key problem of the two-group discriminant analysis is to find a new axis so that projections of the points onto that axis maximizes the difference between the group means relative to their variability on the composite. The discriminant axes is determined as a set of weights, one for each predictor-variable axis-so that we have a linear function, such as:

Y = C1X1 + C2X2, where C1,C2 are the weights.

2) Computing the Discriminant Weights. In this case we find out the sum of squares and the cross products that relate to variation within the groups.

	
---------------------------------------------------
            ¦   Group X1       ¦  Group X2       ¦
 Category A ¦                  ¦                 ¦
            ¦                  ¦                 ¦
 Category B ¦                  ¦                 ¦
---------------------------------------------------

To find C1 and C2, we solve the following simultaneous equations:

 
-----------------------------------------------------------------
                       |    Group X1  |    Group X2      |     
------------------------------------------------------------------
 Sx1²  = SX1²-nX1²     |              |                  |
 Sx2²  = SX2²-nX2²_    |              |                  |
 Sx1x2 = SX1X2-nX1X2   |              |                  |    
-----------------------------------------------------------------

3) Plotting the Discriminant Function. Along with the original plot, the discriminant axis is also shown by passing a straight line from the origin through (C1,C2). The original points can then be projected on the discriminant axis and the discriminant scores for each variable is obtained.

                      _    _
    Sx1²C1 + Sx1x2C2 = X1A - X1B
_ _ Sx2²C2 + Sx1x2C1 = X2A - X2B

4) The Discriminant Criterion. The two group means and the grand mean of the discriminant scores are calculated. Then we compute a measure of between groups variability by finding the deviation of each of the two group means from the grand mean on the discriminant function. These deviations are squared, multiplied by the number of persons in each groups and summed. Next the within groups variability is found by taking the squared deviations about each group mean. The sum of the two separate measures gives a pooled within groups sum of squares. The discriminant Criterion is the ratio of the between groups sum of squares to the within-groups sum of squares. This is the criterion we are trying to maximize in the analysis.

5) Classification. The classification problem in turn involves two additional questions.

1) How well does the function assign the known cases in the sample, and
2) How well does it assign new cases not used in computing the discriminant analysis parameters. These questions are direct parallels of R-Square, the strength of the relationship and Cross-Validation in Regression Analysis.

Assignment Rule: Assign all cases with discriminant scores to the left of the mid-point to the Category A and the ones to the right of the mid-point to Category B. This assignment rule makes the following two specific assumptions: 1) The prior probability of a new case falling into each of the groups is equal across the groups, and 2) The cost of misclassification is equal across the groups.

6) Testing Statistical Significance: We will test whether the group centroids are different. Tests of the equality of the group centroids are based on the F-ratio. This statistical test is calculated from a variability measure called the Mahalanobis squared distance.


MULTIPLE DISCRIMINANT ANALYSIS

The assumptions and objectives of the Two-Group analysis hold for the Multiple Discriminant Analysis. The primary distinguishing feature between the two is that in a multiple discriminant analysis, more than one discriminant function may be computed. In general with K groups and L predictors, we can find up to the lesser of K-1 or L discriminant functions.

Multiple Discriminant Analysis because of its complexities is usually carried out by means of computer programs. The computations in the stepwise discriminant program follows the following steps:

I. Separate F ratios are computed for each of the predictors.

II. The predictor variable with the largest F ratio is entered into the equation. The predictor is entered based on the minimum significance and tolerance levels preset for the analysis.

III. The partial F ratios of all variables not entered in the equation are computed. The F ratios are computed without the predictor variable already entered. The next predictor is then added as per its adjusted F ratio.

IV. Each variable is tested for retention in the discriminant function based on association with the other predictor variables.

V. The stepwise process is repeated until all variables meeting the pre-set levels of significance, tolerance and retention are entered into the equation.

VI. At each stage, stepwise procedure tests are made of inter-group separations and pairwise group separation between all distinct groups. At some steps, the discriminant functions are computed for the variables included as predictors at that point.

VII. A summary of the stepwise procedure is computed. Output includes the variables entered or removed, their associated F ratios, inter-group significance and posterior probability of each case arising from each group is presented.


STAGES OF ANALYSIS

There are differences in the applications and requirements of discriminant analysis.These are often misinterpreted, especially with respect to the validation of the predictive discriminant analysis. These differences are summarized below.

------------------------------------------------------------------------------------------+  
¦       ¦                  ¦   Classification    ¦    Classification ¦   Classification   ¦  
¦       ¦                  ¦     Analysis of     ¦      Analysis of  ¦    Analysis of     ¦  
¦       ¦  Predictive      ¦    Initial Data     ¦      New Data Set ¦    New Data Set    ¦  
¦       ¦  Discriminant    ¦     Set of Known    ¦        of known   ¦        of          ¦  
¦       ¦   Analysis       ¦      Groupings      ¦       Groupings   ¦ Known Groupings    ¦  
¦       ¦                  ¦                     ¦                   ¦                    ¦ 
+-------+------------------+---------------------+-------------------+--------------------¦  
¦       ¦                  ¦                     ¦                   ¦                    ¦
¦Purpose¦ Derive Discrim-  ¦ Determine how well  ¦1) Classify data   ¦ 1) Classify data   ¦
¦       ¦ inant function   ¦ discriminant        ¦   using classifi- ¦    using classifi- ¦
¦       ¦ using initial    ¦ function classifies ¦   cation rule     ¦    cation rule     ¦
¦       ¦ data set : No    ¦ (biased)            ¦   derived from    ¦    deriveed from   ¦
¦       ¦ classification   ¦                     ¦   the predictive  ¦    the predictive  ¦
¦       ¦ involved         ¦                     ¦   function.       ¦    function.       ¦
¦       ¦                  ¦                     ¦2) May be part of  ¦ 2) May be part     ¦
¦       ¦                  ¦                     ¦   validation      ¦    of validation   ¦
¦       ¦                  ¦                     ¦   analysis of     ¦    analysis of     ¦
¦       ¦                  ¦                     ¦   initial predict-¦    initial         ¦
¦       ¦                  ¦                     ¦   ive function.   ¦    predictive      ¦
¦       ¦                  ¦                     ¦                   ¦    function.       ¦
¦       ¦                  ¦                     ¦                   ¦                    ¦
+-------+------------------+---------------------+-------------------+--------------------¦
¦       ¦                  ¦                     ¦                   ¦                    ¦
¦Require¦ Assumptions of   ¦  No Validation      ¦   Validation      ¦    Initial         ¦
¦ ments ¦ linear discrimin-¦  Required           ¦   Required        ¦    Predictive      ¦
¦       ¦ ant model :      ¦                     ¦                   ¦    Function        ¦
¦       ¦ No validation    ¦                     ¦                   ¦    must have been  ¦
¦       ¦ Required         ¦                     ¦                   ¦    previously      ¦
¦       ¦                  ¦                     ¦                   ¦    validated       ¦
¦       ¦                  ¦                     ¦                   ¦                    ¦
+-----------------------------------------------------------------------------------------+

A SIMPLE EXAMPLE

A firm is interested in obtaining information about the commercial acceptability of a new industrial product. The firm is interested in determining the relative importance of four product characteristics (A,B,C,D), as they influence the potential buyer's overall evaluation of product desirability. The ratings given below represent the judgement of twelve potential buyers regarding the individual characteristic rating and a "buy" versus "not buy' response. Each respondent rates the product according to each of the four characteristics and then indicates whether they would buy or not buy the product.

	Hypothetical Rating of the Research Product
    	( 0 = very poor; 10 = excellent)
+---------------------------------------------------------------+
¦        ¦                                                      ¦
¦        ¦ Trait A       Trait B        Trait C       Trait D   ¦
¦--------+------------------------------------------------------¦
¦        ¦   9             8              7            6        ¦
¦        ¦   7             6              6            5        ¦
¦        ¦   10            7              8            2        ¦
¦  Buy   ¦   8             4              5            4        ¦
¦        ¦   9             9              3            3        ¦
¦        ¦   8             6              7            2        ¦
¦        ¦   7             5              6            2        ¦
¦--------+------------------------------------------------------¦
¦        ¦   4             4              4            6        ¦
¦  Not   ¦   3             6              6            3        ¦
¦  Buy   ¦   6             3              3            4        ¦
¦        ¦   2             4              5            2        ¦
¦        ¦   1             2              2            1        ¦
¦        ¦                                                      ¦
+---------------------------------------------------------------+

Steps involved in constructing the discriminant function.

m = 7, n = 12, p = 4

I. Means:

     1           8.28572        3.2             5.08572
     2           6.42857        3.8             2.62857
     3           6              4               2
     4           3.42857        3.2             0.22857

II. Dispersion Matrix:

                  Sij(I) + Sij(II)                 
         (Sij) =  -----------------  
                       n - 2   

       +-                                            -+
       ¦  2.22286     0.834286     0.2       0.994286 ¦
       ¦  0.834286    2.65143      0.6       0.591429 ¦
       ¦  0.2         0.6          2.6       0.1      ¦
       ¦  0.994286    0.591429     0.1       3.05143  ¦
       +-                                            -+
  III. Inverse of the Dispersion Matrix (Sij)-1

    +-                                               -+
    ¦  0.575889   0.144545    -0.004809     -0.159476 ¦
    ¦ -0.144545   0.45174     -0.091688     -0.037452 ¦
    ¦ -0.004809  -0.091688     0.405912      0.006036 ¦
    ¦ -0.159476  -0.037452     0.006036      0.386741 ¦
    +-                                               -+
        	 4       
 IV.	li    =  S Sij-1 dj
	        j=i

      Variable                Discriminant Function 
                              Coefficient (li)
      ----------------------------------------
         1	 2.50279
         2	 0.260376
         3	 0.547738
         4	-0.809025

   	      4   
	DI  = S (l1 -xi(I)) = 22.9239 
             i=1                     
             
              4      	 
	DII = S  (l1  - xi(II)) = 8.60042
             i=1 
    	       	                         
V. 	If  D*  =  (DI  + DII)/2 = 15.7621

Matrix giving the classification ability of the discriminant function

	Predicted
                        Group I     Group II   Total
          Group I         7           0          7
  Actual  Group II        0           5          5
          Total           7           5         12

              p
VI. 	D²  = S  lidi = 14.3234                	                   
             i=1             


	        12 - 4 - 1    7(5)
	F   = (------------)(-------)14.3234 = 7.31092
	            4         12(10) 

	F4,7(a=0.5)  = 4.12 and Fcomputed  > 4.12
	F4,7(a=0.01) = 7.85 and Fcomputed  < 7.85

III. Inverse of the Dispersion Matrix (Sij)-1

       0.575889   0.144545    -0.004809     -0.159476   
      -0.144545   0.45174     -0.091688     -0.037452   
      -0.004809  -0.091688     0.405912      0.006036   
      -0.159476  -0.037452     0.006036      0.386741   


              4  
  IV. li    =  Sij-1 dj
             j=i  

Variable Discriminant Function

Coefficient (li)

         1	 2.50279
         2	 0.260376
         3	 0.547738
         4	-0.809025
 
       4             
 DI  = S(l1 - xi(I)) = 22.9239   
      i=1                       


       4             
 DII = S(l1 - xi(II)) = 8.60042  
      i=1   

                                               

V. If D* = (DI + DII)/2 = 15.7621

Matrix giving the classification ability of the discriminant function

Predicted
                        Group I     Group II   Total
          Group I         7           0          7
  Actual  Group II        0           5          5
          Total           7           5         12


 


              p
VI. 	D²  = S  lidi = 14.3234                	                   
             i=1             


	         12 - 4 - 1    7(5)
	F   = (------------)(-------)14.3234 = 7.31092
	            4         12(10) 

				
	F4,7(a=0.5)  = 4.12 and Fcomputed  > 4.12
	F4,7(a=0.01) = 7.85 and Fcomputed  < 7.85


Null Hypothesis (Ho): There is no significant difference between the mean values of the two groups;
i.e. µ1I = µ1II, µ2I = µ2II, µ3I = µ3II, µ4I = µ4II.

The null hypothesis is rejected at the 5% level of signficance, but cannot be rejected at the 1% level of significance.

VII. If the boundary score is D*, then the probability of misclassification is the probability that a standard normal variate takes a value less than ((D* -22.9239)/(14.3234) + the probability that it takes a value greater than (D* -8.60042)/(14.3234).

	                                  
 	If  D*  =  (DI + DII)/2 , then the probability of misclassification is:

                  | 15.7621-22.9239 |
          P(Z > = --------------------)
                     Ö(14.3234)

	= 2 P(Z> = 1.89232) = 0.06 (which is quite small) 


THE DISCRIM PROGRAM

The PC-MDS Command File

This is a file that defines the various variables, their format and locations, defines the value of missing values for different variables and recodes the values of the variables desired. The command files are designated by an "SPS" extension (i.e., *.SPS). As an example command file, we shall refer to the refer to the file DISCRIM.SPS. The file is designated DISCRIM.SPS and is given below. The name of the PC-MDS command file file is specified interactively by the user.

 
	TITLE BMD07M TEST DATA 
	FILE NAME     'DISCRIM.DAT' 
	DATA LIST     V1 TO V5 
	5    (5F3.0) 
	VARIABLE LABELS    V1 'VARIABLE 1' 
	              V2 'VARIABLE 2' 
	              V3 'VARIABLE 3' 
	              V4 'VARIABLE 4' 
	              V5 'VARIABLE 5' 

The Data File

The data file contains the data in the format described in the Command File. The data files are usually named with a "DAT" extension (i.e., *.DAT). The example data file for the discrimination analysis is called DISCRIM.DAT and is given the below. The data file is specified in line 2 of the command file (the FILENAME command).

                            +---------------+
                            ¦                      
      57 28 41 13  2        ¦         60 27 51 16  2  
      63 25 49 15  2        ¦         69 31 49 15  2
      60 30 48 18  3        ¦         50 33 14  2  1
      51 38 19  4  1        ¦         56 29 36 13  2
      67 31 44 14  2        ¦         65 30 58 22  3
      44 30 13  2  1        ¦         47 32 16  2  1
      72 30 58 16  3        ¦         68 28 48 14  2   
               +------------+

The Output File

An output file must be interactively specified by the user while running each of the PC-MDS programs. The output file is the file to which the results of the discriminant analysis is printed. A common convention is to name the file with a "PRN" extension to signify a print file (i.e., *.PRN).

RUNNING THE DISCRIM PROGRAM

STEP 1: Enter EDITOR (a word processor or program editor that produces ASCII files will suffice) and prepare the command file and the data file.

STEP 2: Load the DICRIM program. The program can be loaded by simply typing DISCRIM and then pressing the [ENTER] key.

A> DISCRIM [ENTER]

STEP 3: After the initial logo identifying the program, a message will appear on the screen requesting the location of theand name of the command file.

 
     +-----------------------------------------------------+
     ¦  ENTER THE NAME OF THE PC-MDS COMMAND FILE          ¦  
     ¦                                                     ¦  
     ¦  USE THE FORM:  DRV:FILENAME.EXT  (e.g. B:STAT.SPS) ¦  
     ¦                                                     ¦  
     ¦                                                     ¦  
     ¦  A: DISCRIM.SPS                                     ¦ 
     +-----------------------------------------------------+             

RESPOND with the location and name of the command file.

A:DISCRIM.SPS [ENTER]

(Assumes the DISCRIM.SPS file is in the A: drive). If the specifications of the command file name was not acceptable, then a message will ask you to re-enter the command file name.

STEP 4: If the command file was specified correctly, the next menu item will popup asking you to specify the location and name of the output file.

+----------------------------------------------------+
¦  ENTER THE NAME OF THE PC-MDS COMMAND FILE         ¦
¦                                                    ¦
¦  USE THE FORM:  DRV:FILENAME.EXT  (e.g. B:STAT.SPS)¦
¦                                                    ¦
¦  A:DISCRIM.SPS                                     ¦
¦  +--------------------------------------------------+
+--¦  ENTER THE NAME OF THE FILE TO SAVE OUTPUT       ¦
   ¦                                                  ¦
   ¦  USE THE FORM:  DRV:FILENAME.EXT(e.g. B:STAT.PRN)¦
   ¦                                                  ¦
   ¦                                                  ¦
   ¦  A:DISCRIM.PRN                                   ¦
   +--------------------------------------------------+

Enter the name of the output file.

A:DISCRIM.PRN [ENTER]

(Assumes you want the output file DISCRIM.PRN in the A: drive).

If a file already existed in the same name, then the message will appear on screen:

+-----------------------------------------+
¦  THIS OUTPUT FILE NAME ALREADY EXISTS!  ¦
¦  DO YOU WANT TO OVERWRITE IT ? (Y/N) Y  ¦
+-----------------------------------------+

STEP 5: Once the output file name is correctly entered, error messages, if any will be displayed on screen as following:

ERROR MESSAGES
ERROR: LINE # : MESSAGE

If errors are found, the program aborts. It is recommended that the user makes a note of the errors. The user must edit the Command file to correct errors.

If there were no errors then the following message will appear onscreen for verifying the data set chosen for the analysis:

+---------------------------------------------------------------------------+
¦ STMT#  #VARIABLES    FORMAT STATEMENT AND DATA                            ¦
¦---------------------------------------------------------------------------¦
¦   1        5                                                              ¦
¦   (5F3.0)                                                                 ¦
+---------------------------------------------------------------------------+ 
+---------------------------------------------------------------------------+ 
¦ 5.700000e+001  2.800000e+001  4.100000e+001  1.300000e+001  2.000000e+000 ¦ 
¦                                                                           ¦ 
+---------------------------------------------------------------------------+ 
                 +-----------------------------------+
                 ¦   WAS THE DATA READ CORRECTLY? Y  ¦
                 +-----------------------------------+                      

The program next reads the first line of data, displays the input format for reading the data, and lists the values for the first data case. If the data is read incorrectly, you may re-specify the format statement. After you indicate that the data was read correctly, the program proceeds with the discriminant analysis of the data.

STEP 6: After the data is entered correctly the following message will appear onscreen:

+-----------------------------------------------------+
¦   DISCRIMINANT ANALYSIS OPTIONS:                    ¦
¦                                                     ¦
¦   5  VARIABLES HAVE BEEN DECLARED.                  ¦
¦   SELECT THE APPROPRIATE OPTION:                    ¦
¦                                                     ¦
¦   (1) SPECIFY THE VARIABLES FOR ANALYSIS            ¦
¦       (VARIABLES ARE SPECIFIED BY SEQUENCE NUMBER)  ¦
¦   (2) VIEW A LIST OF VARIABLE NUMBERS               ¦
¦   (3) QUIT PROGRAM                                  ¦
¦                                                     ¦
¦   YOUR CHOICE : 1                                   ¦
+-----------------------------------------------------+

Once, the option of specifying the variables is chosen, the dependent variable is specified.

+-------------------------------------------------+
¦  THE DEPENDENT VARIABLE IS THE Z VARIABLE IN    ¦
¦  THE FORMULA: Z = d0 + d1X1 + ... + dkXk.       ¦
¦  YOU ARE DISCRIMINATING BETWEEN GROUPS IN Z     ¦
¦                                                 ¦
¦  +--------------------------------------------+ ¦
¦  ¦ ENTER THE DEPENDENT (OR GROUPING) VARIABLE ¦ ¦
¦  +--------------------------------------------+ ¦
¦  5                                              ¦
+-------------------------------------------------+

In the current data set, variable 5 is the dependent variable.

STEP 7: The message for specifying the dependent variable groups will appear:

+-----------------------------------------------------+  
¦ SPECIFICATION OF DEPENDENT VARIABLE GROUPS:         ¦  
¦                                                     ¦  
¦ ENTER GROUP NUMBERS ONE AT A TIME.                  ¦  
¦ A blank space must follow each group number.        ¦  
¦ The dash (-) may be used to simplify statements.    ¦  
¦ PRESS ENTER to quit this menu                       ¦  
¦ For example,                                        ¦  
¦ 1 2 3 4 5 and 1 - 5 are equivalent statements.      ¦  
¦ 1 - 3                                               ¦ 
+-----------------------------------------------------+        

In this example, we enter groups 1 -3 of the dependent variables for the discriminant analysis.

STEP 8: Specifying the independent variables:

+-----------------------------------------------------+  
¦ INDEPENDENT VARIABLES SPECIFICATION:                ¦  
¦                                                     ¦  
¦   ENTER INDEPENDENT VARIABLES ONE AT A TIME.        ¦  
¦   A blank space must follow each variable number.   ¦  
¦   The dash (-) may be used to simplify statements.  ¦  
¦   PRESS ENTER to quit this menu                     ¦  
¦   For example,                                      ¦  
¦   1 2 3 4 5 and 1 - 5 are equivalent statements.    ¦  
¦                                                     ¦  
¦ 1 - 4                                               ¦ 
+-----------------------------------------------------+         
Enter 1 - 4.

STEP 9: Once the variables are specified, the discriminant analysis menu appears:

+----------------------------------------------------------------------+
¦              STEPWISE DISCRIMINANT ANALYSIS MENU                     ¦
¦----------------------------------------------------------------------¦
¦RETURN  IF MODIFICATIONS ARE COMPLETE...RUN DISCRIMINANT ANALYSIS NOW ¦
¦                                                                      ¦
¦   1    NUMBER OF DEPENDENT VARIABLE GROUPS:                       3  ¦
¦                                                                      ¦
¦   2    NUMBER AND IDENTIFICATION OF INDEPENDENT VARIABLES         4  ¦
¦                                                                      ¦
¦   3    NUMBER OF GROUPS TO PLOT:                                  3  ¦
¦                                                                      ¦
¦   4    WEIGHTS FOR THE COVARIANCE MATRIX?                        NO  ¦
¦                                                                      ¦
¦   5    CREATE OUTPUT FILE FOR CANONICAL VARIABLES?               NO  ¦
¦                                                                      ¦
¦   6    CREATE OUTPUT FILE FOR CANONICAL COEFFICIENTS?            NO  ¦
¦                                                                      ¦
¦   7    STRATIFY FIRST TWO CANONICAL VARIABLES ON THE             NO  ¦
¦           BASIS OF THE THIRD CANONICAL VARIABLE                      ¦
¦   8    TYPE OF PRIOR PROBABILITIES OF GROUP MEMBERSHIP:           0  ¦
¦           0=EQUAL        1=ni/n       2=READ IN                      ¦
+----------------------------------------------------------------------+  

STEP 10: During the computations the options of the Discriminant Analysis appears. This menu controls the parameters for entering variables into the discriminant function.

  +----------------------------------------------------+
  ¦       DISCRIMINANT ANALYSIS SUBPROBLEM MENU        ¦
  ¦          SELECT ITEM TO ENTER A NEW VALUE          ¦
  ¦                                                    ¦
  ¦       ITEM DESCRIPTION             CURRENT VALUE   ¦
  ¦       --------------------------- ---------------  ¦
  ¦    1  No changes,  Start Analysis                  ¦
  ¦    2  Max Number of steps               8          ¦
  ¦    3  F Value for inclusion         .010000        ¦
  ¦    4  F Value for deletion          .005000        ¦
  ¦    5  Tolerance Level               .000100        ¦
  ¦    6  Control Delete Option:          NO           ¦
  ¦    7  Print Posterior Prob.           NO           ¦
  ¦    8  Print Sub-Optimum Functions:                 ¦
  ¦        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0       ¦
  ¦                                                    ¦
  ¦       ENTER  CHOICE:                               ¦
  +----------------------------------------------------+

 

Press 1 to continue with the existing set-up of the program.

STEP 11: Once the computations are completed, the options for terminating the program will appear onscreen. The output will be in the DISCRIM.PRN file. Run the EDITOR or a word processing program to read the output file. The output file may be printed if desired. A printed copy of the output file produced by the above commands follows.


Sample Output
	PC-MDS
	STEPWISE DISCRIMINANT ANALYSIS
 
 ANALYSIS TITLE      BMD07M TEST DATA                                        
 INPUT DATA FILE     DISCRIM.DAT                                        
 OUTPUT PRINT FILE   DISCRIM.OUT                                        
 NO. OF VARIABLES       5 
 
   PARAMETERS SPECIFIED (8 VALUES): 
      NUMBER OF VARIABLES                4 
      NUMBER OF GROUPS                   3 
      NO. GRPS TO PLOT ON EACH PAGE   
        IF CANONICAL ANALYSIS DONE       3 
      WEIGHTING OF COVARIANCE MATRIX    NO  
      CANONICAL VARIABLE OUTPUT         NO  
      CANONICAL COEFFICIENT OUTPUT      NO  
      STRATIFICATION OF CANON. VARS.    NO  
      PRIOR PROBABILITIES USED          NO  
 
 DATA TREATED AS HAVING NO MISSING VALUES 
 
 DATA FOR RECORD:     1 
  .57E+02 .28E+02 .41E+02 .13E+02 .20E+01 
 DATA FOR RECORD:   150 
  .79E+02 .38E+02 .64E+02 .20E+02 .30E+01 
 
 GROUP SAMPLE SIZES COMPUTED:  
 NUMBER OF CASES IN EACH GROUP    50    50    50 
 PRIOR PROBABILITIES 
          .3333     .3333     .3333 
 
 MEANS (THE LAST COLUMN CONTAINS THE GRAND MEAN OF THE GROUPS IN THE ANALYSIS) 
 	GROUP
	AGROUP      BGROUP      CGROUP             
  VARIABLE 
   1    	50.0600     59.3600     65.8800     58.4333 
   2  	34.2800     27.7000     29.7400     30.5733 
   3  	14.6200     42.6000     55.5200     37.5800 
   4   	2.4600      13.2600     20.2600     11.9933 


 STANDARD DEVIATIONS 
        	AGROUP      BGROUP      CGROUP 
 VARIABLE 
   1      3.5249      5.1617      6.3588 
   2      3.7906      3.1380      3.2250 
   3      1.7366      4.6991      5.5189 
   4      1.0539      1.9775      2.7465 
 WITHIN GROUPS COVARIANCE MATRIX 

            VARIABLES 
            1           2           3           4 
 VARIABLE 
   1     26.5008 
   2      9.2721     11.5388 
   3     16.7514      5.5244     18.5188 
   4      3.8401      3.2710      4.2665      4.1882 

 WITHIN GROUPS CORRELATION MATRIX 

            VARIABLES 
            1           2           3           4 
 VARIABLE 
   1      1.0000 
   2       .5302      1.0000 
   3       .7562       .3779      1.0000 
   4       .3645       .4705       .4845      1.0000 

 SUBPROBLEM                  0 
 F-LEVEL FOR INCLUSION   .0100 
 F-LEVEL FOR DELETION    .0050 
 TOLERANCE LEVEL         .0001 
 CONTROL VALUES           1111 
 *************************************************************************** 
 STEP NUMBER         0 
 VARIABLE ENTERED  

 VARIABLES NOT INCLUDED AND F TO ENTER - DEGREES OF FREEDOM      2    147 
  1:   119.264  2:    49.160  3:  1180.161  4:   960.007 

 ***************************************************************************
 STEP NUMBER         1 
 VARIABLE ENTERED    3 

 VARIABLES INCLUDED AND F TO REMOVE - DEGREES OF FREEDOM       2    147 
  3:  1180.161 
 VARIABLES NOT INCLUDED AND F TO ENTER - DEGREES OF FREEDOM      2    146 
  1:    34.323  2:    43.035  4:    24.766 

 U-STATISTIC            .05863      DEGREES OF FREEDOM     1    2  147 
 APPROXIMATE F      1180.16100      DEGREES OF FREEDOM     2  147.00 

 F MATRIX - DEGREES OF FREEDOM     1  147 

            GROUP  
            AGROUP      BGROUP 
 GROUP 
 BGROUP    1056.874 
 CGROUP    2258.262     225.348 
 ******************************************************************************* 

 STEP NUMBER         2 
 VARIABLE ENTERED    2 

 VARIABLES INCLUDED AND F TO REMOVE - DEGREES OF FREEDOM       2    146 
  2:    43.035  3:  1112.954 
 VARIABLES NOT INCLUDED AND F TO ENTER - DEGREES OF FREEDOM      2    145 
  1:    12.269  4:    34.569 

 U-STATISTIC            .03688      DEGREES OF FREEDOM     2    2  147 
 APPROXIMATE F       307.10460      DEGREES OF FREEDOM     4  292.00 

 F MATRIX - DEGREES OF FREEDOM     2  146 


            GROUP  
            AGROUP      BGROUP 
 GROUP 
 BGROUP     804.511 
 CGROUP    1473.231     116.038 

  ******************************************************************************* 
 STEP NUMBER         3 
 VARIABLE ENTERED    4 

 VARIABLES INCLUDED AND F TO REMOVE - DEGREES OF FREEDOM       2    145 
  2:    54.577  3:    38.724  4:    34.569 
 VARIABLES NOT INCLUDED AND F TO ENTER - DEGREES OF FREEDOM      2    144 
  1:     4.721 

 U-STATISTIC            .02498      DEGREES OF FREEDOM     3    2  147 
 APPROXIMATE F       257.50320      DEGREES OF FREEDOM     6  290.00 

 F MATRIX - DEGREES OF FREEDOM     3  145 

            GROUP  
            AGROUP      BGROUP 
 GROUP 
 BGROUP     692.014 
 CGROUP    1381.162     133.373 
 ******************************************************************************* 

 STEP NUMBER         4 
 VARIABLE ENTERED    1 

 VARIABLES INCLUDED AND F TO REMOVE - DEGREES OF FREEDOM       2    144 
  1:     4.721  2:    21.936  3:    35.590  4:    24.904 
 
 U-STATISTIC            .02344      DEGREES OF FREEDOM     4    2  147 
 APPROXIMATE F       199.14540      DEGREES OF FREEDOM     8  288.00 

 F MATRIX - DEGREES OF FREEDOM     4  144 
 
            GROUP  
            AGROUP      BGROUP 
 GROUP 
 BGROUP     550.189 
 CGROUP    1098.274     105.313 
 
 F LEVEL INSUFFICIENT FOR FURTHER COMPUTATION 

          FUNCTION 
          AGROUP      BGROUP      CGROUP 
 VARIABLE 
   1        2.35442        1.56982        1.24458 
   2        2.35879         .70725         .36853 
   3       -1.64306         .52114        1.27665 
   4       -1.73984         .64342        2.10791 
 CONSTANT -86.30845     -72.85261    -104.36830 

          GROUP WITH          SQUARE OF DISTANCE FROM AND POSTEROR 
          LARGEST PROB.                PROBABILITY FOR GROUP - 
 AGROUP                   AGROUP               BGROUP               CGROUP 
  CASE 
    1      AGROUP    2.4034    1.0000,   74.2557     .0000,  160.0998     .0000, 
    2      AGROUP     .2419    1.0000,   90.6602     .0000,  181.5587     .0000, 
    3      AGROUP    4.4186    1.0000,  115.7274     .0000,  208.9456     .0000, 
    .        .          .         .         .          .        .         .
      CONTINUED FOR CASE 4 TO CASE 47
    .        .          .         .         .          .        .         .
   48      AGROUP   10.6473    1.0000,  145.3974     .0000,  249.2639     .0000, 
   49      AGROUP    1.0069    1.0000,   92.3452     .0000,  179.2350     .0000, 
   50      AGROUP     .5533    1.0000,   87.2894     .0000,  177.0700     .0000, 

  BGROUP                   AGROUP               BGROUP               CGROUP 
  CASE 
    1      BGROUP   84.7852     .0000,     .3734     .9999,   19.4218     .0001, 
    2      CGROUP  149.0303     .0000,    8.4393     .1434,    4.8645     .8566, 
    3      BGROUP  131.6616     .0000,    8.4307     .9596,   14.7647     .0404, 
    .        .          .         .         .          .        .         .
      CONTINUED FOR CASE 4 TO CASE 47
    .        .          .         .         .          .        .         .
   48      BGROUP   94.1567     .0000,    1.0258     .9998,   18.2519     .0002, 
   49      BGROUP   90.7313     .0000,    2.1033     .9999,   20.1282     .0001, 
   50      BGROUP   72.2566     .0000,    3.1074    1.0000,   32.9760     .0000, 

 CGROUP                   AGROUP               BGROUP               CGROUP 
  CASE 
    1      CGROUP  175.5659     .0000,   21.1320     .0660,   15.8331     .9340, 
    2      CGROUP  195.4075     .0000,   26.1183     .0000,    2.2160    1.0000, 
    3      CGROUP  194.3124     .0000,   22.4317     .0000,    1.3114    1.0000, 
    .        .          .         .         .          .        .         .
      CONTINUED FOR CASE 4 TO CASE 47
    .        .          .         .         .          .        .         .
   48      CGROUP  161.2736     .0000,   13.3249     .0062,    3.1597     .9938, 
   49      CGROUP  184.8178     .0000,   24.9964     .0000,    3.8859    1.0000, 
   50      CGROUP  176.5572     .0000,   26.7791     .0005,   11.6560     .9995, 

           NUMBER OF CASES CLASSIFIED INTO GROUP - 
 ACTUAL  AGROUP BGROUP CGROUP 
        --------------------- 
 AGROUP|    50|     0|     0| 
        --------------------- 
 BGROUP|     0|    48|     2| 
        --------------------- 
 CGROUP|     0|     1|    49| 
        --------------------- 

 SUMMARY TABLE 
  STEP        VARIABLE         F VALUE TO          NUMBER OF       U-STATISTIC 
 NUMBER   ENTERED  REMOVED   ENTER OR REMOVE   VARIABLES INCLUDED 
    1        3                1180.1610                 1              .0586 
    2        2                  43.0354                 2              .0369 
    3        4                  34.5687                 3              .0250 
    4        1                   4.7212                 4              .0234 

      EIGENVALUES 
          32.19192        .28539        .00000       -.00000 

      CUMULATIVE PROPORTION OF TOTAL DISPERSION 
            .99121       1.00000       1.00000       1.00000 

     CANONICAL CORRELATIONS 
            .98482        .47120        .00049        .00123 

 COEFFICIENTS FOR CANONICAL VARIABLE - 
 ORIGINAL     1             2             3             4 
 VARIABLE 
   1       -.08294       -.00241        .22927        .22133 
   2       -.15345       -.21645       -.03570       -.26583 
   3        .22012        .09319       -.02152       -.30048 
   4        .28105       -.28392       -.16343        .42593 
 CONSTANT -2.10511       6.66147      -9.53701       1.37844 

 GROUP          CANONICAL VARIABLES EVALUATED AT GROUP MEANS 
   1      -7.60760       -.21513        .00000        .00000 
   2       1.82505        .72790        .00000        .00000 
   3       5.78255       -.51277        .00000        .00000 
  THE FIRST  2 OF THE CANONICAL VARIABLES REFER TO DIFFERENCES BETWEEN GROUPS. 
 THE REMAINING REFER TO VARIATION WITHIN GROUPS WHICH IS INDEPENDENT OF 
 GROUP DIFFERENCES. 
 CHECK ON FINAL U-STATISTIC         .02344 

  POINTS PLOTTED ON THE FOLLOWING GRAPH 
     X = FIRST CANONICAL VARIABLE 
     Y = SECOND CANONICAL VARIABLE 
     CASE  NUMBER FOLLOWED BY * INDICATES THE POINT IS OFF THE GRAPH 

 GROUP AGROUP  MEAN COORDINATES   -7.608   -.215 

  CASE      X        Y      CASE      X        Y      CASE      X        Y     
    1     -6.771     .971     2     -7.672     .135     3     -8.582   -1.834 
    .        .          .         .         .          .        .         .
      CONTINUED FOR CASE 4 TO CASE 45
    .        .          .         .         .          .        .         .
   46     -9.468   -1.825    47     -8.078    -.969    48     -9.850   -1.586 
   49     -7.586   -1.208    50     -7.490     .265 
 
 
 GROUP BGROUP  MEAN COORDINATES    1.825    .728 
 
  CASE      X        Y      CASE      X        Y      CASE      X        Y     
    1      1.549     .593     2      4.498     .883     3      3.498    1.685 
    .        .          .         .         .          .        .         .
      CONTINUED FOR CASE 4 TO CASE 45
    .        .          .         .         .          .        .         .
   46      2.257    1.427    47      2.479    1.941    48      1.956    1.154 
   49      1.750     .821    50       .606    1.943 


 GROUP CGROUP  MEAN COORDINATES    5.783   -.513 

  CASE      X        Y      CASE      X        Y      CASE      X        Y     
    1      5.107    2.131     2      6.273   -1.649     3      6.292    -.467 
    .        .          .         .         .          .        .         .
      CONTINUED FOR CASE 4 TO CASE 45
    .        .          .         .         .          .        .         .
   46      5.958     .094    47      5.361    -.646    48      4.996    -.188 
   49      5.807   -2.010    50      5.220   -1.468 

 
 PLOT OF FIRST VS SECOND CANONICAL VARIABLES
 OVERLAP IS INDICATED BY $, GROUP MEANS BY *.
 
 
         -10.608              -6.299              -1.990               2.319               6.628          
                    -8.453              -4.144                .165               4.474               8.783
            +....+....+....+....+....+....+....+....+....+....+....+....+....+....+....+....-+....+....+
     6.269 .                                                                                           .    6.269
     5.910 .                                                                                           .    5.910
     5.551 .                                                                                           .    5.551
     5.192 .                                                                                           .    5.192
     4.833 .                                                                                           .    4.833
     4.474 .                                                                                           .    4.474
     4.115 .                                                                                           .    4.115
     3.755 .                                                                                           .    3.755
     3.396 .                                                                                           .    3.396
     3.037 .                                                                                           .    3.037
     2.678 .                                                       E                                   .    2.678
     2.319 .                                                                       I                   .    2.319
     1.960 .                       S                           EE        E           I                 .    1.960
     1.601 .                                                  E  EE     E    E                         .    1.601
     1.242 .               S                                E      E  E        E           I           .    1.242
      .883 .              SSSSSS                           E     E   E*E E     I  EI      II    I      .     .883
      .524 .                S SS                                  EEEEE  EE        I I                 .     .524
      .165 .            S S S                                      EE   EE        I  III I      I      .     .165
     -.194 .           SS * S                                  E   EE EE  EE  E I  II      I           .    -.194
     -.553 .         SSSSS S  S S                                     E         I     I * I  I         .    -.553
     -.913 .         S  S    S                                           E    E     I   I    I         .    -.913
    -1.272 .       S    S S    S                                       E          I  I  I              .   -1.272
    -1.631 .    S        S                                                           I I  I II         .   -1.631
    -1.990 .     S   S                                                               I  I         I    .   -1.990
    -2.349 .                                                                             I  II         .   -2.349
    -2.708 .       S                                                                         I         .   -2.708
    -3.067 .                                                                                           .   -3.067
    -3.426 .                                                                                           .   -3.426
    -3.785 .                                                                                           .   -3.785
    -4.144 .                                                                                           .   -4.144
    -4.503 .                                                                                           .   -4.503
    -4.862 .                                                                                           .   -4.862
    -5.221 .                                                                                           .   -5.221
    -5.581 .                                                                                           .   -5.581
    -5.940 .                                                                                           .   -5.940
    -6.299 .                                                                                           .   -6.299
            +....+....+....+....+....+....+....+....+....+....+....+....+....+....+....+....+....+....+
                    -8.453              -4.144                .165               4.474               8.783
         -10.608              -6.299              -1.990               2.319               6.628          
  COMPLETION OF STEPWISE DISCRIMINANT ANALYSIS 


Discriminant Analysis Technical Appendix