TWO-GROUP DISCRIMINANT ANALYSIS:
In discriminant analysis we predict classification into a given category as a function of the predictor variables. Here we assume a linear function. In case of Two-group Discriminant Analysis, only two categories are involved.
The Objectives of a Two-Group Analysis:
1) Finding Linear combinations of the predictor variables that enable the analyst to separate the groups by maximizing between-variable variation relative to within-groups variations. identity.
2) Establishing procedures for assigning new individuals into the groups. For these individuals, a profile on the predictor variables are known, but not for group identity.
3) Testing whether significant differences exist between the means of the two groups on the predictor variable profiles.
4) Determining which variables account for most of the differences in mean profiles of the groups.
The Two-Group Discriminant Analysis:
Group X1 Group X2
Category A
Category B
1) The variable is classified in two categories. The various sum of squares and cross products, the means of each group and the total sample mean are computed. The key problem of the two-group discriminant analysis is to find a new axis so that projections of the points onto that axis maximizes the difference between the group means relative to their variability on the composite. The discriminant axes is determined as a set of weights, one for each predictor-variable axis-so that we have a linear function, such as:
Y = C1X1 + C2X2, where C1,C2 are the weights.
2) Computing the Discriminant Weights. In this case we find out the sum of squares and the cross products that relate to variation within the groups.
---------------------------------------------------
¦ Group X1 ¦ Group X2 ¦
Category A ¦ ¦ ¦
¦ ¦ ¦
Category B ¦ ¦ ¦
---------------------------------------------------
To find C1 and C2, we solve the following simultaneous equations:
-----------------------------------------------------------------
| Group X1 | Group X2 |
------------------------------------------------------------------
Sx1² = SX1²-nX1² | | |
Sx2² = SX2²-nX2²_ | | |
Sx1x2 = SX1X2-nX1X2 | | |
-----------------------------------------------------------------
3) Plotting the Discriminant Function. Along with the original plot, the discriminant axis is also shown by passing a straight line from the origin through (C1,C2). The original points can then be projected on the discriminant axis and the discriminant scores for each variable is obtained.
_ _
Sx1²C1 + Sx1x2C2 = X1A - X1B
_ _
Sx2²C2 + Sx1x2C1 = X2A - X2B
4) The Discriminant Criterion. The two group means and the grand mean of the discriminant scores are calculated. Then we compute a measure of between groups variability by finding the deviation of each of the two group means from the grand mean on the discriminant function. These deviations are squared, multiplied by the number of persons in each groups and summed. Next the within groups variability is found by taking the squared deviations about each group mean. The sum of the two separate measures gives a pooled within groups sum of squares. The discriminant Criterion is the ratio of the between groups sum of squares to the within-groups sum of squares. This is the criterion we are trying to maximize in the analysis.
5) Classification. The classification problem in turn involves two additional questions.
1) How well does the function assign the known cases in the sample, and
2) How well does it assign new cases not
used in computing the discriminant analysis parameters. These questions are direct parallels of R-Square, the strength
of the relationship and Cross-Validation in Regression Analysis.
Assignment Rule: Assign all cases with discriminant scores to the left of the mid-point to the Category A and the ones to the right of the mid-point to Category B. This assignment rule makes the following two specific assumptions: 1) The prior probability of a new case falling into each of the groups is equal across the groups, and 2) The cost of misclassification is equal across the groups.
6) Testing Statistical Significance: We will test whether the group centroids are different. Tests of the equality of the group centroids are based on the F-ratio. This statistical test is calculated from a variability measure called the Mahalanobis squared distance.
The assumptions and objectives of the Two-Group analysis hold for the Multiple Discriminant Analysis. The primary distinguishing feature between the two is that in a multiple discriminant analysis, more than one discriminant function may be computed. In general with K groups and L predictors, we can find up to the lesser of K-1 or L discriminant functions.
Multiple Discriminant Analysis because of its complexities is usually carried out by means of computer programs. The computations in the stepwise discriminant program follows the following steps:
I. Separate F ratios are computed for each of the predictors.
II. The predictor variable with the largest F ratio is entered into the equation. The predictor is entered based on the minimum significance and tolerance levels preset for the analysis.
III. The partial F ratios of all variables not entered in the equation are computed. The F ratios are computed without the predictor variable already entered. The next predictor is then added as per its adjusted F ratio.
IV. Each variable is tested for retention in the discriminant function based on association with the other predictor variables.
V. The stepwise process is repeated until all variables meeting the pre-set levels of significance, tolerance and retention are entered into the equation.
VI. At each stage, stepwise procedure tests are made of inter-group separations and pairwise group separation between all distinct groups. At some steps, the discriminant functions are computed for the variables included as predictors at that point.
VII. A summary of the stepwise procedure is computed. Output includes the variables entered or removed, their associated F ratios, inter-group significance and posterior probability of each case arising from each group is presented.
There are differences in the applications and requirements of discriminant analysis.These are often misinterpreted, especially with respect to the validation of the predictive discriminant analysis. These differences are summarized below.
------------------------------------------------------------------------------------------+ ¦ ¦ ¦ Classification ¦ Classification ¦ Classification ¦ ¦ ¦ ¦ Analysis of ¦ Analysis of ¦ Analysis of ¦ ¦ ¦ Predictive ¦ Initial Data ¦ New Data Set ¦ New Data Set ¦ ¦ ¦ Discriminant ¦ Set of Known ¦ of known ¦ of ¦ ¦ ¦ Analysis ¦ Groupings ¦ Groupings ¦ Known Groupings ¦ ¦ ¦ ¦ ¦ ¦ ¦ +-------+------------------+---------------------+-------------------+--------------------¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦Purpose¦ Derive Discrim- ¦ Determine how well ¦1) Classify data ¦ 1) Classify data ¦ ¦ ¦ inant function ¦ discriminant ¦ using classifi- ¦ using classifi- ¦ ¦ ¦ using initial ¦ function classifies ¦ cation rule ¦ cation rule ¦ ¦ ¦ data set : No ¦ (biased) ¦ derived from ¦ deriveed from ¦ ¦ ¦ classification ¦ ¦ the predictive ¦ the predictive ¦ ¦ ¦ involved ¦ ¦ function. ¦ function. ¦ ¦ ¦ ¦ ¦2) May be part of ¦ 2) May be part ¦ ¦ ¦ ¦ ¦ validation ¦ of validation ¦ ¦ ¦ ¦ ¦ analysis of ¦ analysis of ¦ ¦ ¦ ¦ ¦ initial predict-¦ initial ¦ ¦ ¦ ¦ ¦ ive function. ¦ predictive ¦ ¦ ¦ ¦ ¦ ¦ function. ¦ ¦ ¦ ¦ ¦ ¦ ¦ +-------+------------------+---------------------+-------------------+--------------------¦ ¦ ¦ ¦ ¦ ¦ ¦ ¦Require¦ Assumptions of ¦ No Validation ¦ Validation ¦ Initial ¦ ¦ ments ¦ linear discrimin-¦ Required ¦ Required ¦ Predictive ¦ ¦ ¦ ant model : ¦ ¦ ¦ Function ¦ ¦ ¦ No validation ¦ ¦ ¦ must have been ¦ ¦ ¦ Required ¦ ¦ ¦ previously ¦ ¦ ¦ ¦ ¦ ¦ validated ¦ ¦ ¦ ¦ ¦ ¦ ¦ +-----------------------------------------------------------------------------------------+
A SIMPLE EXAMPLE
A firm is interested in obtaining information about the commercial acceptability of a new industrial product. The firm is interested in determining the relative importance of four product characteristics (A,B,C,D), as they influence the potential buyer's overall evaluation of product desirability. The ratings given below represent the judgement of twelve potential buyers regarding the individual characteristic rating and a "buy" versus "not buy' response. Each respondent rates the product according to each of the four characteristics and then indicates whether they would buy or not buy the product.
Hypothetical Rating of the Research Product
( 0 = very poor; 10 = excellent)
+---------------------------------------------------------------+
¦ ¦ ¦
¦ ¦ Trait A Trait B Trait C Trait D ¦
¦--------+------------------------------------------------------¦
¦ ¦ 9 8 7 6 ¦
¦ ¦ 7 6 6 5 ¦
¦ ¦ 10 7 8 2 ¦
¦ Buy ¦ 8 4 5 4 ¦
¦ ¦ 9 9 3 3 ¦
¦ ¦ 8 6 7 2 ¦
¦ ¦ 7 5 6 2 ¦
¦--------+------------------------------------------------------¦
¦ ¦ 4 4 4 6 ¦
¦ Not ¦ 3 6 6 3 ¦
¦ Buy ¦ 6 3 3 4 ¦
¦ ¦ 2 4 5 2 ¦
¦ ¦ 1 2 2 1 ¦
¦ ¦ ¦
+---------------------------------------------------------------+
Steps involved in constructing the discriminant function.
m = 7, n = 12, p = 4
I. Means:
1 8.28572 3.2 5.08572
2 6.42857 3.8 2.62857
3 6 4 2
4 3.42857 3.2 0.22857
II. Dispersion Matrix:
Sij(I) + Sij(II)
(Sij) = -----------------
n - 2
+- -+
¦ 2.22286 0.834286 0.2 0.994286 ¦
¦ 0.834286 2.65143 0.6 0.591429 ¦
¦ 0.2 0.6 2.6 0.1 ¦
¦ 0.994286 0.591429 0.1 3.05143 ¦
+- -+
III. Inverse of the Dispersion Matrix (Sij)-1
+- -+
¦ 0.575889 0.144545 -0.004809 -0.159476 ¦
¦ -0.144545 0.45174 -0.091688 -0.037452 ¦
¦ -0.004809 -0.091688 0.405912 0.006036 ¦
¦ -0.159476 -0.037452 0.006036 0.386741 ¦
+- -+
4
IV. li = S Sij-1 dj
j=i
Variable Discriminant Function
Coefficient (li)
----------------------------------------
1 2.50279
2 0.260376
3 0.547738
4 -0.809025
4
DI = S (l1 -xi(I)) = 22.9239
i=1
4
DII = S (l1 - xi(II)) = 8.60042
i=1
V. If D* = (DI + DII)/2 = 15.7621
Matrix giving the classification ability of the discriminant function
Predicted
Group I Group II Total
Group I 7 0 7
Actual Group II 0 5 5
Total 7 5 12
p
VI. D² = S lidi = 14.3234
i=1
12 - 4 - 1 7(5)
F = (------------)(-------)14.3234 = 7.31092
4 12(10)
F4,7(a=0.5) = 4.12 and Fcomputed > 4.12
F4,7(a=0.01) = 7.85 and Fcomputed < 7.85
III. Inverse of the Dispersion Matrix (Sij)-1
0.575889 0.144545 -0.004809 -0.159476
-0.144545 0.45174 -0.091688 -0.037452
-0.004809 -0.091688 0.405912 0.006036
-0.159476 -0.037452 0.006036 0.386741
4
IV. li = Sij-1 dj
j=i
Variable Discriminant Function
Coefficient (li)
1 2.50279
2 0.260376
3 0.547738
4 -0.809025
4
DI = S(l1 - xi(I)) = 22.9239
i=1
4
DII = S(l1 - xi(II)) = 8.60042
i=1
V. If D* = (DI + DII)/2 = 15.7621
Matrix giving the classification ability of the discriminant function
Predicted
Group I Group II Total
Group I 7 0 7
Actual Group II 0 5 5
Total 7 5 12
p
VI. D² = S lidi = 14.3234
i=1
12 - 4 - 1 7(5)
F = (------------)(-------)14.3234 = 7.31092
4 12(10)
F4,7(a=0.5) = 4.12 and Fcomputed > 4.12
F4,7(a=0.01) = 7.85 and Fcomputed < 7.85
Null Hypothesis (Ho): There is no significant difference between the mean values of the two groups;
i.e. µ1I = µ1II, µ2I =
µ2II, µ3I = µ3II, µ4I = µ4II.
The null hypothesis is rejected at the 5% level of signficance, but cannot be rejected at the 1% level of significance.
VII. If the boundary score is D*, then the probability of misclassification is the probability that a standard normal variate takes a value less than ((D* -22.9239)/(14.3234) + the probability that it takes a value greater than (D* -8.60042)/(14.3234).
If D* = (DI + DII)/2 , then the probability of misclassification is:
| 15.7621-22.9239 |
P(Z > = --------------------)
Ö(14.3234)
= 2 P(Z> = 1.89232) = 0.06 (which is quite small)
The PC-MDS Command File
This is a file that defines the various variables, their format and locations, defines the value of missing values for different variables and recodes the values of the variables desired. The command files are designated by an "SPS" extension (i.e., *.SPS). As an example command file, we shall refer to the refer to the file DISCRIM.SPS. The file is designated DISCRIM.SPS and is given below. The name of the PC-MDS command file file is specified interactively by the user.
TITLE BMD07M TEST DATA FILE NAME 'DISCRIM.DAT' DATA LIST V1 TO V5 5 (5F3.0) VARIABLE LABELS V1 'VARIABLE 1' V2 'VARIABLE 2' V3 'VARIABLE 3' V4 'VARIABLE 4' V5 'VARIABLE 5'
The Data File
The data file contains the data in the format described in the Command File. The data files are usually named with a "DAT" extension (i.e., *.DAT). The example data file for the discrimination analysis is called DISCRIM.DAT and is given the below. The data file is specified in line 2 of the command file (the FILENAME command).
+---------------+
¦
57 28 41 13 2 ¦ 60 27 51 16 2
63 25 49 15 2 ¦ 69 31 49 15 2
60 30 48 18 3 ¦ 50 33 14 2 1
51 38 19 4 1 ¦ 56 29 36 13 2
67 31 44 14 2 ¦ 65 30 58 22 3
44 30 13 2 1 ¦ 47 32 16 2 1
72 30 58 16 3 ¦ 68 28 48 14 2
+------------+
The Output File
An output file must be interactively specified by the user while running each of the PC-MDS programs. The output file is the file to which the results of the discriminant analysis is printed. A common convention is to name the file with a "PRN" extension to signify a print file (i.e., *.PRN).
RUNNING THE DISCRIM PROGRAM
STEP 1: Enter EDITOR (a word processor or program editor that produces ASCII files will suffice) and prepare the command file and the data file.
STEP 2: Load the DICRIM program. The program can be loaded by simply typing DISCRIM and then pressing the [ENTER] key.
A> DISCRIM [ENTER]
STEP 3: After the initial logo identifying the program, a message will appear on the screen requesting the location of theand name of the command file.
+-----------------------------------------------------+
¦ ENTER THE NAME OF THE PC-MDS COMMAND FILE ¦
¦ ¦
¦ USE THE FORM: DRV:FILENAME.EXT (e.g. B:STAT.SPS) ¦
¦ ¦
¦ ¦
¦ A: DISCRIM.SPS ¦
+-----------------------------------------------------+
RESPOND with the location and name of the command file.
A:DISCRIM.SPS [ENTER]
(Assumes the DISCRIM.SPS file is in the A: drive). If the specifications of the command file name was not acceptable, then a message will ask you to re-enter the command file name.
STEP 4: If the command file was specified correctly, the next menu item will popup asking you to specify the location and name of the output file.
+----------------------------------------------------+ ¦ ENTER THE NAME OF THE PC-MDS COMMAND FILE ¦ ¦ ¦ ¦ USE THE FORM: DRV:FILENAME.EXT (e.g. B:STAT.SPS)¦ ¦ ¦ ¦ A:DISCRIM.SPS ¦ ¦ +--------------------------------------------------+ +--¦ ENTER THE NAME OF THE FILE TO SAVE OUTPUT ¦ ¦ ¦ ¦ USE THE FORM: DRV:FILENAME.EXT(e.g. B:STAT.PRN)¦ ¦ ¦ ¦ ¦ ¦ A:DISCRIM.PRN ¦ +--------------------------------------------------+
Enter the name of the output file.
A:DISCRIM.PRN [ENTER]
(Assumes you want the output file DISCRIM.PRN in the A: drive).
If a file already existed in the same name, then the message will appear on screen:
+-----------------------------------------+ ¦ THIS OUTPUT FILE NAME ALREADY EXISTS! ¦ ¦ DO YOU WANT TO OVERWRITE IT ? (Y/N) Y ¦ +-----------------------------------------+
STEP 5: Once the output file name is correctly entered, error messages, if any will be displayed on screen as following:
ERROR MESSAGESIf errors are found, the program aborts. It is recommended that the user makes a note of the errors. The user must edit the Command file to correct errors.
If there were no errors then the following message will appear onscreen for verifying the data set chosen for the analysis:
+---------------------------------------------------------------------------+
¦ STMT# #VARIABLES FORMAT STATEMENT AND DATA ¦
¦---------------------------------------------------------------------------¦
¦ 1 5 ¦
¦ (5F3.0) ¦
+---------------------------------------------------------------------------+
+---------------------------------------------------------------------------+
¦ 5.700000e+001 2.800000e+001 4.100000e+001 1.300000e+001 2.000000e+000 ¦
¦ ¦
+---------------------------------------------------------------------------+
+-----------------------------------+
¦ WAS THE DATA READ CORRECTLY? Y ¦
+-----------------------------------+
The program next reads the first line of data, displays the input format for reading the data, and lists the values for the first data case. If the data is read incorrectly, you may re-specify the format statement. After you indicate that the data was read correctly, the program proceeds with the discriminant analysis of the data.
STEP 6: After the data is entered correctly the following message will appear onscreen:
+-----------------------------------------------------+ ¦ DISCRIMINANT ANALYSIS OPTIONS: ¦ ¦ ¦ ¦ 5 VARIABLES HAVE BEEN DECLARED. ¦ ¦ SELECT THE APPROPRIATE OPTION: ¦ ¦ ¦ ¦ (1) SPECIFY THE VARIABLES FOR ANALYSIS ¦ ¦ (VARIABLES ARE SPECIFIED BY SEQUENCE NUMBER) ¦ ¦ (2) VIEW A LIST OF VARIABLE NUMBERS ¦ ¦ (3) QUIT PROGRAM ¦ ¦ ¦ ¦ YOUR CHOICE : 1 ¦ +-----------------------------------------------------+
Once, the option of specifying the variables is chosen, the dependent variable is specified.
+-------------------------------------------------+ ¦ THE DEPENDENT VARIABLE IS THE Z VARIABLE IN ¦ ¦ THE FORMULA: Z = d0 + d1X1 + ... + dkXk. ¦ ¦ YOU ARE DISCRIMINATING BETWEEN GROUPS IN Z ¦ ¦ ¦ ¦ +--------------------------------------------+ ¦ ¦ ¦ ENTER THE DEPENDENT (OR GROUPING) VARIABLE ¦ ¦ ¦ +--------------------------------------------+ ¦ ¦ 5 ¦ +-------------------------------------------------+
In the current data set, variable 5 is the dependent variable.
STEP 7: The message for specifying the dependent variable groups will appear:
+-----------------------------------------------------+ ¦ SPECIFICATION OF DEPENDENT VARIABLE GROUPS: ¦ ¦ ¦ ¦ ENTER GROUP NUMBERS ONE AT A TIME. ¦ ¦ A blank space must follow each group number. ¦ ¦ The dash (-) may be used to simplify statements. ¦ ¦ PRESS ENTER to quit this menu ¦ ¦ For example, ¦ ¦ 1 2 3 4 5 and 1 - 5 are equivalent statements. ¦ ¦ 1 - 3 ¦ +-----------------------------------------------------+
In this example, we enter groups 1 -3 of the dependent variables for the discriminant analysis.
STEP 8: Specifying the independent variables:
+-----------------------------------------------------+ ¦ INDEPENDENT VARIABLES SPECIFICATION: ¦ ¦ ¦ ¦ ENTER INDEPENDENT VARIABLES ONE AT A TIME. ¦ ¦ A blank space must follow each variable number. ¦ ¦ The dash (-) may be used to simplify statements. ¦ ¦ PRESS ENTER to quit this menu ¦ ¦ For example, ¦ ¦ 1 2 3 4 5 and 1 - 5 are equivalent statements. ¦ ¦ ¦ ¦ 1 - 4 ¦ +-----------------------------------------------------+ Enter 1 - 4.
STEP 9: Once the variables are specified, the discriminant analysis menu appears:
+----------------------------------------------------------------------+ ¦ STEPWISE DISCRIMINANT ANALYSIS MENU ¦ ¦----------------------------------------------------------------------¦ ¦RETURN IF MODIFICATIONS ARE COMPLETE...RUN DISCRIMINANT ANALYSIS NOW ¦ ¦ ¦ ¦ 1 NUMBER OF DEPENDENT VARIABLE GROUPS: 3 ¦ ¦ ¦ ¦ 2 NUMBER AND IDENTIFICATION OF INDEPENDENT VARIABLES 4 ¦ ¦ ¦ ¦ 3 NUMBER OF GROUPS TO PLOT: 3 ¦ ¦ ¦ ¦ 4 WEIGHTS FOR THE COVARIANCE MATRIX? NO ¦ ¦ ¦ ¦ 5 CREATE OUTPUT FILE FOR CANONICAL VARIABLES? NO ¦ ¦ ¦ ¦ 6 CREATE OUTPUT FILE FOR CANONICAL COEFFICIENTS? NO ¦ ¦ ¦ ¦ 7 STRATIFY FIRST TWO CANONICAL VARIABLES ON THE NO ¦ ¦ BASIS OF THE THIRD CANONICAL VARIABLE ¦ ¦ 8 TYPE OF PRIOR PROBABILITIES OF GROUP MEMBERSHIP: 0 ¦ ¦ 0=EQUAL 1=ni/n 2=READ IN ¦ +----------------------------------------------------------------------+
STEP 10: During the computations the options of the Discriminant Analysis appears. This menu controls the parameters for entering variables into the discriminant function.
+----------------------------------------------------+ ¦ DISCRIMINANT ANALYSIS SUBPROBLEM MENU ¦ ¦ SELECT ITEM TO ENTER A NEW VALUE ¦ ¦ ¦ ¦ ITEM DESCRIPTION CURRENT VALUE ¦ ¦ --------------------------- --------------- ¦ ¦ 1 No changes, Start Analysis ¦ ¦ 2 Max Number of steps 8 ¦ ¦ 3 F Value for inclusion .010000 ¦ ¦ 4 F Value for deletion .005000 ¦ ¦ 5 Tolerance Level .000100 ¦ ¦ 6 Control Delete Option: NO ¦ ¦ 7 Print Posterior Prob. NO ¦ ¦ 8 Print Sub-Optimum Functions: ¦ ¦ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ¦ ¦ ¦ ¦ ENTER CHOICE: ¦ +----------------------------------------------------+
Press 1 to continue with the existing set-up of the program.
STEP 11: Once the computations are completed, the options for terminating the program will appear onscreen. The output will be in the DISCRIM.PRN file. Run the EDITOR or a word processing program to read the output file. The output file may be printed if desired. A printed copy of the output file produced by the above commands follows.
PC-MDS
STEPWISE DISCRIMINANT ANALYSIS
ANALYSIS TITLE BMD07M TEST DATA
INPUT DATA FILE DISCRIM.DAT
OUTPUT PRINT FILE DISCRIM.OUT
NO. OF VARIABLES 5
PARAMETERS SPECIFIED (8 VALUES):
NUMBER OF VARIABLES 4
NUMBER OF GROUPS 3
NO. GRPS TO PLOT ON EACH PAGE
IF CANONICAL ANALYSIS DONE 3
WEIGHTING OF COVARIANCE MATRIX NO
CANONICAL VARIABLE OUTPUT NO
CANONICAL COEFFICIENT OUTPUT NO
STRATIFICATION OF CANON. VARS. NO
PRIOR PROBABILITIES USED NO
DATA TREATED AS HAVING NO MISSING VALUES
DATA FOR RECORD: 1
.57E+02 .28E+02 .41E+02 .13E+02 .20E+01
DATA FOR RECORD: 150
.79E+02 .38E+02 .64E+02 .20E+02 .30E+01
GROUP SAMPLE SIZES COMPUTED:
NUMBER OF CASES IN EACH GROUP 50 50 50
PRIOR PROBABILITIES
.3333 .3333 .3333
MEANS (THE LAST COLUMN CONTAINS THE GRAND MEAN OF THE GROUPS IN THE ANALYSIS)
GROUP
AGROUP BGROUP CGROUP
VARIABLE
1 50.0600 59.3600 65.8800 58.4333
2 34.2800 27.7000 29.7400 30.5733
3 14.6200 42.6000 55.5200 37.5800
4 2.4600 13.2600 20.2600 11.9933
STANDARD DEVIATIONS
AGROUP BGROUP CGROUP
VARIABLE
1 3.5249 5.1617 6.3588
2 3.7906 3.1380 3.2250
3 1.7366 4.6991 5.5189
4 1.0539 1.9775 2.7465
WITHIN GROUPS COVARIANCE MATRIX
VARIABLES
1 2 3 4
VARIABLE
1 26.5008
2 9.2721 11.5388
3 16.7514 5.5244 18.5188
4 3.8401 3.2710 4.2665 4.1882
WITHIN GROUPS CORRELATION MATRIX
VARIABLES
1 2 3 4
VARIABLE
1 1.0000
2 .5302 1.0000
3 .7562 .3779 1.0000
4 .3645 .4705 .4845 1.0000
SUBPROBLEM 0
F-LEVEL FOR INCLUSION .0100
F-LEVEL FOR DELETION .0050
TOLERANCE LEVEL .0001
CONTROL VALUES 1111
***************************************************************************
STEP NUMBER 0
VARIABLE ENTERED
VARIABLES NOT INCLUDED AND F TO ENTER - DEGREES OF FREEDOM 2 147
1: 119.264 2: 49.160 3: 1180.161 4: 960.007
***************************************************************************
STEP NUMBER 1
VARIABLE ENTERED 3
VARIABLES INCLUDED AND F TO REMOVE - DEGREES OF FREEDOM 2 147
3: 1180.161
VARIABLES NOT INCLUDED AND F TO ENTER - DEGREES OF FREEDOM 2 146
1: 34.323 2: 43.035 4: 24.766
U-STATISTIC .05863 DEGREES OF FREEDOM 1 2 147
APPROXIMATE F 1180.16100 DEGREES OF FREEDOM 2 147.00
F MATRIX - DEGREES OF FREEDOM 1 147
GROUP
AGROUP BGROUP
GROUP
BGROUP 1056.874
CGROUP 2258.262 225.348
*******************************************************************************
STEP NUMBER 2
VARIABLE ENTERED 2
VARIABLES INCLUDED AND F TO REMOVE - DEGREES OF FREEDOM 2 146
2: 43.035 3: 1112.954
VARIABLES NOT INCLUDED AND F TO ENTER - DEGREES OF FREEDOM 2 145
1: 12.269 4: 34.569
U-STATISTIC .03688 DEGREES OF FREEDOM 2 2 147
APPROXIMATE F 307.10460 DEGREES OF FREEDOM 4 292.00
F MATRIX - DEGREES OF FREEDOM 2 146
GROUP
AGROUP BGROUP
GROUP
BGROUP 804.511
CGROUP 1473.231 116.038
*******************************************************************************
STEP NUMBER 3
VARIABLE ENTERED 4
VARIABLES INCLUDED AND F TO REMOVE - DEGREES OF FREEDOM 2 145
2: 54.577 3: 38.724 4: 34.569
VARIABLES NOT INCLUDED AND F TO ENTER - DEGREES OF FREEDOM 2 144
1: 4.721
U-STATISTIC .02498 DEGREES OF FREEDOM 3 2 147
APPROXIMATE F 257.50320 DEGREES OF FREEDOM 6 290.00
F MATRIX - DEGREES OF FREEDOM 3 145
GROUP
AGROUP BGROUP
GROUP
BGROUP 692.014
CGROUP 1381.162 133.373
*******************************************************************************
STEP NUMBER 4
VARIABLE ENTERED 1
VARIABLES INCLUDED AND F TO REMOVE - DEGREES OF FREEDOM 2 144
1: 4.721 2: 21.936 3: 35.590 4: 24.904
U-STATISTIC .02344 DEGREES OF FREEDOM 4 2 147
APPROXIMATE F 199.14540 DEGREES OF FREEDOM 8 288.00
F MATRIX - DEGREES OF FREEDOM 4 144
GROUP
AGROUP BGROUP
GROUP
BGROUP 550.189
CGROUP 1098.274 105.313
F LEVEL INSUFFICIENT FOR FURTHER COMPUTATION
FUNCTION
AGROUP BGROUP CGROUP
VARIABLE
1 2.35442 1.56982 1.24458
2 2.35879 .70725 .36853
3 -1.64306 .52114 1.27665
4 -1.73984 .64342 2.10791
CONSTANT -86.30845 -72.85261 -104.36830
GROUP WITH SQUARE OF DISTANCE FROM AND POSTEROR
LARGEST PROB. PROBABILITY FOR GROUP -
AGROUP AGROUP BGROUP CGROUP
CASE
1 AGROUP 2.4034 1.0000, 74.2557 .0000, 160.0998 .0000,
2 AGROUP .2419 1.0000, 90.6602 .0000, 181.5587 .0000,
3 AGROUP 4.4186 1.0000, 115.7274 .0000, 208.9456 .0000,
. . . . . . . .
CONTINUED FOR CASE 4 TO CASE 47
. . . . . . . .
48 AGROUP 10.6473 1.0000, 145.3974 .0000, 249.2639 .0000,
49 AGROUP 1.0069 1.0000, 92.3452 .0000, 179.2350 .0000,
50 AGROUP .5533 1.0000, 87.2894 .0000, 177.0700 .0000,
BGROUP AGROUP BGROUP CGROUP
CASE
1 BGROUP 84.7852 .0000, .3734 .9999, 19.4218 .0001,
2 CGROUP 149.0303 .0000, 8.4393 .1434, 4.8645 .8566,
3 BGROUP 131.6616 .0000, 8.4307 .9596, 14.7647 .0404,
. . . . . . . .
CONTINUED FOR CASE 4 TO CASE 47
. . . . . . . .
48 BGROUP 94.1567 .0000, 1.0258 .9998, 18.2519 .0002,
49 BGROUP 90.7313 .0000, 2.1033 .9999, 20.1282 .0001,
50 BGROUP 72.2566 .0000, 3.1074 1.0000, 32.9760 .0000,
CGROUP AGROUP BGROUP CGROUP
CASE
1 CGROUP 175.5659 .0000, 21.1320 .0660, 15.8331 .9340,
2 CGROUP 195.4075 .0000, 26.1183 .0000, 2.2160 1.0000,
3 CGROUP 194.3124 .0000, 22.4317 .0000, 1.3114 1.0000,
. . . . . . . .
CONTINUED FOR CASE 4 TO CASE 47
. . . . . . . .
48 CGROUP 161.2736 .0000, 13.3249 .0062, 3.1597 .9938,
49 CGROUP 184.8178 .0000, 24.9964 .0000, 3.8859 1.0000,
50 CGROUP 176.5572 .0000, 26.7791 .0005, 11.6560 .9995,
NUMBER OF CASES CLASSIFIED INTO GROUP -
ACTUAL AGROUP BGROUP CGROUP
---------------------
AGROUP| 50| 0| 0|
---------------------
BGROUP| 0| 48| 2|
---------------------
CGROUP| 0| 1| 49|
---------------------
SUMMARY TABLE
STEP VARIABLE F VALUE TO NUMBER OF U-STATISTIC
NUMBER ENTERED REMOVED ENTER OR REMOVE VARIABLES INCLUDED
1 3 1180.1610 1 .0586
2 2 43.0354 2 .0369
3 4 34.5687 3 .0250
4 1 4.7212 4 .0234
EIGENVALUES
32.19192 .28539 .00000 -.00000
CUMULATIVE PROPORTION OF TOTAL DISPERSION
.99121 1.00000 1.00000 1.00000
CANONICAL CORRELATIONS
.98482 .47120 .00049 .00123
COEFFICIENTS FOR CANONICAL VARIABLE -
ORIGINAL 1 2 3 4
VARIABLE
1 -.08294 -.00241 .22927 .22133
2 -.15345 -.21645 -.03570 -.26583
3 .22012 .09319 -.02152 -.30048
4 .28105 -.28392 -.16343 .42593
CONSTANT -2.10511 6.66147 -9.53701 1.37844
GROUP CANONICAL VARIABLES EVALUATED AT GROUP MEANS
1 -7.60760 -.21513 .00000 .00000
2 1.82505 .72790 .00000 .00000
3 5.78255 -.51277 .00000 .00000
THE FIRST 2 OF THE CANONICAL VARIABLES REFER TO DIFFERENCES BETWEEN GROUPS.
THE REMAINING REFER TO VARIATION WITHIN GROUPS WHICH IS INDEPENDENT OF
GROUP DIFFERENCES.
CHECK ON FINAL U-STATISTIC .02344
POINTS PLOTTED ON THE FOLLOWING GRAPH
X = FIRST CANONICAL VARIABLE
Y = SECOND CANONICAL VARIABLE
CASE NUMBER FOLLOWED BY * INDICATES THE POINT IS OFF THE GRAPH
GROUP AGROUP MEAN COORDINATES -7.608 -.215
CASE X Y CASE X Y CASE X Y
1 -6.771 .971 2 -7.672 .135 3 -8.582 -1.834
. . . . . . . .
CONTINUED FOR CASE 4 TO CASE 45
. . . . . . . .
46 -9.468 -1.825 47 -8.078 -.969 48 -9.850 -1.586
49 -7.586 -1.208 50 -7.490 .265
GROUP BGROUP MEAN COORDINATES 1.825 .728
CASE X Y CASE X Y CASE X Y
1 1.549 .593 2 4.498 .883 3 3.498 1.685
. . . . . . . .
CONTINUED FOR CASE 4 TO CASE 45
. . . . . . . .
46 2.257 1.427 47 2.479 1.941 48 1.956 1.154
49 1.750 .821 50 .606 1.943
GROUP CGROUP MEAN COORDINATES 5.783 -.513
CASE X Y CASE X Y CASE X Y
1 5.107 2.131 2 6.273 -1.649 3 6.292 -.467
. . . . . . . .
CONTINUED FOR CASE 4 TO CASE 45
. . . . . . . .
46 5.958 .094 47 5.361 -.646 48 4.996 -.188
49 5.807 -2.010 50 5.220 -1.468
PLOT OF FIRST VS SECOND CANONICAL VARIABLES
OVERLAP IS INDICATED BY $, GROUP MEANS BY *.
-10.608 -6.299 -1.990 2.319 6.628
-8.453 -4.144 .165 4.474 8.783
+....+....+....+....+....+....+....+....+....+....+....+....+....+....+....+....-+....+....+
6.269 . . 6.269
5.910 . . 5.910
5.551 . . 5.551
5.192 . . 5.192
4.833 . . 4.833
4.474 . . 4.474
4.115 . . 4.115
3.755 . . 3.755
3.396 . . 3.396
3.037 . . 3.037
2.678 . E . 2.678
2.319 . I . 2.319
1.960 . S EE E I . 1.960
1.601 . E EE E E . 1.601
1.242 . S E E E E I . 1.242
.883 . SSSSSS E E E*E E I EI II I . .883
.524 . S SS EEEEE EE I I . .524
.165 . S S S EE EE I III I I . .165
-.194 . SS * S E EE EE EE E I II I . -.194
-.553 . SSSSS S S S E I I * I I . -.553
-.913 . S S S E E I I I . -.913
-1.272 . S S S S E I I I . -1.272
-1.631 . S S I I I II . -1.631
-1.990 . S S I I I . -1.990
-2.349 . I II . -2.349
-2.708 . S I . -2.708
-3.067 . . -3.067
-3.426 . . -3.426
-3.785 . . -3.785
-4.144 . . -4.144
-4.503 . . -4.503
-4.862 . . -4.862
-5.221 . . -5.221
-5.581 . . -5.581
-5.940 . . -5.940
-6.299 . . -6.299
+....+....+....+....+....+....+....+....+....+....+....+....+....+....+....+....+....+....+
-8.453 -4.144 .165 4.474 8.783
-10.608 -6.299 -1.990 2.319 6.628
COMPLETION OF STEPWISE DISCRIMINANT ANALYSIS