79652 55462 12345 16523 46525 79665 65321 98653 46521 65435 32165 56523 65454 16589 98965 73195 15937 35079 62486 46428
This data represents the scores (0 to 9 scale) of 20 students on five finals (e.g. Math, English, History, Geography, Science). Can we say that the students' exam grades in the different subjects are related? The relationship between the student grades are not directly measureable, but are in fact latent. Grades in different courses could be related because of the student's intellectual capabilities, memory capacity, or just interest. Although it should be noted that the test grades of one person may not be completely correlated with one another, we can conclude that the grades in all subject areas should depend to some degree on the general intelligence or other factors common to the learning of the subject material. Accordingly, we may identify one or more factors that explain the `common' portion of the variance in the original raw scores.
The basic objectives of a Factor Analysis are:
iv) To determine the amount of each factor possessed by each observation. (Identified by the factor scores)
Factor analysis is a decompositional analysis of a set of data. Data sets are traditionally in the form of an observations by variables matrix. Some researchers may, however, have need for analysis of data forms that do not conform to the traditional mode. For example, occasions (repeated measures) may be included or data matrices could be transposed. Each of these data forms may be analyzed using factor analysis, but will produce a decomposition of observations or occasions. Alternate forms of the factor analysis data matrix appear below. (The most common forms of factor analysis are R Type, where factors are loaded by variables and are computed across the persons and Q Type, where factors are loaded by persons and are computed across the variables).
The alternative modes of factor analysis can be portrayed graphically. The original data set is viewed as a variables-persons-occasions matrix. R-Type and Q-Type techniques deal with the variables-persons dichotomy. In contrast P-type and Q-Type analysis are used for the occasions-variables situation and S-Type and T-Type are used when the occasions-persons relationship is of interest (c).
VARIABLES VARIABLES
+-----------+ +-----------+
¦¦¦ ¦ +-----------¦
PERSONS ¦¦¦ R-TYPE ¦ PERSONS ¦ Q-TYPE ¦
¦¦¦ ¦ +-----------¦
+-----------+ +-----------+
VARIABLES VARIABLES
+-----------+ +-----------+
¦¦¦ ¦ +-----------¦
OCCASIONS ¦¦¦ P-TYPE ¦ OCCASIONS ¦ O-TYPE ¦
¦¦¦ ¦ +-----------¦
+-----------+ +-----------+
PERSONS PERSONS
+-----------+ +-----------+
¦¦¦ ¦ +-----------¦
OCCASIONS ¦¦¦ S-TYPE ¦ OCCASIONS ¦ T-TYPE ¦
¦¦¦ ¦ +-----------¦
+-----------+ +-----------+
DEFINITIONS
BIQUARTIMIN: The factor loadings matrix is transformed by an oblique (so the factors are correlated) rotation such that there is one variable with a large squared loading on the factor and the rest of the variable loadings on the factor would be close to zero.
COMMON FACTOR ANALYSIS: Factor analysis based upon a correlation matrix, with values less than 1.0 on the diagonal. The values on the diagonal, are known as communalities and are inserted in the diagonal to represent only the common variance (excludes specific and error variance), that should be solved for by the factor analysis
COMMUNALITY: The amount of variance in the variable shared with all other variables.
PRINCIPAL COMPONENTS ANALYSIS: One variety of factor analysis. The factors are based upon an analysis of the total variance in the original data. In application, this means that eh factor analysis begins with a correlation matrix which has the value of '1' used on the diagonal. This computationally implies that all 100% of the variance is common or shared between the variables. Other forms of factor analysis may begin with other values in the diagonal that reflect the amount of variance expected to be explained for each variable.
CORRELATION MATRIX: A table showing intercorrelation among all variables analyzed.
EIGENVALUE: The sum of squares of the loadings in a column in the factor matrix. Eigenvalues are also referred to as latent roots and represent the amount of variance accounted for by a factor.
FACTOR: The smaller set of underlying composite dimensions of all variables in the data set. Factors are linear combinations of the original variables.
FACTOR LOADINGS: These are the correlation coefficients between the variables and the factors. The variables with the highest correlations provide the most meaning (in an interpretation sense) to the factor solution.
FACTOR MATRIX: This k variable by m factor matrix contains the factor loadings of all variables on each factor.
FACTOR ROTATION: Given a cartesian coordinate system where the axes are the factors and the points are the variables, factor rotation is the process of holding the points constant and moving (rotating) the factor axes. The rotation is done in a manner so that the points are highly correlated with the axes and provide a more meaningful interpretation of the factor solution.
FACTOR SCORES: This is the score of each observation on the newly identified factors. This factor score is a linear combination of all of the original variables that were relevant in making the new factor.
GAMA OF ROTATION: A user input parameter that leads to different rotation schemes. Standard values of gama include 0 (for quartimax, quartimin, direct quartimin), .5 (for bi-quartimin), and 1.0 (for varimax and covarimin).
KAISER NORMALIZATION: A process by which each row of the initial factor loading matrix is normalized by dividing by the square root of hi, the row's commonality. This normalization has the effect of making the sum of squares for each row sum to 1.0. This transformation does not affect the varimax solution.
OBLIMIN: Also called simple structure and refers to the rotated factor loadings matrix. Simple structure is difficult to define in that it refers to the situation where most of the loadings on any specific factor are small and a few loadings are as large as possible.
OBLIQUE FACTOR SOLUTIONS: A computed factor solution where the extracted factors are not independent, but are correlated. In many siutations, there is no arbitrary (or theoretical) reason why the factors should be independent of each other. The analysis is conducted to express the relationship between the factors that may or may not be orthogonal; rather than arbitrarily constraining the factor solution so that the factors are independent of each other.
ORTHOGONAL: Refers to mathematical independence of the factors. Operationally, orthogonal factor axes are at right angles to each other (90o).
ORTHOGONAL FACTOR SOLUTIONS: The directional cosines of the angle between the factors in the factor solution corresponds to the correlations between the factors. Orthogonality refers to no correlation and is synonimous to a 90o angle in a cartesian coordinate system. Orthogonal factor solutions then extract the factors so that the factor axes are maintained at right angles. Thus each factor is independent of all other factors and the correlation between the factors is zero.
SQUARED FACTOR LOADINGS: Because loadings are the correlation between the variables and the factors, the squared factor loadings could be compared to R-Square in a regression analysis. The SQUARED FACTOR LOADING indicates the percentage of the variance of the original variable is explained by the factor. For a given factor, the sum of these squared factor loadings is the eigenvalue or latent root associated with that factor.
TRACE: It is the Sum of Squares of the numbers on the diagonal of the correlation matrix used in the factor analysis, the trace is equal to the number of variables, based on the assumption that the variance in each variable is equal to 1. With the common correlation matrix, the trace is equal to the sum of the communalities on the diagonal of the reduced correlation matrix which is also equal to the amount of common variance for the variables being analyzed.
VARIMAX ROTATIONS: An orthogonal rotation of factors that redistributes the variance accounted within the pattern of factor loadings. Both the communalities and the total variance accounted for are the same before and after rotation. This procedure is the most commonly used to re-orient or clean up the loadings obtained in a principal components analysis.
The Factor Program performs a factor analysis of up to 50 variables. The factoring may be done by using either a raw data set or a correlation matrix. Initial communality estimates may be squared multiple correlations, regression variances, maximum absolute row values, or they may be specified by the user. If requested, the program will iterate on the initial communality estimates. Three types of rotations are available, all based on the oblimin criterion. In the first, the factors are restricted to be non-orthogonal, which yields among others quartimax and varimax rotations. In the second, the criterion is applied to the reference factor structure and the factors are allowed to be oblique which yields standard oblimin rotations. In the third, the factors are applied to primary factor loadings, allowing the factors to be oblique and yielding simple loading rotations.
The PC-MDS Command File. This is a file that defines the various variables, their format and locations, defines the value of missing values for different variables and recodes the values of the variables desired. The program files are usually named *.SPS. As an example for a program file for a FACTOR analysis we shall refer to the file FACTOR.SPS. The name of the PC-MDS command file file is specified interactively by the user.
TITLE BMD08M TEST DATA
FILE NAME 'FACTOR.DAT'
DATA LIST V1 TO V5
5 (5F1.0)
VARIABLE LABELS
V1 'VARIABLE 1'
V2 'VARIABLE 2'
V3 'VARIABLE 3'
V4 'VARIABLE 4'
V5 'VARIABLE 5'
The Data File: The data file contains the data in the format described in the Program File. The data files are usually named *.DAT. As an example for a data file for a FACTOR analysis we shall refer to the file FACTOR.DAT given below. The data file is specified in line 2 of the command file (the FILENAME command).
79652 55462 12345 16523 46525 79665 65321 98653 46521 65435 32165 56523 65454 16589 98965 73195 15937 35079 62486 46428
The Output File:
The output file is in the named *.PRN. The output includes:
1) Mean and Standard Deviation for the variables. 2) Variance-Covariance Matrix 3) Correlation Matrix 4) N Matrix 5) Eigenvalues 6) Cumulative proportion of total variance 7) Proportion of Variance per Eigenvalue 8) Factor Matrix before rotation 9) Rotated Factor Matrix 10) Factor Score Coefficients
HOW TO RUN THE FACTOR PROGRAM
STEP 1: Enter the EDITOR (a word processor or program editor that produces ASCII files will suffice), and prepare the command file and the data file.
STEP 2: Load the FACTOR program. The program is loaded by simply typing FACTOR and then pressing the [ENTER] key.
A> FACTOR [ENTER]
STEP 3: After the initial logo identifying the program, the following message will appear on the screen. RESPOND with the location and name of the command file:
+-----------------------------------------------------+ ¦ ENTER THE NAME OF THE PC-MDS COMMAND FILE ¦ ¦ ¦ ¦ USE THE FORM: DRV:FILENAME.EXT (e.g. B:STAT.SPS) ¦ ¦ ¦ ¦ ¦ ¦ A:FACTOR.SPS ¦ +-----------------------------------------------------+
RESPOND with the location and name of the command file:
A:FACTOR.SPS [ENTER]
(Assumes the FACTOR.SPS file is in the A: drive). If the format of entry of the command file was not acceptable, then a message will ask you to re-enter the command file name.
STEP 4: If the command file name was specified correctly, the next menu item will pop up asking you to specify the location and name of the output file.
+----------------------------------------------------+ ¦ ENTER THE NAME OF THE PC-MDS COMMAND FILE ¦ ¦ ¦ ¦ USE THE FORM: DRV:FILENAME.EXT (e.g. B:STAT.SPS)¦ ¦ ¦ ¦ A:FACTOR.SPS ¦ ¦ +--------------------------------------------------+ +--¦ ENTER THE NAME OF THE FILE TO SAVE OUTPUT ¦ ¦ ¦ ¦ USE THE FORM: DRV:FILENAME.EXT(e.g. B:STAT.PRN)¦ ¦ ¦ ¦ ¦ ¦ A:FACTOR.PRN ¦ +--------------------------------------------------+
Enter the name of the output file:
A:FACTOR.PRN [ENTER]
(Assumes you want the output file FACTOR.PRN written to the A: drive). If a file already exists with the same name, then the message will appear on screen:
+-----------------------------------------------------+ ¦ THIS OUTPUT FILE NAME ALREADY EXISTS! ¦ ¦ DO YOU WANT TO OVERWRITE IT? (Y/N) Y ¦ +-----------------------------------------------------+
STEP 5: Once the output file name is correctly entered, the initial computations required for reading the command file take place. Initial error messages associated with the command file, if any, will be displayed:
ERROR MESSAGES
ERROR: LINE # : MESSAGE
If errors are found, the program aborts. It is recommended that the user makes a note of the errors. The user must edit the Command file to correct the errors. The FACTOR program may then be rerun.
If there were no errors then the message on screen will be:
+-------------------------------------------------+
¦ WHAT TYPE OF DATA HAVE YOU SPECIFIED? ¦
¦ ¦
¦ +--------------------------------------------+ ¦
¦ ¦ PRESS ENTER IF RAW DATA ¦ ¦
¦ +--------------------------------------------+ ¦
¦ +--------------------------------------------+ ¦
¦ ¦ ENTER 1 IF CORRELATION MATRIX FILE ¦ ¦
¦ +--------------------------------------------+ ¦
¦ ¦
+-------------------------------------------------+
The file Factor.Dat contains raw data. Press ENTER to continue.
STEP 6: The program next reads the first line of data, displays the input format for reading the data, and lists the values for the first data case. If the data is read incorrectly, you may re-specify the format statement, and then the program proceeds with the factor analysis of the data.
+------------------------------------------------------------------------+
¦ STMT# #VARIABLES FORMAT STATEMENT AND DATA ¦
¦------------------------------------------------------------------------¦
¦ 1 5 ¦
¦ (5F1.0) ¦
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
¦ 7.00000e+000 9.00000e+000 6.00000e+000 5.00000e+000 2.00000e+000 ¦
+------------------------------------------------------------------------+
+-----------------------------------+
¦ WAS THE DATA READ CORRECTLY? Y ¦
+-----------------------------------+
STEP 7: The option to SPECIFY THE VARIABLES is next presented.
+-----------------------------------------------------+ ¦ FACTOR ANALYSIS PROGRAM OPTIONS: ¦ ¦ ¦ ¦ 5 VARIABLES HAVE BEEN DECLARED. ¦ ¦ SELECT THE APPROPRIATE OPTION: ¦ ¦ ¦ ¦ (1) SPECIFY THE VARIABLES FOR ANALYSIS ¦ ¦ (VARIABLES ARE SPECIFIED BY SEQUENCE NUMBER) ¦ ¦ (2) VIEW A LIST OF VARIABLE NUMBERS ¦ ¦ (3) QUIT PROGRAM ¦ ¦ ¦ ¦ YOUR CHOICE : 1 ¦ +-----------------------------------------------------+
Option 1 is selected to specify the variables that are to be included in the analysis. Enter the variables you want to study and press enter. The variables selected are listed.
+-------------------------------------------------------+
¦ FACTOR ANALYSIS VARIABLES SPECIFICATION: ¦
¦ ¦
¦ ENTER VARIABLES ONE AT A TIME. ¦
¦ A blank space must follow each variable number. ¦
¦ The dash (-) may be used to simplify statements. ¦
¦ PRESS ENTER to quit this menu ¦
¦ For example, ¦
¦ 1 2 3 4 5 and 1 - 5 are equivalent statements. ¦
¦ ¦
¦ 1 - 5 ¦
+-------------------------------------------------------+
+------------------------------------------------------------------------+
¦ SELECTED VARIABLES ¦
+------------------------------------------------------------------------+
+------------------------------------------------------------------------+
¦ 1 V1 2 V2 3 V3 4 V4 5 V5 ¦
¦ ¦
¦ VARIABLES CORRECT? Y ¦
+------------------------------------------------------------------------+
STEP 8: The option for saving the correlation matrix separately for future usage is next presented. ENTER a FILENAME.EXT or press ENTER to continue.
+-------------------------------------------------+ ¦ THE CORRELATION MATRIX MAY BE SAVED TO BE USED ¦ ¦ IN FUTURE ANALYSES. (REDUCES PROCESSING TIME) ¦ ¦ ¦ ¦ PRESS ENTER TO CONTINUE WITHOUT SAVING ¦ ¦ +--------------------------------------------+ ¦ ¦ ¦ ENTER THE NAME OF THE CORRELATION MATRIX. ¦ ¦ ¦ +--------------------------------------------+ ¦ ¦ ¦ +-------------------------------------------------+
STEP 9: Next the FACTOR MENU will appear:
+----------------------------------------------------------------------+ ¦ F A C T O R M E N U ¦ ¦----------------------------------------------------------------------¦ ¦ RETURN IF MODIFICATIONS ARE COMPLETE...RUN FACTOR ANALYSIS NOW ¦ ¦ ¦ ¦ 1 MAXIMUM ITERATIONS FOR COMMUNALITY: 1 ¦ ¦ ¦ ¦ 2 TYPE OF ROTATION: 1 ¦ ¦ 0=NO ROTATION; 1=ORTHOGONAL; 2=OBLIMIN; 3=OBLIQUE ¦ ¦ 3 MAXIMUM NUMBER OF ITERATIONS IN ROTATION: 50 ¦ ¦ ¦ ¦ 4 VALUE OF GAMA FOR ROTATION: 1.000 ¦ ¦ 0=STD VALUE; .5=BIQUARTIMIN; 1=VARIMAX (ORTHOGONAL) ¦ ¦ 5 NUMBER OF FACTORS TO ROTATE (#VAR/2=DEFAULT): 2 ¦ ¦ ¦ ¦ 6 MINIMUM EIGENVALUE CUTOFF VALUE (ROTATE IF >CUTOFF):1.000 ¦ ¦ ¦ ¦ 7 IS KAISER NORMALIZATION TO BE USED FOR ROTATION? NO ¦ ¦ ¦ ¦ 8 UPPER LIMIT ON CORRELATIONS OF BI-FACTORS IF ¦ ¦ OBLIMIN ROTATION IS PERFORMED (DEFAULT=.95): .950 ¦ +----------------------------------------------------------------------+
the variables and the options for the program are entered, the computations will begin:
+---------------------------------------+ ¦PLEASE WAIT...COMPUTATIONS IN PROGRESS ¦ +---------------------------------------+
The program begins computations. Once the statistical analysis is complete, the program prompts the user with an options menu. The QUIT PROGRAM option should be used to exit the factor program.
+---------------------------------------------+ ¦ FACTOR ANALYSIS PROGRAM OPTIONS: ¦ ¦ SELECT THE APPROPRIATE OPTION: ¦ ¦ ¦ ¦ (0) QUIT PROGRAM ¦ ¦ (1) START A NEW ANALYSIS (NEW COMMAND FILE)¦ ¦ (2) FURTHER ANALYSIS WITH CURRENT DATA ¦ ¦ (3) PRODUCE FACTOR SCORES ¦ ¦ ¦ ¦ YOUR CHOICE: ¦ ¦ ¦ +---------------------------------------------+
STEP 11: The output will be in the FACTOR.PRN file. Run the EDITOR or a word processing program to read the output file. The output file may be printed if desired. A printed copy of the output file produced by the above commands follows.
PC-MDS
FACTOR ANALYSIS
ANALYSIS TITLE BMD08M TEST DATA
INPUT DATA FILE A:FACTOR.DAT
OUTPUT PRINT FILE A:FACTOR.PRN
NO. OF VARIABLES 5
DATA TREATED AS HAVING NO MISSING VALUES
DATA FOR RECORD: 1
.70E+01 .90E+01 .60E+01 .50E+01 .20E+01
DATA FOR RECORD: 20
.40E+01 .60E+01 .40E+01 .20E+01 .80E+01
VARIABLE MEAN STAND. DEV. MINIMUM MAXIMUM
V1 4.7500 2.53138 1.00000 9.00000
V2 5.4500 2.08945 2.00000 9.00000
V3 4.4500 2.28208 .00000 9.00000
V4 4.6500 2.32322 2.00000 9.00000
V5 4.6500 2.36810 1.00000 9.00000
CORRELATION MATRIX
V1 .10000E+01
V2 .42042E+00 .10000E+01
V3 .17538E+00 .61757E+00 .10000E+01
V4 .22597E+00 -.20438E+00 -.27647E+00 .10000E+01
V5 -.37534E+00 .20051E+00 -.12515E+00 .38792E+00 .10000E+01
1 2 3 4 5
N-MATRIX
V1 20
V2 20 20
V3 20 20 20
V4 20 20 20 20
V5 20 20 20 20 20
1 2 3 4 5
FACTOR ANALYSIS SUMMARY STATISTICS
NUMBER OF CASES 20
NUMBER OF VARIABLES 5
MAX. ITERATIONS FOR COMMUNALITIES 1
MAX. ITERATIONS FOR ROTATION 50
NUMBER OF FACTORS TO BE ROTATED 2
EIGENVALUE CUTOFF CONSTANT 1.000000
UPPER LIMIT ON CORRELATION COEFFICIENT .95000
DIAGONAL ELEMENTS ARE UNALTERED
VARIMAX ROTATION IS PERFORMED
EIGENVALUES
2.08418 1.25547 1.04697 .36381 .24957
CUMULATIVE PROPORTION OF TOTAL VARIANCE
.41684 .66793 .87732 .95009 1.00000
PROPORTION OF VARIANCE PER EIGENVALUE
VARIANCE PERCENT
..............................................................
. .
.4168 .*********** .
.*********** .
.*********** .
.*********** .
.2779 .*********** .
.*********** .
.*********** *********** .
.*********** *********** *********** .
.1389 .*********** *********** *********** .
.*********** *********** *********** .
.*********** *********** *********** .
.*********** *********** *********** *********** .
.*********** *********** *********** *********** *********** .
..............................................................
EIGENVALUE 0 0 0 0 0
1 2 3 4 5
VARIABLE ESTIMATED FINAL
COMMUNALITY COMMUNALITY
V1 1.000000 .817099
V2 1.000000 .711402
V3 1.000000 .560990
V4 1.000000 .884644
V5 1.000000 .365514
FACTOR MATRIX BEFORE ROTATION
VAR# VARIABLE NAME FACTOR
1 2
1 V1 .55331 .71481
2 V2 .82906 .15512
3 V3 .74201 -.10205
4 V4 -.44063 .83096
5 V5 -.58819 .13983
ORTHOGONAL ROTATION
ITERATION SIMPLICITY
CRITERION
0 -1.095068
1 -1.095877
2 -1.095877
FACTOR - 1 VARIANCE ACCOUNTED FOR: .4168
VARIABLE
2 V2 .82062
3 V3 .74606
5 V5 -.59424
1 V1 .51822
4 V4 -.48016
FACTOR - 2 VARIANCE ACCOUNTED FOR: .2511
VARIABLE
4 V4 .80876
1 V1 .74064
2 V2 .19489
5 V5 .11132
3 V3 -.06618
ROTATED FACTOR MATRIX:
VAR# VARIABLE NAME FACTOR
1 2
1 V1 .51822 .74064
2 V2 .82062 .19489
3 V3 .74606 -.06618
4 V4 -.48016 .80876
5 V5 -.59424 .11132
FACTOR SCORE COEFFICIENTS
VAR# VARIABLE NAME FACTOR
1 2
1 V1 .2377 .5815
2 V2 .3914 .1426
3 V3 .3595 -.0640
4 V4 -.2431 .6509
5 V5 -.2873 .0976
FACTOR ANALYSIS COMPLETE, NORMAL END OF PROGRAM