CONSTRUCTING A COMMAND FILE FOR THE FREQUENCY, REGRESSION, DISCRIMINANT AND FACTOR PROGRAMS


Several of the PC-MDS programs use an SPSS (c) type of command language. This command language facilitates a broad range of data manipulation options. Because the multidimensional scaling procedures most often focus on aggregate data, the Frequency, Regression, Discriminant, and FACTOR analysis programs are the only ones which operate using this command structure.

The user must prepare two files in order to run each of these PC-MDS program.

REQUIRED FILES SOURCE OF FILE
PC-MDS COMMAND FILE => Is created with the PC-MDS EDITOR or your word processing file.
See "THE PC-MDS COMMAND FILE" documentation. Unless new variable transformations are required for different analyses, the command file is the same for the FACTOR, REGRESSION, DISCRIMINANT AND FREQUENCY programs.
ASCII DATA FILE => Is created using the PC-MDS EDITOR OR YOUR WORD PROCESSOR, SAVING THE FILE AS A DOS TEXT FILE
The Data File may be downloaded from a mainframe computer or produced by another computer program. The Data File MUST be an ASCII file (numbers and letters only). ASCII Data Files Do not use commas and do not require spaces to separate variables.
Example:
1775113118861141175114117113115817711771114711567430829120987615347821096453 
05720938761452209876154360295318742198763450305982174610987624531098762453 
057330987416526644142666441646642235132432343324323433142244346643236243234333 
0574544334352542354322111023313435 
57   1   2   2   2   2

The same data file is used for all analyses.

PC-MDS OUTPUT FILE => is Automatically Created by PC-MDS
The name is specified interactively when running each PC-MDS program.


RUNNING THE PROGRAMS

Each of the command file driven PC-MDS program interacts with the user in identifying the names of three files that are required for all analyses: The Command file, The Data file, and The Output file. The following description identifies the contents of each file.

The PC-MDS Command File

The command file defines the various variables, their format and locations, defines missing values for variables and recodes the values of the variables, if desired. The same command file may be used for all analyses.

For purposes of clarification, the command files are designated as files with an "SPS" extension (i.e., *.SPS). The name of the PC-MDS command file is specified interactively by the user when each program is run. (Note that the .SPS designation is used for instructional clarity only. The command file may have any name and does not require the .SPS extension).


SAMPLE COMMAND FILES FOR DISCRIMINANT, FACTOR, REGRESS, FREQUENCY ONLY

EXAMPLE # 1

TITLE SAMPLE COMMAND FILE 
FILE NAME 'SAMPLE.DAT'
DATA LIST  V1 TO V20, V21 
21  (3X,F4.0,19F4.1,F4.0) 
VARIABLE LABELS V1 'CITY' 
V2 'SEX' 
V3 'FIRST NEWS SOURCE' 
V4 'SECOND NEWS SOURCE'
V5 'THIRD NEWS SOURCE' 
V20 'IMP ADS' 
V21 'READ HERALD' 
COMPUTE V1 = 20*0.5 
COMPUTE V2 = 81**.5 
IF(V1 EQ 10) V33=1 
IF(V2 EQ  9) V34=2 
IF(V1 EQ 10) V35=1 
COMPUTE V29 = COS(0.5+0.5*2) 
COMPUTE V39 = COS(0.5+0.5*2) 
VARIABLE LABELS    V29 'AGE' 
V39 'INCOME'  
MISSING VALUES  V29 (0)/V21(1110)/V1 TO V19 (5,55) 
RECODE V5,V7 TO V10 (LOW TO 10 = 55) (LOW TO 20 =
66)/ V12 TO V15 (1 TO HIGH = 5) / 
V19 TO V20 (1,2,3 = 7) (4,5,6= 8)


EXAMPLE # 2

TITLE 7-11 CREDIT CARD ANALYSIS 
FILE NAME     'SEVEN.DAT' 
DATA LIST     V1 TO V30  
30    (12F1.0,F2.0,2F1.0,9F3.1,5F1.0,F3.0) 
VARIABLE LABELS    V1 'AVERAGE WEEKLY VISITS' 
V2 'AVERAGE WEEKLY GAS PURCHASES' 
V3 'GAS QUALITY IMPORTANCE' 
V4 '7-11 GAS QUALITY IMPRESSION' 
V5 'SERVICE QUALITY' 
V6 'MERCHANDISE QUALITY' 
V7 'HIGHEST QUALITY GAS'  
V8 'CREDIT CARD FOR GAS' 
V9 'MOST COMMON GAS PAYMENT' 
V10 'SECOND GAS PAYMENT CHOICE' 
V11 'HOW OFTEN GAS CREDIT CARD PAID' 
V12 'HOW FAR AWAY FROM 7-11' 
V13 'AGE' 
V14 'SEX' 
IF (V2 LE 1) NEWVAR=1
IF ((V2 GT 2) AND (V4 LE 3)) NEWVAR=2
IF ((V2 GT 5) AND (V4 GT 3)) NEWVAR=3
COMPUTE V31 = COS(0.5+0.5*2)
COMPUTE V32 = LG10(V1)
MISSING VALUES V1 TO V12,V14 TO V29(0) 
RECODE V13 (0 TO 17=1) (18 TO 23=2) (24 TO 29=3) 
(30 TO 39=4) (40 TO 49=5) (50 TO 64=6) 
(65 TO 72=7)/ V14 (4=0) (6=0) 
SELECT IF ( V14 EQ 2)


The Data File

The data file contains the data in the format described in the Command File. The data files are usually named with a "DAT" extension (i.e., *.DAT). The example command file above identifies the data file as 'SAMPLE.DAT'. The data file is specified in line 2 of the command file (the FILENAME command).

(Note that the .DAT designation is used for instructional clarity only. The data file may have any name and does not, in reality, require the .DAT extension).


SAMPLE PC-MDS DATA FILE FOR DISCRIM, FACTOR, REGRESS OR FREQ

533351221103241110                         41112307 
4224432141111721 5              16         41122308 
3233532212122011 5                         41121309 
544342221505271110                         41112310 
5423511251013222 5           16            41122311 
452342125101291311      6                  41121312 
322343215111311312     11                  41121313 
5544532115513113 6           32            41121314 
4335524111312913107    12                  41121315 
1111542011143021 5      6                 541121316 
2334421112223613208                        41122317 
4244522151413014115     7  3    2         541122318 
342342121502261215                         41122319 


The Output File

An output file name must be interactively specified by the user while running each of the PC-MDS programs. The output file is the file to which the analysis is printed. A common convention is to name the file with a "PRN" extension to signify a print file (i.e., *.PRN).


CONSTRUCTING A COMMAND FILE FOR PC-MDS

The COMMAND file contains information that defines the variables to be included in the analysis. The command language is very similar to that employed in the mainframe version of the SPSS (c) computer program. The major difference is that statistical options for running each specific program are specified interactively rather than in the command file. For those familiar with SPSS, the COMMAND file employs the command options: PRINTBACK, TITLE, FILE NAME, DATA LIST, VARIABLE LABELS, MISSING VALUES, RECODE, COMPUTE, IF, SELECT IF, and WEIGHT. The following section describes the syntax for each command.

1. PRINTBACK (OPTIONAL)

Use: To print the command file as part of the output file. The default is PRINTBACK YES

PRINTBACK NO (Does not print command file)

2. TITLE (REQUIRED)

Use: To specify a title.

(The word TITLE followed by one or more blank spaces and a title (50 Characters maximum length). For example:

TITLE MATURE MARKET STUDY

3. FILE NAME (REQUIRED)

Use: To specify the name and location of the data file.

Specify the complete path for reading the data file. The path and file name must be placed in single quotes. The entire path and file name is limited to 50 characters. Generic example of the FILE NAME command includes:

FILE NAME 'C:\SUB1\FILENAM.EXT' or

FILE NAME 'FILENAM.EXT'

In the first example, the data file is called FILENAM.EXT and is found in the subdirectory called SUB1 within the C drive. In the second example the data file is found in the default directory and is called FILENAM.EXT.

4. DATA LIST (REQUIRED)

Use: To specify the variables to be read in from the data file, as well as the column in which each variable is found in the data file.

The DATA LIST command identifies the variables included in the data file. The data file name is specified by the FILE NAME command. The DATA LIST has two parts. The first part specifies the list of variables. The second part specifies how the data is to be read. The list of variable appears:

DATA LIST V17,V28 TO V37,V40,V41,V49,V108 TO V116,

V145,V146,V148,CLUS1

V149,V151,V152,CLUS2

V155,V156,V158,CLUS3

14 (34X,F1.0,10X,9F1.0,F2.0,3X,2F1.0,7X,F1.0)

9 (57X,9F1.0)

4 (25X,2F1.0,2X,F1.0/19X,F1.0)

The words DATA and LIST must be separated by a space and followed by a space. The DATA LIST specification of variables can continue on to several lines of the file, as long as the first columns are blank. The variable names may be up to 8 characters long and must start with a letter (Variable names cannot start with a number). The "TO" convention may be used to specify a continuously numbered sequence. For example, V28 TO V37 is used to specify 10 variables. The word "TO" must be preceded and followed by a blank space. Up to 500 total variables may be defined in the Command file using compute or IF statements. The maximum number of variables that can be included in a given analysis varies by analysis. For example, 50 of the 500 defined variables may be specified interactively for inclusion in the DISCRIM, FACTOR, and REGRESS programs.

The second part of the DATA LIST is the input format for reading each of the variables identified in the first part. The format specification may be up to 15 lines in length (This is 15 lines of specification, not 15 lines per respondent in the data file). Each line begins in column 1 by indicating the number of variables to read. This number is followed by one or more blank spaces and then the FORTRAN format statement for reading the number of variables specified. Note that each format line may specify more than one line of data in the data file. The format statement in each format line is limited to 120 characters in length. Examples of valid statements include:

14 (34X,F1.0,10X,9F1.0,F2.0,3X,2F1.0,7X,F1.0)

9 (57X,9F1.0)

4 (25X,2F1.0,2X,F1.0/19X,F1.0)

The example is a valid DATA LIST sequence for reading 27 variables from 4 lines in the data file. Note that the last format specification reads from two lines within the data file (the "/" causes a skip to the next line of data).

Preparing A Format Statement: Format statements tell PC-MDS how the data file is to be read for a single observation or respondent. Format statements are prepared using the standard FORTRAN language coding conventions. These conventions are quite simple and easy to learn and have previously been discussed in this manual.

5. VARIABLE LABELS (OPTIONAL)

Use: To identify labels defining the variables.

Contains a label (up to 40 characters) identifying variables from the DATA LIST.

Labels must follow a valid variable name. At least one space must appear between the variable name and the label. The label must be enclosed in single quotes. The variable names must start in column two or beyond. VARIABLE LABELS may be declared for new variables after they have been created using COMPUTE or IF statements.

VARIABLE LABELS

V17 'OVERALL HEALTH CONDITION'

V28 'ABILITY TO PERFORM EVERYDAY ACTIVITIES'

V29 'ABILITY TO...TRANSPORTATION'

V30 'ABILITY TO...HOUSE CLEANING'

CLUS 'CLUSTER GROUPING'

6. MISSING VALUES (OPTIONAL)

Use: To specify variable values that are missing or are to be otherwise excluded from the analysis.

The missing values are used to specify those variables for which values are deemed to be missing. The convention used to specify MISSING VALUES is to list the variable names using either individual names, or a "TO" argument which specifies a range of variables for which values are missing. The variable list is followed by a list of the values to be declared as missing.

The keywords "LO" and "HIGH" are valid operators. The slash "/" is used to separate the variable lists having different values declared as missing. The values to be declared as missing are identified within parenthesis. If multiple values are declared as missing, then they must be separated by commas. An example might appear:

MISSING VALUES V17,V28 TO V37,V40,V41,V49,V108 TO V116,V145,V146,V148,CLUS(0)/

V28(6)/V145(8)/

V29 TO V36,V40,V41(4,5)

MISSING VALUES V29 (0)/V21(1110)/V1 TO V19 (5,55)

Note that the last missing value specification must NOT end with a slash. Respondent data that has missing values may be treated for LISTWISE or PAIRWISE deletion, depending on the analysis conducted.

PAIRWISE deletion means that if the value is missing (for the dependent and/or independent variable) then the respondents data for the independent variable in question is eliminated from the analysis.

LISTWISE deletion means that the entire case is excluded from analysis for all variables.

PAIRWISE OR LISTWISE deletion is specified interactively by the user in all bivariate or multivariate programs. Several programs use only LISTWISE deletion, because pairwise deletion does not provide complete information required for the analysis. In this case, the LISTWISE option is invoked automatically if missing values are declared.

7. IF (OPTIONAL)

Use: To specify conditional relationships between variables.

The IF command specifies a conditional relationship between the variables. The IF command may be used to assign values to existing variables or to new variables that are created automatically when defined in the IF statement. Variables may be transformed using the IF statement. Mathematical expressions are supported on the right side of the equality expression. Examples of general form include:

IF (VAR1 EQ 3) VAR2 = 1

IF (VAR1 EQ 3) VAR2 = VAR5

IF (VAR1 EQ V2) VAR3 = COS(VAR4*.5)

IF ((VAR1 EQ 3) AND (VAR2 EQ 4)) VAR3 = 5

IF ((VAR1 EQ VAR3) OR (VAR2 EQ 22)) VAR4 = 25

The command word IF must begin in column 1 of the command file and be followed by a blank space.

Where one of the following RELATIONAL OPERATORS is used to specify the IF relationship:

EQ EQUALS

GT GREATER THAN

LT LESS THAN

GE GREATER THAN OR EQUAL

LE LESS THAN OR EQUAL

NE NOT EQUAL

AND TWO CONDITIONS HOLD

OR EITHER CONDITION HOLDS

8. COMPUTE (OPTIONAL)

Use: To unconditionally compute values for new or existing variables.

The compute statement allows the user to create new variables or compute intermediate values for each case read by the program. The COMPUTE statement may be used to compute new variables not defined in the DATA LIST or to transform current variables. The COMPUTE statement may be used with any of the relational operators. The following 14 examples would each begin in the first column of the line.

COMPUTE V1 = 20*0.5 COMPUTE V2 = 81**.5

COMPUTE V3 = ABS(12) COMPUTE V4 = ABS(V1-V2)

COMPUTE V5 = TRUNC(2.345) COMPUTE V6 = TRUNC(5.95)

COMPUTE V7 = 256 COMPUTE V8 = 50

COMPUTE V9 = SQRT(V8) COMPUTE V10 = LG10(V9)

COMPUTE V11 = ARSIN(0.14) COMPUTE V33 = 0

COMPUTE V29 = COS(0.5+0.5*2)

MATHEMATICAL AND FUNCTION OPERATORS

KEYWORD

MEANING EXAMPLE
ABS(value) Absolute Value VAR3=ABS(V2-V1)
ARCOS(value) Arc Cosine VAR3=ARCOS(V2-V1)
ARSIN(value) Arc Sine VAR3=ARSIN(V2-V1)
ARTAN(value) Arc Tangent VAR3=ARTAN(V2-V1)
COS(value) Cosine VAR3=COS(V2-V1)
EXP(value) Exponentiation VAR3=EXP(V2-V1)
LG10(value) Log Base 10 VAR3=LG10(V2-V1)
LN(value) Natural Logarithm VAR3=LN(V2-V1)
MOD(value,value) Remainder V1/V2 VAR3=MOD(V1,V2)
RND(value) Round to whole # VAR3=RND(V2-V1)
SIN(value) Sine VAR3=SIN(V2-V1)
SQRT(value) Square Root VAR3=SQRT(V2-V1)
TAN(value) Tangent VAR3=TAN(V2-V1)
TRUNC(value) Truncate VAR3=TRUNC(V2-V1)
/ Division VAR3=V1/V2
* Multiplication VAR3=V1*V2
'+ Addition VAR3=V1+V2
- Subtraction VAR3=V1-V2
** Exponentiation VAR3=V1**V2

NOTE: For any of the above COMPUTE relational operators, you may substitute VARIABLE NAMES or VALUES for V1 and V2.



9. RECODE (OPTIONAL)

Use: To combine or re-assign data values for defined variables.

The RECODE statement is used to recode data values. A variable list is specified, followed by one or more value lists specifying the values to be recoded and the value to which they are to be recoded. The slash "/" operator separates the different variable lists. The keywords LOW, TO and HIGH are valid arguments.

RECODE V5,V7 TO V10 (LOW TO 10 = 55) (LOW TO 20 = 66) /

V12 TO V15 (1 TO HIGH = 5) /

V18 (2,3,4 TO 10 = 6) /

V19 TO V20 (1,2,3 = 7) (4,5,6= 8)

The last list in the recode statement must NOT end with a slash.

10. WEIGHT (OPTIONAL)

Use: To weight the sample size used in computation of statistics.

The WEIGHT statement is used to weigh the sample size for all variables included in the analysis. The value used for weighing is the value of the variable declared as the weight variable. For example, if VAR1 is declared as the weight variable, then for each respondent, the respondent's value on VAR1 is multiplied by 1.0 and is the increment in the sample size for the individual respondent, when computing the statistics for variables defined in the analysis. Weights are most often used for adjusting sampling proportions. The form of the statement is:

WEIGHT VAR1

11. SELECT IF (OPTIONAL)

Use: To select or exclude individual respondents data.

The SELECT IF statement is used to select or include an entire respondent's data for analysis. The statement is of the same form as the IF statement, but does not assign a mathematical value. The assignment results in an internal 0 or 1 for including the case in the analysis. The proper form is:

SELECT IF (VAR2 EQ 2)

The statement must include a variable name, one of the relational operators (EQ GT LT GE LE NE) and then either a value or another variable name.