HOW TO USE AID


DEVELOPED BY THE INSTITUTE FOR SOCIAL RESEARCH
UNIVERSITY OF MICHIGAN (1)


ABSTRACT

The AID (Automatic Interaction Detector) program is useful in studying the interrelationships among a set of up to 60 variables. Regarding one of the variables as a dependent variable, the analysis employs a non-symmetrical branching process based on variance analysis techniques to subdivide the sample into a series of subgroups which maximize one's ability to predict values of the dependent variable. Linearity and additivity assumptions inherent in conventional multiple regression techniques are not required.


PROGRAM DESCRIPTION

Essentially, AID is something like a "stepwise regression program" where the independent variates (predictors) need not be quantitative; i.e., one can have either quantitative categories such as those for age or income or "qualitative" categories such as those for sex, marital status, cause of death, or political preference. Also with aid, the quantitative predictors can be categorized into intervals of unequal length (e.g., incomes under $15,000, $15,000-24,999, $25,000 - $49,999, etc.) or non-ordinal categories (e.g., ages 25-54, 20-24, or 55-64 and 65 and over).

The AID program operates by finding that dichotomy based on any predictor which gives the lowest within-group sum of squared deviations for the dependent variable, Y. Essentially this is the dichotomization which "accounts for" more of the variance of the dependent variate (i.e., has a larger "correlation" with the dependent variate) than any other dichotomization based on grouping the categories of a single predictor into two groups.

Having made this first dichotomy, the AID program then takes the "ineligible" group with the largest within group sum of squared deviations for Y and "splits" it in a similar manner. A group is "eligible" for splitting if it has at least the specified number of cases (RMIN) and a within group sum of squared deviations at least as great as a specified proportion (P1) of the original sum of squared deviations. Splits will be made only if the within group sum of squared deviations (WGSSD) is reduced by some minimum proportion of the total sum of squares specified by the parameter P2. The process of dichotomizing groups continues until there are no eligible groups which can be split, to yield the specified minimum WGSSD reduction or until some specified maximum number (MAXGRP) of groups is created.


OUTPUT
	The output from the PC-MDS AID program includes:
    1.  A listing of all parameters used in the run as specified by the user in the setup file.

    2.  The number of cases having valid data on the dependent variable, sum of weights,
        sum of the dependent variable Y, sum of Y squared, the total sum of squares, 
        group mean and the standard deviation of the group mean are printed for the first parent group 
        which represents the entire sample of valid data.

    3.  Within each parent group, each predictor or independent variable will be listed with its 
        respective "field identification number" and name.  the following statistics will be produced 
        for each "class" defined by its code: 
		
		(a) the number of cases in the class, 
		(b) sum of weights, 
		(c) sum of Y, 
		(d) sum of Y squared, 
		(e) between sum of squares, 
		(f) class mean, and 
		(g) standard deviation.  
		
        A listing of the codes defining the best possible partition of the parent group based on that 
        predictor is presented along with associated BSS and BSS/TSS.
		
    4.  After the "splitting" of the parent group based on the criteria provided by the user 
        has taken place, the predictor used in the "split" is identified along with the 
        definitions of the two new eligible groups, or the "Candidate Groups".  
        A list of all "Candidate Groups" and their respective group statistics will 
        then appear on the output.  

    5.  In the event that one or both of the new groups do not satisfy the criteria set forth 
        by the user for eligibility as a candidate for splitting, appropriate comments will 
        appear declaring the status of that group (or groups).  Residuals, if requested by the user,
        will be produced for each final group.  They will appear in the following format:

        a.  Cols. 1-3 = the final group number of the observation, or "0" if the observation 
            was omitted from the run in the initial output pass (see TYPE 05 input for exclusions).
        b.  Cols 4-9 = the identification number as defined on the TYPE 03 input.
        c.  Cols 10-15 = the record weight.
        d.  Cols 16-24 = the residual (100 times the deviation of the dependent variate value of 
            Y from its final group mean).
				
    6.  All error comments pertaining to the setup will also appear in the output file.


PROGRAM RESTRICTIONS 1. Maximum number of input variables allowed is 60, minimum is 2. 2. Minimum number of valid cases in sample allowed in the analysis is 50. 3. Maximum number of groups (MAXGP) to be created may not exceed 90, and must be greater than or equal to 1. 4. Maximum number of classes or resulting Recode categories for all predictors taken together must not exceed 1000. Each "specified category" counts as 2 toward the 1000 limit, since a "specified category" requires both a lower and an upper limit. 5. 0 < = p1 < = p2 < = 1.0 6. Maximum number of predictors = 39. 7. Since all data is input under the I format specification, blanks in the data will be read as though they are zeros.
INSTRUCTIONS FOR USE OF THE AID PROGRAM The AID program begins by requesting file information. Specifically, the user must input: (1) The analysis title (2) The name of the input file which contains the AID control commands (3) The name of the input file which contains the AID data, (4) The name of the output file containing printed output, and (5) The name of the output file to which residuals are to be written. Following this input, the AID program retrieves the AID control file and begins to process the instructions.


CONTROL FILE SETUP


	INPUT TYPE 02:  INPUT FORMAT STATEMENT

	Input type 02 is reserved for the input format statement that is used to read the 
	input data file.  The input statement 02 is arranged as follows:

	Cols 01 - 02 = 02 (The input type must be entered as 02)

	Cols 03 - 80 =	The fortran format statement to be used.  
	In the example, the statement for input type 02 appears:

		02 (I4,3I3,2I2,I5) 

		ONLY ONE INPUT TYPE 02 LINE MAY BE USED

INPUT TYPE 03: SPLIT LIMITATION SPECIFICATION Input type 03 is used to specify the split limitations to be used in the AID analysis. All entries for input type 03 must be entered in a "RIGHT JUSTIFIED" format within the respective field. Example: 03 7999990100001000 90 25 1 Cols 1 - 2 INPUT TYPE: Must be 03. Cols 3 - 5 = NV, the number of variables used as input in the run. (In the sample data set, NV = 7) Cols 6 - 10 = NC, the number of cases in the input file. If number if not known, enter 99999 and if only the first K records or cases are to be used in the analysis, enter K in this field. Cols 11 - 15 = P1, the percent of the total sum of squares that must be contained in any group if that group is to become a candidate for splitting. For example, if one percent was specified, the user would enter "01000". Note that the decimal point is not entered in the field, but is assumed to be to the left of the field. Other values may be entered at the user's option. The recommended value is one percent. The range is .00001<=P<1.0. Cols 16 - 20 = P2, the best split on the i'th candidate group must reduce the unexplained sum of squares by P2 proportion of the total sum of squares or that group will not be split, and it will not become a candidate group even though it may meet the P! requirement above. The range is 0 < P1 <= P2. As specified above, the decimal point is assumed to be to the left of the field. The recommended value for samples were 1000 < = NC < = 5000 is .006. Increase this up to .01 when NC drops to 200. Cols 21 - 25 =MAXGRP, the maximum allowable number of groups which at any point in the process are eligible for split attempt. The splitting process will stop regardless of P1 or P2 when the sample has been divided into MAXGRP number of unsplit groups. MAXGRP may not be larger than 90. Cols 26 - 30 =NMIN, the number of observations that must be contained in both resultant groups if a particular BSS is to be considered a possible basis for making a split. For example: NMIN=15 requires that both resultant groups from the split have at least 15 observations in them. A split which would result in one or both of the resultant groups having an N or less than NMIN is not considered by the algorithm as a valid split attempt basis, even if it would result in a partition otherwise meeting the criteria. A group which contains less than NMIN*2 observations is therefore not eligible for further splitting. Normally, this parameter should be de-activated by setting it to 2. Cols 31 - 35 Blank or 1 (INPUT OUTPUT CONTROL VALUE) Cols 36 - 40 =NR, the field number for the identification variable to be used i the residual output. Note: if KG>=0, i.e., there are to be residuals, then a non-zero value for NR must be supplied. Cols 41 - 50 =MDOUT, missing data code to be used as residual output for all observations excluded from analysis. (Leave blank if residuals are not requested.) Note: if MDOUT is blank or zero, the residual for Missing data would be 100*Y, where Y represents the value of the dependent variable (Optional).

INPUT TYPE 04: PREDICTOR SPECIFICATION

The user must supply information to AID telling it which of the input variables are to be used as predictors. Any input variable may be used as a predictor provided it is specified in the Input Format (TYPE 02 INPUT) as a field to be read in. Predictors must be positive or negative integers within a range of -99999 to 999999. The program will recode these specified by the user (see below) into categories numbered 00,01,...39. The maximum recode value is 39 (i.e., the total number of recode values for any one predictor may not exceed 40). It is recommended that the number of classes not exceed 7.

Two methods for specifying recodes are provided:

1. Equal Interval - A minimum value, a maximum value, and an interval length are specified. The range between the minimum and maximum values is divided into recode categories of the specified interval length. Values below the specified minimum plus the interval length are given the recode value of 00. Values equal to or greater than the maximum value are given a recode value of 1 greater than the recode for the interval just below the maximum; for example, if for the predictor named age, the minimum is specified as 10, the maximum as 75 and the interval length as 5, the result would be:
                      AGE        RECODE
               Less than 15          00
                    15 - 19          01
                    20 - 24          02
                    25 - 29          03
                    30 - 34          04
                    35 - 39          05
                    40 - 44          06
                    45 - 49          07
                    50 - 54          08
                    55 - 59          09
                    60 - 64          10
                    65 - 69          11
                    70 - 74          12
                75 and over          13


Provision is also made for reassigning up to 3 values of the predictor to some specified recode. For example, unknown ages may have been entered as 99, but they are to be assigned a recorded value of 43 (the median j). Also there might be a class coded as -1 such that the exact age was unknown but these persons were known to be adults at least 50 years of age, and it seems desirable to treat them as in the 60-64 year old interval; then, a recode value of 10 could be specified for all ages known as -1. (The discussion of specification of equal interval recodes follows this section.)

2. Specified Categories Intervals of unequal length may be used. As with the equal interval specification, a lower limit, V(1), is given. All values less than V(1) are assigned the recode associated with the lower limit. A V(2) is given: all values equal to or greater than V(1) but less than V(2) are assigned the second recode, and so on. Note that the upper limit for the ith code forms the lower limit for the i+1 recode. The cut points specified by the V's must be in ascending order. The final V value must be 999999, satisfying that the immediately previous V will be taken as the lower limit of the last recode category; all values equal to or greater than the last V value before the 999999 will be given the last recode specified.

For example, given the variable Marital Status with the following codes:
          Original Code        Category
               0               Not Known
               1               Single
               2               Married
               3               Widowed
               4               Separated
               5               Divorced
               6               Common-law marriage


If we wished to collapse the original codes to form four groups - (single), (not known, married, common-law marriage), (widowed), and (separated, divorced) - we would perform the recoding as follows:
          Original code(s)                         Recode
          to be recoded        i    V(i)         (new value)
               0               1     1               1
               1               2     2               0
               2               3     3               1
               3               4     4               2
               4,5             5     6               3
               6               6     999999          1


The first V specified is 1, which results in all original values less than 1 to be recoded to the specified recode 1: all original 0's will be recoded to 1's. The second V is 2. This causes all values equal to or greater than the first V, (which was 1), but less than 2 to be recoded to the specified value of 0: all original 1's will become 0's. The third V is 3: all original 2's will be recoded to 1's. The fourth V will result in all original codes greater than or equal to 3 but less than 4 (all original 3's) being recoded to 2's. The fifth V, V(5), will cause all values equal to or greater than 4, (V(4)), but less than 6, (V(5)), to be recoded to 3's. The last V, V(6), is coded 999999. This signifies that all original values equal to or greater than v(5), the value 6, are to be recoded 1's.
          Includes 
		  Original 
		  Categories  Recode Category   V(i-1) <=  CODE < V(i) 
               1          1     2          Single          0
                          0     1          Not Known       1
               2          2     2          Married     
               6          6     999999     Common Law
               3          3     4          Widowed         2
               4          4     6          Separated       3
               4          5     6          Divorced


Another example would be recoding income by intervals of $2000 up to $9999 and by $5000 from $10,000 to $24,999. Incomes to $25,000 and over are to be another class and incomes of zero and negative values (including cases that are unknown) are also a separate class. We would specify:
          Includes Values      V(i)       Recode
          < = 0                   1          0
          1 - 1999             2000          1
          2000 - 3999          4000          2
          4000 - 5999          6000          3
          6000 - 7999          8000          4
          8000 - 9999         10000          5
          10000 - 14999       15000          6
          15000 - 19999       20000          7
          20000 - 24999       25000          8
          25000 - and over   999999          9


In the AID program, predictors can be specified as "monotonic" or "free". For "monotonic" predictors, splits will be made so that any value of the recodes in one group is less than every value in the other group. For a "free" predictor this restriction would not apply. Thus, if the income variable shown above were specified as monotonic, the possible splits would be:
          Group A Recodes          Group B Recodes
               0                    1 thru 9
               0,1                  2 thru 9
               0,1,2                3 thru 9
               0,1,2,3              4 thru 9
               ...                     ...
               0 thru 8                8,9
               0 thru 9                  9


In the above splits, every group A recode is less than any recode in Group B. If the income variable were specified as free, one could get splits such as Recodes 0,1,7,9 in Group A and Recodes 2 thru 6 and 8 in Group B, where some of the Group A recodes are less than 2, which is in Group B and some are greater than 2.

PREPARATION OF THE TYPE 04 PREDICTOR SPECIFICATION

Type 04 Predictor specification is prepared in the following format:

     Cols  1-2     = Specification Type (Must be 04)

     Cols  3 - 20     = Name of Predictor (e.g., Race, Age, Cause of Death, Education, etc.)

     Cols 21 - 23     = Field number of predictor (see type 02 line)

     Cols 25          = 0, for "Free"
                      = 1, for "Monotonic"

     Cols 26          = 0, for Equal Interval
                      = 1, for Specified Categories
THE REMAINDER OF THE PREDICTOR SPECIFICATION DEPENDS UPON THE CODE IN COLUMN 26.

     Equal Interval (0 in Col 26)
     Cols 27 - 32     Minimum Value
     Cols 33 - 38     Maximum Value
     Cols 39 - 44     Interval Length
     Cols 45 - 47     Specified recode (or output value)-- enter -1 (minus 1) if no recode is specified
     Cols 48 - 53     Input value to be given recode result specified in Cols 45 - 47
     Cols 54 - 56     Recode--same as Cols 45-47
     Cols 57 - 62     Value--same as Cols 48-53
     Cols 63 - 65     Recode--same as Cols 45-47
     Cols 66 - 71     Value--same as Cols 48-53


NOTE: When using equal intervals, the minimum and maximum values (Cols 27 - 32 and Cols 33 - 38) plus the interval length (Cols 39 - 44) are used to compute the number of output predictor codes. Hence, the number of unique codes or classes is computed as follows:
     MAX - MIN      =     Range over which the interval length is to be applied.     
     C              =     Number of output classes 
                    =      (Range/Interval Length) +1


     Specified Categories (1 in Col 26)
     Output                    Input
     Recode Value in          Corresponding V(i) Value in
     Cols 27-29               Cols 30-35
     Cols 36-38               Cols 39-44
     Cols 45-47               Cols 48-53
     Cols 54-56               Cols 57-62
     Cols 63-65               Cols 66-71


The last V(i) value must always be 999999. If a predictor with Specified Categories requires more than 5 values to cover the range, additional lines with duplicate entries in Cols 1 - 26 should be prepared.

NOTE: As indicated above, the recodes for any predictor with either Equal Interval or Specified Categories must be between 00 and 39, so that the number of recode categories is <= 40. In addition, the total number of recode categories for all predictors combined must be < = 1000.

EXAMPLE DATA SET:

04SEX                 6 00     0     9     1 -1 
04AGE                 3 10    10    80    10  8    91  9    93 -1 
04AGE AT MARRIAGE     2 01  0    17  1    18  2    20  3    21  4       26
04AGE AT MARRIAGE     2 01  5    31  6999999 

Type 05 input--Dependent Variable Line
     Cols 1 - 2          = Input type (Must be 05)

     Cols 3 - 20          = Name of Dependent Variable

     Col 21 - 23          = The field number of the variable Y to be used in the analysis.  
	                        For example, if the dependent variable is the 8th variable in the 
							format (type 02 input), then enter 008 here.  
                            Restriction:  1 < = Y < = NV

     Cols 24 - 26          = The field number of the variable to be used in the weighting of the data.
                             If a weight variable is used, enter the field number here.  
                             Enter 000 if the cases are not to be weighted.  
                             If a weight field is specified (WT>0), the values of the variable must 
                             contain integers between 1 and 99999.  Restriction: 0 < = WT < = NV

     Cols 27 - 32          = Ymax, the maximum value for the dependent variable allowed in 
                             the analysis.  Hence, any observation whose dependent variable 
                             has a value algebraically larger than YMAX will be read, 
                             but not used by the program in the analysis.  
                             NOTE:  If you do not wish to use YMAX as a screening value, 
                             then punch the actual maximum value of the dependent variable for YMAX.

     Cols 33 - 38          = YMIN, the minimum value of the dependent variable allowed in the analysis. 
                             The same restrictions for YMAX apply here.  NOTE: If one does not wish to 
                             use YMIN for screening, then enter the minimum value of the dependent 
                             variable here.

     Cols 39 - 44          = MD1, first "missing data" code for the dependent variable.  
                             Will cause any observation with the dependent variable equal to MD1 
                             to be deleted from the analysis.  If you do not wish to use MD1, 
                             then enter a value larger than YMAX or smaller than YMIN.

     Cols 45 - 50          = MD2, Second "missing data" code.  Specify this code as above using the 
                             appropriate value.
CONTINUE COMMAND The CONTINUE command specifies another analysis to be performed on the data. It is followed by a full set of control commands.
 AID CONTROL FILE FOR THE SAMPLE DATA SET


02 (I4,3I3,2I2,I5) 
03  7999990100001000   90   25    1    1 
04SEX                 6 00     0     9     1 -1 
04AGE                 3 10    10    80    10  8    91  9    93 -1 
04AGE AT MARRIAGE     2 01  0    17  1    18  2    20  3    21  4    26 
04AGE AT MARRIAGE     2 01  5    31  6999999 
05NO. OF CHILDREN   004  7    98     0999999999999 


SAMPLE AID DATA SET
0010 25 33 00 0 0 0050 1     
0010 28 56 03 1 0 0100 1 
0020 19 62 08 2 1 0250 2 
0020 40 99 11 3 1 0500 1 
0020 31 91 02 4 1 1000 1 
0031 17 20 01 5 0 1000 2 
0032 19 25 03 6 0 0500 0 
0032 20 39 04 7 0 0250 1 
0100 22 36 02 8 1 0100 2 
0100 25 50100 9 1 0050 1 
0201 17 65 05 1 0 0025 1 
0202 26 37 02 1 1 0050 1 
0203 19 21 01 1 0 0075 2 
0204 26 52 04 1 1 0100 2 
0205 22 31 03 1 1 0500 2 
0206 18 65 00 1 0 1000 0 
0207 43 61 00 1 0 0100 1 
0208 19 46 02 1 0 0250 1 
0207 33 84 00 1 0 0200 1 
0208 24 35 00 1 1 0050 2 
0209 28 63 06 1 0 0400 2 
0211 53 55 00 0 0 0020 1 
0212 36 45 01 1 0 0200 1 
0213 72 73 00 1 1 0100 1 
0215 42 75 05 9 0 0050 0 
0216 34 52 03 9 1 0010 1 
0217 24 53 07 9 0 1000 1 
0218 19 45 07 9 0 0045 2 
0220 19 31 04 9 1 0040 2 
0221 23 34 01 1 0 0100 2 
0222 27 34 02 9 1 0600 1 
0223 26 41 07 9 0 1000 1 
0224 25 39 02 1 0 0025 2 
0225 23 37 09 9 1 0200 2 
0226 17 38 08 9 1 0200 2 
0227 27 51 03 9 1 0300 2 
0228 15 50 10 9 0 0050 2 
0229 18 56 09 5 1 0400 1 
0230 45 56 00 9 1 0020 2 
0231 55 75 00 9 1 0200 1 
0232 28 37 01 9 0 1000 1 
0233 25 46 03 9 0 0020 2 
0234 27 56 03 9 1 0010 2 
0235 28 59 02 9 1 0500 2 
0236 28 39 01 9 0 0200 1 
0237 34 63 05 9 1 0400 1 
0238 21 49 02 9 0 0400 2 
0239 17 49 08 9 1 0700 1       LAST DATA LINES ARE:
0240 18 48 02 9 1 0400 1       0244 23 33 00 9 1 0200 2 
0241 63 89 00 9 1 0300 2       0245 26 78 08 9 1 0900 1 
0243 25 89 02 0 0 0500 2 
  (CONTINUED)

  

SAMPLE AID OUTPUT
          AUTOMATIC  INTERACTION  DETECTOR 
   DEVELOPED BY THE INSTITUTE FOR SOCIAL RESEARCH 
             UNIVERSITY OF MICHIGAN 
                PC-MDS VERSION      

 AID TEST DATA
 
 INPUT FORMAT IS: 
  (I4,3I3,2I2,I5)                                   
 
 NO. OF RECORDS        99999 
 NO. OF FIELDS             7 
 IDENT. IN FIELD NO.       0 
 RESIDUALS IN FILE:        RESAID.OUT 
 
 P1 =  0.01000    P2 =  0.01000    MAXGP =   90    NMIN =   25    TLEV =  0.00 
 
 
          VARIABLE-FIELD NO.   RECODE  CORRESPONDS TO 
 
  1  SEX                  6        0    LESS THAN      1 
                                   1         1 TO      1 
                                   2         2 TO      2 
                                   3         3 TO      3 
                                   4         4 TO      4 
                                   5         5 TO      5 
                                   6         6 TO      6 
                                   7         7 TO      7 
                                   8         8 TO      8 
                                   9         9 OR OVER 
 
  2  AGE                  3        0    LESS THAN     20 
                                   1        20 TO     29 
                                   2        30 TO     39 
                                   3        40 TO     49 
                                   4        50 TO     59 
                                   5        60 TO     69 
                                   6        70 TO     79 
                                   7        80 OR OVER 
                                   8          91 
                                   9          93 
 
  3  AGE AT MARRIAGE      2        0    LESS THAN     17 
                                   1        17 TO     17 
                                   2        18 TO     19 
                                   3        20 TO     20 
                                   4        21 TO     25 
                                   5        26 TO     30 
                                   6        31 OR OVER 
 DEPENDENT VARIABLE-Y IN FIELD NO.   4 IS  NO. OF CHILDREN    
 
 YMAX =       98             EXCLUDE          999999  999999 
 YMIN =        0             WEIGHT IN FIELD NO.   7 
 
 
 
 DO NOT SPLIT UNLESS TSS IS MORE THAN      0.16965250E+04  AND 
                     BSS IS MORE THAN      0.16965250E+04. / 
 VALID CASES:      52. 
 
  GROUP 1 VALUES: 
 
    GROUP   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE 
      1     52   0.16590000E+05   0.62995000E+05   0.40885500E+06 
 
                      T S S            MEAN          STD. DEV. 
                 0.16965250E+06   0.37971670E+01   0.31978410E+01 
 
 
 ******************************************************************************* 
 
    AID TEST DATA                                                                  
    ** STEP NO.=  1       PARENT GROUP =  1 ** 
 
 
  TRY ON PREDICTOR  1   SEX                
  CODE   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          B S S 
    0    26   0.85600000E+04   0.25375000E+05   0.13833500E+06   0.12265450E+05 
    1    26   0.80300000E+04   0.37620000E+05   0.27052000E+06   0.00000000E+00 
 
  CODE   N       MEAN            STD. DEV. 
    0    26   0.29643690E+01   0.27153540E+01 
    1    26   0.46849310E+01   0.34263810E+01 
 
  MAX. BSS=    0.12265450E+05   BSS/TSS =   0.07230   BETWEEN  
  CODES       0 
  AND CODES   1 
 
  TRY ON PREDICTOR  2   AGE                
  CODE   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          B S S 
    1     3   0.15750000E+04   0.25750000E+04   0.55750000E+04   0.81360180E+04 
    2    15   0.35650000E+04   0.89100000E+04   0.42540000E+05   0.18187500E+05 
    3     8   0.30150000E+04   0.15275000E+05   0.10058500E+06   0.42663240E+04 
    4    11   0.25100000E+04   0.13760000E+05   0.93780000E+05   0.13182830E+00 
    5     6   0.21750000E+04   0.65250000E+04   0.41025000E+05   0.10082350E+04 
    6     4   0.12500000E+04   0.74500000E+04   0.58850000E+05   0.46432280E+03 
    7     4   0.15000000E+04   0.65000000E+04   0.62500000E+05   0.34369750E+04 

  CODE   N       MEAN            STD. DEV. 
    1     3   0.16349210E+01   0.93097660E+00 
    2    15   0.24992990E+01   0.23845720E+01 
    3     8   0.50663350E+01   0.27737660E+01 
    4    11   0.54820720E+01   0.27035970E+01 
    5     6   0.30000000E+01   0.31403930E+01 
    6     4   0.59600000E+01   0.33997650E+01 
    7     4   0.43333330E+01   0.47842340E+01 
 
  MAX. BSS=    0.42663240E+04   BSS/TSS =   0.02515   BETWEEN  
  CODES       0  1  2  3 
  AND CODES   4  5  6  7 
 
 
   TRY ON PREDICTOR  3   AGE AT MARRIAGE    
  CODE   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          B S S 
    0     2   0.10500000E+04   0.25000000E+04   0.90000000E+04   0.22482300E+04 
    2     9   0.29600000E+04   0.89500000E+04   0.58420000E+05   0.46906520E+04 
    6    13   0.31000000E+04   0.99800000E+04   0.76040000E+05   0.76303650E+04 
    4    12   0.31450000E+04   0.12510000E+05   0.74080000E+05   0.63840390E+04 
    3     1   0.25000000E+03   0.10000000E+04   0.40000000E+04   0.63572070E+04 
    5    12   0.51600000E+04   0.21730000E+05   0.13209000E+06   0.90572440E+04 
    1     4   0.19250000E+04   0.83250000E+04   0.59225000E+05   0.00000000E+00 
 
 
  CODE   N       MEAN            STD. DEV. 
    0     2   0.23809520E+01   0.17036710E+01 
    2     9   0.30236490E+01   0.32548480E+01 
    6    13   0.32193550E+01   0.37636140E+01 
    4    12   0.39777420E+01   0.27807220E+01 
    3     1   0.40000000E+01   0.00000000E+00 
    5    12   0.42112400E+01   0.28043350E+01 
    1     4   0.43246760E+01   0.34732430E+01 
 
      GROUP  1 CANNOT BE SPLIT ON PREDICTOR  3   AGE AT MARRIAGE    
 
 SPLIT GROUP  1 ON PREDICTOR  1   SEX                
 INTO GROUP   2 WITH CODES    0 
 AND GROUP    3 WITH CODES    1 
 
   BSS IS    0.12265450E+05...BSS/TSS IS   0.07230...T-VALUE  1.97 
 
   CANDIDATE GROUPS ARE  AS FOLLOWS. 
 
 GROUP   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          T S S 
    3    26   0.80300000E+04   0.37620000E+05   0.27052000E+06   0.94272910E+05 
    2    26   0.85600000E+04   0.25375000E+05   0.13833500E+06   0.63114140E+05 
 
 
  CODE   N       MEAN            STD. DEV. 
    3    26   0.46849310E+01   0.34263810E+01 
    2    26   0.29643690E+01   0.27153540E+01 
  ******************************************************************************* 
 
    AID TEST DATA                                                                  
    ** STEP NO.=  2       PARENT GROUP =  3 ** 
 
 
  TRY ON PREDICTOR  1   SEX                
  CODE   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          B S S 
    1    26   0.80300000E+04   0.37620000E+05   0.27052000E+06   0.00000000E+00 
 
 
  CODE   N       MEAN            STD. DEV. 
    1    26   0.46849310E+01   0.34263810E+01 
 
      GROUP  3 CANNOT BE SPLIT ON PREDICTOR  1   SEX                
 
 
  TRY ON PREDICTOR  2   AGE                
  CODE   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          B S S 
    2     9   0.19400000E+04   0.65600000E+04   0.37140000E+05   0.43462590E+04 
    3     2   0.11000000E+04   0.64000000E+04   0.46400000E+05   0.87025980E+03 
    4     7   0.13400000E+04   0.59600000E+04   0.38880000E+05   0.12858570E+04 
    5     2   0.65000000E+03   0.40000000E+04   0.26000000E+05   0.22153810E+03 
    6     3   0.12000000E+04   0.72000000E+04   0.57600000E+05   0.62317530E+03 
    7     2   0.80000000E+03   0.55000000E+04   0.60500000E+05   0.82343100E+04 
 
 
  CODE   N       MEAN            STD. DEV. 
    2     9   0.33814430E+01   0.27767190E+01 
    3     2   0.58181820E+01   0.28862740E+01 
    4     7   0.44477610E+01   0.30384780E+01 
    5     2   0.61538460E+01   0.14595120E+01 
    6     3   0.60000000E+01   0.34641020E+01 
    7     2   0.68750000E+01   0.53253520E+01 
 
      GROUP  3 CANNOT BE SPLIT ON PREDICTOR  2   AGE                
 
 
  TRY ON PREDICTOR  3   AGE AT MARRIAGE    
  CODE   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          B S S 
    0     1   0.10000000E+04   0.20000000E+04   0.40000000E+04   0.82343100E+04 
    4     5   0.10500000E+04   0.35000000E+04   0.21100000E+05   0.11033120E+05 
    6     8   0.25300000E+04   0.95300000E+04   0.74590000E+05   0.20991620E+05 
    5     7   0.24600000E+04   0.10830000E+05   0.66590000E+05   0.58438860E+05 
    2     4   0.10900000E+04   0.65600000E+04   0.50640000E+05  -0.31736590E+06 
    1     2   0.90000000E+03   0.72000000E+04   0.57600000E+05   0.00000000E+00 

  CODE   N       MEAN            STD. DEV. 
    0     1   0.20000000E+01   0.00000000E+00 
    4     5   0.33333330E+01   0.29973530E+01 
    6     8   0.37667980E+01   0.39106830E+01 
    5     7   0.44024390E+01   0.27726590E+01 
    2     4   0.60183490E+01   0.31997180E+01 
    1     2   0.80000000E+01   0.00000000E+00 
 
      GROUP  3 CANNOT BE SPLIT ON PREDICTOR  3   AGE AT MARRIAGE    
 
      GROUP  3 CANNOT BE SPLIT. MAXIMUM BSS =  0.00000000E+00 FOR PREDICTOR 
 
 GROUP   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          T S S 
    3    26   0.80300000E+04   0.37620000E+05   0.27052000E+06   0.94272910E+05 
 

  CODE   N       MEAN            STD. DEV. 
    3    26   0.46849310E+01   0.34263810E+01 
    CANDIDATE GROUPS ARE  AS FOLLOWS. 
 
 GROUP   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          T S S 
    2    26   0.85600000E+04   0.25375000E+05   0.13833500E+06   0.63114140E+05 
 
  CODE   N       MEAN            STD. DEV. 
    2    26   0.29643690E+01   0.27153540E+01 
 
 ******************************************************************************* 
    AID TEST DATA                                                                  
    ** STEP NO.=  2       PARENT GROUP =  2 ** 
 
  TRY ON PREDICTOR  1   SEX                
  CODE   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          B S S 
    0    26   0.85600000E+04   0.25375000E+05   0.13833500E+06   0.00000000E+00 
 
  CODE   N       MEAN            STD. DEV. 
    0    26   0.29643690E+01   0.27153540E+01 
 
      GROUP  2 CANNOT BE SPLIT ON PREDICTOR  1   SEX                
 

  TRY ON PREDICTOR  2   AGE                
  CODE   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          B S S 
    1     3   0.15750000E+04   0.25750000E+04   0.55750000E+04   0.34113910E+04 
    2     6   0.16250000E+04   0.23500000E+04   0.54000000E+04   0.10381870E+05 
    3     6   0.19150000E+04   0.88750000E+04   0.54185000E+05   0.90213570E+03 
    4     4   0.11700000E+04   0.78000000E+04   0.54900000E+05   0.52770370E+04 
    5     4   0.15250000E+04   0.25250000E+04   0.15025000E+05   0.13843170E+04 
    6     1   0.50000000E+02   0.25000000E+03   0.12500000E+04   0.17981170E+04 
    7     2   0.70000000E+03   0.10000000E+04   0.20000000E+04   0.00000000E+00 
 
  CODE   N       MEAN            STD. DEV. 
    1     3   0.16349210E+01   0.93097660E+00 
    2     6   0.14461540E+01   0.11098270E+01 
    3     6   0.46344650E+01   0.26108950E+01 
    4     4   0.66666670E+01   0.15743680E+01 
    5     4   0.16557380E+01   0.26666440E+01 
    6     1   0.50000000E+01   0.00000000E+00 
    7     2   0.14285710E+01   0.90350790E+00 
 
      GROUP  2 CANNOT BE SPLIT ON PREDICTOR  2   AGE                
 
  TRY ON PREDICTOR  3   AGE AT MARRIAGE    
  CODE   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          B S S 
    6     5   0.57000000E+03   0.45000000E+03   0.14500000E+04   0.28885460E+04 
    1     2   0.10250000E+04   0.11250000E+04   0.16250000E+04   0.76610190E+04 
    2     5   0.18700000E+04   0.23900000E+04   0.77800000E+04   0.19284510E+05 
    3     1   0.25000000E+03   0.10000000E+04   0.40000000E+04   0.17393700E+05 
    5     5   0.27000000E+04   0.10900000E+05   0.65500000E+05   0.61782450E+04 
    4     7   0.20950000E+04   0.90100000E+04   0.52980000E+05   0.24895520E+04 
    0     1   0.50000000E+02   0.50000000E+03   0.50000000E+04   0.00000000E+00 
 
  CODE   N       MEAN            STD. DEV. 
    6     5   0.78947370E+00   0.13858540E+01 
    1     2   0.10975610E+01   0.61702980E+00 
    2     5   0.12780750E+01   0.15896390E+01 
    3     1   0.40000000E+01   0.00000000E+00 
    5     5   0.40370370E+01   0.28216300E+01 
    4     7   0.43007160E+01   0.26062660E+01 
    0     1   0.10000000E+02   0.00000000E+00 
 
      GROUP  2 CANNOT BE SPLIT ON PREDICTOR  3   AGE AT MARRIAGE    
 
      GROUP  2 CANNOT BE SPLIT. MAXIMUM BSS =  0.00000000E+00 FOR PREDICTOR 
  GROUP   N     TOTAL WEIGHT        SUM OF Y      SUM Y-SQUARE          T S S 
    2    26   0.85600000E+04   0.25375000E+05   0.13833500E+06   0.63114140E+05 
 
  CODE   N       MEAN            STD. DEV. 
    2    26   0.29643690E+01   0.27153540E+01 

 END OF RUN 

 
 
       ***       53 RESIDUALS WERE PRODUCED IN THIS A.I.D. RUN *** 
 
 
 
 ADDITIONAL PROCESSING PASS IMPLIED, BUT "CONTINUATION" INDICATOR CARD WAS NOT 
 PRESENT OR WAS MISPUNCHED. RUN TERMINATED. 

SAMPLE AID DATA RESIDUAL FILE OUTPUT  (FILE RESID.OUT)

  0     1    50    10000 	  2     1   400   239704 
  3     1   250   199532 	  2     1    20     -296 
  3     1   500   549532 	  2     1   200    19704 
  3     1  1000   199532 	  2     1    50    24704 
  3     1   100    19532 	  2     1  1000   699704 
  3     1    50     9532 	  2     1    45    31204 
  3     1   100    39532 	  2     1   100     9704 
  3     1   500   149532 	  2     1  1000   699704 
  3     1    50     -468 	  2     1    25     4704 
  3     1   100     -468 	  2     1    50    49704 
  3     1    10     2532 	  2     1  1000    99704 
  3     1    40    15532 	  2     1    20     5704 
  3     1   600   119532 	  2     1   200    19704 
  3     1   200   179532 	  2     1   400    79704 
  3     1   200   159532 	  2     1   500    99704 
  3     1   300    89532 
  3     1   400   359532 
  3     1    20     -468 
  3     1   200     -468 
  3     1    10     2532 
  3     1   500    99532 
  3     1   400   199532 
  3     1   700   559532 
  3     1   400    79532 
  3     1   300     -468 
  3     1   200     -468 
  3     1   900   719532 
  2     1    50     -296 
  2     1   100    29704 
  2     1  1000    99704 
  2     1   500   149704 
  2     1   250    99704 
  2     1    25    12204 
  2     1    75     7204 
  2     1  1000     -296 
  2     1   100     -296 
  2     1   250    49704 
  2     1   200     -296 
  (CONTINUED)


NOTES

1. This guide for the AID program was developed by University of North Carolina and extracted from the documentation from the Computation Services Facility, Instituted for Social Research, University of Michigan. The Aid Program is the University of Pennsylvania version.