Frequency analysis is particularly useful for describing discrete categories of data having multiple choice or yes-no response formats. This analysis involves constructing a frequency distribution. The frequency distribution is a record of the number of scores that fall within each response category. The frequency distribution, then, has two elements: (1) the categories of response, and (2) the frequency with which respondents are identified with each category.
The only technical requirement of the frequency analysis is that the categories of response be mutually exclusive and exhaustive. This means that the same observation cannot be counted as belonging to more than one response category. The frequency analysis must be exhaustive in the sense that all respondents must fit into a category.
Every set of data has a tendency to cluster around a central value. There are various measures of central tendency, including the Mean, Median , Mode, Quartiles and various percentiles.
(1) The Arithmetic Mean:
For most statistical analyses, the mean is the most often used measure of central tendency. The mean is used most often, because of its relationship to the variance statistic. The mean is also important in the sampling distribution, which is formed from the distribution of all possible individual sample means, and has as its center, the mean of the population. The mean is affected by the presence of extreme scores (outliers) which may not be typical of the sample (or population) as a whole. The mean is preferred when a distribution is symmetric and interest is centered on a score that represents all scores.
The mean value, X, is obtained by summing the score values for a given variable and dividing by the number of scores.
For any variable X, having n observations,
X1,X2,X3, ...........Xn, the mean of variable X is defined as

Sometimes, however the data is not a simple set of integers that are summed, but instead, a set of categories. Consider an example where respondents are grouped into age categories. In this example, the midpoint of the category is identified as the mean of the category and is substituted for individual responses in the computation of the mean
Group
X Frequency Mean Freq * Mean
---------------------------------------------------
Age 41-43 1 42 42
Group 38-40 3 39 117
35-37 4 36 144
32-34 4 33 132
29-31 5 30 150
26-28 8 27 216
23-25 10 24 240
20-22 18 21 378
17-19 23 18 414
14-16 17 15 255
11-13 10 12 120
S = 103 S = 2,208
_ 2208
Mean = X = ---- = 21.44
103
Thus we see that for ungrouped data, the identified categories of X1, X2, X3, ..... Xn have corresponding frequencies f1, f2, f3, ........ fn, and for grouped data, the values X1, X2, X3,....... Xn are the class-midpoints of the n class intervals with class frequencies f1, f2, f3, ...... fn respectively.
_
Then the mean X is defined as
n
S (fiXi)
X1f1 + X2f2 + X3f3+ .....+ Xnfn i = 1
X = -------------------------------- = --------------
f1f1 + X2f2 + X3f3+ .....+ Xnfn S fi
i = 1,n
(2) The Median:
The median is the value that divides a frequency distribution in two equal parts (the distribution is arranged in ascending or descending order of magnitude).
The median value is an appropriate indicator of central tendency when the distribution of points is skewed and when the most typical value is desired (typical is described as the middle point between the extremes).
If X1, X2, X3, ............ Xn is a set of data arranged in ascending order of magnitude, then the median of the set of data is given by :
Me = X(n+1)/2, if n is odd,
Me = (Xn/2 + X(n/2+1)), if n is even.
This result is true also for an ungrouped frequency distribution. If the data is a grouped frequency distribution, then the median,
where N = (f1 + f2 + f3 + .......... + fn)
It should be noted that cumulative frequencies may be defined in two ways. (1) the cumulative frequency less than the upper boundary of a class, and (2) the cumulative frequency greater than the lower boundary of class. Unless otherwise stated, the term cumulative frequency refers to the first definition.
In this example, (the same example as for the mean calculation, except that the classes are arranged in an ascending order) the cumulative frequencies of the categories are computed and the median class is identified as the mean of the category and is used for input to the computation of the median.
Cumulative
X Freq Frequency (Less Than)
---------------------------------------------
Age 11-13 10 10
Group 14-16 17 27
17-19 23 50 <--- Modal Class
20-22 18 68 <--- Median Class
23-25 10 78
26-28 8 86 Median = 103/2=51.5
29-31 5 91
32-34 4 95
35-37 4 99
38-40 3 102
41-43 1 103
S=103
Median Class Lower Bound = l1 = 20
Median Class Upper Bound = l2 = 22
N = 103, l2 - l1 = 2, f = 68, c = 50
Median = 20 + ((103/2 - 50)/68)*2 =20.0441
(3) The Mode:
The mode is defined as the most frequently observed value. For grouped data, the mode is the most commonly observed category, and for ungrouped data, the mode is the value which occurs most frequently.
For a grouped frequency distribution, the mode is given by
Mo = l1 + ((f1 - f0)/(2f1 - f0 - f2))*(l2 - l1)
where:
l1 - l2= the modal class
f1= frequency of the modal class
f2= Frequency of the class following the modal class
f0= Frequency of the class following the modal class
Using the same example for computation of the mode, the modal class is identified as the mean of the category and is used for the responses in the computation of the mode.
l2 - l1 = (17-19) = 2, f1 = 23, f0 = 17, f2 = 18
Mo = 17 + ((23-17)/(2*23-17-18))*(2) = 17 + (6/11)*2 =18.091
Dispersion is the spread of the data about the measure of central tendency. There are various measures of dispersion that may be applied to a data set. The most commonly used measures include the range, variance, standard deviation.
(1) Maximum & Minimum
The largest and smallest values of the variable.
(2) Range
The Range is the difference between the highest and the lowest values of the variable.
(3) The Variance
The variance is defined as the "Mean of the Square of deviations around the mean". The calculation of the variance occurs as a three step process.
Step 1: Calculate the mean of the set of data (Let mean = X)
Step 2: Calculate the Deviation of each score from the mean score for the variable. [ Deviation = (Xi - X) ]
Step 3: The mean of the square of the deviations is calculated by dividing by n - 1.
n _
S (Xi - X)2
i = 1
Variance (s2) = ----------------
( n - 1 )
If the set of data is very large, the denominator ( n-1 ) can be approximated by n.
(4) The Standard Deviation
The standard deviation is defined as the root-mean-square-of-deviation-around-the-mean. The standard deviation for a sample can be expressed as the square root of the variance and is represented by s. The advantage of the standard deviation is that it is expressed in the same unit as the original variable.

(5) The Standard Error
If from the population of data is drawn an infinite number (or all possible) samples of equal size, then the mean of each sample would be a true estimate of the population mean, but not all of them would be identical. These means would be normally distributed into what is called a `sampling distribution'. It is the standard deviation of this sampling distribution that is called the standard error. The standard error is an estimate of the potential for discrepancy between the sample mean and the population mean Because the population mean is usually unknown, the standard error cannot be calculated directly, but is estimated by dividing the standard deviation by the square root of the number of cases in the sample.

The Frequency Analysis program produces frequency distributions and descriptive statistics for variables identified in the data set. As previously indicated three distinct files are required to run frequency analysis.
The PC-MDS Command File
The command file defines the various variables, their format and locations, defines missing values for variables and recodes the values of the variables, if desired. For purposes of clarification, the command files are designated as files with an "SPS" extension (i.e., *.SPS). As an example command file, we shall refer to the command file for the hospital data. The file is designated HOSP.SPS and is found on the data disk. The name of the PC-MDS command file is specified interactively by the user when each program is run. (Note that the .SPS designation is used for instructional clarity only. The command file may have any name and does not, in reality, require the .SPS extension).
The Data File
The data file contains the data in the format described in the Command File. The data files are usually named with a "DAT" extension (i.e., *.DAT). The example data file for the hospital data is called HOSP.DAT and is found on the data disk. The data file is specified in line 2 of the command file (the FILENAME command).
(Note that the .DAT designation is used for instructional clarity only. The data file may have any name and does not, in reality, require the .DAT extension).
The Output File
An output file must be interactively specified by the user while running each of the PC-MDS programs. The output file is the file to which the frequency analysis is printed. A common convention is to name the file with a "PRN" extension to signify a print file (i.e., *.PRN). For the frequency analysis, the output file contains the following output.
1) The variable number 2) The mean 3) The standard deviation 4) The standard error 5) The sample size 6) The maximum and minimum values 7) The range 8) The frequency distribution with percentages
HOW TO RUN THE FREQ PROGRAM
STEP 1: Enter the EDITOR (a word processor or program editor that produces ASCII files will suffice), and prepare the command file and the data file. STEP 2: Load the FREQ program. The program is loaded by simply typing FREQ and then pressing the [ENTER] key. C> FREQ [ENTER] STEP 3: After the initial logo identifying the program, a message will appear on the screen requesting the location and name of the command file.
ENTER THE NAME OF THE PC-MDS COMMAND FILE |
ENTER THE NAME OF THE PC-MDS COMMAND FILE |
ENTER THE NAME OF THE FILE TO SAVE OUTPUT
USE THE FORM: DRV:FILENAME.EXT(e.g. B:STAT.PRN)
C:HOSP.PRN
|
THIS OUTPUT FILE NAME ALREADY EXISTS! DO YOU WANT TO OVERWRITE IT? (Y/N) Y |
FREQUENCY PROGRAM OPTIONS:
64 VARIABLES HAVE BEEN DECLARED.
SELECT THE APPROPRIATE OPTION:
(1) SPECIFY THE VARIABLES FOR ANALYSIS
(VARIABLES ARE SPECIFIED BY SEQUENCE NUMBER)
(2) VIEW A LIST OF VARIABLE NUMBERS
(3) QUIT PROGRAM
YOUR CHOICE : 2
|
SEQ# NAME VARIABLE LABEL |
1 V1 PERSON FILLING OUT
2 V2 TYPE OF SURGERY
3 V3 OUTPATIENT INSURANCE
4 V4 PERCENT DR COVERAGE
|
VARIABLES SPECIFICATION:
ENTER VARIABLES ONE AT A TIME.
A blank space must follow each variable number.
The dash (-) may be used to simplify statements.
PRESS ENTER to quit this menu
For example,
1 2 3 4 5 and 1 - 5 are equivalent statements.
1 -3
|
SELECTED VARIABLES |
1 V1 2 V2 3 V3
VARIABLES CORRECT? Y |
STMT# #VARIABLES FORMAT STATEMENT AND DATA |
1 64 (4X,F1.0,F2.0,53F1.0,2X,9F1.0) |
1.00000e+000 6.00000e+000 1.00000e+000 1.00000e+000 1.00000e+000
5.00000e+000 4.00000e+000 4.00000e+000 4.00000e+000 4.00000e+000
4.00000e+000 4.00000e+000 4.00000e+000 4.00000e+000 4.00000e+000
2.00000e+000 3.00000e+000 1.00000e+000 2.00000e+000 1.00000e+000
2.00000e+000 1.00000e+000 2.00000e+000 2.00000e+000 2.00000e+000
1.00000e+000 3.00000e+000 2.00000e+000 2.00000e+000 3.00000e+000
|
WAS THE DATA READ CORRECTLY? Y |
PLEASE ENTER THE NEW FORMAT STATEMENT
(4X,F1.0,F2.0,53F1.0,2X,9F1.0) |
| PRINT OPTION: | ||
| Press ENTER to QUIT ANALYSIS | OR | Type 1 for FREQUENCY TABLES |
TITLE PATIENT SURVEY, LOCAL HOSPITAL FILE NAME 'C:HOSP.DAT' DATA LIST V1 TO V64 64 (4X,F1.0,F2.0,53F1.0,2X,9F1.0) VARIABLE LABELS V1 'PERSON FILLING OUT' V2 'TYPE OF SURGERY' V3 'OUTPATIENT INSURANCE' V4 'PERCENT DR COVERAGE' V5 'PERCENT HOSPITAL COVERAGE' V6 'DECISION OF WHERE TO HAVE SURGERY' ---------------------------------------------- ¦¦¦Continues for V7 to V64 ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ ---------------------------------------------- RECODE V7 TO V10(1,2=1)(3,4=2)(5=0) MISSING VALUES V1 TO V10(0,9) THE HOSP.DAT FILE 00101061115444444444231212122213223111326355533326424666250200213141735 00201071412222299223331122111211221111222222222222233202331100115141222 00301061334233333333131111111111111111113442442325344333441200124102343 ----------------------------------------------------------------------- ¦¦ Continues for Observations 4 - 181 (NUMBERING IS NOT CONSECUTIVE) ¦¦ ------------------------------------------------------------------------ 19602011115444333324441111111111111111112334324224025432222100122137125 19702031113111449994941111111111111911311111111111111911311100124137124 19802011115224444444441111111211222111211112121111122221212100124131122 19902021435333233222231111111111111111111313111213425429232200113131743
PC-MDS
FREQUENCY ANALYSIS
ANALYSIS TITLE PATIENT SURVEY, LOCAL HOSPITAL
INPUT DATA FILE C:HOSP.DAT
OUTPUT PRINT FILE C:HOSP.PRN
NO. OF VARIABLES 64
DATA FOR RECORD: 1
.10E+01 .60E+01 .10E+01 .10E+01 .10E+01 .50E+01 .20E+01 .20E+01 .20E+01 .20E+01 .20E+01 .20E+01 .20E+01 .20E+01 .20E+01 .10E+01 .20E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .20E+01 .10E+01 .10E+01 .20E+01 .10E+01 .10E+01 .10E+01 .20E+01 .10E+01 .00E+00 .10E+01 .20E+01 .20E+01 .20E+01 .10E+01 .10E+01 .10E+01 .10E+01 .00E+00 .20E+01 .10E+01 .20E+01 .00E+00 .00E+00 .00E+00 .10E+01 .20E+01 .00E+00 .10E+01 .20E+01 .10E+01 .30E+01 .10E+01 .40E+01 .10E+01 .70E+01 .30E+01 .50E+01
DATA FOR RECORD: 185
.20E+01 .20E+01 .10E+01 .40E+01 .30E+01 .50E+01 .20E+01 .20E+01 .20E+01 .10E+01 .20E+01 .20E+01 .10E+01 .10E+01 .10E+01 .10E+01 .20E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .20E+01 .10E+01 .20E+01 .20E+01 .10E+01 .90E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .10E+01 .30E+01 .10E+01 .30E+01 .10E+01 .70E+01 .40E+01 .30E+01
DATA MODIFICATION COMPLETE 185 OBSERVATIONS READ.
VAR # MEAN STD.DEV. STD.ERR. SAMPLE MAXIMUM MINIMUM RANGE
--------- --------- --------- --------- ----- ---------- --------- ---------
V1 1.413 .536 .040 184 3.00 1.00 2.00
V2 4.286 2.476 .182 185 16.00 1.00 15.00
V3 1.059 .237 .017 185 2.00 1.00 1.00
VARIABLE: V1 PERSON FILLING OUT
___________________________________________________________
VALUE COUNT VALID % PERCENT CUMULATIVE%
___________________________________________________________
.000 1 MISSING .005 MISSING
1.000 112 .609 .605 .609
2.000 68 .370 .368 .978
3.000 4 .022 .022 1.000
===========================================================
185 1.000 1.000 1.000
1. Count = Total number of responses for each value of the variable in the frequency distribution.
2. Valid % = The percentage after adjustment for missing values (To account for non-respondents or
undesired codes)
3. Percent = (Count of Each Value)/(Total Count)
4. Cumulative % = The sum total of percentages of all values less than or equal to the current variable
value. The cumulative % of the last variable must be 100%.
VARIABLE: V2 TYPE OF SURGERY
___________________________________________________________
VALUE COUNT VALID % PERCENT CUMULATIVE%
___________________________________________________________
1.000 42 .227 .227 .227
2.000 6 .032 .032 .259
3.000 28 .151 .151 .411
4.000 12 .065 .065 .476
5.000 35 .189 .189 .665
6.000 31 .168 .168 .832
7.000 10 .054 .054 .886
8.000 20 .108 .108 .995
16.000 1 .005 .005 1.000
===========================================================
185 1.000 1.000 1.000
VARIABLE: V3 OUTPATIENT INSURANCE
___________________________________________________________
VALUE COUNT VALID % PERCENT CUMULATIVE%
___________________________________________________________
1.000 174 .941 .941 .941
2.000 11 .059 .059 1.000
===========================================================
185 1.000 1.000 1.000