Section VI: Appendix A (Administrative Databases)

Part A: Use of Official Statistics in Administrative Records

[Source: Social Research Associates, "Wyoming Program Performance Measurement Through Unemployment Insurance Wage Record Follow-up," section II B pp. 13-18. Copyright 1999 by the Wyoming Department of Employment, Research & Planning.]

Governmental organizations have long collected and used administrative data. The invention of writing nearly 6000 years ago in Mesopotamia was initially put to service mainly in keeping public records of taxes and agricultural production (Lenski and Lenski 1974). Similarly, the field of statistics ("state-istics," or "political arithmetic") arose in the Renaissance in connection with numerical records kept by the governments of emerging national states -- births, deaths, marriages, and of course taxes (see Douglas 1967).

With the rise of social science as a discipline in the nineteenth century, records compiled by governments were put to use in testing abstract theoretical ideas far removed from their original purposes, ranging from suicides and social solidarity to wages and social exploitation. To the present day, academic researchers continue to use administrative data from the public sector (e.g., Grandjean 1981). Increasingly private firms do so as well (e.g., Ishida, Spilerman, and Su 1997). The compilation of administrative data has spread from public bureaucracies to become standard practice in organizations of all sorts throughout the world, from businesses to charities. Computerization has accelerated that trend.

Today organizations use administrative data to monitor the performance of individual employees', and the organization as a whole. Governments also use administrative records to monitor compliance with statutory requirements. The development of interagency data sharing systems by state governments to monitor the client outcomes of workforce development agencies is yet another step in the widening circle of the application of administrative records for purposes beyond those envisioned when record-keeping protocols were established. As the Florida Department of Education (FETPIP 1997, p. 1) noted:

The collection of data by electronically linking administrative data bases as a means of supporting statistical analyses is a relatively new phenomenon. Its use for vocational education or JTPA [Job Training Partnership Act] follow-up is but one of several applications that have been and are being examined using the technique. It has been used in health and vital statistics by the Center for Disease Control, U.S. Census to Internal Revenue Service master files, enhancements from the U.S. Survey of Income and Program Participation and economic data, and a myriad of others.

In the social sciences Emile Durkheim, (1897, 1951), pioneered the use of official statistics to test abstract theory. The French sociologist used official cause-of-death figures from throughout Europe to support his argument that suicide rates were a function of social solidarity. Durkheim's work is credited with establishing sociology as an empirical academic discipline (see Douglas 1967). Durkheim's ingenious analyses of data shed light on questions that went far beyond the European governments' reasons for collecting the data. This work continues to serve as a methodological model and a theoretical base for research in such areas as education, crime, and highway safety (see Grandjean 1974; Kim, Grandjean, and Milner 1993; O'Leary 1984).

Despite this influence, Durkheim's use of official statistics to assess social solidarity is not above criticism (see Kim 1986) The first thorough critique was provided by Douglas (1967), and many of his comments still apply to current uses of administrative data in general. Administrative records are an attractive source of information because of their availability and low cost. Douglas (1967, p. 166) observed, "it is always easier to use the great quantities of published official statistics on any subject than to go out and collect even a small part of the statistics for oneself."

Concerns about the accuracy of data are typically dismissed on the assumptions that (1) errors are few and small, and (2) errors are random, so there will be no systematic bias in analyses. Douglas (1967, pp. 167-231) cited reasons these assumptions may be unfounded. Most fundamentally, the definitions applied in collecting the data may be quite different from those of interest in the analyses, and may change over time or from one social context to another. Even if the definitions are reasonably similar and constant, the observer responsible for applying the definition to a particular event may not have enough of the relevant information to do so accurately, or may be influenced by irrelevant or false information from others. This is especially so when the event being coded is socially sensitive (e.g., a loved one's possible suicide, or one's own unemployment). Both the concealment of information and the provision of misleading information are by definition intentional social actions, and hence such errors are not likely to be randomly distributed (cf. Kitsuse and Cicourel 1963).

The advantages and disadvantages of using administrative data for purposes beyond the original intent are still much debated. Coleman et al. (1998) found hospital records were about equal in accuracy to patient surveys, but had the virtue of more complete coverage, lower cost, and no bias due to non-response. However, Iezzoni (1997) concluded that medical administrative data were best used only to flag cases for further study using other methods (cf. Dyson, Power, and Wozniak 1997). Hauser (1975) reviewed and applauded the wide range of applications to which administrative data were put, from education, the labor force, and welfare programs to housing, recreation, and transportation. Wheeler (1969) compiled a volume covering much the same range, but with an emphasis on issues of misuse and loss of privacy. Levitas and Guy (1996) took a similar tack in addressing government statistics in the United Kingdom. Papers presented at a symposium sponsored by Statistics Canada (1988) gave detailed attention to the problems and prospects for record-linkage and sharing of administrative data in various contexts, while Stevens and McGowan (1985) provided an overview of the management of administrative information systems in public organizations.

An excellent review of issues in the sharing of administrative data among public agencies was provided by Dawes (1996). Dawes listed several benefits of information-sharing: avoiding the wasteful and costly duplication when different agencies collect the same information; promoting standardization in the definitions of data elements; providing more complete and higher quality information that agencies can apply to their own internal questions; expanding professional networks and cross-agency contacts; placing the programs of separate agencies in a broader context for policy decisions; increasing public accountability of the programs; and integrating program planning, service delivery, and program evaluation. Dawes (1997) also noted that data-sharing may be particularly attractive to agencies if the alternative is "service integration" accomplished through extensive re-structuring, including the possible elimination or merging of departments or entire agencies.

Despite these advantages, there are important barriers to information-sharing. Some barriers are technical, such as incompatible hardware, software, or data structures across agencies (Dawes 1997). But most are organizational, such as agency defensiveness over "turf" or organizational self-interest; the internal control of data handling by agency professionals steeped in the existing organizational culture; agency relationships with the immediate constituency, who may be equally committed to the existing ways of doing things; and traditions of agency autonomy, reinforced by the primacy of existing named programs in the budget process (Dawes 1997).

Levesque and Alt (1994) focused specifically on the use of Unemployment Insurance (UI) wage records for evaluating outcomes of workforce development programs. Their review summarizes the advantages of UI data over survey methods of tracking outcomes as follows: coverage rates of 60-90 percent, compared to response rates of 25-50 percent in many surveys; less bias, whether from non-response, self-selection, or lack of objectivity when the agency conducts its own survey; freedom from errors due to respondents' memory failures or distortions; substantially lower costs, with savings estimated at 80 percent compared to the cost of survey follow-up; and a reduced data burden, both on the agencies (which can rely on the facilities and expertise of a centralized unit for data collection and analysis) and on clients (who no longer need to be surveyed for additional information).

However, in workforce development as in the medical context mentioned earlier, there is still debate about the accuracy of administrative data. According to Levesque and Alt (1994, p. 8), the Bureau of Labor statistics has reported that 5 to 10 percent of UI wage records contain incorrect Social Security Numbers, which are the key to linking these records with other administrative data bases. Some estimates of other errors in the wage records are as high as 30 percent (see Levesque and Alt 1994). As Douglas (1967) observed with regard to suicide data, such errors are likely to be non-random.

There are many steps in the process of recording administrative data were errors may appear in (FETPIP 1997, p. 2):

The accuracy of wage report data requires that employers accurately record and report employee identification and payroll information. It also requires that the employers' data are entered accurately when received by the unemployment insurance agency. The assignment of Standard Industrial Classification Codes to employers must be accurate as well. Similarly, the accuracy of student or participant data to be used in a record linkage program requires that ... student level information such as demographic attributes, socio-economic characteristics, program distinctions, etc. must be faithfully represented.

UI wage records have other limitations as well. Only about 90 percent of workers are covered by state UI systems, since some kinds of employers or employment are excluded by federal and state regulations (Levesque and Alt 1994, p. 11). Wage records typically do not include hours worked, sequence of jobs held, or other occupational information (Levesque and Alt 1994, p. 12), so some states supplement UI data with surveys of employers (FETPIP 1997, p. 32). And the legal definition of unemployment, established for purposes of determining eligibility for benefits, may understate joblessness among recent graduates whose employment history is too short to meet eligibility requirements (Levesque and Alt 1994, p. 13; cf. Douglas 1967).

Other concerns expressed by users of administrative labor-market information (not just UI records) in a survey of state and local agencies included a lack of timeliness and local specificity in the available data (Duggan and Kane 1990, p. 1). Notably, however, the main concern was not the accuracy of the data but rather "the lack of analysis of data and cumbersome presentation" of the analyses that were conducted. In this they echoed a social-science treatment of the issues by Wilensky (1967). Focusing on the organizational causes and consequences of "intelligence failures," Wilensky concluded that often the problems stem not from inadequacies in the data collected, but rather from incomplete analysis and faulty interpretation of the data.

Typically the reasons for intelligence failure can be traced to organizational dynamics. For example, the filtering of communications through multiple layers of organizational hierarchy, or between separate departments in the structure, increases the likelihood of those communications being distorted. Lower levels of the hierarchy may have reasons to conceal information from higher levels, or vice versa, and departments may manipulate information to protect their departmental "turf." Among other structural remedies to such problems, Wilensky (1967) advocated the use of interdepartmental working groups, to move the communications out of routine channels and into a face-to-face arena. where differences of opinion and conflicting interest can be addressed directly. Implicitly, this argument also suggested that interagency data sharing could generate improved analyses and interpretation of workforce development outcomes -- by increasing the quality and quantity of data available for analysis, but especially by fostering communication between agencies.

In The Dynamics of Bureaucracy, Peter Blau (1955) provided one of the first detailed ethnographic accounts of the impact of administrative record-keeping on behavior. The organization he described was a state employment agency. This coincidence alone would merit at least brief mention in a review of literature on administrative records in workforce development, but the substance of Blau's observations warrants a somewhat fuller treatment.

Blau (1955, p. 33) listed some of the intended functions of administrative records as follows:

The preparation of periodic statistical reports constitutes a method for evaluating operations well suited to the administration of large organizations. Dehumanized lists of cold figures correspond to the abstract, impersonal criteria which govern bureaucratic activities. Statistical records provide precise and comparable information on operations quickly and in a concise form that is quickly communicated. ... Statistical records are also more economical, since they can be prepared by clerks.

Blau identified unintended functions and dysfunctions (Blau 1955, pp. 35-43) in reporting. For example, the recording system initially counted each employment agent's interviews, but not successful job placements. Understandably, agents rushed through as many interviews as possible, and placed only a small proportion of their interviewees in jobs. When the number of job referrals was added to the reporting system, referrals went up, but agents had no incentive for care in matching referrals to job openings, so successful placements did not increase much. When the proportion of placements was added to the reporting system, both the number and the rate of job placements went up (Blau 1955, pp. 35-38).

Conversely, the system did not record the number of "counseling" interviews, and so agents performed few of these. Though arguably an important part of the agency's services, they were quite time-consuming and they kept the agents from doing the placement interviews that were being recorded. Agents could also manipulate their counts, such as by "referring" a client to the very same job from which s/he had been temporarily laid off and thus scoring a "placement" when the worker was recalled to work (Blau 1955, pp. 38-43).

These results illustrated a fundamental principle of administrative record-keeping: systems established for the purpose of counting that which is regarded as important end up defining as important whatever is being counted.

In the employment agency, the recording system had another effect, competition. As Blau (1955, p. 49) pointed out, the competition was particularly dysfunctional because the agents were "dependent on common and limited resources," namely the supply of job openings. As a result, social cohesion and cooperation suffered, both between agents and between departments, and the organizational mission was impaired. An agent who learned of an opening would conceal that information, in hopes of being able to claim a suitable referral among the agent's own interviewees. Another agent with a well-suited interviewee might never know of the opening, and the result might be no placement for anyone. Concealing openings thus improved an agent's individual statistics, but impeded the overall organizational goal of maximizing total placements. Blau (1955, p. 53) documented this seeming paradox by showing that the unit of the agency with the least competitive agents had the highest average placement scores, even though, within each unit, the most competitive agent had the highest placements for that unit..

Blau's (1955) classic study called attention to the unintended or "latent" consequences of organizational systems established for the collection and analysis of administrative records. Such systems hold the potential for distorting the core activities of the organization.


Blau, Peter M. 1955. The Dynamics of Bureaucracy: A Study of Interpersonal Relations in Two Government Agencies. Chicago: University of Chicago Press.

Coleman, E.A., E.H. Wagner, L.C. Grothaus, J. Hecht, J. Savarino, and D.M. Buchner. 1998. "Predicting Hospitalization and Functional Decline in Older Health Plan Enrollees: Are Administrative Data as Accurate as Self-Report?" Journal of the American Geriatrics Society 46:419-425.

Dawes, Sharon s. 1996. "Interagency information Sharing: Expected Benefits, Manageable Risks." Journal of Policy Analysis and Management 15:377-394.

Douglas, Jack D. 1967. The Social Meanings of Suicide. Princeton, New Jersey: Princeton University Press.

Duggan, Paula, and Matt Kane. 1990. Final Report: Assessing the Adequacy of Labor Market Information at the State and Local Level. Contract Number 99-9-3436-75-050-01, U.S. Department of Labor, Employment and Training Administration. Washington, D.C.: Northeast-Midwest Institute.

Durkheim, Emile. [1897] 1951. Suicide: A Study in Sociology. New York: Free Press.

Dyson, G.P., K.G. Power, and E. Wozniak. 1997. "Problems with Using Official Records from Young Offender Institutions as Indexes of Bullying." International Journal of Offender Therapy and Comparative Criminology 41:121-138.

FETPIP [Florida Education and Training Placement Information Program]. 1997. Initial Steps - The Ground Work. Tallahassee: Florida Department of Education.

Grandjean, Burke D. 1974. "The Division of Labor, Technology, and Education: Cross-National Evidence." Social Science Quarterly 55:543-552.

Grandjean, Burke D. 1981. "History and Career in a Bureaucratic Labor Market." American Journal of Sociology 86:1057-92.

Hauser, Philip M. 1975. Social Statistics In Use. New York: Russell Sage Foundation.

Iezzoni, L.I. 1997. "Assessing Quality Using Administrative Data." Annals of Internal Medicine 127:666-674.

Ishida, Horoshi, Seymour Spilerman, and Kuo-Hsien Su. 1997. "Educational Credentials and Promotion Chances in Japanese and American Organizations." American Sociological Review 62:866-882.

Kim, Sung-Soon Clara. 1986. Dimensions of Social Integration: Solidarity and Deviance in American Cities. Unpublished Ph.D. dissertation, University of Virginia.

Kim, Sung-Soon Clara, Burke D. Grandjean, and Murray Milner, Jr. 1993. "Solidarity and Deviance: Durkheimian Sources of social Integration in American Cities." Paper presented at the annual meeting of the American Sociological Association.

Kitsuse, John I., and Aaron V. Cicourel. 1963. "A Note on the Official Use of Statistics." Social Problems 11:131-139.

Levesque, Karen A., and Martha Naomi Alt. 1994. A Comprehensive Guide to Using Unemployment Insurance Data for Program Follow-Up. Berkeley, California: Institute for the Study of Family, Work, and Community.

Levitas, Ruth, and Will Guy, editors. 1996. Interpreting Official Statistics. London: Routledge.

O'Leary, Thomas J. 1984. Alone at the Wheel: A Study of Social Solidarity and Automobile Accidents. Unpublished Ph.D. dissertation, University of Virginia.

Statistics Canada. 1988. Statistical Uses of Administrative Data: Proceedings. Ottawa: Statistics Canada.

Stevens, John M., and Robert P. McGowan. 1985. Information Systems and Public Management. New York: Praeger.

Wheeler, Stanton, editor. 1969. On Record: Files and Dossiers in American Life. New York: Russell Sage Foundation.

Part B: Data Validation through Enumeration Verification System (EVS)

The Social Security Administration's EVS provides a way for an employer or agency to verify the information they obtain from their employees. The Department of Employment's Unemployment Insurance (UI) program used EVS to help in identifying fraudulent claims. This also allowed us to check the accuracy of the Wage Records database. A file containing SSN's from UI Wage Record files and demographic information from the Drivers License master file was sent to the Social Security Administration. They returned a file with verification codes and a few other fields added. This file was then matched to quarterly Wage Record files (96\4-98\1). The following code list and table describe the verification codes and how often they occurred in each quarter.

Code Definition

1 SSN not in file (never issued to anyone)

2 Name and DOB match, sex code does not

3 Name and sex code match, DOB does not

4 Name matches, DOB and sex code does not

5 Name does not match, DOB and SEX code not checked

* Input SSN did not verify; Social Security located and verified different SSN

The memo below was written by Research & Planning's Mike Evans to communicate the findings to those in Wyoming's Department of Employment involved with this issue.

Memorandum - Wyoming Employment Resources Division

July 21, 1998

To: Greg Olson, Ellen Schreiner, and Wendy Tyson

From: Mike Evans

Subject: Enumeration Verification of Social Security Numbers (SSN's)

We received the file back from the Social Security Administration (SSA) and I had Norman match it with wage records showing the attached SSN's were not on file (code #1) with the SSA (See SSNVER matches with Wage Records attachment).

SSA verified some SSN's. Of these SSN's marked by asterisks (code *) some did not match the names we had supplied from the driver's license file. The attachments show matches from the SSA file and wage records for each quarter from the first quarter of 1998 to the fourth quarter of 1996. The quarters were not cross matched so some SSN's could show up in different quarters.

Code 1 shows SSA never issued the SSN and is not on file. The probability of these individuals using improperly SSN's is higher than a typo and/or the driver's license file having an improper name. The verse is true of code *. The probability of a typo and/or driver's license file problem is higher than fraud. I have attached the file structure we submitted to SSA, along with the codes returned to us (See EVS Requests on Diskette attachment). Codes 2, 3, 4, and 5 are problems with the driver's license file specifically (See previous memo dated April 8, 1996) and not the wage record file.

Only 0.01 percent of the wage records have a code 1 error with a definite problem occurring in the SSN on average. The * code problem occurred 0.04 percent of the time on average. This indicates the reliability of SSN in the wage records files are accurate.

These findings correspond with previous research we have shown wage records having only a slight SSN problem. This involved looking at the first three digits of the SSN in wage records to verify they existed with only a small percentage (0.001 %) of wage records effected or having problems.

I hope this information is useful to you. If you would like a copy of the entire file from SSA, please let me know.


cc: Beth Nelson

Tom Gallagher

Go Back To:
Table of Contents
Labor Market Information
Wyoming Job Network
Send Us Mail