Overview of chapters
*1 Change or continuity: the information society or informatization?
*Introduction
*The Information Society
*Informatization
*The social shaping of technology
*Users, communities and identities
*Social influences on the innovation process
*The development and reception of hypertext
*Introduction
*History of hypertext and the World Wide Web
*Hype and hypertext: the reception of hypertext by the humanities
*2 The use of CAPI in social survey research
*Computer assisted interviewing
*Overview
*Concerns about CAI
*Emerging issues
*Current solutions to documentation problems
*The Family Resources Survey
*3 Documenting the Family Resources Survey
*Design principles
*What is to be documented?
*General information
*Metadata
*Questionnaire
*Project summary
*4 Writing hypertext: the construction of online documentation
*Introduction
*Multidocument hypermedia
*Topic driven hypermedia texts
*Structure of the hierarchical dataset website
*Variable metadata: comparing the spreadsheet and HTML versions
*Problems with the HTML version
*Search facilities
*Summary
*Central text hypermedia
*Conclusion
*References
*
Research overview
Large scale social survey research for public policy is increasingly being carried out by interviewers using laptop computers and CAPI (computer-assisted personal interviewing) programmes (de Leeuw and Nicholls, 1996). National benchmark surveys, such as the General Household Panel Survey, the Family Expenditure Survey, the British Crime Survey, and the British Social Attitude Survey, are all now CAPI based, having been paper and pencil questionnaires when they started. The Family Resources Survey, sponsored by the Department of Social Security, has been a CAPI based survey since its inception in 1993. This survey is a large and complex multi-level survey into the living standards and characteristics of around 24,000 households each year. Survey fieldwork is carried out by the Social Survey Division of the Office of National Statistics (ONS) and the National Centre for Social Research (formerly known as Social and Community Planning Research) jointly. The survey uses the CAPI program BLAISE, the program developed by Statistics Netherlands used by ONS and NCSR for almost all public sector surveys.
Traditionally in survey research, a key document for the survey researcher and the survey user has been the paper questionnaire, which contained clearly set out information on the question wordings, the response categories, the routing, the raw form of the data items, the checks made by interviewers, and so on. The paper survey questionnaire has now been replaced by the electronic questionnaire, written in BLAISE code, which is viewed by the interviewer on the laptop by a series of screens concerning questions to be put through the questionnaire not visible, but written into the programme. This is to ensure that only the right people are asked the right questions, and can involve the nesting of questions (for example, between the household and the individual level). The paper questionnaire has been replaced by printout in a relatively raw form of the questions and the routing instructions separately in BLAISE. The latest version of the FRS 'questionnaire', produced by one of the survey agencies, is over 1000 pages long, and is not regarded as particularly user-friendly by those who have examined it.
The project on which this research is based aims to contribute to providing full electronic documentation of the Family Resources Survey for the Department of Social Security ASD Intranet, and potentially in future for the World Wide Web. It will aim to do so in as user-friendly a form as possible, to enhance use of the data from this complex survey, and to make the processes of data collection more transparent to users. A possible model for the kind of facility to be created is the website of the British Household Panel Study (BHPS). BHPS, however, is still a paper and pencil survey and has not had to face the problems which documentation of CAPI surveys create.
The research questions to which the project is addressed, using the FRS as a case study of a survey which has never had a paper questionnaire, concern how to develop full electronic documentation for CAPI surveys. What changes have electronics wrought in the survey collection process? While survey agencies are fully familiar with this new environment, there is relatively little experience in the academic community, and few academic surveys as yet are mounted via CAPI. What, then, is the impact on users of the survey data of introducing new technology?
In using hypertext languages (HTML and XML) to produce the online documentation, the research is concerned with issues relating to the impact of new technology on both society and the representation of knowledge. Hypertext, as a communications medium, is a particularly social technology: it is a technical means to produce and disseminate information. The case study therefore also acts as a means to illuminate two connected themes within the academic literature on new technologies: firstly, the extent to which technology acts as a driving force for change in society; secondly, the impact of new communications technologies on the representation of knowledge.
The first area covers a broad range of issues. To what extent is technological innovation a driving force for change within society? Many social theorists currently suggest that what we are seeing is a distinctive break from the past, the emergence of an 'information society', in which the Internet and associated communications media are the most significant forces behind change in society. Conversely, other writers, drawing on a range of literature from the historical reception of new technologies to the social construction of science, suggest that we are not seeing an emergent information society, rather, an 'informatization' of society (Webster, 1995). Rather than standing outside of society, it is argued, science and technology are, to a large extent, socially constructed.
Secondly, it is frequently suggested that the shift to the electronic representation of information brought about by the Internet and associated technologies such as hypertext, marks a decisive break with the types of knowledge produced by previous modes of communication. This change is customarily compared to the way in which the increased proliferation of the printed word had a profound impact on societies which had previously transmitted information primarily through the spoken word. Hypertext, it is argued, has a similar capacity to affect our culture, language, education, and intellectual life (Landow, 1992).
New technologies are frequently the stimulus for hyperbolic pronouncements, and the debate surrounding them is rapidly cast in terms of their liberating promise, or their potentially deleterious effects. The Internet and, by extension, the World Wide Web and hypertext, have been no exception to this rhetoric. However, this rhetoric, regardless of whether one takes an optimistic or pessimistic stance, is based on an assumption that these new technologies have an essential nature which stands outside of the historical, political, and social situation in which they were created.
This study aims to review some of the ways in which the nature of hypertext has been essentialized: either as democratizing and liberating, or as the cause of information overload and 'dumbing down'. By outlining the different ways in which hypertext applications can be constructed, and the various social influences which impact upon this construction process, I will attempt to offer a way of conceptualizing new communications technologies which avoids imbuing them with an essential nature.
Overview of chapters
Chapter 1
presents an overview of literature related to the social impact of new information and communications technologies (ICTs). The debate between those who see a decisive break with the past and the emergence of an 'information society' and those who see continuity with the past and merely an 'informatization' of society is reviewed and the various theories critiqued. Since the online documentation is being developed using HTML and XML, the history of hypertext and the Web are outlined and the way in which hypertext has been received by the humanities is critically reviewed.Chapter 2 provides information on the context in which the study is being undertaken. The concept of computer assisted interviewing (CAI) is introduced and explained. In particular, the history and impact of computer assisted personal interviewing (CAPI) in social survey research are reviewed and the problems of producing documentation for CAPI are discussed. The Family Resources Survey (FRS) is described, and the need for the current project is explained.
Chapter 3 outlines the way in which the project has been undertaken. The chapter outlines the design principles which have underpinned the development of the online documentation for the FRS. The current documentation is described in more detail, and the structure and content of the documentation website are outlined.
Chapter 4 provides some preliminary analysis of the study, based on the development of the pilot documentation, and drawing on the academic literature on hypertext.
1 Change or continuity: the information society or informatization?
In his exhaustive review of theories of the information society, Webster (1995) draws attention to what he sees as a 'major divide' in current theories of the impact on society of new information and communication technologies:
'On the one side are subscribers to the notion of an "information society", while on the other are those who insist that we have only had the "informatisation" of established relationships... One the hand there are those who subscribe to the notion that in recent times we have seen emerge "information societies" which are marked by their differences from hitherto existing societies... One the other hand there are scholars who, while happy to concede that information has taken on a special significance in the modern era, insist that the central feature of the present is its continuities with the past' [pp. 4-5].
This seems, therefore, to be an apparently straightforward, if polarized, debate between those who emphasize that as a result of new ICTs we are seeing a profound change in the organization of society, and those who suggest that these ICTs in fact do not alter and often even reinforce existing social relationships. The question is one of continuity or change. However, as Webster goes on to argue, this is not just a simple divide, but reflects deeper fault lines in social theory:
'[T]his is not a mere academic division, since the different terminology reveals how one is best to understand what is happening in the informational realm' [p. 4].
The idea that there is a technical realm within society which, in important ways, remains unaffected by the realm of values and beliefs - yet is the bedrock on which society is founded - has stayed with us as new information and communication technologies (ICTs) have emerged over the last twenty years. Writers such as Manuel Castells and Daniel Bell, leading proponents of theories of the current emergence of new forms of society transformed by information technology, have 'the same emphasis on the transformational, indeed foundational, characteristics of changes in techniques of production throughout history and, most recently, in the role of information and knowledge' [Webster, 1995: 196]. Webster characterizes such thinking, particularly from Castells, as a form of Althusserian Scientific Marxism which draws a distinction between relations of production (classes) and forces of production (techniques). This creates a fundamental dichotomy between the realm of values and social organization, and that of technique and technology.
This idea has not gone uncontested, from within both Marxist theory and sociological perspectives. Gouldner (1980) suggests there are 'Two Marxisms': Scientific Marxism, as described above, and a tradition of Critical Marxism, present in writers such as Theodor Adorno, Herbert Marcuse, and, in the UK, Raymond Williams and E.P. Thompson. These writers are marked by:
'a characteristic refusal to privilege technique in examination of social change, either by regarding it as the primum mobile of change or by presenting it as something set apart from the social world' [Webster, 1995: 198].
It is argued instead that technology and technique are part of a whole arrangement of relationships under capitalism, which have to be understood in their historical context, and in ways which mean that social values are present in the process of technological development itself. From a social constructionist tradition, various sociologists have argued that both science and technology are not developed independent of social pressures and values (e.g. Latour and Woolgar, 1979; Mackenzie and Wajcman, 1985).
In this chapter I will use Webster's distinction between theories of the information society and informatization to organize my review of the literature on new technology and society. I shall begin by critiquing the idea of the 'information society' in two main ways. Firstly, drawing on Robins and Webster (1999), I shall argue that such an approach is technologically deterministic: it prioritizes technology as the prime impetus for the perceived change in society and thus implicitly removes technology from the 'realm of values and beliefs'. Associated with this is the fact that the view of historical change offered in these models tends to be simplistically teleological, implying a straightforward trajectory which society has followed since pre-industrial times. Secondly, I shall argue that this approach fails to address the substantive issue of how technologies are constructed and implemented, and instead provokes a broadly sterile debate between believers in utopian and dystopian views of the potential of the 'information society'. The rhetoric of optimism and pessimism which marks these debates carries with it an implicit assumption that technology impacts upon rather than is influenced by the social realm; technology is thus withdrawn from the 'realm of values and beliefs'.
This opens up my discussion to those who have contributed to the literature on the social shaping of technology, who are deeply critical of 'black box' views which divorce technology from societal influences. This body of work can be broadly characterized as a social constructionist approach which emphasizes user experiences and social influences on the innovation process to highlight continuities in our current experience with ICTs. A discussion of the appeal to historical evidence as a rhetorical tool for supporters of informatization leads into my critique of the theoretical arguments used to support the notion that what we see is more of the same, rather than a decisive break with the past. I shall broadly agree but, nevertheless, I shall argue that they contain a fallacy in their implicit association of this line of argument with a valorization of the notion of 'modernity'. In invoking this concept, theorists of informatization reinforce the idea of the possibility of a epochal break in society by providing the rhetoric to support the opposing case for continuity.
In so arguing, I wish to suggest that a broadly social constructionist approach to technology, as shown in the social shaping of technology literature, can have validity without falling into the trap of placing technological advances on one side of the continuity/change divide. This leads me into the second section of this introductory chapter, in which I draw to the fore the hypertext technologies which are being used for my current research. After reviewing the history of hypertext, and its confluence with the Internet which has made it a particularly social technology, I shall suggest that these technologies have been the source of the same types of rhetoric which I have discussed in more general terms at the start of this chapter: i.e. a form of technological determinism which supports a simplistic and teleological conceptualization of the trajectory of change in society, and an almost inevitable degeneration into opposing optimistic and pessimistic visions of the impact of technology.
Moreover, and in an interesting twist, I shall argue that hypertext technologies have been seized upon by the humanities as a means to provide a technological justification for emphasizing the social and contextualized nature of knowledge. In so doing, however, this has led to a conceptualization of hypertext by the humanities which, whilst emphasizing the contextualizing capabilities of hypertext, paradoxically, reinforces the notion of technology as something which stands beyond social influence. Drawing on the work of Hunter (1999), I wish to argue the case for a view of technology which situates it within its social and historical context, but does not make appeals to grand narratives of historical epochs. Technologies do not automatically usher in new eras nor do they necessarily reinforce the dynamics of old ones; they are neither intrinsically good nor inevitably bad - they are what we make them.
Theories of the emergence of a new society based on changes in the technical organization of information have taken a number of forms over the past thirty years. In the mid 1970s, Daniel Bell (1976) argued that a new system was emerging: a 'post-industrial society'; this was marked by the decline of industrial and agricultural sectors of the economy and the growth of the service sector. The post-industrial society was distinguished by a heightened presence and significance of information, both quantitatively and qualitatively, i.e. that new forms of information, which he termed 'theoretical knowledge' were becoming more important. Bell's thesis contains a number of significant flaws, not the least being, Webster suggests [p. 31], its unsustainability as anything other than an ideal type construct, its teleology, and its endorsement of a convergence theory of development (i.e. that all societies are set on the same of developmental journey, from pre-industrial, to industrial, to post-industrial). During the 1980s, this hypothesis found new expression in the work of Piore and Sabel (1984) who suggested we were 'living through a second industrial divide' which was comparable to the one which brought about mass production at the end of the nineteenth century. This involved a shift to 'flexible specialization', a change from the repetitive forms of labour epitomized by Fordism and scientific management towards a form of labour which emphasized the skills of workers and greater variety in goods. Flexible specialization, it was argued, was particularly found in the sorts of small, high-tech firms which had begun to emerge in Silicon Valley from the late 1970s onwards. It is emerging information technologies which are seen as the major facilitator of this flexibility.
The most significant variation on this theme in recent years has been Castells' articulation of the 'informational mode of development' (Castells, 1996); i.e. a new form of economic organization in society predicated on advancements in information technology. New developments and applications of information technology are causing a new form of social organization, the 'rise of the network society'. As he argues:
'A new economy has emerged in the last two decades on a worldwide scale. I call it informational and global to identify its fundamental distinctive features and to emphasize their intertwining. It is informational because the productivity and competitiveness of units or agents in this economy... fundamentally depend upon their capacity to generate, process, and apply efficiently knowledge-based information. It is global because the core activities of production, consumption, and circulation... are organized on a global scale, either directly or through a network of linkages between economic agents... The Information Technology Revolution provides the indispensable, material basis for such a new economy... [T]he evolution of technology has indeed largely determined the productive capacity of society and standards of living, as well as social forms of economic organization. Yet... we are witnessing a point of historical discontinuity' [pp. 66-67].
One would be hard pressed to find a more unequivocal expression of the special transformational capabilities of technology. Robins and Webster [1999: 68-73] provide a robust argument against the discourse of the emergence of an information society. The argument challenges the information society hypothesis on the grounds of its technological determinism, i.e. that it prioritizes technology as a prime impetus for change in society - in its neo-McLuhanite forms, as the primary determining factor. This removes technology from its social context and treats it as an isolated phenomenon. Once desocialized, technology is also seen as neutral, a tool to be used either appropriately or not, depending on the motives of a society. This, they suggest, can have potentially deleterious political effects: 'if technology is socially neutral and leaves politics choices to the public, then on what reasonable grounds can it be suspected?' they ask rhetorically [p. 69]. Moreover, they argue, this conception of technology constructs a particular view of history, one in which history is seen as the process of technological advance, and which carries an underlying inevitabilism [p. 69]. This perception, they believe, construes technology as 'a hidden hand in development apart from the social issues of power and control' [p. 70]. The view of historical change offered in these theories is a simplistically teleological one, bound to notions of inevitable technological (and hence societal) progress.
Golding (2000) collects evidence from empirical research to suggest that there are four fallacies associated with the information age thesis. The first fallacy relates to identity. He argues that much writing on identity in the information age, particularly from cyberculture studies (e.g. Jones, 1995 and Turkle, 1997) contains what he calls the 'fallacy of the postmodern subject' [p. 172]; i.e. that stable identities are being eroded. He counters this by pointing to the resilience of expressions of national identities (Billig, 1995). The second fallacy relates to inequality; it is claimed that ICTs will lead to the end of deprivation and need [p. 174]; against this he sets ONS statistics which suggest 'a settling pattern of high users and excluded non-users which will provide a digital underpinning to structures of material inequality that are more likely to become self-replicating than abating' [p. 175]. The third fallacy, he suggests, relates to power; a fallacy of interactivity. It is suggested that new ICTs will create more democratic forms of political organization and stronger communities. Golding argues that instead individualization, unequal access, and disenfranchisement could as easily be the result of net politics [p. 176]. The fourth fallacy concerns change: that there is a fundamental shift in the organization of society related to the compression of time and space: Cairncross' (1998) 'death of distance' - 'probably the single most important force shaping society in the first half of the next century' [p. 1]. Golding draws on studies of travel statistics and the slow take-up of homeworking to argue that this notion is fallacious.
Conceptualizing new technology as the main driver of social change fails to address the substantive issue of how technologies are constructed and implemented. Instead it provokes a sterile debate between believers in utopian and dystopian visions of the potential of the 'information society'. For example, contemporary debate over the impact of the Internet and digital communication technologies, is underscored by a great deal of ambivalence about what to make of these innovations. Women are portrayed as either disadvantaged by technologies which reward a 'masculine' sense of mastery (Turkle, 1988), or as appropriating such technologies to their own ends (Spender, 1995). The fortunes made by dot-com millionaires are discussed in the same breath as the bursting of the net bubble. In the contemporary debates over crime and new technology, 'the Internet' seems almost to be synonymous with pornography, or else providing the key to successful surveillance of criminal activity. Whilst failing to capture the dystopic element which plays such an important role in the construction of this deterministically based discussion of new technology, Robins and Webster (1999) excoriate:
'this wishful marketing discourse, with its magical vision of new technologies as the solution to our social ills - promoting participatory politics, material comfort, improved pedagogy, better communications, restored community, and whatever else you may think of' [p. 5].
An analysis of the way in which the Internet has been conceptualized as impacting on political action provides a good example. Rheingold (1994) suggests that there are two visions of the Net running through contemporary discourse [pp. 14-15]. Enthusiasts see the Net as the new 'agora', i.e. the Athenian marketplace where citizens met to talk and debate. In this vision, the Internet has the potential to revitalize democracy, and enables people to form communities across gender, class, race and national boundaries. Rheingold's own description of his experiences of the WELL online community focuses primarily on the positive aspects of cyberspace: he describes numerous occasions in which community members provide each other with information and emotional support. For example, when the young son of one member became seriously ill, the community rallied round with practical information and advice on his condition from a doctor in the community, and well-wishes and emotional support from other community members. Rheingold also describes to great effect the impact on the WELL community of the death of one of its more eccentric members.
Conversely, Rheingold suggests, pessimists see the emergence of a 'panopticon', a term devised by Bentham and popularized by Foucault to describe an environment in which people act as if they were under surveillance all the time. Samarajiva (1996) suggests that electronic environments by their nature are more open to surveillance than physical environments, since tracking and storage procedures can be built into their design:
'Being relatively more malleable than physical environments, electronic environments are more conducive to dynamic and continuous exercise of control through… technical features… [E]lectronic environments can be designed to enable pervasive and transparent surveillance through the tracking of usage patterns and long-term storage of such information' [p. 133].
Whilst Samarajiva suggests that such surveillance can be evaded by users either unaware of it or actively subverting it, other commentators remain more pessimistic. Shields (1996) describes the potentially deleterious effects of new technologies on our existing communities:
'The neglect of face to face communities has also raised fears about the decline of the public sphere into a virtual world controlled by telecommunications corporations where only the privileged have access and the body is disdained as an embarrassing and imperfect support for minds infatuated with virtual, representational bodies' [p. 1].
It appears, therefore, that we can conceptualize the impact of the Internet on political action in two ways: as a means of breaking down community barriers and revitalizing social interaction, or else as a means for furthering the interest of government and big business at the expense of an already moribund public sphere. However, as Mansell and Silverstone (1996) somewhat dryly point out:
'Simplistic utopian or dystopian visions of the future provide us neither with an understanding of how these changes come about nor with an understanding of the longer-term implications' [p. 3].
The rhetoric of optimism or pessimism which permeates contemporary discussion about new ICTs carries with it an implicit assumption that technology impacts upon rather than is influenced by the social realm; technology is thus withdrawn from the 'realm of values and beliefs'. In the following section I shall review the various models which have been offered to counter such technological determinism.
The social shaping of technology
Various writers, and particularly those associated with the Programme on Information and Communication Technologies (PICT) studying the 'social shaping of technology', have challenged determinism in discussions about technology. As Edge (1995) criticizes:
'such approaches imply technological determinism; a linear model of the innovation process which treats technology as a "black box", and is preoccupied with the "social impacts" of a largely pre-determined technological trajectory' [p. 1].
Edge and others suggest a different view of technology: a broadly social constructionist approach asks questions of the origin and evolution of the technology itself, which pays attention to the flexibility of the innovation process, and focuses on the flexibility of and choices made during this process. The emphasis is thus on the socially embedded nature of technological development and the social factors which can shape the innovation process.
Users, communities and identities
Mackay (1995) provides an overview of two main theoretical approaches to the study of the social shaping of technology. The first approach outlined by Mackay draws on media and cultural theorists such as Stuart Hall and David Morley, and is primarily concerned with consumption. This perspective emphasizes how design and development processes may encode preferred forms of deployment which are reinforced through marketing and that, in this semiological sense, one might propose that a technology is a text. This, in turn, puts a greater emphasis on the role of the decoder of the text; that is, the user. Combined with Hall's notion of the polysemic nature of texts, i.e. that texts always have several possible readings, the ways in which users may (or may not) appropriate technologies for ends other than those intended by their creators, comes to the fore. As Mackay summarizes:
'Technologies facilitate, they do not determine, and they may be used in a variety of ways… The subjective, social appropriation of a technology is thus a crucial force in the social shaping of technology - one which cannot be "read off" from either the physical technology or the social forces behind its development' [p. 45].
Literature devoted to the user traditionally emphasized the pathological nature of computer use. Turkle (1997) points out how the word is used for other pathological attachments (e.g. drug use). Psychological studies have emphasized forms of technophobia (Brosnan, 1998) and addiction (Shotton, 1989; Orleans and Walters, 1996). Drawing on Gilligan's work (Gilligan, 1982) on the different ways in which men and women view their social worlds, writers such as Turkle (1988) have emphasized the ways in which women become 'reticent' towards computer technology 'because the computer becomes a personal and cultural symbol of what a woman is not' [p. 41]. She argues elsewhere (Turkle, 1984) that there are two styles of mastering technology: one is an orderly, rational and systematic approach aimed at achieving precisely defined goals; the other is concerned more with the aesthetics of the final result than a precise blueprint, and skills are learnt through trial and error. The second style, Turkle suggests, is associated more with girls and woman, and is traditionally unrewarded in the acquisition of IT skills, where mastery of specific programming tools is more valued. The difference that Turkle suggests is based on Gilligan's argument that men perceive their social world as a 'hierarchy' of autonomous positions whilst women perceive a 'web' of interconnections between people. As Turkle goes on to note, computers may become more attractive to women when perceived as supporting communication through networks (see Spender, 1995).
With the emergence of cyberculture studies in the early 1990s, the literature on users of technology became substantially focused on use of the Internet. Silver (2000) describes how cyberculture studies became focused on the 'twin pillars' of investigating collective communities and online identities. These two areas are respectively epitomized by two classic texts in the field: Howard Rheingold's study The Virtual Community (1994) and Sherry Turkle's Life on the Screen (1997). Silver goes on to suggest that a new field of 'critical cyberculture studies' is beginning to emerge, which seeks to offer 'more complex, more problematized findings' [p. 24] about using the Internet.
Social influences on the innovation process
The second theoretical perspective within the social shaping of technology literature, which Mackay terms a 'neo-Marxist' approach and associates with workplace theorists such as Harry Braverman, argues that technological change cannot be fully understood solely by reference to individual inventions, and that there is a need to examine how broader social and economic forces affect the nature of technological problems and solutions. Edge (1995) suggests various ways in which social factors may act in the 'shaping' process. Social factors may influence selection between available technological possibilities; they may permit only one area of 'possible' technological development to be explored, to the extent that it becomes difficult to talk of 'alternatives'; they may operate by creating a particular environment (e.g. market) or intellectual climate where only certain technological configurations succeed; they may shape technological development by the specific embodiment of social models into the technology.
As Mackay points out, from this point of view it is a small leap to seeing technology as an implicitly political social phenomenon. Such an approach draws on the 'political economy' approach of Herbert Schiller which, whilst acknowledging the increased importance of information technologies in our current era, also stresses their centrality to ongoing developments, and argues that communications technologies are foundational elements of established and familiar capitalist endeavour. For example, a 'political economy' of the Internet would look behind the information presented on websites to look at the structural features: e.g. patterns of ownership, or sources of advertising revenue, arguing that such factors constrain what information is presented. An area of interest might be the ways in which organizations can pay to increase the likelihood that their site will be listed first in a particular search engine, or the ways in which certain groups in society are excluded from access to the Net. Underpinning this perspective is an assumption that even with all the additional information and new, virtuoso technologies, the priorities and pressures of capitalism remain the same (e.g. Slevin, 2000).
In seeking to emphasize continuity with the past, writers in this field frequently offer examples which emphasize the long histories of apparently new technological developments, and also similarities between the reception of earlier technologies and 'new' ICTs. For example, Winston (1998), in what Golding (2000) calls 'a book-length assault on technological determinism' [p. 171], traces the history of the Internet back to the mid 1800s and the 'first wired network' - the telegraph system. He argues:
'In order to provide a context for outlining the development of the Internet we need to go back to the beginning, to the start of electronic communications, to show how central the building of networks has been to their success and how much the current networking of computers conforms to these historical patterns' [p. 243].
Winston goes on to develop a model of what he calls 'the "law" of the suppression of radical potential', i.e. a process whereby 'general social constraints coalesce to limit the potential of [technologies] radically to disrupt pre-existing social formations' [p. 11] or what Golding calls 'the solidity and endurance of social and economic formations in the face of technical novelty' [2000: 171]. In the case of the Internet, Winston emphasizes the incremental nature of its inception rather than it being a radical invention:
'The Internet emerges in the US in the 1970s as a species of spin-off from a (largely still classified) national security project rather than any sort of "discrete" invention' [p. 325].
From the late 1980s, control of the Internet's backbone was handed over from the US government-funded National Science Foundation to private telecommunications giants Sprint, Ameritech and Pacific Bell (a transition completed in 1995). Winston suggests that:
'Those who seriously believed they were in a brave new world of free and democratic communications were simply ignoring the reality of their situation... a straightforwardly classic expression of the suppression of radical potential whereby the new technology is distributed among the established players to minimise the threat to their business' [p. 334].
Winston carefully apostrophizes the word 'law' and insists that 'although the phenomenon under discussion can be found in the histories of all telecommunications technologies it is not so regular as always to manifest itself in the same form with equal force at the same point of development' [p. 12]. Nonetheless, it is difficult to see how he can avoid precisely the charges of inevitability and determinism which he himself is laying at the door of those who speak of radical change.
Historical studies which emphasize the similarities between the reception of new technologies in the past seek to show how the claims made for them then often mirror claims made for ICTs now. For example, Marvin (1988), in her study of the social impact of the telephone, cites a newspaper report enthusiastically outlining a vision of a society changed by the recent arrival of a new technology:
'[N]othing less than a new organization of society - a state of things in which every individual, however secluded, will have at call every other individual in the community, to the saving of no end of society and business complications, of needless goings to and fro, of disappointments, delays, and a countless host of those great and little evils and annoyances which go so far under present conditions to make life laborious and unsatisfactory.'
This does not refer to the Internet, or to email, but to the telephone, and it appeared in the Scientific American in January 1880. These sorts of claims are instantly recognizable to those tracking the reception of and debates surrounding the impact of new communication technologies 120 years later. Marvin's discussion of the way in which the telephone and the telegraph were received and discussed in contemporary discourse makes for familiar reading. She describes how it was widely discussed how fortunes could be made through the use of these technologies; how women's use of new technology was singled out for particular discussion; how the new communications techniques were brought to the fore in debates over fighting crime and 'an association between sensational crime and the new electric media was strong in popular and expert literature' [p. 92].
The use of history as a rhetorical device to counter technological determinism is a deliberate tactic on the part of some writers. As Robins and Webster (1999) write:
'Without history, the new technologies become an unstoppable force which, though incomprehensible to [ordinary people], is understood sufficiently for them to realise that they must change their whole lives' [p. 74].
They argue that treating technology ahistorically creates:
'a general sense of acquiescence to innovation. We believe this happens because technology, without discernible origins, is something that ordinary people cannot understand' [p. 74].
In their discussion of the continuities between our current experience of new technology and those of the past, Robins and Webster appropriate Luddism as a 'motif'; they argue Luddism was a response to 'the unfolding logic of the Enclosures movement... [and t]he logic of enclosure was the logic of the new capitalist order' [p. 7]. Their refusal to accept the notion of a information society is thus an explicitly political stance. They argue that new ICTs do not justify the utopian (or, indeed, dystopian) rhetoric which surrounds them and suggest they often work to reproduce 'conservative' social practices:
'[W]hat is unfolding now is the continuation of what was set in motion in the early-nineteenth century: what we now call the global information economy is, we argue, the most recent expression of the capitalist mobilization of society... [T]here is much about the "information revolution" that is just business as usual (if the technologies are new, the social visions that they generate tend to be surprisingly conservative' [pp. 6-7].
Golding (2000) also associates the impact of new technologies with the continuation of modernity:
'In assessing the impact, both recent and immanent, of these technologies, we find, above all, the abiding fault lines of modernity' [p. 179].
These fault lines he sees as being associated with three broad trends: convergence, i.e. the merging of ICTs as big entertainment corporations from different spheres (TV, video, film, internet etc.) engage in takeovers and mergers; the deregulation of state intervention into communications industries; and differentiation, i.e. a translation of income inequalities into ICT stratification - the process of social exclusion of the poorer segments of society from new ICTs [p. 179]. He goes on to invoke a specific debate within social theory, between those who are suggesting society has moved into a postmodern era and those who, whilst acknowledging flaws in modernity, still perceive value in its 'project':
'In part this [his approach] is an insistence on the endurance of modernity and the intellectual and political baggage that comes with it. In part it is a plea for a stay of execution of the core tools and methods of the sociological imagination, and a reminder that the basis of prediction lies in examining social dynamics rather than technological innovation' [p. 166].
However, advocating a continuation of the 'project of modernity' is an entirely different issue from taking a broadly social constructionist view of the relationship between new technology and society. One does not necessarily entail the other, as Golding implies. Moreover, in invoking the concept of modernity, theorists of informatization reinforce the idea of the possibility of a decisive break in society by providing the rhetoric to support the opposing case for continuity. Robins and Webster's complaint of the tendency in information society theories to herald the arrival of a new epoch [1999: 75] rings false: the discourse of modernity invokes the discourse of postmodernity; both are grand narratives of historical epochs which carry with them implicit notions of inevitabilism, progress and historicism - precisely those deterministic notions which these theorists attempt to criticize.
A social constructionist approach to the technology can have validity without falling into the trap of placing technological advances on one side of the continuity/change divide. In the following chapter, I shall draw on the work of Hunter (1999) to argue the case for a conceptualization of technology which situates it within its social and historical context, but does not make appeals to grand narratives of historical epochs. Golding (2000) gives a short-hand taxonomy of technologies which gives away the residual deterministic tendencies of his argument:
'We can conceive of two forms of technological innovation. Technology One allows existing social action and process to occur more speedily, more efficiently, of conveniently (though equally possibly, with negative consequences, such as pollution or risk). Technology Two enables wholly new forms of activity previously impracticable or even inconceivable' [p. 171].
Developments in biotechnology, he suggests, might constitute a 'Technology Two'; they may 'presage real change in what human action and activity might obtain and pursue' [p. 172]. But, he argues, '[i]n essence, many new ICTs are more obviously Technology One than Technology Two' [p. 171].
In the following section, I draw to the fore the hypertext technologies which are being used for the current research. After reviewing the history of hypertext and its confluence with the Internet, I shall illustrate how hypertext has been the source of the same types of rhetoric which I have discussed in this chapter: i.e. a form of technological determinism which supports a simplistic and teleological conceptualization of the trajectory of change in society, and an almost inevitable degeneration into opposing optimistic and pessimistic visions of technology. Despite its usefulness, I believe that Golding's distinction between types of technology still emphasizes distinctive features of the technology itself. I intend to offer a way of conceptualizing hypertext which moves away from imbuing the technology with a fixed nature and which, instead, highlights the different ways in which the technology is applied.
The development and reception of hypertext
Hypertext technologies (such as HTML and XML), when used across the Internet or other computer networks, are technical means of communication: as such they are significant in discussions related to the social impact of new technology and, also, the epistemological impact of different modes of representation on information. This section will focus on the emergence and reception of hypertext. An overview of the historical development of hypertext will be given, and an account given of the reception of hypertext applications by the humanities. It will be suggested that hypertext offers a good example of the rhetoric which surrounds the reception of both new technologies and new forms of communication, and which reinforces the idea that hypertext technology has a fixed nature which encourages the organization of knowledge in a certain way (flexible, non-hierarchical, relative). This idea will be criticized, and an alternative way of conceptualizing hypertext outlined.
History of hypertext and the World Wide Web
The term hypertext describes a method of presenting information in which text, images, sounds, and actions become linked together in a complex, nonsequential web of associations that permit the user to browse through related topics, regardless of the presented order of the topics. These links are often established both by the author of a hypertext document and by the user, depending on the intent of the hypertext document.
The term hypertext itself was coined in 1965 by Ted Nelson to describe documents, as presented by a computer, that express the non-linear structure of ideas, as opposed to the linear format of books, films, and speech. However, the concept of a mechanical web of information linked by association rather than selection predates Nelson by two decades. In July 1945, Vannevar Bush, a professor at MIT who had been associated with the development of the computer, outlined his vision of a machine would allow access to the sum of human knowledge. This microfilm/audio recording device, which he called a memex, would allow '[s]election by association rather than by indexing' which, he argued, more closely mirrored the workings of the human mind:
'When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can only be in one place... The human mind does not work that way. It operates by association With one item in its grasp, it snaps instantly to the next that is suggested by the association of thought, in accordance with some intricate web of trails carried by the cells of the brain' [Bush, 1945: 105].
The memex would thus provide a mechanical means of mirroring the associative selection and indexing patterns of the human mind:
[The memex] affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of the memex. The process of tying two items together is the important thing... When the user is building a trail... [i]t is exactly as though the physical items had been gathered together to form a new book. It is more than this, for any item can be joined to numerous trails' [1945: 106].
Although hypertext-based software applications were developed and became increasingly popular during the 1970s and 1980s, it was not until the early 1990s that something approaching Bush's vision emerged. This depended on two distinct but related developments; firstly, the expansion of the Internet and, secondly, the creation of an easy-to-use graphical user interface (GUI).
The Internet started in the late 1960s as a US military experiment into the construction of robust computer networks that could withstand nuclear attack. This network, called the ARPANET, was a great technical success, but limited in scope. In 1983, the military felt the technology was stable and mature enough to be useful, and separated the ARPANET into two halves: MILNET (an operational network used by the military), and the other half was left for research by universities. In 1984, since connection to the ARPANET was dependent on being part of a military-funded research project, the US National Science Foundation started building a successor to the ARPANET, called NSFNET. During the 1980s, the ARPANET's role as a backbone linking other networks was gradually taken over by NSFNET, and the ARPANET was decommissioned in 1990 (Slevin, 2000).
NSFNET's initial exclusion of commercial traffic encouraged the growth of competitive private backbone networks, and it was not until the National Science Foundation's decision in the late 1980s to privatize key parts of its network operation that opened the Internet to these commercial networks. In 1995, NSFNET was shut down completely, and most internet traffic is now carried by commercial networks (Slevin, 2000). However, use of the Internet in the late 1980s remained limited: the software to access and navigate one's way around the information available was arcane, the information itself disorganized (Musciano and Kennedy, 1998).
The real explosion of the Internet came in the early 1990s when a user-friendly means of accessing and navigating one's way around the network finally became available (Musciano and Kennedy, 1998). Physicists at CERN (Europe's advanced atomic particle accelerator), notably Tim Berners-Lee, released an authoring language and distribution system which they had developed for creating and sharing multimedia-enabled, integrated electronic documents over the Internet called Hypertext Markup Language (HTML). HTML presented to the user unified text, pictures and sound that previously had appeared as fragmented items. Significantly, the Web allowed hypertext linking, so that documents located anywhere in the world could be connected together (Cailliau, 2000)
. Vannevar Bush's dream of a mechanical web which linked information by association seemed to have become a reality (Winston, 1998).
Hype and hypertext: the reception of hypertext by the humanities
The fascination and enthusiasm with which hypertext has been greeted by the humanities lies in the apparent solutions it offers to the analytical constraints of formal computer applications, by virtue of being a communications - and hence social - medium. Where earlier software had dealt in organization, hierarchy, and categorization, hypertext appears to offer users the chance to organize texts in a less formalized, more associative fashion, contextualize information, and emphasize the relationships between bodies of knowledge. It is no surprise, therefore, that hypertext applications have been enthusiastically received by qualitative researchers and literary theorists, for whom textual analysis is at the heart of the research process. Hypertext, it is suggested, at last offers a technical means of scrutinizing and ordering information in a way which emphasizes the situated nature of knowledge.
The discussion of the potential of hypertext applications for qualitative social research has taken place within the context of a broader debate about the use of computer applications in qualitative research and, as
such, follows a recognizable pattern of debate in the face of new technology: a optimistic tone which seeks
to embrace the potential of the new tool, countered by a sceptical fear characterized by Hunter (1999) as accompanying the inception of all technologies that 'wisdom will be superseded by a patina of information' [1999: 94]. Fielding and Lee (1998) are enthusiastic about the potential of computer software in assisting the research process, and sum up the main advantages in terms of managing better data of multi-stranded, multi-sourced forms. They are, however, wary that 'technologies can interpose themselves in ways which serve to distance the researcher from the data' [1998: 1]. Fischer [1994: 199] notes that computers can encourage the use of research procedures because they are easy rather than because they are appropriate.
Weaver and Atkinson [1994: 10-11] combine a pessimism about the general use of analytical software with a proselytizing enthusiasm for hypertext. Speaking generally about software for qualitative analysis, they suggest that such applications do not reflect 'messy' reality, and argue that:
'[T]he uncritical adoption and implementation of microcomputer software - or indeed the wholesale endorsement of the general approach - may commit the researcher to an implicit and uncritical adoption of particular analytic strategies' [1994: 1].
Moreover, they go on to suggest that the introduction of computers into qualitative research may be tied up with issues related to the legitimization of data: it is bound up with attempts to 'clean up' the tradition, making it more systematic, standardized, and generally making its knowledge claims more acceptable to the scientific community [1994: 16].
Nevertheless, they remain very optimistic about the possibilities of hypertext software, which they see as the most superior analytical software available to qualitative researchers. Other forms of software, such as coding software, lose their contextual information, whereas hypertext, they suggest, is more flexible and dynamic, and encourages reflexive modes of thinking. Echoing Vannevar Bush, they draw a specific link between the way knowledge is constructed in a hypertext database and the way it is organized in the human mind, concluding that unlike traditional print media, where information is ordered in a linear, unidimensional way, hypertext systems complement the human thinking process. They conclude:
'Our research suggests that, because the ideas and trails of a researcher themselves become 'objects' in the same way as data, hypertext encourages thinking that is much more reflexive than that encouraged by other strategies. Researchers are encouraged to analyze and question their own idea, and the emerging construction of knowledge, in the same way that they do their data. The issue of reflexivity is bound up with the broader question of how different transformations of data make researchers think differently about the text' [1994: 160].
It is left to Barry (1998) to provide the contrary and expected pessimistic view, that hypertext's lack of structure and too much flexibility could lead to cognitive overdrive, a 'patina of information' superseding wisdom. But Barry's note of caution does not recast the terms of the debate: hypertext technology, it is assumed, produces information which is multilinear and non-hierarchical: the technology is imbued with an essential nature.
Moreover, since hypertext is both a technology and a communications medium concerned with the organization of knowledge, hypertext is given the capability to fix the nature of the knowledge which it represents, and is thus positioned as a driving force in social change. George Landow, one of the most influential writers on the convergence between critical and literary theory and hypertext, explicitly aligns the emergence of hypertext with a form of theory which emphasizes a decentralized, de-authored vision of the text:
'[M]any... who write on hypertext or literary theory argue that we must abandon conceptual systems founded on ideas of center, margin, hierarchy, and linearity and replace them with ones of multilinearity, nodes, links, and networks' [1992: 2].
The assumption is thus made that hypertext has the potential significantly to alter our culture and society; that this new technology can, in some way, act as a means of social change. Hypertext is positioned as the next agent of epistemological change in a trajectory which began with oral communication and has passed through the printed book to the current position. He sums up:
'Electronic text processing marks the next major step in information technology after the development of the printed book. It promises (or threatens) to produce effects on our culture, particularly on our literature, education, criticism and scholarship, just as radical as those produced by Gutenberg's movable type' [1992: 19; my emphasis].
The emphasis draws attention to the way, once again, that debates about the social impact of new technologies are cast in terms of optimism and pessimism - this imbues these technologies with a fixed nature, rather than emphasizing the extent to which social influences can impact upon both the development and the application of new technology. Similarly Bolter (1991) offers a somewhat teleological overview of the history of writing from the papyrus roll to hypertext that owes a great deal to McLuhan, and sees a profound shift taking place in our concept of the text, which moves between optimism and pessimism, and assumes electronic modes of representation will have specific effects:
'[The] shift to the computer will make writing more flexible, but it will also threaten the definitions of good writing and careful reading that have been fostered by the technique of printing... [T]he printing press encouraged us to think of a written text as an unchanging artifact, a monument to its author and its age... [it] also tended to magnify the distance between the author and the reader... Electronic writing emphasizes the impermanence and changeability of text, and it tends to reduce the distance between author and reader' [1991: 2-3].
His central point is that hypertext and the shift to electronic publishing necessarily imply a very different form from the printed book and challenge some of our basic assumptions about the organization and presentation of text:
'a hypertext is like a printed book that the author has attacked with a pair of scissors... the difference is that the electronic hypertext does not... dissolve into a disordered bundle, as the book would, because the author defines a scheme of connections to indicate relationships' [1991: 24].
Bolter's enthusiasm is tempered by a recognition of the potential hazards of hypertext which calls to mind the points made by Barry (1998). He notes [p. 67] that the impermanence of the electronic image discourages attention to detail. In a somewhat prescient remark (bearing in mind that the World Wide Web was only invented towards the end of 1990, and was not widely used or known), Bolter notes that:
'there is an inevitable degeneration in the quality of typography and graphics in the new electronic writing space, because the computer encourages the democratic feeling among its users that they can be their own designers' [1991: 66].
Mitra and Cohen (1999) outline a six-fold classification of the distinguishing features of hypertext, with specific reference to the Web.
They conclude:
'In summary, the WWW text has a set of specific characteristics predicated by its hypertextuality' [1999: 192; my emphasis].
These analyses are therefore dependent on imbuing hypertext technologies with a fixed nature. Such a conception of hypertext is, however, significantly flawed. By focusing primarily on the impact hypertext can have on the construction of knowledge, rather than on the designer's role in constructing applications, these analyses have essentialized the contextualizing capabilities of hypertext technology. Moreover, hypertext applications have been valorized as providing a means of accessing and creating information which, because of its contextualized nature is - paradoxically - seen as somehow more authentic. The humanities, chronically mindful of the truth-claims of scientific discourse which emphasize objectivity, have turned to hypertext to construct a defence of context - but it is a defence in which technology is seen as a legitimating force, and which reinforces a basic opposition in which we are invited to see the associative and the contextualized as somehow more 'authentic' than the hierarchical and the objective. Moreover, by granting hypertext technology this essential nature, the door is left open to technological determinism: hypertext is given the ability to fix the nature of knowledge; it is removed from the social sphere and granted special status as an agent of social and epistemological change.
It is this assumption of the fixed nature of hypertext applications which I intend to critique, by emphasizing the constructed nature of hypertext applications, and the significant part played by external influences in developing applications. As Lynette Hunter argues:
'[N]o technique is enclosing, isolating and reductive, or exploratory, contextualising and flexible, in itself; nor is either authenticity or self-reflexiveness in itself enabling. Communicative texts from all disciplines need a rhetorical analysis of stance, which will position the techniques and strategies historically, politically and socially. Such an analysis situates the textuality, and in so doing situates the knowledge' [1999: 6].
Hunter has developed a significant criticism of such technological determinism, with particular emphasis to hypertext, based on her own experiences of developing and using hypertext for primarily literary and historical applications. She specifically seeks to counter the notion that hypertext is somehow implicitly flexible, relative, and non-hierarchical [1999: 110-11], arguing that hypertext applications are often based on information chosen by their designer and ordered in a structured and often hierarchical fashion. To this end, she has developed a useful classification of hypertext applications, using case studies of projects as examples. This contrasts with the taxonomy developed by Mitra and Cohen by focusing on the application (which necessarily implies a designer or designers and emphasizes the socially constructed nature of hypertext) rather than on seeking to classify distinctive features of the technology itself.
Hunter suggests that hypertext projects can be grouped under four approaches: topic driven hypermedia texts, central text hypermedia, multidocument hypermedia, and hypermedia nests [p. 113].
The next chapter will outline the context of the case study, providing an overview of the FRS, the use of CAPI in social survey research, and specifying the problems of documenting CAPI-based surveys. Chapter 3 will provide fuller details of the pilot documentation for the FRS.
Chapter 4 presents a discussion of the pilot documentation informed by the academic literature on hypertext. Returning to Hunter's taxonomy and her emphasis on the situated nature of knowledge, I shall suggest, contra Landow, that rather than enabling a participatory creation of knowledge which is limitless in scope, hypertext technologies are bounded and informed by specific social constraints. These might include the purpose of the project; the institutional setting in which the project is embedded; the nature of the material to be presented; and the broader academic and professional community in which the material will be assessed. Whilst it will be shown that the electronic presentation of this material has many advantages, the documentation - in the process of its construction - has been substantially influenced by many and varied external criteria.
2 The use of CAPI in social survey research
This chapter provides information on the context in which the study is being undertaken. The concept of computer assisted interviewing (CAI) is introduced and explained. In particular, the history and impact of computer assisted personal interviewing (CAPI) in social survey research are reviewed and the problems of producing documentation for CAPI are discussed. The Family Resources Survey (FRS) is described, and the need for the current project is explained.
Computer assisted interviewing
Benefits of computer assisted interviewing (CAI)
Computer assisted data collection (CADAC) methods have become ever more popular in survey data collection, increasingly replacing pen and paper methods in academic, government and commercial environments (de Leeuw and Nicholls, 1996). Initial enthusiasm for these new methods stemmed from their potential for revolutionizing the data collection process. Manners (1990) and Saris (1991) identified some the expected benefits of introducing computer assisted interviewing (CAI) into large scale surveys:
Types of CAI
Many authors offer a taxonomy of the various forms of CAI (e.g. Snijkers, 1992; de Leeuw and Nicholls, 1996), and a variety of acronyms describe the various types:
Computer assisted telephone interviewing (CATI): The interviewer is seated behind a computer terminal and asks the questions which appear on screen; the respondent's answer is typed directly into the computer. The most usual CATI setup is using a network, with supervisors present for quality control and to assist with problems; however, technological change makes it possible for a decentralized CATI survey to be carried out, for instance from interviewers' own homes. In terms of problems with CATI, Martin et al. (1993) point to the higher number of households in the UK which do not have telephones (roughly 12%) as compared to the US, and suggest that this could lead to some sampling problems. CATI has been, in general, more popular in the USA than in the UK and Europe (de Leeuw and Nicholls, 1996).
Computer assisted personal interviewing (CAPI): Interviewers visit respondents in their own homes with a laptop computer, and conduct a face-to-face interview. After the interview, the data is sent back to a central location, either by disk or by modem. New interviewer instructions and samples addresses can also be sent this way (Baker, 1992; Martin and Manners, 1995).
Computer assisted self interviewing (CASI): Respondents themselves read and answer the questions on screen; the program guides the respondent through the questionnaire. An interviewer need not be present, although scenarios are emerging in which a CASI element is part of a broader CAPI interview, for example, when answering sensitive questions (Couper and Rowe, 1996).
Introduction of CAI
The historical development of the introduction and use of computers in survey research is well documented (see Baker, 1992; Manners, 1990). In the United States, which has, in general, placed more emphasis on telephone interviewing (de Leeuw and Nicholls, 1996), computerized versions of telephone interviews were in use from the 1970s. De Leeuw and Nicholls note that computer assisted 'mail' surveys, including using the Internet for data collection, are more prominent in the USA.
In Europe, the greater emphasis on face-to-face interviewing (de Leeuw and Nicholls, 1996), meant more advancement took place in the area of computer assisted personal interviewing (CAPI). Early adopters of CAPI were Statistics Sweden in the early 1980s, followed by considerable research into the effects of the new technology on the survey process in the both private and public sector. The increasing availability of laptops made CAPI a feasible option from the 1980s. The Netherlands Central Bureau of Statistics (CBS) was the first to establish a large, full-scale ongoing CAPI survey in 1987, with the Netherlands Labour Force Survey. In the UK, the Labour Force Survey was the first survey to use CAPI from 1990; the Family Resources Survey (FRS – started in 1992) was the first full-scale, ongoing survey in the UK to be conceived initially as a purely computer-based survey.
De Leeuw and Nicholls (1996) point out that 'whether computer assisted data collection [CADAC] methods should be used for survey data collection is no longer an issue. Most professional research organizations… are adopting these new methods with enthusiasm.' Nonetheless, a substantial amount of work has been conducted to satisfy users of the usefulness and reliability of CADAC methods compared to traditional survey techniques. Concerns have broadly surrounded three main area:
Effects on data quality
Nicholls et al. (1997) provide a review of the main research conducted on the effects of the transition to new data collection technologies on survey data quality.
In the case of unit non-response (i.e. failure to obtain the requisite information from a designated sample unit), there had been some concern that CATI or CAPI respondents would object to having their information stored on a computer. However, several studies comparing refusal rates in both CAPI and CATI with those of equivalent paper and pencil control groups have typically found no significant differences (Baker et al., 1995; Manners, 1990). Indeed, Baker et al. found a greater respondent willingness to disclose sensitive information (see also below on respondent and interviewer acceptance).
Item non-response is concerned with the extent to which respondents give poor answers, or the failure on the part of interviewers to ask questions. Nicholls et al. report that 'one of the most consistent conclusions of the CAI literature is that CAI can eliminate virtually all respondent and interviewer omissions of application items, but provides little or no reduction in rates of explicit refusals'. Baker et al. concur, in that they found lower item non-response for CAPI compared to PAPI (paper and pencil interviewing). This is attributed to the automation of branching by CAI programmes, which eradicates the possibility of interviewer branching errors, particularly in very complex questionnaires with multiple branching dependent on respondents' answers to previous questions. Sebestik et al. (1988, cited in Weeks, 1992) suggest that 90% of errors made by PAPI interviewers were failures to record a required response and thus impossible to do with CAPI. Nonetheless, Baker et al. point out that the elimination of interviewing errors in executing branching instructions assumes that the CAPI programme has been adequately designed and tested.
There was also concern that whilst automation of the process would remove the possibility of error in following the questionnaire, the software or hardware itself might introduce new error, for example, through keying errors and poor typing skills (Baker, 1992). Baker's article concludes that these concerns remain hypothetical and have not been found in empirical study. Dielman and Couper's (1995) empirical study of key presses when compared to audio tapes of the same interviews reports a 0.095% error rate, somewhat alleviating concern about the potential of typing errors introducing significant error into a CAPI survey.
Effects on costs and timescales
The introduction of CAI into the survey process theoretically offers the potential for speeding up the process by removing editing and inputting stages towards the end of the process and hence, also, reducing the cost of the process. However, additional costs emerge at the start of the process as a result of investing in new technology (laptop computers, etc.) and training interviewers to manage the new system. Sebestik et al. (1988, cited in Weeks, 1992), in an early study of costs of introducing CAPI, found that training costs for CAPI were on average 18% higher than a comparable pen and paper study, and field data costs were 17% higher.
The significant point here is that costs primarily surrounded the transition from a paper-based interview to a CAPI system. Once the infrastructure is in place, the benefits of a CAPI system emerge. As Baker et al. (1995) sum up, the effect of CAPI on costs: 'CAPI may initially be more expensive, but the cost differential is likely to narrow as organizations and interviewers gain experience using CAPI, especially as laptop costs decline.' Nevertheless, Poynter (2000) is pessimistic about the future of CAPI, predicting that by 2005 the high initial investment costs will lead to it being sidelined in survey research. He does, however, note that one exception will most likely be an increase in the use of handheld computers, because of their portability and the fact that they can also be connected to the Internet. At the moment, however, limitations in terms of memory mean that handheld computers cannot yet cope with large scale social surveys.
Respondent and interviewer acceptance
Baker (1992) outlines the potential barriers which could affect respondent and interviewer acceptance of new technology: concerns about confidentiality and seeing the computer as intrusive; the pace of the interview being controlled by the speed of the computer leading to a loss of rapport and eye contact between interviewer and respondent. His own studies reported that whilst a large majority of respondents were enthusiastic, a steady minority of 5% preferred paper and pencil versions of the interview. Most respondents thought that the interviewer appeared more professional with the computer.
Baker et al. (1995) reported a greater respondent willingness to disclose sensitive information, i.e. that computerized interviewing led to more accurate reporting of sensitive data. In general, however, studies have tended to support the claim that computerization per se has less effect on the reporting of sensitive behaviour than whether or not the interview is self-administered (Jobe and Pratt, 1997; Wright et al., 1998; Tourangeau and Smith, 1996). Wright et al. also investigated the effect of the age of the respondent on acceptability of computerized self-administered interviews, finding that younger respondents tended to have more positive attitudes towards and familiarity with computers than older respondents. This finding was supported by Couper and Rowe (1996), who found in a study that willingness to self-complete an interview rather than insisting on interviewer assisted completion was related to age, level of education, and computer experience. Buetow et al.'s (1996) study of the use of CAPI among Australian GPs supported the finding that older patients were more likely to regard the computer with suspicion, and to prefer not to operate the computer independently. For many, this was related to problems in reading the screen.
In terms of interviewer acceptance, Wojcik and Baker (1995, cited in Nicholls et al., 1997) describe broad interviewer acceptance of CAPI achieved by a combination of lighter and more powerful hardware, enhanced software, improved data management and communications, and carefully developed interviewer training. Once trained, most interviewers preferred to use CAPI. Martin et al. (1993) report that interviewers reacted well to using the computer and handled assignments without major problems. Some non-typists found the keyboard difficult, but it was argued that these problems would disappear with experience (Couper et al., 1997).
Couper and Burt's (1994) study of interviewer acceptance of CAPI contextualizes the impact of introducing a new form of working onto an experienced workforce. They note that 'computer anxiety is generally associated with age, and with women' and that, in many cases, CAPI involves the imposition of a new method onto a workforce which tends to fit the profile of people expected to be anxious about using computers. Their study of attitudes before and after using CAPI found that, in general, users were positive about the new technology, and that the key factor in determining attitudes was experience rather than interviewer attributes.
Computer assisted interviewing is now a given in the survey research process, and considerable research has been undertaken to investigate the potential effects of such new technology on the quality of the data and the research process. De Leeuw and Collins (1997) note that the focus is now shifting from data collection techniques and sum up that it is 'safe to assume that with well-trained interviewers and the same well-constructed questionnaire, both CAPI and CATI will perform well, and differences in data quality will be extremely small'. They suggest, however, that the 'human factors' of CAI have been neglected, under which heading they include such issues as: whether reading from a screen and typing require different perceptual and motor skills than going through a paper questionnaire; or, the significance that needs to be attached to the reports of interviewers that it is harder to grasp the overall structure of the questionnaire. Similarly, Couper and Nicholls (1998) conclude that the first effects of the change to computer-based survey research have been primarily operational and are already well-documented, i.e. the speed and efficiency with which surveys are conducted, and the completeness and consistency of data collection. They suggest that more important consequences surround issues such as the roles humans play in the collection process, and the nature of survey documentation.
Bateson and Hunter (1991a and 1991b) discuss the changing roles of professionals involved in survey research as a result of the move to new technologies, emphasizing the greater role played by the interviewee in achieving quality data, and the extent to which researchers now take on some of the role of expert programmers in being able to specify questionnaires and consistency checks. This is a more optimistic view than is generally taken: other researchers have emphasized that 'it has become more and more difficult for developers, interviewers, supervisors, and managers to keep control of the content and structure of CAI instruments' (Kelly, 1999). It is easy, and hence tempting, to add new functions to the instruments which rapidly make the program and its associated documentation more complicated.
In addition, CAI questionnaires are programmed rather than written: for example, in the case of the FRS the program is produced in BLAISE, a variant of the PASCAL programming language (Manners, 1990), which is not broadly understood or accessible to non-specialists. The process thus becomes less transparent, even to analysts with statistical expertise and familiarity with the survey. Clark and Maynard (1998) reflect on the needs of secondary analysts who are one more step removed from the creation of the questionnaire. Whereas the computerization of the survey process has made raw data files more easily accessible via the Internet, analysts also need access to the questionnaires on which the survey was based in order to track question routing and the context in which questions are asked. The lack of a formal or easily understood questionnaire is a bar to this process (Bethlehem and Manners, 1998).
Current solutions to documentation problems
As Martin and Manners (1995) point out, in a PAPI survey the questionnaire itself is a vital document for researchers and others to use as a record of what the survey covers. CAPI software, however, varies a great deal in the type of questionnaire documentation which can be produced. As Kelly (1999) points out: 'the documentation of CAI instruments [becomes] a separate task, one that was not necessary when paper questionnaires were used.' Kelly outlines the three most common current approaches of documenting questionnaires: producing separate questionnaire specifications independent of the CAI program; manual editing of the program; and, semi-automated documentation of the electronic questionnaire. These approaches are problematic to the extent that they either remain as opaque as the questionnaire they are trying to replace, or else have the potential to introduce error into the process. In addition, many continuous surveys face problems when updating their documentation from year to year as new questions are introduced or old ones dropped.
There are currently a number of projects aiming to document electronic CAPI questionnaires. Current solutions have tended to emphasize documenting either the questionnaire, or the survey metadata (metadata is data about data; this information allows analysts to use datasets more effectively).
The Data Documentation Initiative (DDI) is an international consortium of academic researchers and government statistical offices (including the ESRC Data Archive, which holds records of FRS data). The DDI aims to establish an international criterion and methodology for the content, presentation, transport, and preservation of metadata about datasets in order to produce codebooks which are uniform, highly structured, and are easily searchable on the Web. To this end, the DDI has developed what is known as a Document Type Definition (DTD) for the markup of social science codebooks. The DTD employs XML (the Extensible Markup Language), a 'next-generation' markup language which is a dialect of a more general markup language, SGML (Standardized General Markup Language). However, XML is a technology still in flux, whose own standards are currently in the process of being set. As a consequence of this, not all web browsers are capable of processing XML easily.
The TADEQ project (Bethlehem and Manners, 1998) is a collaborative R&D project funded by the European Union ESPRIT Programme, involving National Statistical Institutes, research institutes, and commercial marketing research organizations (TADEQ: a Tool for the Analysis and Documentation of Electronic Questionnaires). The project proposes to develop a tool to make a human-readable presentation (on paper or electronically in hypertext format) of the electronic questionnaire. The project has thus far focused on the development of its own XML DTD: the Questionnaire Definition Language (QDL).
A questionnaire generator called qgen, which processes questionnaire information automatically into HTML or XML has been developed at the MRC Biostatistics Unit, University of Cambridge; however, this currently can handle only very small scale questionnaires (Walker, 2000). In terms of documenting metadata, Card (2000) and see
http://www.socio.com) offers a means of providing information on and searching for variable metadata from multiple survey datasets; however, searchable forms of the survey questionnaires are not offered.
The Family Resources Survey (FRS) is a large and complex multi-level survey which collects information on the incomes and circumstances of around 22-24,000 private households in Great Britain. This includes household characteristics; income and receipt of Social Security benefits; tenure and housing costs; assets and savings; carers and those needing care; and, employment. The FRS contract is currently held by ONS Social Survey Division and the National Centre for Social Research (formerly known as Social and Community Planning Research), and has been since the first full survey year (1993/94). The project is managed by the UK Department of Social Security (DSS) Analytical Services Division 3E (ASD3E).
National benchmark surveys in the UK, such as the General Household Panel Survey, the Family Expenditure Survey, the British Crime Survey, and the British Social Attitude Survey, are all now CAPI based, having been paper and pencil questionnaires when they started. In contrast, the FRS has been a CAPI based survey since its inception in 1993. The FRS uses the CAPI program BLAISE, the program developed by Statistics Netherlands used by ONS and NCSR for almost all public sector surveys.
The current project aims to provide full documentation for the Family Resources Survey, for both the survey questionnaire and the metadata. The primary main constraint put on the project at the outset was that it would be accessible via the ASD intranet, and that it should be compatible with possible eventual use at the ESRC Data Archive, the main repository for government social survey documentation for external users. This meant that the documentation would be developed using HTML and, possibly, XML. The following chapter describes in detail the way in which the project has been undertaken.
3 Documenting the Family Resources Survey
This chapter outlines the design principles which have underpinned the development of the online documentation for the FRS. The current documentation is described in more detail, and the structure and content of the website are outlined.
Three main principles have informed the development of the online documentation for the FRS.
1. Accessible via the Web. The current FRS documentation exists in many disparate forms (documents, spreadsheets, graphics etc.), and one of the main goals of the project has been to present this information in a single place, on the FRS website on the DSS intranet (this also means that the documentation can, potentially, be made available to users external to the DSS, via the Web). The decision to use the website as the primary means of dissemination of this information meant that HTML (and, later, XML) is the primary tool for development.
2. Anticipating users. Girill and Luk (1992) suggest that the most significant feature in the design of hypertext-based documentation is that it should not confound user expectations. This principle has informed the FRS documentation on two levels. Firstly, in terms of the design of the pages, the site has been developed in accordance with other ASD3E pages available on the DSS intranet. Secondly, this principle underlies the organization of the information on the site: users should not be forced to 'hunt out' information - it should follow a pattern of organization which is familiar and expected.
3. Updating documentation. Since the FRS is an annual survey, and changes to the survey questionnaire are made each year, the documentation cannot remain static. It has to be designed so that it can be updated easily. Moreover, it could not be assumed that those people who would be updating the documentation would necessarily proficient in HTML.
Current FRS documentation falls into three categories, and exists in both paper and electronic form. There is general information on the survey, metadata related to variables on the hierarchical dataset, and the survey questionnaire. The following section describes each of these in more detail.
Background information on the FRS is currently documented in a Guide produced each year by ASD3E. This provides preliminary information on the background of the survey (history of the survey, response rates); a description of the structure of the dataset; and, programming information for SAS.
The hierarchical dataset of the FRS is made up of 24 tables, with each of almost 1500 variables associated with one of the tables. Table 1 lists the 24 tables and the number of variables associated with each.
Table 1 Tables in the FRS hierarchical dataset (FRS survey 1998-99)
|
Table name |
No. of variables |
|
ACCOUNTS |
9 |
|
ADMIN |
32 |
|
ADULT |
433 |
|
ASSETS |
17 |
|
BENEFITS |
30 |
|
BENUNIT |
84 |
|
CARE |
45 |
|
CHILD |
166 |
|
DSSPAY |
6 |
|
ENDOWMNT |
8 |
|
EXTCHILD |
11 |
|
HOUSEHOL |
221 |
|
INSURANC |
29 |
|
JOB |
158 |
|
MAINT |
25 |
|
MORTCONT |
8 |
|
MORTGAGE |
53 |
|
ODDJOB |
9 |
|
OWNER |
23 |
|
PENAMT |
7 |
|
PENSION |
28 |
|
RENTCONT |
7 |
|
RENTER |
53 |
|
VEHICLE |
5 |
|
Total: |
1440 |
Each individual variable has associated with it information (metadata), which includes:
This information is currently held on an Excel file of more than 6000 lines, which is searchable, but not easily navigable. One of the main tasks for the online documentation project was to develop a way of presenting this information in a more user-friendly way, whilst retaining the search functionality of the original Excel file.
In the case of the FRS, the electronic questionnaire, written in BLAISE code, is viewed by the interviewer on the laptop by a series of screens concerning questions to be put through the questionnaire not visible, but written into the programme. This is to ensure that only the right people are asked the right questions, and can involve the nesting of questions (for example, between the household and the individual level). The paper questionnaire has been replaced by printout in a relatively raw form of the questions and the routing instructions separately in BLAISE. The latest version of the FRS 'questionnaire', known as the BLAISE automatic documentation (BAD), produced by one of the survey agencies, is over 1000 pages long, and is not regarded as particularly user-friendly by those who have examined it.
The pilot documentation has been based on the 1998-99 survey (known within ASD as FRS 35). This was the most complete survey at the time the documentation was initiated. Figure 1 shows the structure of the current FRS documentation website. All documentation presented on the website is accessed through a single front page, and links from that page take the user on to further documentation. This further documentation, in turn, is presented in two sections: one of which provides a general introduction to the FRS; the other giving more detailed documentation related to the hierarchical dataset.
The introductory documentation is made up of five main sections linked from the FRS home page. These sections provide information on: the background to the survey; the structure of the database; programming examples; information on imputation (i.e. the processes whereby missing data are computed); and, an introduction to the paper-based questionnaire.
The detailed documentation presents an online version of metadata related to each variable in the hierarchical dataset, organized by table. This section of the website also provides links to two methods of searching the hierarchical dataset: by topic, and by variable name.
The next chapter presents a discussion of the development of the pilot documentation informed by Hunter's taxonomy and her emphasis on the situated nature of knowledge, as discussed in chapter 1.

4 Writing hypertext: the construction of online documentation
This section will draw on Hunter's classification of hypertext applications (discussed in chapter 1) to provide a preliminary analysis of the development of online documentation for the FRS. To recap, in her attempt to counter the notion that hypertext is implicitly flexible, relative, and non-hierarchical [1999: 110-11], Hunter has developed a useful classification of hypertext applications, using case studies of projects as examples. She suggests that these projects can be grouped under four approaches: topic driven hypermedia texts, central text hypermedia, multidocument hypermedia, and hypermedia nests [p. 113].
This remainder of this chapter presents a discussion of the development of the current online FRS documentation in the light of three of these approaches: multidocument hypermedia; topic driven hypermedia nests; and central text hypermedia.
The information currently presented on the FRS site has been derived from various sources, including Word documents, Excel files, and graphics. Presenting this varied information in HTML has involved a number of challenges, which illustrate some of the constraints and contextual limitations imposed on and the choices made when constructing hypertext applications. In each of these cases, it has not been assumed that an electronic representation of material is necessarily better than a paper-based version. There are, however, many examples where this has proven to be the case, and these shall be described.
The section of the website which presents introductory and background information to the FRS has much in common with Hunter's category of hypertext application which she calls 'multidocument hypermedia'. She describes this as 'a multiple document archive, using hypertext predominantly as a presentation device' [1999: 117]. Such hypermedia are primarily used for learning purposes.
The information provided on this section of the FRS website is primarily derived from the key introductory point for ASD staff to the FRS: the Guide to the Family Resources Survey. This is a 23-page booklet, updated for each release of the FRS, which provides preliminary information on the background and structure of the survey. The information contained within the Guide covers a variety of topics, from the history of the survey to SAS programming examples; viewing the Guide as a collection of documents or chapters on various themes related to the FRS makes it more easily understandable as a multiple information source.
The various sections of the Guide are presented as separate web pages. In summary, these pages contain information on:
Whilst several sections of the Guide are simply descriptive material, translating information between the computer and the written page is not simply a form of transcription. Other sections of the Guide benefit from having information presented in the non-linear format which hypertext can facilitate. For example, contrasting programming examples are more easily compared as two frames on a webpage than across several pages in a paper document.
Figure 2 shows how the survey dataset is produced in two different forms; a hierarchical dataset which has a more complex structure but which is easier to use once learnt, and a flatfile dataset which is more immediately accessible to the new user. New users are presented with programs which enable comparison of the two methods of programming in order to learn by example. In the paper-based documentation, these examples stretch across five pages, necessitating movement backwards and forwards through the text. When these examples are presented electronically, in a web browser, it is possible to present the two examples simultaneously so that a user can examine the different examples of programming more easily, using frames and scrolling within the browser to align similar and contrasting parts of the two programmes.
Figure 2 Example of non-linear presentation: programming examples

Another example relates to the presentation of the structure of the hierarchical dataset in electronic format (see Figure 3). In this example, the information in the paper-based Guide takes up four pages across which, again, the user is obliged to move backwards and forwards. On a web page, a table is set out at the top with each of the 24 table names, which provide hyperlinks to the section further down the page providing information on that particular table (in Figure 3, the table name 'MORTCONT' is in red to show a selected hyperlink).
Figure 3 Example of non-linear presentation: table descriptions for the hierarchical dataset

These examples provide very simple instances of the different ways in which paper and hypertext can be constructed to show the same information, and also show clearly that electronic documents are not necessarily dependent on linear presentation. However, the significant point is that this non-linear nature is a direct result of choices made by the designer of the site. It would have been equally possible to have presented the programming example as separate web pages, but the use of a technical capacity (frames) which HTML offers meant that a more effective presentation method could be used instead.
Hunter describes this approach as effectively providing 'a multidimensional filing system' for existing information which is already highly categorized and hierarchical. This approach has underpinned the bulk of the current documentation for the FRS. The information presented is rigorously categorized and completely pre-determined, and hence this approach offers a real challenge to the essentializing assumptions made about hypertext. In addition, the user has limited to no impact on the content. As Hunter summarizes:
'Topic-driven hypertexts are less problematic to construct than many other applications because by definition they count on a stable approach to material that does not unduly disrupt the expected order. In many ways they are more flexible and speedy enumerative bibliographic systems, and have no necessary connection to any social materiality that would require their users to engage with and assess the choice of material or the kinds of links that had been established... Such hypertexts are set up... for educated specialist users, to better present the formal, corporately held directions of information and make more effective the sense of a necessary answer or conclusion' [1999: 114; my emphasis].
These applications are therefore primarily aimed at presenting organized information for the use of specialists who want their needs to be anticipated. The metadata related to variables on the FRS hierarchical dataset represent such a body of information: as the name of the dataset implies, the information is already highly formalized and hierarchically organized. The information is thus intended for specialist users who approach it with specific expectations as to how this it will be presented. As Hunter summarizes about her own project:
'[T]he mere pre-existence of the information to be put into the hypertext meant that substantial categorising and hierarchising had already taken place, which could not be disrupted without unhelpfully disordering the expectations of users' [1999: 113; author's emphasis].
From the literature on developing online documentation, Girill and Luk (1992) confirm this rule. They argue [p. 573] that freeing information from its 'traditional constraints' leads to its own problems; that users can become disorientated, and even 'lost in hyperspace':
'if the underlying information has a stable, well-understood, and well-known structure, allow the network to reflect that structure... We believe that documentation systems designers can preserve the benefits of hierarchical access yet avoid its known weaknesses not by forsaking the use of text structure (as pure hypertext does) but rather by exploiting it in a different way' [pp. 574-6].
The remainder of this section will outline how the variable metadata have been presented on the FRS website. Until now this metadata has been stored on an Excel file: these different modes of representing the information will be compared and the benefits of the hypertext version will be outlined. Some of the problems which were found with the website structure will be discussed. Finally, the section will review the search facilities which have been designed for finding specific variables.
Structure of the hierarchical dataset website
The hierarchical dataset is made up of 24 tables, with each of nearly 1500 variables related to one of these tables. Metadata for each of these variables is held on a single Excel file which consists of nearly 6000 lines of data. The decision was made to dedicate a single web page to each variable, with links provided from the appropriate table page (see Card (2000) for a similar example of producing online variable metadata). Figure 4 outlines more fully the structure of this part of the website, showing the path for reaching one variable (ADCH) off the ACCOUNTS table; it can be seen how this structure resembles Hunter's 'multidimensional filing system'.

Figure 5 shows the ACCOUNTS table web page, with the link to the variable ADCH selected, and shown in red. Links are also provided to pages dedicated to the other variables associated with this table (ACCINT, ACCOUNT, etc.) and this pattern is repeated for all the tables in the hierarchical dataset (links to the other 23 table pages are provided on the left hand side of this page). Stylesheets were used to ensure that each page remained uniform. From each variable page, users can link to definitions of the various categories used, and also information on benefits and how the variables are used (see also Figure 6).
Figure 5 ACCOUNTS table web page, with link to variable ADCH shown

Variable metadata: comparing the spreadsheet and HTML versions
This section outlines the benefits of presenting variable metadata in HTML. Each column in the original spreadsheet contains specific types of metadata:
Individual variables make up a handful of rows in the spreadsheet; e.g., the spreadsheet entry for the variable ADCH under the ACCOUNTS table is made up of three lines of text (in a larger document of nearly 6000 lines). The user must locate the appropriate variable by searching for text, and then must refer to a paper document which lists which type of information is contained in each lettered column. Table 2 shows the lines in the spreadsheet for the variable ADCH.
Table 2 Spreadsheet information for variable ADCH
|
|
A |
B |
C |
D |
E |
F |
G |
H |
J |
L |
M |
N |
O |
Q |
||
|
Details |
TABLE |
VARIABLE |
VAR_FMT |
LABEL |
FRSVALUE |
FMTVALUE |
MINVAL |
MAXVAL |
DERIVED |
BENEFIT |
QUESTION |
TYPE |
BLOCK |
USAGE |
||
|
Metadata |
ACCOUNTS |
ADCH |
ACS_302X |
Child or |
1 |
Adult |
1 |
2 |
0 |
0 |
ADCH |
(I) |
BINTREST |
C |
||
|
held |
|
|
|
adult a/c? |
2 |
Child |
|
|
|
|
|
|
|
|
||
For example, to discover what type of variable ADCH is, the user of the spreadsheet would have to search for the variable, locate the appropriate column (using the FRS Guide to find out which column contains the appropriate information), and then check the definition of '(I)'. In contrast, Figure 6 presents all this information on a single page in HTML. As the selected hypertext link (in red) shows, users can click to access definitions of each type of metadata.
Figure 6 HTML information for variable ADCH
Problems with the HTML version
Whilst moving to an electronic version of the documentation brings substantial benefits, new and unforeseen problems can emerge. Two main problems emerged during the construction of the HTML version of the metadata. The first was that the statistical package (SAS) which generated the Excel version of the metadata automatically could not generate HTML pages automatically. This meant that each page had to be constructed individually. This problem will not exist with the implementation of the new version of SAS at the DSS, which does generate HTML automatically.
Another problem emerged when it became apparent that a handful of variables shared a name with a table (e.g. there is an ACCOUNTS table and an ACCOUNTS variable; or there is a CARE table and a CARE variable). Since all the HTML files were being stored in the same location and were being named according to their variable or table name, this would lead to duplication in file names. The solution was to give precedence to table files, and to indicate variable files with the suffix '_var': thus, the file containing the information for the ACCOUNTS table was named 'accounts.htm'; the file for the ACCOUNTS variable was named 'accounts_var.htm'.
Another, related problem emerged when it became apparent that there was duplication in the names of some variables under different tables. Some of these variables contained identical information (e.g. _MONTH_) and it was apparent that a single variable page would suffice: the ability of hypertext to link to a single file from multiple locations assisted in avoiding duplication here. However, some of these variables related to different tables. Again, this would lead to duplication in file names. The solution was to give the variable name a suffix indicating to which table it was associated. For example, two variables entitled ANYMON is associated with both the ADULT and the ASSETS tables. The file names were thus 'anymon_adult.htm' and 'anymon_assets.htm' respectively. These common variable names are being removed through various iterations of the survey.
In providing search facilities for the metadata, I have drawn on Hunter's suggestion that the user should be anticipated [1999: 114]. In addition, Card (2000) indicates two types of search facility which users of social science data are likely to want: by variable name; and, according to topic. Two search facilities have therefore been designed for finding specific variables. The first is a simple search on variable name: this is a front page which provides to links to pages listing all variable beginning with the same letter. Again, this is a simple example of allowing hypertext to link to a single file from multiple locations so that applications are flexible.
The second search facility enables users to find variables associated with various topics. Users enter their search terms into a standard HTML form and are linked to the appropriate pages on the website. In order to make the search successful, a two-level classification system was devised, in consultation with ASD3E staff. Table 3 shows the topic classification system which has been designed for the variable topic search. Twenty top-level topic categories have been isolated, some of which contain subcategories. This classification system reflects specific topic interests which users are likely to have (variables related to the council tax, to employment, to assets and savings, to benefits etc.). Each variable on the 1998-99 FRS dataset has been classified according to this system.
Table 3 Topic classification for variable topic search
|
|
Top level classification |
Second level (where applicable) |
|
1. |
Assets and savings |
Accounts and investments held Capital value Interest and dividends |
|
2. |
Care |
Childcare Informal care |
|
3. |
Consumer durables |
|
|
4. |
Council Tax |
|
|
5. |
Demographic characteristics |
|
|
6. |
Employment |
Earnings Employment Status Job description Self-employment income |
|
7. |
Health |
Children's health Health restrictions on work |
|
8. |
Housing Benefit |
|
|
9. |
Housing costs |
Charges on property Mortgages Rent Structure and contents insurance Water and sewerage |
|
10. |
Informal care |
|
|
11. |
Insurance policies |
|
|
12. |
Maintenance |
Maintenance paid Maintenance received |
|
13. |
NHS services |
|
|
14. |
Non-state pensions |
Income from pensions, trusts and annuities Pension scheme membership |
|
15. |
Other income |
Children's earnings Educational grants/loans Income from property Non-state benefits Odd jobs Royalties and allowances Welfare/school milk/meals |
|
16. |
Qualifications |
|
|
17. |
Social Security Benefits/Tax Credits |
|
|
18. |
Tenure |
|
|
19. |
Travel to work |
|
|
20. |
Unclassified |
|
The search engine is a JavaScript program. Variable names and classification topics are entered into the program, and this information is associated with the appropriate HTML variable page. Figure 7 shows how this information is associated in the JavaScript program for five different variables. Lines beginning 'title' show the search terms entered for each variable (compare with the topic classification system for 'Assets and Savings'); lines marked 'desc' give the variable description (the label from the metadata); lines marked 'links' point to the variable HTML file.
Figure 7 JavaScript showing how search terms are entered for each variable
title
[2]="assets savings interest dividends accint"This illustrates how what would appear to be an open-ended facility, in which users can enter search terms, can be constructed in a highly controlled fashion, using a hierarchical classification system. The process is tightly controlled by the application designer.
In summary, the work done on documenting the variable metadata provides an example of Hunter's category of topic-driven hypermedia text. The metadata are highly formalized and hierarchically organized, and are meant for specialist users who are approaching the information with specific expectations as to how it will be presented and what kind of information they want. These principles are carried into constructing the topic search facility, where a hierarchical classification based on expected user needs was devised.
The most recent work on the FRS website is developing in a fashion similar to another of Hunter's approaches to hypertext applications. 'Central text hypermedia' provide 'an information shell surrounding the central text, a text which may be a person/writer or a literary artefact' [1999: 115]. In this case the central text is the FRS questionnaire.
At the moment, it seems that two constraints will affect the development of this hypertext application. Firstly, the issue of standards will become significant. Standards for developing computer applications are one of the most significant ways in which social constraints can impact upon the development of new technology. As Hawkins (1996) puts it in his overview of the social impact on the process of developing standards:
'[T]he crucial question for social scientists is: "Given the critical role of standards in providing for operational functionality in electronic communications networks, how are the technical and non-technical criteria synthesized in the process of selecting and applying standards?"' [1996: 158].
In this case, two XML markup standards for electronic questionnaire documentation are in the process of being developed, both of which could be appropriate for the FRS. The first is the Data Documentation Initiative Document Type Definition (DDI DTD: see chapter 2). This consortium includes the ESRC Data Archive, who hold records of FRS data. The second XML DTD, the Questionnaire Definition Language (QDL) is emerging from the TADEQ project (see chapter 2). A main player in the TADEQ project is the Social Survey Division of the Office of National Statistics (ONS), who have conducted the field research for the FRS since its inception.
Moreover, XML is a technology still in flux, whose own standards are currently in the process of being set. As a consequence of this, not all web browsers are capable of processing XML easily. This is a potential second constraint on the development of the online documentation, in that ways of working within either the DDI's or the TADEQ project's standards which can also be used by existing web browsers will need to be developed.
To summarize: the current online documentation for the Family Resources Survey reflects several of the approaches outlined by Hunter in her classification of hypertext applications.
The various forms which the online documentation has already taken go some way to questioning the idea that hypertext applications are necessarily flexible, non-hierarchical, and organized in a relative fashion. In addition, it is clear that hypertext applications are bounded and informed by specific social constraints. These might include the expectations of the users; the institutional setting in which the project is embedded; the nature of the material to be presented; and the broader academic and professional community in which the material will be assessed. Whilst the electronic presentation of this material has many advantages, the documentation - in the process of its construction - has been substantially influenced by many and varied external criteria.
Baker, R. P. (1992) New Technology in Survey Research: Computer Assisted Personal Interviewing (CAPI). Social Science Computer Review 10: 145-157.
Baker, R. P., Bradburn, N.M. and Johnson, R.A. (1995) Computer-Assisted Personal Interviewing: An Experimental Evaluation of Data Quality and Costs. Journal of Official Statistics 11: 413-431.
Barry, C. A. (1998) Choosing Qualitative Data Analysis Software: Atlas/ti and Nudist Compared. Sociological Research Online 3(3). <
http://www.socresonline.org.uk/socresonline/3/3/4.html>Bateson, N. and Hunter, P. (1991a) The Use of CAPI for Official British Surveys. Bulletin de methodologie sociologique 30: 16-26.
Bateson, N. and Hunter, P. (1991b) The use of Computer Assisted Personal Interviewing for official British surveys. Survey Methodology Bulletin 28: 26-33.
Bell, D. (1976) The Coming of Post-Industrial Society. Harmondsworth: Penguin.
Bethelehem, J. and Manners, T. (1998) TADEQ: A Tool for Analysing and Documenting Electronic Questionnaires. Paper presented at the 5th International BLAISE Users' Conference. Lillehammer, November 1998.
Billig, M. (1995) Banal Nationalism. London: Sage.
Bolter, J.D. (1991) Writing Space: The Computer, Hypertext, and the History of Writing. Hillsdale, NJ: Lawrence Erlbaum Associates.
Brosnan, M.J. (1998) Technophobia: The psychological impact of information technology. London: Routledge.
Buetow, S.A., Douglas, M., Harris, P. and McCulloch, C. (1996) Computer-Assisted Interviewing. Social Science Computer Review 14: 205-212.
Bush, V. (1945) As we may think. The Atlantic Monthly. July, 1945.
Cailliau, R. (2000) A Little History of the World Wide Web. W3C website. <
http://www.w3.org/History.html>Cairncross, F. (1998) The Death of Distance: How the Communications Revolution Will Change Our Lives. London: Orion Books.
Card, J.J. (2000) Development and Dissemination of an Electronic Library of Social Science Data. Social Science Computer Review 18: 82-86.
Castells, M. (1996) The Rise of the Network Society. Oxford: Blackwell.
Clark, R. and Maynard, M. (1998) Research Methodology: Using online technology for secondary analysis of survey research data - 'Act globally, think locally'. Social Science Computer Review 16: 58-71.
Couper, M.P. and Rowe, B. (1996) Evaluation of a computer-assisted self-interview component in a computer-assisted personal interview survey. Public Opinion Quarterly 60: 89-105.
Couper, M.P. and Burt, G. (1994) Interviewer Attitudes Toward Computer-Assisted Personal Interviewing (CAPI). Social Science Computer Review 12: 38-53.
Couper, M.P. and Nicholls, W.L. (1998) The History and Development of Computer Assisted Survey Information Collection Methods. In: Couper, M.P, Baker, R.P., Bethlehem, J., Clark, C.Z.F., Martin, J., Nicholls, W.L. and O'Reilly, J.M., (Eds.) Computer Assisted Survey Information Collection. New York: Wiley.
Couper, M.P., Hansen, S.E. and Sadosky, S.A. (1997) Evaluating Interviewer Use of CAPI Technology. In: Lyberg, L., Biemer, P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N. and Trewin, D., (Eds.) Survey Measurement and Process Quality. New York: Wiley.
de Leeuw, E. and Collins, M. (1997) Data Collection Methods and Survey Quality: An Overview. In: Lyberg, L., Biemer, P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N. and Trewin, D., (Eds.) Survey Measurement and Process Quality. New York: Wiley.
de Leeuw, E. and Nicholls, W. L. (1996) Technological Innovations in Data Collections: Acceptance, Data Quality and Costs. Sociological Research Online 1(4). <
http://www.socresonline.org.uk/socresonline/1/4/leeuw.html>Dielman, L. and Couper, M.P. (1995) Data Quality in a CAPI Survey: Keying Errors. Journal of Official Statistics 11: 141-146.
Edge, D. (1995) The Social Shaping of Technology. In: Heap, N., Thomas, R., Einon, G., Mason, R. and Mackay, H., (Eds.) IT and Society: A Reader. London: Sage.
Fielding, N.G. and Lee, R.M. (1998) Computer Analysis and Qualitative Research. London: Sage.
Fischer, M. (1994) Applications in computing for social anthropologists. London: Routledge.
Gilligan, C. (1982) In a Different Voice. London: Harvard University Press.
Girill, T. and Luk, C. (1992) Hierarchical search support for hypertext online documentation. International Journal of Man-Machine Studies 36: 571-585.
Golding, P. (2000) Forthcoming Features: Information and Communications Technologies and the Sociology of the Future. Sociology 34: 165-184.
Gouldner, A.W. (1980) The Two Marxisms: Contradictions and Anomalies in the Development of Theory. London: Macmillan.
Hawkins, R. (1996) Standards for Communication Technologies: Negotiating Industrial Biases in Network Design. In: Mansell, R. and Silverstone, R., (Eds.) Communication by Design: The Politics of Information and Communication Technologies. Oxford: Oxford University Press.
Hunter, L. (1999) Critiques of Knowing: Situated textualities in science, computing and the arts. London: Routledge.
Jobe, J.B. and Pratt, W.F. (1997) Effects of Interview Mode on Sensitive Questions in a Fertility Survey. In: Lyberg, L., Biemer, P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N. and Trewin, D., (Eds.) Survey Measurement and Process Quality. New York: Wiley.
Jones, S. (1995) CyberSociety: Computer-Mediated Communication and Community. London: Sage.
Kelly, M. (1999). What users want from a tool for analysing and documenting electronic questionnaires: the user requirements for the TADEQ project. Paper presented at the Association for Survey Computing 3rd International Conference. Edinburgh, September 1999.
Landow, G. P. (1992) Hypertext: The Convergence of Contemporary Critical Theory and Computing. London: The John Hopkins University Press.
Latour, B. and Woolgar, S. (1979) Laboratory Life: The Social Construction of Scientific Fact. London: Sage.
Mackay, H. (1995) Theorising the IT/Society Relationship. In: Heap, N., Thomas, R., Einon, G., Mason, R. and Mackay, H., (Eds.) IT and Society: A Reader. London: Sage.
Mackenzie, D. and Wajcman, J. (1985) Introductory Essay. In: Mackenzie, D. and Wajcman, J., (Eds.) The Social Shaping of Technology. Milton Keynes: Open University Press.
Manners, T. (1990) The development of Computer Assisted Interviewing (CAI) for Household Surveys: The Case of the British Labour Force Survey. Survey Methodology Bulletin 27: 1-5.
Mansell, R. and Silverstone, R. (1996) Introduction. In: Mansell, R. and Silverstone, R., (Eds.) Communication by Design: The Politics of Information and Communication Technologies. Oxford: Oxford University Press.
Martin, J. and Manners, T. (1995) Computer assisted personal interviewing in survey research. In: Lee, R.M., (Ed.) Information Technology for the Social Scientist. London: UCL Press.
Marvin, C. (1988) When Old Technologies Were New. Oxford: Oxford University Press.
Mitra, A. and Cohen, E. (1999) Analyzing the Web: Directions and Challenges. In: Jones, S., (Ed.) Doing Internet Research: Critical Issues and Methods. London: Sage.
Musciano, C. and Kennedy, B. (1998) HTML: The Definitive Guide. Sebastopol: O'Reilly.
Nicholls, W.L., Baker, R.P. and Martin, J. (1997) The effect of New Data Collection Technologies on Survey Data Quality. In: Lyberg, L., Biemer, P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N. and Trewin, D., (Eds.) Survey Measurement and Process Quality. New York: Wiley.
Orleans, M. and Walters, G.T. (1996) Human-computer enmeshment: Identity Diffusion Through Mastery. Social Science Computer Review 14: 144-156.
Piore, M. and Sabel, C. (1984) The Second Industrial Divide: New York: Basic Books.
Poynter, R. (2000) We've got five years. Paper presented at the Association for Survey Computing Conference on 'Survey Research on the Internet - the Honeymoon is Over'. London, September 2000.
Rheingold, H. (1994) The Virtual Community: Finding Connection in a Computerized World. London: Secker and Warburg.
Robins, K. and Webster, F. (1999) Times of the Technoculture: From the information society to the virtual life. London: Routledge.
Samarajiva, R. (1996) Surveillance by Design: Public Networks and the Control of Consumption. In: Mansell, R. and Silverstone, R., (Eds.) Communication by Design: The Politics of Information and Communication Technologies. Oxford: Oxford University Press.
Saris, W.E. (1991) Computer-Assisted Interviewing. Newbury Park: Sage.
Shields, R. (1996) Introduction: Virtual Spaces, Real Histories and Living Bodies. In: Shields, R., (Ed.) Cultures of Internet: Virtual Spaces, Real Histories, Living Bodies. London: Sage.
Shotton, M. A. (1989) Computer addiction?: a study of computer dependency. London: Taylor & Francis.
Silver, D. (2000) Looking Backwards, Looking Forwards: Cyberculture Studies 1990-2000. In: Gauntlett, D., (Ed.) Web.Studies: Rewiring media studies for the digital age. London: Arnold.
Slevin, J. (2000) The Internet and Society. Cambridge: Polity.
Snijkers, G.J.M.E. (1992) Computer Assisted Interviewing: Telephone or Personal? In: Westlake, A., Banks, R., Payne, C. and Orchard, T., (Eds.) Survey and Statistical Computing. Amsterdam: Elsevier.
Spender, D. (1995) Nattering on the Net. Melbourne: Spinifex Press.
Tourangeau, R. and Smith, T.W. (1996) Asking Sensitive Questions: The impact of data collection mode, question format and question content. Public Opinion Quarterly 60: 275-304.
Turkle, S. (1984) The Second Self: Computers and the Human Spirit. New York: Simon and Schuster.
Turkle, S. (1988) Computational Reticence: why women fear the intimate machine. In: Kramarae, C., (Ed.) Technology and Women's Voices: Keeping in Touch. London: Routledge.
Turkle, S. (1997) Life on the Screen: Identity in the Age of the Internet. London: Phoenix.
Walker, N. (2000) Automated Study Documentation: The Web and XML. Paper presented at the Association for Survey Computing Conference on 'Automatically Better? The impact of automation on the survey process'. London, April 2000.
Weaver, A. and Atkinson, P. (1994) Microcomputing and Qualitative Data Analysis. Aldershot: Avebury.
Webster, F. (1995) Theories of the Information Society. London: Routledge.
Weeks, M.F. (1992) Computer-Assisted Survey Information Collection: A Review of CASIC Methods and Their Implication for Survey Operations. Journal of Official Statistics 18: 445-465.
Winston, B. (1998) Media Technology and Society A History: From the Telegraph to the Internet. London: Routledge.
Wright, D.L., Aquilino, W.S. and Supple, A.J. (1998) A comparison of computer-assisted and pencil and paper self-administered questionnaires in a survey on smoking, alcohol, and drug use. Public Opinion Quarterly 62: 331-353.