Content associated with: General report, 1931    Page 164

The use of statistical and data processing equipment in the GRO

Edward Higgs

The analysis of the data collected in the census-taking process in the nineteenth and early twentieth centuries fell into two stages. First the information gathered in the census enumerators' books had to be placed into various categories — so many people aged 21–30, so many people employed as innkeepers, so many people residing in London who had been born in Lancashire, and so on. Secondly, there were various computations necessary to sum these figures, to create percentages for the Census Reports, to create various ratios, and so on.

The first process, data abstraction, was the main clerical task undertaken by the census officials in London. There is little information on how John Rickman, who was responsible for the censuses of 1801 to 1831, went about this work. Since the schedules he sent out to overseers of the poor, clergymen, and schoolmasters in Scotland, asked them to place the numbers of people in their parishes in various categories, the census data were already categorised (Higgs, 1989, 5–7, 114–19). Presumably all he needed to do was to sum up the results.

However, from 1841 onwards details about the characteristics of every person in Britain were collected, and this vast collection of individual items of information had to be categorised in the Census Office run by the General Register Office (GRO). Until 1911 all the tables in the Census Reports were created using tabling sheets and the 'ticking' method. In the case of occupational abstraction, for example, the tabling sheets were large pieces of paper with occupational headings down one side and age ranges across the top. These headings were ruled across the sheet, creating boxes into which the census clerks put a tick for an occurrence in the census enumerators' returns of a person of the relevant age and occupation. The ticks in the columns were then added up, and the results placed in another series of columns on another sheet, giving the numbers of people under particular occupational headings within particular age groups. Sheets were created in this manner for each registration sub-district. In order to create tables by registration districts, the sheets for sub-districts had to be folded at the column to be totalled and then lined up so that they overlapped, and the figures were then read off on to district sheets. Figures were also transferred from district to county sheets in a similar manner. This was all very cumbersome, and was one of the reasons why the Victorian GRO was so reluctant to increase the scope of the census questions (Higgs, 1996a, 155–6, 206–8).

This manual technology was coming under increased strain in the early years of the twentieth century, as new questions were added to the census. In 1911 the GRO was anxious to collect new information about marital fertility (Census of England and Wales, 1911. Vol. XIII. Fertility of marriage. Part I; Census of England and Wales, 1911. Vol. XII. Fertility of marriage. Part II; Szreter). In order to analyse the fertility data collected in the 1911 census, and that gathered by other new census enquiries, the Office introduced Hollerith machine tabulators. These had been invented by Herman Hollerith in 1890 for use in the US census of that year, and were being introduced across Europe. This was a consequence of the increasing size and complexity of national census enumerations across the developed world in a period of widening state intervention in society (Higgs, 1996b).

Machine tabulation broke data analysis down into two stages. First, information on individuals was punched on cards, and secondly, the information on the cards was read electronically. Pads with spring-loaded pins were brought down on individual cards, and if the pins passed through a punched hole they completed a circuit through which electricity passed to move the dial of a counter. This separated data capture from data analysis, since the cards could be analysed in differing ways, and as many times as required. At a stroke the bottlenecks in the GRO's manual system of data processing were removed, opening up whole new possibilities for statistical manipulation. The invention of the 'database' thus preceded that of the electronic computer by more than half a century (Higgs, 2004, 156–78).

The general reluctance of the GRO to expand its data-processing operations in the Victorian period stands in marked contrast to its willingness to experiment with new forms of computational technology. It did not try to find new ways to reduce the mass of individual survey returns to quantitative results but it did attempt to find ways of manipulating the latter more easily. For example, as early as 1858 the GRO had purchased one of George Scheutz's versions of Charles Babbage's difference engine. The machine's capacity to replace both human 'computers' and compositors, and thus produce reliable tables, was seen as its most important advantage. In practice, however, the machine was less than reliable, but was still used in the Office for computation for the next 20 years (Higgs, 2003, 222–4).

Perhaps the real innovation in the GRO's computational arrangements came with the introduction of commercial mechanical calculators. The first commercially manufactured calculating machine, the 'arithmometer', had been designed by Thomas de Colmar in 1820. However, the GRO's purchase of an arithmometer in 1870 for £20 was an extremely early example of the practical application of this technology in Britain. William Farr, the GRO's Superintendent of Statistics, informed the Treasury that the machine had doubled the number of calculations a clerk could make in a set time, whilst enhancing their accuracy. In the course of the next 40 years the GRO purchased a number of other such machines. In the 1890s the GRO also began to employ various forms of slide rule, which the Office found both more useful and cheaper than arithmometers (Higgs, 2003, 224–5).

This technology helped the GRO to analyse ever more complex and voluminous information in the census returns, with only a modest increase in resources.

REFERENCES

Census of England and Wales, 1911, Vol. XIII. Fertility of marriage. Part I. BPP 1917–18 XXXV. [View this document: Fertility of marriage (part I), 1911]

Census of England and Wales, 1911, Vol. XII. Fertility of marriage. Part II (London: HMSO, 1923). [View this document: Fertility of marriage (part II), 1911]

Edward Higgs, Making sense of the census. The manuscript returns for England and Wales, 1801–1901 (London, 1989).

Edward Higgs, A clearer sense of the census: the Victorian census and historical research (London, 1996a).

Edward Higgs, 'The statistical Big Bang of 1911: ideology, technological innovation and the production of medical statistics', Social History of Medicine, 9 (1996), 409–26.

Edward Higgs, 'The General Register Office and the tabulation of data, 1837–1939' in Martin Campbell Kelly, Mary Croarkin, John Fauvel and Raymond Flood, eds, From Sumer to the spreadsheets: the curious history of tables (Oxford, 2003), 209–34.

Edward Higgs, Life, death and statistics: civil registration, censuses and the work of the General Register Office, 1837–1952 (Hatfield, 2004).

Simon Szreter, Fertility, class and gender in Britain 1860–1940 (Cambridge, 1996).