File under random: non-binary sex data and imputation in the 2021 census

The Scottish Government has proposed that the sex question in the 2021 census should no longer ask whether people are male or female, but instead should ask whether a person identifies as male, female or another category, based on self-declared gender identity. While the Stage One report on the bill has recommended retaining the existing binary sex question, and introducing a separate voluntary question on transgender identity, the format of the sex question is still to be decided. With this in mind, this blog looks at the data handling proposals, should an additional non-binary response option be introduced.

The proposed change to the sex question is not intended to provide published data on the number of people who identify as neither male or female. Instead, officials have agreed that any non-binary responses would be randomly assigned to the standard male/female sex response options, in order to maintain consistency with the data in all previous censuses since 1801. In practice, this means that officials would treat any non-binary responses to the sex question as ‘non-responses’, or missing data.

As with all social surveys, non-responses in the census are not unusual. For instance, respondents may fail to answer certain questions, provide inconsistent answers that don’t make sense in relation to other responses, or tick all of the response options. For the purposes of data quality, it is however important to correct inconsistencies and where appropriate, provide estimates for missing data. Non-responses in the census are usually dealt with using imputation, a statistical technique designed to minimise the risks associated with missing data. Imputation methods usually adjust for non-response bias. that is, they take into account the fact that non-responses are not randomly distributed across the population. In the census, this involves imputing answers from ‘donor’ records that otherwise resemble the incomplete record as closely as possible, also known also as nearest neighbour imputation.

However a different approach has been proposed for the dealing with any non-binary sex data in the 2021 census,  Instead of using donor records, officials plan to use random imputation. This is based on the assumption that the number of non-binary responses will be too small for a random approach to have any significant effect on overall accuracy, together with stakeholder wishes that sex should not be imputed based on other similar responses.

There is nonetheless some some risk in relation to data quality, simply because the results are unpredictable. While there is little data on trans and non-binary populations, in a recent large-scale UK government LGBT survey (N=108,100) trans respondents made up 13% of the sample, with non-binary respondents making up over half of all those identifying as transgender  (p. 16 Figure 3.1). Across the full sample, 7% of respondents identified as non-binary. This proportion was higher among younger age-groups, with 11.6% of those aged 16 or 17, and 9.4% of  those aged 18 to 24 years describing themselves in this way (Annex 3:  Gender identity Q.1).

These differences suggest that the risk to data quality will be higher at the sub-population level because any non-binary responses are unlikely to be randomly distributed across the population. As suggested above, the proportion of non-binary responses seems likely to be higher among younger age groups. Natal sex may also be a relevant factor, given the significantly higher proportion of referrals of natal girls to gender identity services, as well as socio-economic status.

The census is a major, costly administrative exercise that provides a once-in-a-decade opportunity to get an accurate, comprehensive and consistent picture of Scotland’s population. This raises questions as to why officials have proposed a departure from established statistical techniques designed to minimise the risks associated with non-response bias, or allowing stakeholder wishes to shape the approach taken in this way.