Symposium: Demonstrating Risks Is Not the Same as Estimating Prevalence
It is a great pleasure to have the chance to comment on the paper by Pablo Diego-Rosell and Jacqueline Joudo Larsen. They identify strong correlations between various risk factors and the estimated prevalence of modern slavery. They provide evidence for vulnerabilities both for countries and for individuals. They themselves recognize the policy and operational significance of these vulnerabilities, allowing resources and interventions to be focused appropriately, helping us develop a deeper understanding of this terrible crime.
My main concern is with a different issue: prevalence estimates for individual countries. Can the risk model be used for prevalence estimation in any particular country? The risk-factor model is good for explanation, but that does not mean it can be used reliably for prediction or estimation. Contrary to the authors’ assertion that the model can be used to accurately predict prevalence at the country level, the prevalence estimates for individual countries are extremely imprecise.
This imprecision is set out, partially, in the authors’ Appendix D. For the United States, for example, the estimate of total prevalence is 0.51 per cent with a standard deviation of 0.33 per cent. Section 3.4 of the paper suggests that to get a 95 per cent prediction interval you would use 0.51 per cent ± 0.66 per cent, in other words that the actual value could be anywhere between −0.15 per cent and 1.17 per cent. This translates, roughly speaking, to a number of victims between −0.5 million and 4 million. Obviously a negative prevalence is not possible, and some sort of transformation is appropriate to make the posterior distribution more symmetrical, but the principle is the same: the model cannot be used to make individual country predictions to any useful degree of accuracy. This is not surprising, nor does it detract from the value of identifying risk factors.
A further demonstration is given by looking in detail at Figure 6 (bottom right).
Even if the single point with very large prediction and prevalence (Uganda) is left out, the remaining points have a correlation of about 0.95, so an r value of 0.97. But the residuals from the diagonal line—which for those points is virtually identical to the line fitted by linear regression—have a standard deviation of 0.2 per cent. So, even on the countries for which the data have been fitted, if a country has mean predicted prevalence z per cent, a confidence interval for the actual prevalence is (z±0.4) per cent. This calculation does not take account of all the details of the model, but the accuracy, or inaccuracy, is in the same ball-park as in Appendix D. Although there is large correlation, the individual predictions are very inaccurate.
It is unfortunate that the Global Slavery Index (GSI) 2018 report itself is silent on the precision of the individual estimates. It is poor statistical practice to give an estimate without also saying how accurate you think it is. Therefore, I am delighted that the authors’ background paper contains more information about precision.
However, there are two other matters that increase further the imprecision of the individual prevalence estimates. None of the 48 countries surveyed is in developed Asia, Western Europe or North America. Predictions for such countries are based on extrapolation from countries in different parts of the world, and therefore are subject to an additional layer of unquantifiable uncertainty. It may well be that such developed countries have higher—or lower—levels of resilience to certain risks; we simply cannot tell from these data. Indeed, at the other end of the spectrum, certain “high-risk” countries also have specific characteristics not present in the training set.
Secondly, if I understand correctly, the surveys asked about interviewees and their immediate family, and so each interview essentially yielded information on a small group of people. It is not clear if the obvious dependencies between individuals in a family were accounted for in the model fitting, but if they were not then there is another source of imprecision.
The authors were generous with me in giving me the exact coordinates of the points in Figure 6, but in closing may I make a plea for much more release of data and methodology. I very much hope that they will release the actual program scripts in Stata and R, and also the original data, suitably anonymized.
The basic principle behind open data and open research is that anyone should be able to reproduce the published results. This is essential to verify the research itself and would put the work on the sort of rigorous level that is nowadays standard. More to the point, it would provide a rich resource for others to build upon the foundation that the authors have given—and it would also set a welcome example for other work in this important field.
This piece has been prepared as part of the Delta 8.7 Modelling the Risk of Modern Slavery symposium. Read all the responses here.
Bernard W. Silverman is Professor of Modern Slavery Statistics, The Rights Lab, University of Nottingham, UK.
This article has been prepared by Bernard W. Silverman as a contributor to Delta 8.7. As provided for in the Terms and Conditions of Use of Delta 8.7, the opinions expressed in this article are those of the author and do not necessarily reflect those of UNU or its partners.