Code 8.7: Mining Government Data to Reach Target 8.7
Code 8.7 convened world leaders on anti-slavery efforts, technology and academia to determine the best methods of harnessing the potential of artificial intelligence, computational science, satellite imaging, machine learning and more to optimize efforts against modern slavery. A critical component of this effort includes optimizing insights available from government data for law enforcement intervention, predictive vulnerability assessments and crime mapping to improve resource allocation.
The Mining Government Data session included speakers Luis Fabiano de Assis from SmartLab Brazil, Clare Gollop of the UK National Policing Modern Slavery Portfolio and Julia Kocis of the Lehigh County Regional Intelligence Center. These speakers presented their efforts to optimize government data to measure prevalence and model vulnerability to modern slavery from multiple sources including law enforcement, prosecution, social welfare, unemployment and regional socioeconomic vulnerability factors.
This session focused on four major areas of discussion:
- Structured and unstructured, temporal and spatial data and their coordination;
- Implementation of network analysis and graph theory;
- Communication to policymakers; and
- Analysis coordination.
In terms of the types of data that are required for these efforts, we addressed the challenges of integrating structured and unstructured data. Unstructured data includes free-form and narrative information from law enforcement and police reports that provides key contextual information for investigations, but which are resource- and time-intensive to mine by hand. Similarly, dimensions of temporal and spatial data must be addressed in data integration into existing or new models. Coordinating and sharing data across multiple disparate law enforcement entities, across as many as 40 local law enforcement agencies, state and federal groups require rigorous organization and concerted effort.
Anajali Mazumder (The Alan Turing Institute) and Clare Gollop (UK National Policing Modern Slavery Portfolio).
Some presenters spoke directly to their use of network analysis and graph theory to optimize the analysis of data that they have obtained, and to facilitate communication to policymakers. Each of the presenters and our audience responded to concerns about communicating these often complex findings accurately and succinctly to policy and other stakeholders. Some of our panelists recommended wrapping graphs and data around stories and broader narratives to contextualize findings for policymakers. Finally, coordination of multi-tiered analytical teams and their findings was a critical next step to ensure that data analyses were operationalized. Not having a coordinated strategy for analytical teams and data findings integration could defeat the effort made to conduct these analyses from the outset.
In the truly collaborative nature of these sessions, the audience and panelists together developed eight major takeaways from the session: collect data for a purpose; apply boundaries to network analysis; answer a few questions well; even a basic or rudimentary model is better than nothing; create space for feedback loops; do not limit efforts to a coalition of the willing; avoid implicit bias in AI models; and integrate survivor perspectives.
Purposeful data collection based on what you need rather than just what may already be available is critical for meaningful analysis. Clare Gollop shared this advice from her time developing multi-tiered analytical teams. The data that were already being collected by national policing reports were not sufficient to answer the questions they had and so specifically defining their own questions had a substantial impact on their ability to meaningfully analyse this issue.
Network analysis needs boundaries. It is important to establish boundaries around what information will and will not be considered in analysis. This is particularly important in the context of network analyses of law enforcement data when there are substantial opportunities for infinite relevant connections in a modern slavery case. Julia Kocis found that there was so much detail in law enforcement case records, that without establishing boundaries when conducting network analysis, researchers could find infinite relevant connections to an operation and boundaries need to be established.
Our session emphasized that we need to invest in answering a small number of specific questions really well. All presenters acknowledged the challenges they faced in prioritizing these research questions in the context of high-pressure and immediate-delivery work environments. Luis Fabiano de Assis ensured that his work in Brazil focused on key socioeconomic indicators to identify at-risk municipalities for modern slavery, and Clare Gollop clearly defined the questions she was analysing prior to commencing her research project.
Often, we found that even a basic or rudimentary model is preferable to allowing law enforcement interventions and resource allocations to be directed by anecdotal evidence, political alliances, priorities or rhetoric. Even using broad risk and vulnerability factors to assess optimal regions and municipalities for intervention, risk prediction and resource allocation is an ideal first step. Luis Fabiano de Assis discussed that his models are appropriate and necessary first steps to ensure that resource allocation decisions and law enforcement interventions are based on some relatively objective sociological data as opposed to intuition or unfounded assumptions.
Our audience emphasized the importance of incorporating feedback loops into predictive models to ensure that they become as useful and accurate as possible. As each of these models are operationalized and tested, the audience encouraged the panelists to formalize feedback mechanisms whereby the tools themselves could be improved by findings on the ground.
Some of our panelists also discussed the challenge of extending efforts beyond “coalitions of the willing”. This describes a situation where we may have stakeholders gathered that wish to be involved in these efforts, but these stakeholders may not necessarily be the only ones you require for effective intervention. It is important to cultivate all willing participants, but also to put significant effort into bringing critical actors to the table, even if they are resistant.
In modelling machine learning algorithms or beginning to implement AI in models, it is important to not build in implicit bias such as hand-selecting trial or test cases. All models are only as good as the data and assumptions underlying their development. We must be aware of our own biases in order to fully avoid perpetuating cycles of misinformation and harmful or inaccurate bias as this could defeat the utility of using more advanced tools.
Finally, it is important to involve survivor perspectives in the analyses of case data is critical to fully understanding the context of these situations. Survivors provide important insight into testable hypotheses and can contextualize findings and component data that comprise these models.
Dr Davina Durgana is a Senior Statistician at the Walk Free Foundation.
This article has been prepared by Davina Durgana as a contributor to Delta 8.7. As provided for in the Terms and Conditions of Use of Delta 8.7, the opinions expressed in this article are those of the author and do not necessarily reflect those of UNU or its partners.