2020•08•05 Kuala Lumpur
by Aisling Murray (academic intern)
This post is based on the UNDP Global Centre for Technology, Innovation and Sustainable Development Singapore‘s Smart City Training Series webinar (watch here) by Claudia Lopes and Shumin Liu.
In the context of increasing urbanisation and digital connectivity, and considering the 2030 Agenda’s call for a data revolution for sustainable development, discussions of how we can harness Big Data to offer insights and solutions to complex and interlinked global problems are more relevant than ever.
At a time when Big Data-driven practices have come to the forefront of disease surveillance and case tracking efforts, the COVID-19 pandemic provided an apt backdrop for the UNDP Global Centre for Technology, Innovation and Sustainable Development’s webinar, ‘Introduction to Big Data for Smart Cities’. The panel of experts, including United Nations University – International Institute for Global Health’s Research Fellow, Dr Claudia Lopes, UNDP Global Centre Singapore’s Smart Cities and Digitalisation Advisor, Calum Handforth, and UNDP Bangkok Regional Hub’s Data Impact and Management Consultant, Shumin Liu, convened to share insights and experiences in leading Big Data-driven projects to improve lives and livelihoods in the urban environment.
The webinar drew attention to critical considerations of data quality, and how the creators, purposes, accessibility and insights of Big Data may serve to perpetuate pre-existing inequalities, especially in the context of gender. In striving to use Big Data to monitor and accelerate progress towards the SDGs, the panellists reflected on the importance of data collaborativesand transparent algorithms, open-source data in ensuring that women or minority groups are not left out of Big Data-informed policies and decision-making, and therefore are not ‘left behind’ on the path to sustainable development.
Dr Lopes provided a definition of Big Data as “data sources that require data science and AI tools or methods to capture, curate, manage, and process data in an efficient way”. Thus, it is aby-product of digital behaviour, produced organically as users interact with systems and machines. As Mr Handforth highlighted, in the context of urban and smart cities, it can take many forms; social media networks, crowdsourcing, SMS and mobile data, citizen-reporting platforms and news sources. Dr Lopes pointed to the value of Big Data, in that it brings together many different forms of data, often incorporating traditional data sources, in order to produce insights that can inform decision-making for organisations, businesses and governments.
However, the panellists each address the critical question of access; with most datasets held by the private sector or governments, there is a need to consider which communities are involved in the generation, governance, and use of data (citizens, governments, data activists, privacy experts), and what impact this may have on improving the lives of people. As Dr Lopes rhetorically notes, “who has built the systems, who is capturing the data, who benefits from the data insights [and] who can potentially be harmed by these data insights?”
Big Data boasts several benefits over traditional data sources: it is large in size, allowing for regional and time trends to be observed; it is able to capture data in real-time and unexpected events; and it is less reactive than traditional research data in that people’s behaviour is observed in natural settings.. However, Big Data can fall short in that it is both influenced by and serves to influence pre-existing gender inequality, when women are not represented in each stage of the data management process – platform creation, data collection, and data analysis. This can result in data that is incomplete, and non-representative of whole populations. When algorithms are trained on biased data, the results will apply only to a segment of the population. Often, they are groups with more access to digital platforms, with better skills to provide or to analyse data, and with more power to voice their opinions without fear of retaliation (on social media for example).
Dr Lopes highlighted this lack of neutrality among the actors and institutions involved in curating and producing Big Data; algorithms, therefore, have the potential to reproduce societal biases and discrimination, whether that be sexism, racism, classism, or discrimination in any form (D’Ignazio, and Klein, 2020). Using the example of Google’s speech recognition, Dr Lopes reported on evidence that this function Is 13% more accurate for male compared to female voices (Tatman, 2017). Similarly, computer software for facial recognition has higher rates of error for women, and women of colour in particular, compared to men. “If we are using data that is already biased, we are training algorithms that are already biased”, and in turn, the conclusions drawn from the data to inform policies may a) exclude groups not represented in the data, and b) serve to amplify these biases linked to pre-existing inequalities.
Gender, Dr Lopes therefore pointed out, must be taken into consideration in the design and implementation of user platforms, data analysis and data impact. Within the urban environment, human-sourced social media data is often a key tool for obtaining opinions and cultural beliefs of certain groups. However, many women face restricted access to these platforms, especially in low- and lower-middle income settings; the gender gap in internet access is as high as 70% in parts of South Asia (Sey and Hafkin, 2019). Even if women have access to digital platforms, they may have to use them in a restricted way due to risks of both real-world and online harassment and violence. With this risk of only certain women being represented in data outcomes, Dr Lopes drew attention to the need to consider data quality, accuracy, and validity before pursuing Big Data to inform policy-making.
Guided by the ethical principles of Big Data, the panellists unanimously agree that high-quality, open-source data that is curated in an inclusive and collaborative way is key to ensuring transparency and accountability and that women are not excluded from the use, results, analysis and conclusions of Big Data.
Data transparency is at the heart of creating this inclusive environment, for example algorithms should be classified in terms of their fairness based on data accuracy for certain demographics. Responding to a question on data privatisation, Ms Liu asserted that open-source data is “the culture we should embrace”, in order to ensure that the producers of data (i.e., the public) can benefit from its results and hold governments accountable. Dr Lopes echoed this sentiment – “knowledge should be more widely available” – citing algorithm transparency as a vital first step to preventing the cycle of harmful biased algorithms perpetuating gender inequality. In understanding the risks of biased data and algorithms, people will be more motivated to engage with the technical aspects of data analysis, Dr Lopes noted.
In keeping with the panellists’ focus on the power of collaboration and inclusivity to ensure sustainable and gender-responsive use of Big Data, their final takeaway messages exemplify this message of cooperation. Dr Lopes’s final note that “data can perpetuate inequalities” provided the critical antecedent to Mr Handforth’s assertion that “conversations about Big Data need to be as open as possible”, as only through understanding the former, can we ensure the latter. The COVID-19 pandemic continues to demonstrate the power of Big Data in producing publicly-available, real-time data on an everchanging situation. Therefore, conversations like this are a critical reminder of the need to question what such data can and cannot tell us, and how we can become more gender-responsive in our approach to this fast-evolving source of knowledge. Data transparency and collaborative action are fundamental in order to deconstruct inherent gender biases in Big Data; foster a data revolution that dismantles, not perpetuates, gender inequality; and ‘leave no one behind’ as we accelerate progress towards sustainable development.
D’Ignazio, C.and Klein, L. (2020). Data feminism.Cambridge, MA: MIT Press
Groter, P. Ngan, M., Hanaoka K.(2019). Face Recognition Vendor Test (FRVT) Part 3: Demographic Effects. US National Institute of Standards and Technology.
Sey, A. and Hafkin, N. (2019). Taking stock: Data and evidence on gender equality in digital access, skills and leadership. United Nations University Institute on Computing and Society/International Telecommunications Union
Tatman, R. (2017) Proceedings of the First Workshop on Ethics in Natural Language Processing, pages 53–59. Valencia, Spain, April 4th, 2017. 2017 Association for Computational Linguistics