The Facebook Mess
In a time where almost every single person has a smartphone and a social media account to connect with people who are physically distant, Facebook has been the go to of many such new and established internet users. As of the second quarter of 2018, Facebook had 2.23 billion monthly active users out of the 3.58 billion total internet users around the globe. This simple statistic is testament enough for you to understand how far wide Facebook reaches and should further increase the alarming nature of any data leaks from the data giant.
In March earlier this year, The Guardian reported on the massive data leak earlier estimated to have effected 50 Million users but went on to increase to 87 million affected users. A little background on Cambridge Analytica shows how problematic this is. The company based out of Britain, is a political consulting firm which combined data mining, data brokerage, and data analysis with strategic communication during the electoral processes. It was started in 2013 as an offshoot of the SCL Group. The company closed operations in 2018, although related firms continued in existence. In the revelation by the former Cambridge Analytica Employee, Christopher Wylie told the Observer that the Company owned by the hedge fund billionaire Robert Mercer, and headed at the time by Trump’s key adviser Steve Bannon – used personal information taken without authorisation in early 2014 to build a system that could profile individual US voters, in order to target them with personalised political advertisements and spent millions exploiting the same without explicit consent.
In the case of Cambridge Analytica, the company was able to harvest personally identifiable information through a personality quiz app called thisisyourdigitiallife, based on the OCEAN personality model. Information gathered via this app is useful in building a “psychographic” profile of users (the OCEAN acronym stands for openness, conscientiousness, extraversion, agreeableness, and neuroticism). Adding the app to your Facebook account to take the quiz gives the creator of the app access to profile information and user history for the user taking the quiz, as well as all of the friends that user has on Facebook. This data includes all of the items that users and their friends have liked on Facebook.
Researchers associated with Cambridge University claimed in a paper that it “can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender …” with a model developed by the researchers that uses a combination of dimensionality reduction and logistic/linear regression to infer this information about users.
The model, according to the researchers, is effective due to the relationship of likes to a given attribute. However, most likes are not explicitly indicative of their attributes. Despite its non-conclusivity, the potential tool that data inherently is, especially in the hands of people who wish to sway your opinion, is unmatched even in today’s digital age.
The Ethics in Question
Ethics refers to standards of right and wrong that prescribe what we ought to do, typically guided by duties, rights, costs and benefits. In research ethics, these relationships are among researchers, participants, and the public. Many guides exist, such as the 2016 ESRC’s Framework for Research Ethics. There are also more general codes, such as the 1978 Belmont Report, which identifies the core principles of respect for persons, beneficence and justice in human subjects research, and the more general European Convention on Human Rights, or ECHR, ratified in 1953.
What are the principal ethical issues in social research with big data?
Privacy is recognised as a human right under numerous declarations and treaties and most recently in the Puttaswamy case in India. The privacy of research subjects can be protected by a combination of approaches: limiting what data are collected; altering data to be less disclosing of private information; and regulating access to data. But big data can challenge these existing procedures:
- The definitions of “private” and “privacy” are ambiguous or contested in many big data research contexts.
- Are social media spaces public or private? Some, such as Twitter seem more public by default, whereas Facebook is more private.
- Many users believe, and act as if, the setting is more private than it is, at least as specified in the user agreements of many social media platforms. Is compliance with formal agreements sufficient in such cases?
- Some approaches to ethical research depend on being able to unambiguously distinguish public and private users or usages. However, data costs and analytical complexity are driving closer collaborations between public and private organisations, blurring these distinctions.
- There is debate as to whether data science should be classified as human subjects research at all, and hence exempted from concerns—such as privacy—that are grounded in human rights.
The ethical issue of consent arises because in big data analytics, very little may be known about intended future uses of data when it is collected. With such uncertainty, neither benefits nor risks can be meaningfully understood. Thus, it is unlikely that consent obtained at the point of data collection (one-off) would meet a strict definition of “informed consent”. For example, procedures exist for “broad” and “generic” consent to share genomic data, but are criticised on the grounds that such consent cannot be meaningful in light of risks of unknown future genetic technologies.
- Obtaining informed consent may be impossible or prohibitively costly due to factors such as scale, or the inability to privately contact data subjects.
- The validity of consent obtained by agreement to terms and conditions is debatable, especially when agreement is mandatory to access a service.
Emerging Issues in Research Ethics
Alternatives to individual informed consent, e.g., “social consent” are being tested whereby sufficient protections are in place to ethically permit data use without individual informed consent.
- There is growing recognition of the need to respect the source and provenance of data—and more broadly its “contextual integrity”—when deciding what, if any, reuse is permissible.
- Most research ethics are based on the assumption that the entity at risk is an individual, hence de-identification offers protection. If harms can be inflicted, for example, denial of health care, based on group membership with no need for individual identification, then the protection of de-identification is no longer adequate.
- If it is no longer possible to neatly divide public and private, then some suggest accessing data use based on outcomes, and permitted uses with “public benefit” or in the “public interest”.
However, definitions are often vague, and such benefits accrue long after the decision about data use has been made. How can data users be held accountable for delivering the promised public well-being? Is there a method to hold companies to certain rules that ensure ethical use of the data we ourselves surrendered? These are the questions that both scholars as well as informed citizens are unable to answer at the moment however, there have been a few inventions that do take your data and privacy seriously, like the DuckDuckGo browser that doesn’t track you or sell your data to advertisers unlike Google and Facebook or even the new 184.108.40.206 DNS resolver by Cloudfare that allows its users to browse the internet without the worry of being tracked or their information being sold to the highest bidder. Unfortunately however, we have long way to go in terms of there being awareness amongst our own peers as to the value of the data they so freely give up for cat memes and gifs on Instagram or Facebook.
In the end, I’d like to leave you with something to ponder upon.
Is your identity the price you pay for the ‘free’ services and connections you avail from social networking sites? You already know the answer.