Data mining is the process of analysing large amounts of data to identify patterns and trends that can be used to make predictions or decisions. Machine learning and algorithms are usually used to handle the vast information gathered, providing intelligence for a wide range of industries including marketing, healthcare and finance. As this area continues to grow, the practice raises many ethical concerns about the invasion of user’s privacy, outcome biases and data transparency.
Many individuals are usually unaware that their data and behaviours are being recorded, with some online platforms failing to clearly inform participants or gain their consent. Although user data is supposedly anonymous, it can be possible to identify individuals from patterns in cross-referenced data sets, where sensitive user data can be exposed.
An example of this was during the Cambridge Analytica Scandal of 2018, when Facebook’s data was mined to create user profiles to influence political campaigns without user consent or awareness. One of the earliest controversies in 2012 involved Target, a US retailer. They developed an algorithm to predict when female customers became pregnant based on their purchase habits in order to send them relevant advertising. In one case, this led to a father receiving pregnancy ads before his daughter had made him aware that she was pregnant.
Algorithms trained on historical data that contain certain biases can perpetuate the issue. These historical biases, for example, can lead to lower credit rating scores for individuals from marginalised communities. Biases can also occur from incomplete or inaccurate data which can lead to over simplified outcomes or results, or from unforeseen algorithmic biases, where the programming itself can discriminate due to its coding.
The outcome of these issues can lead to discrimination in certain industries, for example hiring, education and finance. One such example was Amazon’s AI hiring tool, developed in 2014, to identify the best talent from the analyses of job resumes. As the algorithm was based on historical data where the majority of hires were men, the algorithm favoured applicants in male-dominated fields while downgrading resumes from female applicants.
Given the large amounts of user data collected, transparency helps build trust while protecting individual rights. It means that users should have the right to be informed in a clear fashion that their data is being collected, how it will be used and whether it will be shared with third parties. Algorithms should also have clear guidelines of what criteria they use to make their decisions and organisations should have access to be able to understand and interpret their algorithms decisions. This can be a challenging aspect, given the complexity and ‘black box’ approach of many current algorithms.
Data mining raises the question of who owns the data collected and is therefore responsible for its safeguarding. Large amounts of sensitive data are normally a target for hackers and a breach can lead to the theft of personally identifiable information. In 2021, 700 million user records from LinkedIn were breached, exposing names email address and phone numbers, raising the potential for identity theft and phishing.
As data mining becomes more sophisticated, demands for better privacy and protection will place pressure on organisations to follow practices such as GDPR, where the minimum data required is collected and for a finite time period, or using enhanced encryption techniques and increased regulation and compliance. Bias detection and the incorporation of ethical frameworks will also be critical to ensure fairness and diversity.
If you are interested in studying Economics or Sociology, Oxford Home Schooling offer the chance to study the subjects at several levels, listed below. You can also Contact Us.