Let’s start with the basics, Carl. What tends to happen when you hand over a bit of personal information online?
Well, I looked into this during an investigation I did for the BBC and I was surprised by what I found. Thanks to GDPR, which you might remember coming in a couple of years ago, and something called a ‘subject access request’, each of us has the right to see a copy of all the personal data that companies hold about us. I asked every company I could think of and about 20 came back to me. These companies weren’t the likes of Google or Facebook; they came from a shadow world of data brokers, data enrichers and data sellers. One of the misconceptions people have is that the primary data gatherers are companies they’ve heard of and have a relationship with – Google, Amazon, Uber and so on – but what I learnt was that most of the data about me was held and traded by companies I’d never heard of before.
What sort of data do these shadowy companies hold and what do they do with it?
From those companies, I got back around 7,000 pages worth of data about myself – and remember that’s only 20 companies. I requested data from around 80 in total. You might think this data is simply the data that’s been collected in the course of you using a service. Actually, a lot of it is a next level of data: it’s data that’s been generated on the basis of the data you’ve handed over as you use that service; what you hand over has been used to predict things about you that you haven’t revealed. Data companies are modelling you, trying to segment you, to establish probabilities and likelihoods about you. This is where this data can go off in some extremely weird directions. I’m looking at my data now and it’s telling me there’s a 23% chance I’m interested in gardening, yet my animal and nature awareness level is low. There’s a prediction about the age of my boiler in here. The data even told me I had no regular interest in book reading, which was a little hard to take as I was writing a book. On the basis of my online browsing habits, I was also included in a segmentation for a Netmums ‘Women Trying To Conceive’ event.
If the predictions are wildly inaccurate, is there anything to worry about?
What I found was a very distorted, Alice in Wonderland version of myself. It was often very funny and it says something about how these companies are often selling garbage rather than amazing insights about us. But some of the analysis was a bit sharper. I saw probabilities for me using the internet to gamble and spending money on alcohol. These are things you probably don’t want unscrupulous companies using to target potentially vulnerable people.
Where do they get the original data from?
Just have a look at the number of cookies that are tracking you on the average news website – there are a lot of different actors collecting your data. There are also a lot of companies whose business models aren’t what they seem. Free apps on your phone are often monetised by their owners passing on the data they gather. Navigation service apps, for example, often exist primarily to track large amounts of people travelling around, say, London and identifying travel hotspots. None of this is necessarily that sinister, but my biggest gripe about the whole industry is its secrecy. What’s actually happening is very hidden from most people and we have very little opportunity to understand it, to challenge it or to opt out of it. In Europe, GDPR now governs a lot of what’s going on and makes it a lot riskier for companies to do this in Europe, but the situation’s a lot worse for consumers in the US, China and Russia.
Also right now, the Internet of Things (from smart fridges to fitness trackers) is creating vastly more data sources. During the investigation, I spoke a lot to the Systems & Algorithms Lab at Imperial College London. They were studying a robot hoover, which was running around the lab, collecting information about the floor plan and sending it off to China. But you only knew it was sending data to China if you had dug into the software like they had in the lab. One of the more sinister things they showed me was a webcam that was hard-coded (i.e. couldn’t be altered) to send streams to a location in China.