Clearview, a new facial recognition system, has been grabbing headlines recently. They provide an app that, given a photograph of a person’s face, can fairly reliably find more photos of the same person as well as a name and other identifying information. The attention is well-deserved. Systems that can recognize a face from a small to medium database (a collection of mug shots for example) have been around for some time. But Clearview has amassed a database of, reportedly, 3 billion facial images. We do not know how many individuals these images come from, but it’s likely that it’s a significant proportion of the internet-using world. In one example, a journalist determined that the Cleaview database held 7 images of her. If they have, on average, 7 photos of each person in their database, it includes something like 10% of all internet users in the world.
Being able to search for a face in such a large database is a technical achievement, but not implausible. If there is any “secret sauce”, it won’t be secret for long. Perhaps the more impressive achievement is to have managed to acquire such a large database of face images. To do so, they crawled the web, social media sites, company directories, personal websites, any publically accessible web content. Collecting all the photos they found and, as I understand, also collecting personally identifying information (especially full name) when available. This sort of “web scraping” requires some technical ability and significant computational resources, but is something anyone with sufficient funding could pull off.
Whether or not this technology is good for society is something I won’t discuss here. But, suffice to say, there are many calls to ban the technology. Bans may be somewhat effective in preventing use in cases where transparency is required – a court will not accept facial recognition results as evidence if facial recognition is illegal – but preventing individuals or private organizations from making use of this sort of software is probably extremely difficult.
This sort of software is an inevitable consequence of the internet age – specifically the persistence and searchability of personal data. As an analogy, consider highschool yearbooks, which typically include the name and photo of every student in the school. These yearbooks were not thought of as privacy invasions in the past. Technology has made it possible to build a virtual collection of every highschool yearbook in the world and search through them all to find a particular face or name in a fraction of a second. Data that was previously harmless becomes a threat.
So the Clearview database is not going away. Even if they were compelled to destroy their database, it can be reconstructed – once something is available on the internet, it’s nearly impossible to delete all possible copies of it. And the only way that the database won’t continue to grow is if people stop posting facial photographs in publically-available places. So no more photo-sharing beyond a strict group of real (and trusted) friends and family, no more photos in company directories, no more names and faces in media articles, etc. The list goes on. I think it’s highly unlikely that sharing of photographs is going to stop.
Ultimately, Clearview AI is a symptom of a fundamental shift in our notion of privacy. Previously harmless information can be exploited once it becomes persistent and searchable. This is something we need to get used to.