Privacy risks of large-scale facial search and data collection

By Erman Ayday

Phone numbers, as evidenced by the wide adoption of two-factor authentication, are reliable tokens for uniquely identifying individuals (users). They are also employed by many social media applications to allow users to identify friends, without having to search for people using full names (or usernames) that may not constitute identifiability across platforms. Searching by phone numbers and contact discovery are such two features that are desirable both to the users and to the services as it makes migration to a platform easier, especially when signing up, increasing the chances of user adoption.

However, through the convenience of contact discovery, large-scale unmonitored use of this feature may result in attackers obtaining corpuses of phone numbers registered to the services along with the accounts they are attributed to. To identify the active social network accounts of individuals in a given region, brute force phone number verification is possible in popular online services, such as WhatsApp, Facebook Messenger and Twitter. There have been numerous studies establishing the risk associated with vast phone number validation, which can lead to phishing attacks and illegal advertising practices, such as robocalling. Using this feature, an attacker can create a massive dataset of active phone numbers (belonging to real individuals) residing in a particular geographic region (e.g., a city or country).

Furthermore, by utilizing all personal information from the queried social networks, the attacker can also include several attributes (including face photos, occupation, biography, relationship status, etc.) of the individuals along with their phone numbers into this dataset. One serious consequence of possessing such a dataset is the possibility to link a total stranger (that the attacker comes across in public for example) to one of the records in the dataset via facial search. This is possible if a high fraction of individuals’ facial photos and associated records are available. A facial matching capability would allow the attacker to immediately learn a vast amount of personal information from the records about the person, which may have serious consequences, including (but not limited to) discrimination or taking advantage of the person’s sensitive information in the dataset (e.g., salary, occupation, or relationship status). This privacy risk can also be seen in the popular computer game Watch Dogs, in which the main character takes photos of strangers in public to instantly observe their personal information (e.g., occupation or financial records).

Through our research, we have shown that accurate facial search is possible in the constructed dataset and that an attacker can link a randomly taken photo (i.e., a single facial photo) of an individual to their profile with 67% accuracy. This means that an attacker can, on a large scale, create a search engine that is capable of identifying individuals’ records efficiently and accurately from just a single facial photo.