Twitter Metadata Privacy: Why It Matters

In this article we will discuss...

Public Twitter metadata can identify users with striking accuracy

Researchers at University College London showed that an individual Twitter user can be identified from metadata linked to their tweets. The finding remains highly relevant for Twitter/X metadata privacy because metadata can reveal behaviour even when the message itself looks harmless.

Metadata is everywhere: in what we post, the photos we take and the status updates we publish. On social networks, metadata can be used to infer identity, location, habits and relationships. In some cases, metadata attached to selfies or posts can even contradict a person’s account of where they were or what they were doing.

On Twitter/X, metadata can also be used to identify users with a very high degree of precision. Tweets that may appear anonymous can still leave patterns that point back to a specific account.

The researchers analysed tweets and their associated metadata to identify users within a group of 10,000 Twitter accounts with 96.7% accuracy. Even when up to 60% of the metadata was blurred or degraded, the model could still identify a single person with more than 95% accuracy.

Metadata can be far more revealing than the actual content of a tweet.

Most people would not give a stranger their address, but metadata can show routines, habits and contextual clues. When combined with other information, it may reveal when someone is at home, when they are travelling or how they normally behave online.

The average user often underestimates how easily they can be identified through metadata. Twitter has historically exposed many metadata fields through its API, and those fields can become powerful identifiers when they are analysed together.

Why anonymity is difficult on social networks

The researchers analysed millions of users and tested a set of tweet metadata points with several machine-learning algorithms. These included information such as when an account was created, when a tweet was published and the number of favourites, followers and followed accounts.

One of the simplest machine-learning models was also one of the most effective. It showed that a person can be identified with near-total accuracy using only a small set of metadata points.

The model works by training on a known dataset of users and learning how those users behave on Twitter/X through metadata. This is why anonymising social media datasets is not always enough. It is very difficult to anonymise a dataset completely, because triangulation with one or more other datasets can reverse many attempts to remove identifying information.

What the GDPR adds to the discussion

The GDPR improved the legal framework around these practices. Article 25 requires data protection by design and by default, while the principle of data minimisation requires organisations to process only the personal data that is necessary for a specific purpose.

The deeper question is not only whether companies should hold so much identifying information about users, but whether users understand how much their metadata can reveal. They should care about it, but in practice many people still do not realise how revealing metadata can be.

Post based on information originally reported by Wired.