The Experiment of Identifying Sensitive URLs

In our opinion, being tracked(spied) when visiting web pages that contain sensitive content, e.g., related to health and sexual preference, is the “Elephant in the Room” of privacy. Several data protection regulations as the GDPR in Europe, safeguard online content that contains sensitive data.

In our recent article S. Matic, C. Iordanou, G. Smaragdakis, N. Laoutaris, “Identifying Sensitive URLs at Web-Scale,” ACM IMC’20. [pdf], we showed that such spying is taking place on hundreds of millions of web pages. We are currently developing technologies to warn users when such tracking is taking place. To do this, we are asking for YOUR help.

In this experiment, we will be showing you URLs from the internet and asking you to classify them as sensitive or non-sensitive from your perspective. Below, you will find detailed instructions on how to proceed to classify URLs. We expect that the experiment will take less than 10 minutes and upon completion of the experiment, you can safely uninstall the addon if you do not wish to keep it.

In order to help you to understand what sensitive content is from a legal point of view, we add here the definition of sensitive information provided by the current General Data Protection Regulation (GDPR) that is enforced in all EU countries.

ARTICLE 9 EU GDPR: "Processing of special categories of personal data"
Processing of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation shall be prohibited

Data Privacy Policy:
We will collect information about the new labels without collecting any user Personally Identifiable Information (PII) related data other than their email address. This is because our browser extension assigns collected labels to users by generating a random identifier during installation time of the extension. In addition, we hold a database of users and the websites they are asked to annotate. We only hold users' email to reward users accordingly, prior to informing them of the raffle conditions before participation on the main site set by ourselves here. Therefore, we minimise the indirect risks to privacy by not collecting any website labels related to user habits or any other personal information. Note that the browser communication with our back-end server collecting the labels is secured over https connection using SSL encryption and server certification.