Killing in the name of Privacy

Tags: #web,#linux,#foss,#selfhosting

Reading time: ~11min


An AD blocker killed a project that I was working on for 4 months. It was one commit adding 2 lines that killed 419 commits with more than 2000 total lines of code.

The script that powers my project was placed on the filter list "EasyPrivacy". But is the project that I designed with privacy as the first feature trying to invade your privacy?

This brings up the topic about telemetry in FOSS projects. Is telemetry inherently bad? Is there a way to collect telemetry without comprimising privacy?

Landscape mode recommended on mobile devices

Motivation

I invest a lot of time in writing posts on this blog. We are talking about days for each post, not only hours. I want to offer quality content to people interested in Rust, Linux, FOSS and self-hosting.

After investing so much time in a product that you share for free with the public, you want to find out how people receive it. As Johannes Gross said in German:

"Der Applaus ist das Brot des Künstlers"

Its translation is:

"Applause is the artist's bread and butter" or "Applause is what keeps artists going"

I am not an artist. I keep my blog posts technical and don't want to open a discussion about what art is. But I want to see that I don't invest that much time only for a couple of people. Although it is fun to write posts about topics I am passionate about, if I don't help enough people with my writing, I would rather do something more effective.

Therefore, I was looking for a tool to find out (1) how many people read my posts, (2) how much time they spend on them and (3) on which sites are my posts shared.

(1) tells me that my writing is effective and helps many. In addition, if a post has an estimated reading time of 30 minutes and the average time reported by (2) is much lower than 30 minutes, then I know that either the post was not well received and people left the page early or that I should write shorter posts. Finally, (3) tells me where I should promote my posts. If I see that I have many visits coming from Reddit, then maybe I should promote my future posts on Reddit although I try to avoid proprietary platforms.

My search for such a tool didn't lead to any good results. The self-hosted FOSS projects that I found collect too much data for my taste.

I don't want to log the IP of my visitors. I don't want to know from which country they are and what OS, browser etc. they use. All of this can be used for fingerprinting. I wanted a tool that doesn't collect any data that can help identifying users.

And, of course, I wanted a tool written in Rust to not waste resources on my home server 🦀

So I went ahead and started the project OxiTraffic. It won't take a long time, I thought… Maybe two weeks, I thought…

Well, here we are 4 months later 😅 But hey, at least I consider the project to be almost done now and could just release version 1.0 in the next days.

BUT THEN…

Silent block

I went to my website 3 days ago just to see that OxiTraffic is not working anymore because the script that powers it is blocked. Did I fuck up something related to CORS again?

Nope, the network panel in Firefox shows that the AD blocking extension uBlock Origin blocks the script. WHAT? WHY? 😧

It turns out that OxiTraffic has been silently added to the EasyPrivacy list. This filter list is activated by default in uBlock Origin to protect your privacy. But is OxiTraffic bad towards privacy? I thought that I designed it from the ground up to be privacy preserving…

Before judging, let's talk about what OxiTraffic does and collects first.

OxiTraffic

To start with, OxiTraffic doesn't collect ANY personal data. No IP, no cookies, no User-Agent, nothing is touched that could be used to identify a user on the web, not even indirectly.

All what OxiTraffic collects is:

Note

You can skip to the next section if you are not interested in the technical details of how OxiTraffic is designed to protect the privacy of its visitors.

The tiny JS script that powers OxiTraffic on the browser of the visitor (client) sends only 2-3 requests:

  1. The first request is sent at the beginning to tell the backend that a visitor arrived. The backend responds with a temporary ID to be used for the next two requests. This ID is deleted when the page is left.
  2. The second request is sent after spending at least 20 seconds on the page in the foreground (not when the tab is hidden). This request leads to counting the visit. The 20 seconds of delay is required to filter out bot traffic and only count valuable visits.
  3. The third and last request is sent when the tab is closed or the page is changed. This is not reliable because of the many ways a browser can be closed. This is why I said 2-3 requests. But when this request is actually sent, it tells the backend how much time the user spent on the page.

What the backend sees from these three requests:

  1. The first request tells the backend that a random client called a page on the website. It can be a bot, it can be a human, the backend knows nothing about that client. But it sends a temporary ID for the next two requests.
  2. The second request tells the backend that the client spent at least 20 seconds on the page. The backend verifies that claim by checking the arrival time corresponding to the temporary ID. If this claim of the minimum time spent is fulfilled, the backend counts the visit in the hope that this was a human. This request also sends the referrer origin (basically the domain with the scheme http/https). This means that if I click on the link to that page from https://www.reddit.com/path?key=value, only https://www.reddit.com is sent.
  3. The third request tells the backend that the client left the page and claims to have spent a given amount of time on the page in the foreground. The backend doesn't know when this time in the foreground was spent. It doesn't know when the tab was in the background. It doesn't know to which page the client possibly went afterwards. The temporary ID is deleted on the change of the page because it is a simple variable in the script. It is not stored in a cookie or something. In fact, if the visitor just reloads the page, the backend can't differentiate him from a completely new visitor.

Note

It is important to note that browsers send the Referer header automatically which can contain the full referrer URL instead of only containing the origin. OxiTraffic only sends the origin and respects the no-referrer policy. This means that OxiTraffic collects less information about the referrer than what browsers send anyway.

You can verify my claims by taking a look at the commented source code of the script.

Communication

When I found out about being blocked, I thought that this has to be a misunderstanding. Maybe someone thought that this is yet another tracking software like almost all alternatives.

No problem, I thought. I just have to explain that OxiTraffic is not like the common alternatives and isn't a privacy concern.

So I opened an issue on Github and asked for reverting that blocking while explaining the details that I mentioned above. I addressed some concerns of a person that isn't a maintainer and offered answering any further questions regarding privacy.

At the end, a maintainer closes the issue only with the reply:

"I think that it can be left blocked."

You might say, "so what? It is his software and he has the final decision. He doesn't even have to take the issue seriously and communicate the reason of the blocking.".

Yes, you are right. This applies to every FOSS project. But I really think that we have to question the power of AD blockers.

You normally install an AD blocker just to get rid of ADs and make the internet tolerable. Additionally blocking requests that are privacy invasive is a feature that I appreciate in an AD blocker like uBlock Origin. But what if it also blocks requests not related to privacy?

I would understand if someone wants to block the telemetry of OxiTraffic for himself out of minimalism, but not privacy! And most importantly, don't block it on a list called EasyPrivacy that is activated by default on every installation of uBlock Origin (and probably other AD blockers using this list).

They killed my project in the name of privacy, although we "believe in the same privacy". Just swap "Privacy" with "God" and think about parallels…

I am a user of Linux, Firefox, uBlock Origin, etc. because privacy is one of my main concerns in our digital world. But if someone with enough power over the internet decides that I violate his definition of privacy, then I am doomed to despair.

Telemetry

I am just one vicitim of this almost religious fight against telemetry in the FOSS world. You might be able to remember a couple of cases where anonymous telemetry was criticized to death in projects related to Linux and FOSS.

What even is software telemetry about?

It is about collecting data on remote points and automatically sending it to a central point for monitoring.

Is telemetry inherently bad?

Sadly, it tends to be bad because of how it is often misused, especially by proprietary software. But just like any other tool: It depends on its usage!

Knifes as a tool can be harmful. Some people misuse them to harm others. But this doesn't mean that we should ban knifes and treat everyone trying to safely use them as a serial killer.

If the design of a telemetry system has privacy as one of the highest priorities as I explained with OxiTraffic, then it is a tool that can highly benefit the project using it without harming users. It gives insight about what creators/developers should focus on and how their product is received. If you are interested in that project and want to support it, then you should support its attempts to collect anonymous telemetry!

Don't just blindly send all data that a project asks for, even if it is a FOSS project that you want to support. Someone should question every piece of data that is collected and privacy concerns have to raised. But also don't approach it with the naive mentality "telemetry = bad"!

If you think that something is privacy invasive, then saying only that is not constructive. You are claiming that, so you have the burden of proof! Prove how the collected data can be used to track or identify a user. And maybe start with a definition of privacy to put everyone on the same page.

Protest

Is OxiTraffic really dead? It is dead only on my website for users of AD blockers. But it can still deliver valuable statistics from users that don't have an AD blocker activated on my website.

It is also dead on the website of @moanos because he posted about self-hosting it on his blog.

I could use my reverse proxy to point the blocked script to another URL and then redirect that request. But I don't want to start such a ping-pong war with EasyList. And I also don't want to risk blocking my whole domain instead of just blocking the script. Do you feel their power?

You could protest with me by self-hosting OxiTraffic on your website. Instructions can be found in the README. It is one binary that you can host in a container or directly on the host system. I would love to help you host it to let you profit from its anonymous statistics. Contact me if you need any help 🥰

As a demo, you can find a presentation of the evil data that OxiTraffic collects about my website on the dashboard.

After hosting OxiTraffic, you could additionally send EasyList a pull request to let them block your evil script too and kill more in the name of their privacy.

You can suggest improvements on the website's repository

Content license: CC BY-NC-SA 4.0