How probable a 3 compromised Tor relays?

I am not sure if I did the math wrong but it seems to me that 3 compromised Tor relays are not so unlikely.

Under the assumption that 33% of Tor relays are compromised, I made the following calculations:

Probability of getting 3 compromised relays in a circuit: 3.59% (0.03 ^ 3)

The probability of at least one compromised circuit in 100 circuits is: 97.49% (calculation below)
If each circuit has a 3.5937% chance of being fully compromised (from 0.33^3), the probability a single circuit is not compromised is 1 − 0.035937 = 0.964063.
For 100 independent circuits, the probability none are compromised is 0.964063^100 ≈ 0.0251.
So the probability of at least one compromised circuit in 100 attempts is 1 − 0.964063^100 ≈ 0.9749 (about 97.49%).

I don’t think this is a reasonable assumption, but also you aren’t taking into account guard nodes that are meant to reduce this exact probability.

4 Likes

Even if we assume a low percentage of compromised relays, like 20% percent, we still get alarming probabilities of compromised circuits:

Probability of a compromised single Tor circuit: 0.80%
Probability of at least one compromised circuit in 100 Tor circuits 55.21%

20% is not low. Where are you getting these numbers from?

Again, you are not taking into account guard nodes which reduce this probability.

Even still though, you are making a lot of assumptions, the biggest two being: that 20% are compromised in the first place, and that they are all operated by the same attacker.

There is a whole selection algorithm that differentiates between guards, middles, and exits, and takes into account things like whether the relays in a circuit are in neighboring IP space. You can’t increase your consensus weight by simply running more nodes, and the Tor team monitors for Sybil attacks on their end and manually removes groups of relays to prevent this as well.

4 Likes

Couple of things to digest here.

First and foremost, I have to preface everything by saying the chances of you being caught by a correlation or advanced attack that compromises Tor in general is astronomically low. It is much easier for your adversary to find OpSec mistakes you make than compromise an anonymity network. Even high-level adversaries with basically infinite funding would much rather find a few OpSec mistakes you’ve made throughout the years than spend hundreds of thousands or even millions of dollars to undermine an anonymity network that they themselves use. That being said, it’s not impossible.

While there is no solid evidence of a correlation attack ever being performed on the Tor network alone, there have been some cases in the past where it looked very suspicious and likely that they have been performed. Again, it is very unlikely but NOT impossible. If you hear about a major Drug Market being busted, the chances they compromised the Tor network is far, far lower than an admin using an outdated application, failing to update a program, or even using a zero-day on a specific piece of software they use.

While the “20% of nodes being compromised” is completely baseless, I will run with it. IF 20% of Tor nodes were malicious, the time it would take for your circuit to be deanonymized would be roughly as follows:

Note: The Tor browser picks a guard node at first connection along with a few backup nodes. After one is through, a new node from that initial pool will be used. Guards cycle about every month OR if one becomes unusable.

Math

Definitions

  • pG = fraction of guard bandwidth compromised (e.g., 0.2)

  • pE = fraction of exit bandwidth compromised (e.g., 0.2)

  • N = number of circuits you build while on the same primary guard

  • Δt = average time between new circuits that matter (minutes)

Core results

  • Chance of at least one deanonymized circuit after N circuits:

    • P = pG × [1 − (1 − pE)^N]
  • Circuits to first deanonymization if you have a bad guard:

    • E[N | bad guard] = 1 / pE
  • Time to first deanonymization if you have a bad guard:

    • E[T | bad guard] = Δt / pE

Quick plug-in example (pG = pE = 0.2, Δt = 10 min)

  • P after N circuits: 0.2 × [1 − 0.8^N]

  • E[N | bad] = 5 circuits

  • E[T | bad] = 50 minutes

Takeaway: Most users will get safe guards which makes correlation much, much more difficult. I won’t break down that math as it’s even more unpredictable than this. IF a user is assigned a malicious guard node, it would take about 50 minutes on average for a circuit to be fully compromised.

With it only taking an average of 50 minutes to deanonymize someone given 20% of nodes are compromised, that alone should tell you it’s unlikely. You would see a lot more arrests of people if that was the case. A lot more nuance is needed to calculate an actual amount, but you get the idea.

You must understand that to become a guard or exit, you need sustained presence as a middle relay on the Tor network. An adversary can’t just spin up 400 servers at the same hosting company as the Tor directory would spot that instantly and disable the nodes. A lot of people throw around a number like “They could easily spend $5k and spin up hundreds of malicious servers.” This is vastly taken out of context. An adversary would need to spin up only a few servers at a large number of ASN/hosting providers to not be caught almost instantly. Even then, The Tor Project and volunteers actively look out for malicious nodes and they can be reported to the Directory Authority. Here is a VERY rough breakdown of spending/infra necessary for control of a specific portion of the network.

Cost/Infra Breakdown
  1. About 2% of guard bandwidth
    • Infra: 8–12 relays × 100–300 Mbit/s each across 6–10 providers

    • Spend: roughly $3k–$7k/month

  2. About 5% of guard bandwidth
    • Infra: 20–40 relays across 15–25 providers

    • Spend: roughly $6k–$18k/month

  3. About 2% of exit bandwidth
    • Exits are pricier (abuse, turnover)

    • Infra: 8–15 exits across 8–12 providers

    • Spend: roughly $5k–$12k/month

  4. About 5% of exit bandwidth
    • Infra: 20–40 exits across 20–35 providers

    • Spend: roughly $12k–$30k/month

  5. Ambitious: ~10% guards + 10% exits
    • 50–150 relays overall; heavy diversity and churn budget

    • Spend: roughly $60k–$200k/month

That is money that could very well just be a waste if discovered by the Tor Directory Authority. It is VERY unlikely that an adversary is controlling anywhere NEAR 20% of the network. If anything, it would be closer to 1%, but even that is unlikely.

1 Like

The “Cost/Infra Breakdown” was interesting. To most people, that looks like a lot of money; however, to a nation state (or corporation) with an unlimited war chest to use against its own citizens, that is nothing.

And that are only the documented one, we can assume that the iceberg is a lot deeper then what we can see.

I try to add this to the algoryhtm later.

I see what you’re saying, but slightly disagree.

This is a small price to a nation-state, but this traffic analysis isn’t targeted. It’s a dragnet that will probably capture a ton of innocent people and wouldn’t be justified by many departments. You are paying $200k per month - and it takes a few months to even be considered to be a guard/exit node. They would be out anywhere from $600k - $1 million just to be at a point where they could be considered a guard/exit node. From there, you would be paying $200k/month and hoping that the Tor Directory Authority doesn’t catch you.

Volunteers as well as the Tor Directory Authority constantly scan the network for suspicious nodes. Also, keep in mind, that they would need ALL of their relays to be guard/exit nodes for that math to add up, which in itself is mathematically unlikely.

Large adversaries would see this and maybe try, but would likely be thwarted after seeing how difficult it is to sustain. Members of the Tor Directory Authority also personally know many guard/exit relay operators. Many institutions, nonprofits, and citizens run safe guard/exit relays that we can use and make it even more difficult for adversaries to obtain significant hold of the network.

1 Like

A more important question is how likely is it that your threat or its allies can purchase data from, or coordinate the efforts of multiple autonomous systems (ASes). For even a nation-state entity running enough tor nodes to consistently de-anonymize users would be troublesome, especially considering guards. But purchasing (or otherwise acquiring) flow data from the ISPs themselves would be much more realistic and effective. And we already know at least the US government contracts with a cooperation that collects this data in order to deanonymize VPNs. An AS-level adversary, which Tor is not made to counter, does not necessarily have to be a government.

an adversary can increase her opportunities of per-
forming flow correlation by controlling/wiretapping autonomous
systems (ASes) or Internet exchange points (IXPs), and record-
ing the traffic features of the Tor connections that they transit.
Several studies [20 , 49 , 69] demonstrate that specific ASes and
IXPs intercept a significant fraction of Tor traffic, therefore are
capable of performing flow correlation on Tor at large scale. Oth-
ers [18 , 36 , 37, 67 , 69 ] show that an AS-level adversary can further
increase her chances of flow correlation by performing various
routing manipulations that reroute a larger fraction of Tor connec-
tions through her adversarial ASes and IXPs. For instance, Starov
et al. [67 ] recently show that approximately 40% of Tor circuits are
vulnerable to flow correlation attacks by a single malicious AS, and
Sun et al. [ 69] show that churn in BGP as well as active manipula-
tion of BGP updates can amplify an adversarial AS’s visibility on
Tor connections. This has lead to various proposals on deploying
AS-aware path selection mechanisms for Tor

Source: DeepCorr: Strong Flow Correlation Attacks on Tor
Using Deep Learning https://arxiv.org/pdf/1808.07285

Surprisingly, close to 30% of all relays are hosted in only 6 ASes and 70 prefixes.
Together, these relays represent almost 40% of the band-
width in the entire Tor network (see Table 5). As such,
these few prefixes constitute extremely attractive targets.

Source: RAPTOR: Routing Attacks on Privacy in Tor https://www.princeton.edu/~pmittal/publications/raptor-USENIX15.pdf

A more in-depth scholarly analysis of this issue can be found here: An Extended View on Measuring Tor AS-level Adversaries https://arxiv.org/pdf/2403.08517

Also see my posts here: PSA: VPNs can and probably do compromise your privacy - #13 by Factorial

__

A special note about guard relays. Guards rotate approximately every few months, which puts a hard cap on the chance of being fully deanonymized for every guard rotation period. But what’s kind of alarming is that as 50.8% of guard probability comes from just 7 ASes.

__

So all of this depends on your threat model. If your threat is a western government that’s willing to involve its intelligence agencies to deanonymize your traffic, I would assume that said threat could probably correlate traffic from hundreds of Tor nodes at a minimum, primarily focused on the most popular German guard and exit nodes. I would guess that picking a guard node in a very unpopular and not western-aligned AS, using a undiscovered bridge, or using Snowflake would help mitigate the possibility of being deanonymized by an AS-level threat.

2 Likes

1 millions $ is nothing for compared to the budgets they have.

How would Tor Directory Authority know if certain nodes are compromised, if the adversary isnt doing something obvious?

1 Like

Budgets aren’t the bottleneck - covertness is. The hard part is quietly acquiring and maintaining meaningful guard/exit share across diverse ASNs without being clustered or flagged. On top of that, ongoing cost and churn matter. It’s not just a one‑time $1M; it’s sustained cost. Furthermore, the longer and larger the footprint, the more telemetry and anomalies researchers or the Directory Authority can correlate. Scale also increases detection surface. Multiple malicious relay clusters have been identified and removed. This shows that it’s feasible but not trivial to stay undetected at scale.

There is no bulletproof way. A few ways could be behavioral and network heuristics, network topology, active and passive measurements, community intelligence, etc. For example, KAX-17 was discovered by a security researcher who noticed a large number of nodes that didn’t include contact information. This violated Tor’s operational guidelines and when they investigated it, they linked them together. It’s just stuff like this that happens and you correlate specific nodes and take them all down to not let them compromise the network.

2 Likes

I am pretty sure that 3 letter agencies can open anonymous mail accounts and use them for contact information.

1 Like