How probable a 3 compromised Tor relays?

I am not sure if I did the math wrong but it seems to me that 3 compromised Tor relays are not so unlikely.

Under the assumption that 33% of Tor relays are compromised, I made the following calculations:

Probability of getting 3 compromised relays in a circuit: 3.59% (0.03 ^ 3)

The probability of at least one compromised circuit in 100 circuits is: 97.49% (calculation below)
If each circuit has a 3.5937% chance of being fully compromised (from 0.33^3), the probability a single circuit is not compromised is 1 − 0.035937 = 0.964063.
For 100 independent circuits, the probability none are compromised is 0.964063^100 ≈ 0.0251.
So the probability of at least one compromised circuit in 100 attempts is 1 − 0.964063^100 ≈ 0.9749 (about 97.49%).

I don’t think this is a reasonable assumption, but also you aren’t taking into account guard nodes that are meant to reduce this exact probability.

5 Likes

Even if we assume a low percentage of compromised relays, like 20% percent, we still get alarming probabilities of compromised circuits:

Probability of a compromised single Tor circuit: 0.80%
Probability of at least one compromised circuit in 100 Tor circuits 55.21%

20% is not low. Where are you getting these numbers from?

Again, you are not taking into account guard nodes which reduce this probability.

Even still though, you are making a lot of assumptions, the biggest two being: that 20% are compromised in the first place, and that they are all operated by the same attacker.

There is a whole selection algorithm that differentiates between guards, middles, and exits, and takes into account things like whether the relays in a circuit are in neighboring IP space. You can’t increase your consensus weight by simply running more nodes, and the Tor team monitors for Sybil attacks on their end and manually removes groups of relays to prevent this as well.

4 Likes

Couple of things to digest here.

First and foremost, I have to preface everything by saying the chances of you being caught by a correlation or advanced attack that compromises Tor in general is astronomically low. It is much easier for your adversary to find OpSec mistakes you make than compromise an anonymity network. Even high-level adversaries with basically infinite funding would much rather find a few OpSec mistakes you’ve made throughout the years than spend hundreds of thousands or even millions of dollars to undermine an anonymity network that they themselves use. That being said, it’s not impossible.

While there is no solid evidence of a correlation attack ever being performed on the Tor network alone, there have been some cases in the past where it looked very suspicious and likely that they have been performed. Again, it is very unlikely but NOT impossible. If you hear about a major Drug Market being busted, the chances they compromised the Tor network is far, far lower than an admin using an outdated application, failing to update a program, or even using a zero-day on a specific piece of software they use.

While the “20% of nodes being compromised” is completely baseless, I will run with it. IF 20% of Tor nodes were malicious, the time it would take for your circuit to be deanonymized would be roughly as follows:

Note: The Tor browser picks a guard node at first connection along with a few backup nodes. After one is through, a new node from that initial pool will be used. Guards cycle about every month OR if one becomes unusable.

Math

Definitions

  • pG = fraction of guard bandwidth compromised (e.g., 0.2)

  • pE = fraction of exit bandwidth compromised (e.g., 0.2)

  • N = number of circuits you build while on the same primary guard

  • Δt = average time between new circuits that matter (minutes)

Core results

  • Chance of at least one deanonymized circuit after N circuits:

    • P = pG × [1 − (1 − pE)^N]
  • Circuits to first deanonymization if you have a bad guard:

    • E[N | bad guard] = 1 / pE
  • Time to first deanonymization if you have a bad guard:

    • E[T | bad guard] = Δt / pE

Quick plug-in example (pG = pE = 0.2, Δt = 10 min)

  • P after N circuits: 0.2 × [1 − 0.8^N]

  • E[N | bad] = 5 circuits

  • E[T | bad] = 50 minutes

Takeaway: Most users will get safe guards which makes correlation much, much more difficult. I won’t break down that math as it’s even more unpredictable than this. IF a user is assigned a malicious guard node, it would take about 50 minutes on average for a circuit to be fully compromised.

With it only taking an average of 50 minutes to deanonymize someone given 20% of nodes are compromised, that alone should tell you it’s unlikely. You would see a lot more arrests of people if that was the case. A lot more nuance is needed to calculate an actual amount, but you get the idea.

You must understand that to become a guard or exit, you need sustained presence as a middle relay on the Tor network. An adversary can’t just spin up 400 servers at the same hosting company as the Tor directory would spot that instantly and disable the nodes. A lot of people throw around a number like “They could easily spend $5k and spin up hundreds of malicious servers.” This is vastly taken out of context. An adversary would need to spin up only a few servers at a large number of ASN/hosting providers to not be caught almost instantly. Even then, The Tor Project and volunteers actively look out for malicious nodes and they can be reported to the Directory Authority. Here is a VERY rough breakdown of spending/infra necessary for control of a specific portion of the network.

Cost/Infra Breakdown
  1. About 2% of guard bandwidth
    • Infra: 8–12 relays × 100–300 Mbit/s each across 6–10 providers

    • Spend: roughly $3k–$7k/month

  2. About 5% of guard bandwidth
    • Infra: 20–40 relays across 15–25 providers

    • Spend: roughly $6k–$18k/month

  3. About 2% of exit bandwidth
    • Exits are pricier (abuse, turnover)

    • Infra: 8–15 exits across 8–12 providers

    • Spend: roughly $5k–$12k/month

  4. About 5% of exit bandwidth
    • Infra: 20–40 exits across 20–35 providers

    • Spend: roughly $12k–$30k/month

  5. Ambitious: ~10% guards + 10% exits
    • 50–150 relays overall; heavy diversity and churn budget

    • Spend: roughly $60k–$200k/month

That is money that could very well just be a waste if discovered by the Tor Directory Authority. It is VERY unlikely that an adversary is controlling anywhere NEAR 20% of the network. If anything, it would be closer to 1%, but even that is unlikely.

3 Likes

The “Cost/Infra Breakdown” was interesting. To most people, that looks like a lot of money; however, to a nation state (or corporation) with an unlimited war chest to use against its own citizens, that is nothing.

And that are only the documented one, we can assume that the iceberg is a lot deeper then what we can see.

I try to add this to the algoryhtm later.

1 Like

I see what you’re saying, but slightly disagree.

This is a small price to a nation-state, but this traffic analysis isn’t targeted. It’s a dragnet that will probably capture a ton of innocent people and wouldn’t be justified by many departments. You are paying $200k per month - and it takes a few months to even be considered to be a guard/exit node. They would be out anywhere from $600k - $1 million just to be at a point where they could be considered a guard/exit node. From there, you would be paying $200k/month and hoping that the Tor Directory Authority doesn’t catch you.

Volunteers as well as the Tor Directory Authority constantly scan the network for suspicious nodes. Also, keep in mind, that they would need ALL of their relays to be guard/exit nodes for that math to add up, which in itself is mathematically unlikely.

Large adversaries would see this and maybe try, but would likely be thwarted after seeing how difficult it is to sustain. Members of the Tor Directory Authority also personally know many guard/exit relay operators. Many institutions, nonprofits, and citizens run safe guard/exit relays that we can use and make it even more difficult for adversaries to obtain significant hold of the network.

1 Like

A more important question is how likely is it that your threat or its allies can purchase data from, or coordinate the efforts of multiple autonomous systems (ASes). For even a nation-state entity running enough tor nodes to consistently de-anonymize users would be troublesome, especially considering guards. But purchasing (or otherwise acquiring) flow data from the ISPs themselves would be much more realistic and effective. And we already know at least the US government contracts with a cooperation that collects this data in order to deanonymize VPNs. An AS-level adversary, which Tor is not made to counter, does not necessarily have to be a government.

an adversary can increase her opportunities of per-
forming flow correlation by controlling/wiretapping autonomous
systems (ASes) or Internet exchange points (IXPs), and record-
ing the traffic features of the Tor connections that they transit.
Several studies [20 , 49 , 69] demonstrate that specific ASes and
IXPs intercept a significant fraction of Tor traffic, therefore are
capable of performing flow correlation on Tor at large scale. Oth-
ers [18 , 36 , 37, 67 , 69 ] show that an AS-level adversary can further
increase her chances of flow correlation by performing various
routing manipulations that reroute a larger fraction of Tor connec-
tions through her adversarial ASes and IXPs. For instance, Starov
et al. [67 ] recently show that approximately 40% of Tor circuits are
vulnerable to flow correlation attacks by a single malicious AS, and
Sun et al. [ 69] show that churn in BGP as well as active manipula-
tion of BGP updates can amplify an adversarial AS’s visibility on
Tor connections. This has lead to various proposals on deploying
AS-aware path selection mechanisms for Tor

Source: DeepCorr: Strong Flow Correlation Attacks on Tor
Using Deep Learning https://arxiv.org/pdf/1808.07285

Surprisingly, close to 30% of all relays are hosted in only 6 ASes and 70 prefixes.
Together, these relays represent almost 40% of the band-
width in the entire Tor network (see Table 5). As such,
these few prefixes constitute extremely attractive targets.

Source: RAPTOR: Routing Attacks on Privacy in Tor https://www.princeton.edu/~pmittal/publications/raptor-USENIX15.pdf

A more in-depth scholarly analysis of this issue can be found here: An Extended View on Measuring Tor AS-level Adversaries https://arxiv.org/pdf/2403.08517

Also see my posts here: PSA: VPNs can and probably do compromise your privacy - #13 by Factorial

__

A special note about guard relays. Guards rotate approximately every few months, which puts a hard cap on the chance of being fully deanonymized for every guard rotation period. But what’s kind of alarming is that as 50.8% of guard probability comes from just 7 ASes.

__

So all of this depends on your threat model. If your threat is a western government that’s willing to involve its intelligence agencies to deanonymize your traffic, I would assume that said threat could probably correlate traffic from hundreds of Tor nodes at a minimum, primarily focused on the most popular German guard and exit nodes. I would guess that picking a guard node in a very unpopular and not western-aligned AS, using a undiscovered bridge, or using Snowflake would help mitigate the possibility of being deanonymized by an AS-level threat.

3 Likes

1 millions $ is nothing for compared to the budgets they have.

How would Tor Directory Authority know if certain nodes are compromised, if the adversary isnt doing something obvious?

1 Like

Budgets aren’t the bottleneck - covertness is. The hard part is quietly acquiring and maintaining meaningful guard/exit share across diverse ASNs without being clustered or flagged. On top of that, ongoing cost and churn matter. It’s not just a one‑time $1M; it’s sustained cost. Furthermore, the longer and larger the footprint, the more telemetry and anomalies researchers or the Directory Authority can correlate. Scale also increases detection surface. Multiple malicious relay clusters have been identified and removed. This shows that it’s feasible but not trivial to stay undetected at scale.

There is no bulletproof way. A few ways could be behavioral and network heuristics, network topology, active and passive measurements, community intelligence, etc. For example, KAX-17 was discovered by a security researcher who noticed a large number of nodes that didn’t include contact information. This violated Tor’s operational guidelines and when they investigated it, they linked them together. It’s just stuff like this that happens and you correlate specific nodes and take them all down to not let them compromise the network.

2 Likes

I am pretty sure that 3 letter agencies can open anonymous mail accounts and use them for contact information.

1 Like

Sure they could. I never said they couldn’t. I am saying that when you have a large amount of nodes and you don’t know where they belong and hundreds pop up in a mere week, it’s obvious something fishy is going on. They would need to slowly introduce nodes while staying under the radar. Please do more research before assuming the NSA can just magically stay undercover the whole time.

The simple truth of the matter is that they don’t even need to compromise Tor nodes to analyze traffic. Especially with the use of AI-assisted traffic analysis, the days of any agency needing to compromise the network are over. They can simply monitor ASes and IXPs and correlate traffic. No need to spend millions a month or anything. Do research on that and dive into that rabbit hole.

They usually never need to go this far. Compromising the Tor network would be very difficult, even for the NSA which people presume have magic powers to catch anyone they want. Even with AI-assisted traffic analysis and passive monitoring of popular ASes and IXPs, the easiest way is, believe it or not, OpSec mistakes. People talking too much, leaving too much information in what they share or upload, or being careless.

I think these discussions are important, but never forget the silent but deadly killer. OpSec mistakes will always be the #1 way people get caught. You’re human, of course you will mess up. Make sure you have things in place to control that mistake. Compartmentalize. Don’t ruminate over if the NSA controls hundreds of Tor nodes, because they most likely do not. You can link every publicly available document from DN arrests to simple and careless OpSec mistakes. There is no credible evidence that ANY arrest was due to Tor compromise. If you believe there is, go ahead and link it.

Parallel construction was used to hide the fact that the Enigma machine encryption had been cracked. Operations based on intelligence recovered this way were authorized if there was a plausible alternate means for the data to have been obtained.

5 Likes

In the case against where the child porn forum “BoysTown” was taken down, it was directly noted on news that the admins were arrested due to German police performing traffic analysis.

The “Liberty Lane” string deanonymizations of CP viewers also seems to have been a case of deanonymization by spinning up malicious guard nodes but the details are murky since its “TOP SECRET”.

If I’m not mistaken - both happened around the KAX-17 incident.

You argument is based on the assumption that any adversary that tries to comprise Tor, has to do this in a small amount of time.
But any adversary, could just put randomized delays between their node on boarding to prevent any coralation between them.

We have data and time padding for this.
And of course it helps when the user pool is bigger.

2 Likes

This is a possibility. I cannot completely rule out the use of parallel construction, but I would like to make a few points.

  • Several high-profile cases in the US and other countries have shown governments don’t necessarily try to hide their novel methods. In the proceedings after Operation Pacifier, the FBI stated they used a NIT to obtain information. This led to open suppression battles, case dismissals, and other problems when trying to prosecute those charged. If it was a parallel construction, it was not a very clean one.
  • The Enigma machine coverup was a wartime millitary-intelligence effort, not a peacetime criminal justice proceeding. I think the two are vastly different and many would agree. A more fitting comparison would be the DOJ/DEA using parallel construction to arrest people based on information they obtained illegally by having “random” searches and seizures by having drug dogs scan cars they knew would contain drugs, but hid the fact they had prior knowledge.

Back to the main point though, the reason you see so many seemingly obvious ways people got caught isn’t because it’s parallel construction, it’s because people really mess up and get caught. As stated earlier, it’s entirely possible there have been cases in the past that have been constructed in a way to obscure the original data source, but again, there is no credible evidence of this either.

Correct, but with a few caveats.

Traffic analysis is not a Tor compromise, it is a side-effect of what’s possible in a low-latency anonymity network. Just like a server seizure isn’t a “Tor compromise” or a NIT being deployed on an onion site isn’t a “Tor compromise".

Going deeper, you must also look at the entire case and how they investigated it.

The Boystown admins were using Ricochet, which lacked several traffic-correlation mitigations later used by Ricochet Refresh. Investigators leveraged this by sending Ricochet messages to the admins at specific times. While doing this, they monitored a large set of Tor middle relays to watch for traffic bursts with the characteristic size and timing of the Ricochet message.

When a monitored relay saw the matching burst, investigators could infer that this relay was on the admin’s circuit and, from there, identify the admin’s entry guard. This is a form of guard discovery via active correlation.

With the guard identified, authorities then subpoenaed the ISP and obtained records of which subscribers were connected to that guard at the precise message times. That yielded a short list of candidate IP addresses. From there, traditional tactics were likely used, such as surveillance, subpoenaing accounts and other services, and search warrants to confirm which suspect controlled the admin account.

None of this screams “Tor compromise”. It was a lengthy investigation, tiring and time-consuming for all investigators, that only led to 4 arrests. According to reports, there were over 400,000 members and only 4 got nabbed in this massive investigation. Don’t you think they would try to get maybe a few more and say they were caught in it too if they had broken Tor? Even if they could grab an extra hand full of producers or high-level contributors?

I remember hearing about Liberty Lane before it even hit Reddit, and at the time I was skeptical, and now I am even more-so. I remember seeing the original Google Sheets of cases people speculated were linked to Liberty Lane and kept a close eye on that as the Google Sheet expanded with more and more names and cases.

One thing you’ll notice if you get your hands on that Google Sheet (Should be in the Reddit comments of the original Liberty Lane post iirc) is that all the dates were in or before 2019. This was before v3 onions were widely adopted and very long before v2 onions were deprecated. People, at the time, speculated that it had something to do with v2 onions being compromised. While we know v2 onions were weaker than v3, obviously, saying they were compromised is a long reach. v2 onions were much easier to perform correlation attacks on, this could’ve resulted in a few arrests, but not a network-wide compromise. v3, the onions we use now, are much harder to perform analysis on, while not impossible. Vanguards is also now built into the network, further enhancing analysis defenses.

I think it’s good to be weary, but you can only worry so much about the unknown. I do not believe any government has a golden key into the Tor network. It always has been, if you’re a big enough target, there will always be a chance you get caught. One mistake which leads to a guard node being compromised or discovered which leads to your IP that you use at your home and you’re caught. Take precaution. The same applies to this. I think what happened with Liberty Lane was similar to what happened with BoysTown. They used traffic correlation at ISP/IXPs and confirmed who was using which nodes that connected to illegal sites and nabbed them after investigating further to confirm they did.

Correct. KAX-17 gets its name from being recognized at originating in 2017. In 2019, the Tor Directory Authority (DA) removed many suspicious nodes (not originally thought to be KAX-17 but later was said to be KAX-17) and continued removing them up until 2021, when a massive network purge was attributed to being KAX-17 nodes.

Most of these relays were middle relays and very few of them had the guard flag attributed to them, meaning it wasn’t possible to get them as a guard. From this, it is likely that if a government was running this operation, it would be akin to what happened with the BoysTown case. They would use the middle relays to try and find your guard node and then narrow down who you are based on regular investigative techniques. It is not something that is all encompassing and would still take a lot of work to deanonymize someone. Furthermore, it is not targeted. So if you were looking for a specific someone, you would need to wait until they connected to your nodes, if ever.

Sidenote: It is believed that BTCMITM20 could also possibly be the same threat actor as KAX-17. While the motive for KAX-17 spinning up a bunch of middle relays is unknown, the motive for BTCMITM20, as the name implies, was to modify BTC addresses by spinning up a bunch of exit relays. If you entered a BTC address while connected to one of these exits and attempted to send money, it would go to the threat actors wallet. If these two are related is still debated, though.

1 Like

That was just an example. I was trying to point out that you would NEED to be very covert and it’s not a simple process. You are just stating my exact point. Reread the second to last sentences. “They would need to slowly introduce nodes while staying under the radar.” You are just restating what I said. But again, this doesn’t guarantee they stay under wraps, they could get removed at anytime by the Tor DA, so they would want to act quickly or they could risk losing all their nodes. They would need to find a sweet spot at moving as quickly as possible and not being caught.

Yes, there is padding in the Tor network but it is minimal to maintain the low-latency of the network. Every low-latency network is susceptible to end-to-end correlation, including Tor with these protections. These would only protect against very simple time-correlation. These defenses would not thwart a highly skilled adversary that can see traffic near the client and the destination.