How NAT Traversal Works

An easy-to-understand and comprehensive introduction to NAT.

2023/07/03

Origin

This post How NAT traversal works is originally written by Tailscale, and I write some summary here. See the original post: https://tailscale.com/blog/how-nat-traversal-works/. And here is a Simplified Chinese translation: https://arthurchiao.art/blog/how-nat-traversal-works-zh/

Tailscale is primarily a company that specializes in offering secure remote access solutions for enterprises. Their product utilizes WireGuard for establishing secure connections, and for individuals, it provides an excellent means of accessing remote devices with the utmost directness.

Reason

It's been my constant desire to have stable, fast, and unlimited access to the devices in my home since I entered college. However, most remote access solutions require a relay machine with public IP address and inbound, which is hardly possible for students in Mainland China: VPS in China have small bandwidth and expensive traffic cost, and those overseas usually have high latency and slow speed due to GFW.

Therefore, when one pathway is blocked, it becomes necessary to explore alternative methods. NAT can be a significant obstacle to achieving direct connectivity, particularly when multiple NATs exist between machines, such as in a college and home environment (or even more so when considering routers at home/dormitory). It was during this period that I began to delve into the study of NAT and its mechanisms, leading me to discover and subsequently utilize Tailscale.

Summary

NAT traversal is rendered useless when either end of the network has a public IP address (IPv4 or IPv6) and allows inbound connections (TCP or UDP, any port is acceptable), as the other end can establish a direct connection without sophisticated steps (WireGuard can be useful when only UDP inbound is allowed). In such cases, the use of NAT traversal is eliminated.

In other cases, we can usually use STUN (Session Traversal Utilities for NAT) to clarify the shared public IP address and port of both ends, and if the destination remains the same for following connections, the traversal will be successful. Otherwise, we will fail.

Picky NAT
Picky NAT

Now that we’ve discovered that not all NAT devices behave in the same way, we should talk terminology. If you’ve done anything related to NAT traversal before, you might have heard of “Full Cone”, “Restricted Cone”, “Port-Restricted Cone” and “Symmetric” NATs. These are terms that come from early research into NAT traversal.

That terminology is honestly quite confusing. I always look up what a Restricted Cone NAT is supposed to be. Empirically, I’m not alone in this, because most of the internet calls “easy” NATs Full Cone, when these days they’re much more likely to be Port-Restricted Cone.

More recent research and RFCs have come up with a much better taxonomy. First of all, they recognize that there are many more varying dimensions of behavior than the single “cone” dimension of earlier research, so focusing on the cone-ness of your NAT isn’t necessarily helpful. Second, they came up with words that more plainly convey what the NAT is doing.

The “easy” and “hard” NATs above differ in a single dimension: whether or not their NAT mappings depend on what the destination is. RFC 4787 calls the easy variant “Endpoint-Independent Mapping” (EIM for short), and the hard variant “Endpoint-Dependent Mapping” (EDM for short). There’s a subcategory of EDM that specifies whether the mapping varies only on the destination IP, or on both the destination IP and port. For NAT traversal, the distinction doesn’t matter. Both kinds of EDM NATs are equally bad news for us.

In the grand tradition of naming things being hard, endpoint-independent NATs still depend on an endpoint: each source ip:port gets a different mapping, because otherwise your packets would get mixed up with someone else’s packets, and that would be chaos. Strictly speaking, we should say “Destination Endpoint Independent Mapping” (DEIM?), but that’s a mouthful, and since “Source Endpoint Independent Mapping” would be another way to say “broken”, we don’t specify. Endpoint always means “Destination Endpoint.”

In the original post, Tailscale divides the entire mapping process into two parts: NAT and firewall. NAT handles outbound mapping, while the firewall manages inbound rules. This division makes it clear that it's the inbound rules that determine the result of traversal, so the classification of NAT types is not important.

The subsequent part of the post discusses other aspects, including brute-force guessing, NAT64/DNS64, ICE, and more. However, these contents are irrelevant in normal circumstances and will be disregarded here.