First, a short recap of what this project is doing: the Wi-Fi stack for the ESP32 (a popular, cheap microcontroller) is provided through a binary blob. We’re trying to reverse engineer the software and hardware to build our own open-source Wi-Fi stack. This will enable features that the current, closed source ESP32 Wi-Fi implementation does not have, for example 802.11s mesh networking. It will also improve the auditability of the code. We’re currently writing a new Wi-Fi stack for the ESP32 from scratch, in Rust. For more details about how we reverse engineered the hardware, see previous articles.

In this article, we’ll talk about what mesh networking is, the existing options for it on the ESP32, why those are not ideal, and what we’re doing about it.

What is a mesh network Link to heading

A standard Wi-Fi network has two types of wireless nodes: a client (think phone, laptop, …) and an access point. Clients associate with the access point, after which they can send and receive data packets to/from the access point. The standard problem with this is that if the client moves out of range of the access point, it will lose connectivity. To help solve this, mesh networking was invented: by using mesh networking, clients can act as a sort of repeater which relay packets for other clients. That way, even clients which are not in range of the original access point, can still have internet connectivity.

Regular Wi-Fi network Link to heading

Regular Wi-Fi network

Mesh Wi-Fi network Link to heading

Mesh Wi-Fi network

Existing solutions Link to heading

Espressif (the company behind the ESP32) already provides several mesh implementations that use the radio peripheral on the ESP32:

ESP-MESH (= ESP-WIFI-MESH)
ESP-MESH-LITE
ESP-BLE-MESH

ESP-MESH Link to heading

ESP-MESH (which they also call ESP-WIFI-MESH) is based on each ESP32 having two interfaces: one SoftAP (access point mode) interface, for downsteam connection, and one station interface for upstream connection.

Topology of ESP-MESH

As you can see, the nodes in ESP-MESH have a tree topology. This is not ideal: all packets of the mesh have to go through the root node, even if other nodes are within reach of the AP.

ESP-MESH does not provide real IP connectivity to nodes: nodes can use a limited set of protocols to communicate with outside servers: they can set a HTTP/MQTT/JSON or raw binary payload. This message will then be sent throught the tree until it reaches the root node. The root node will then initiate an actual HTTP/MQTT/… request with the data in the packet. This rather restricts us: only the root layer has an active TCP/IP stack, so the child nodes can’t use arbitary TCP/IP protocols (for example, SSH); they can only communicate in terms of HTTP/MQTT/…

Mesh bidirectional stream

A third drawback of ESP-MESH, is that you can’t easily connect to nodes in the mesh from another client connected to the access point.

ESP-MESH-LITE Link to heading

To solve some of these drawbacks, Espressif created ESP-MESH-LITE, which confusingly has a better feature-set than ESP-MESH. I suppose the -LITE postfix is because it uses less resources?

ESP-MESH-LITE also uses the two interface solution, and they also have a tree topology. However, instead of only allowing certain types of payload over the mesh, every node has an active IP stack, and uses NAPT (basically NAT, Network Address Translation) to translate IP addresses on each node. This makes it so that nodes in the mesh can initiate arbitary TCP/IP connections to the outside; but you still can’t easily initiate connections from outside the mesh to nodes in the mesh. What you can do, is set up NAT portforwarding rules on every node to forward incoming connections from outside of the mesh to the correct nodes below, but that is tedious, and you also need to choose beforehand which ports need to go to which hosts/port combination.

Mesh bidirectional stream

However, recently, Espressif posted the following note in the ESP-MESH-LITE repository:

Dear Developer Friends, Thank you sincerely for your continued support and interest in the ESP-Mesh-Lite solution. In response to current technology development needs, we would like to make the following announcement: Starting today, the ESP-Mesh-Lite open-source repository will enter Maintenance Mode, during which only essential security patches and critical bug fixes will be provided. Future technical iterations and commercial support will be transitioned to our private solution repository, designed specifically for enterprise users. The private repository will offer:
Access to tailored feature development to meet your specific project requirements
Early access to pre-release technical updates for faster integration and validation
Dedicated one-on-one technical support to ensure smooth development and timely issue resolution
To ensure uninterrupted project continuity and access to technical support, we recommend contacting us via email at sales@espressif.com to join our private solution repository program.

ESP-BLE-MESH Link to heading

Espressif also provides Blueotooth meshing for the ESP32. This is not that interesting in this project, since as far as I can see, they don’t really do TCP/IP over Bluetooth. The esp32-open-mac project itself is currently also only concerned with Wi-Fi, not Bluetooth.

Why 802.11s is better Link to heading

ESP-MESH and ESP-MESH-LITE are both Espressif-proprietary protocols: other devices likely won’t be able to interoperate in their mesh networks without significant engineering work: some third-party devices might be able to associate with the SoftAP interface, but they likely won’t be able to provide downstream connectivity for other nodes.

The Wi-Fi standard (IEEE 802.11) actually specifies a standard for mesh communication: 802.11s. This standard has several advantages over using ESP-MESH or ESP-MESH-LITE:

It’s specified in a standard, so other mesh devices can interoperate, as long as they follow the standard
It does its meshing on the MAC layer instead of on the IP-layer (ESP-MESH-LITE) or on the application layer (ESP-MESH). This makes the mesh basically transparent for other devices connected to the root access point: these other devices can directly connect to devices in the mesh and vice versa.
It does not force a tree topology: each node in the mesh can have multiple paths to the router. That way, the mesh network can cope better with nodes going down.

The proprietary Wi-Fi stack on the ESP32 unfortunately does not (yet) implement 802.11s standards-compliant meshing. Fortunately, we’ve reverse engineered enough of the Wi-Fi hardware on the ESP32 to be able to start implementing this ourselves. This will be the first feature in the 802.11 standard that is not implemented in the proprietary stack, but is implemented in the open source stack.

Progress report on implementing 802.11s Link to heading

We’ve started implementing 802.11s in Ferris-on-Air. FoA is an experimental IEEE 802.11 stack, written in Rust, made for the ESP32. It already supported client mode and Apple AWDL. We now added a third interface mode: mesh mode. Note that the mesh networking support is not complete yet: we currently only support associating with a mesh network, but don’t handle any data yet.

Implemented (see https://github.com/esp32-open-mac/FoA/pull/4):

Sending / handling mesh beacon frames
Handling probe requests
Open mesh peering:
- The finite state machine to open/confirm connections to other Mesh STAs
- Timers required in the open/confirm FSM

Still to implement:

Sending/routing data frames
Encryption of data frames
Authenticated mesh peering
Beacon timing adjustment

There is still a lot of work to be done before mesh networking fully works. I believe that most of this work can be done without additional hardware reverse engineering; so if you like writing no-std, embedded async Rust, like reading a 7000 page standard, and are up for a challenge, feel free to join the effort.

All images used are from Espressifs documentation and licensed under Apache 2.0, see here and here