Introducing the LibreQos Bakery

• Herbert Wolverson

Introducing The LibreQos Bakery

Prior to version 1.5-RC1, LibreQoS read ShapedDevices.csv and network.json (in the LibreQoS.py script) to determine network hierarchy, device configuration and design the traffic control (TC) setup. Once the files were validated, the script would create a big set of tc commands, clear the current TC state, and apply the new configuration.

This approach worked, but it had limitations:

  • Users would experience a delay while the tree was rebuilt, with some devices not being shaped during this time. This was usually very brief—and we spent a lot of time optimizing it—but it was still noticeable. It was made worse by TC acquiring a lock for each command.
  • Users across the network would experience a tiny “burp” in network traffic as TC dropped their queues. In most cases this wasn’t noticeable.
  • Even if nothing had changed, the script would still clear and rebuild the entire TC tree.
  • Rapid changes (such as adding hotspot users) would cause the script to rebuild the entire tree.
  • You paid the RAM (and small CPU) cost of creating TC classes for every device/circuit, even if they were not in use.

The Bakery is a new feature that improves the way LibreQoS handles creating and managing traffic control configurations. It’s not complete: the next version will feature even more goodies. The new version helps in a number of ways.

Configuring the Bakery

In /etc/lqos.conf (or via the web interface), you can now setup the Bakery. It’s off by default. In the queues section of the configuration, you can add two new options:

lazy_queues = "Htb"
lazy_expire_seconds = 300

Your options for lazy_queues are:

  • Htb - The HTB (Hierarchical Token Bucket) tree is created, including speed limits for all devices. No CAKE qdiscs are created on start-up. This is the best option for most users; HTB queues are cheap, CAKE queues use more RAM and CPU.
  • Full - Only the network hierarchy is created, with no entries for devices. This is a good option for users with tight resource constraints, but it means that all devices get around 1 second of unshaped traffic when they first connect.
  • No (the default) - the dynamic bakery is not used, and all queues are created at start-up. Queues are not expired. It still benefits from the bakery!

The lazy_expire_seconds option sets the time after which unused queues are removed. This is useful for devices that are not always connected, such as mobile phones or laptops.

What Happens Next?

So how does the new bakery differ?

Tree Building

Once the bakery is configured, the scheduler runs as normal. LibreQoS.py still reads network.json and ShapedDevices.csv. Instead of directly executing tc commands, it creates a command buffer - describing the network. It also features some modifications to the algorithm:

  • Each tree node now has “gaps” in between the queues. This allows new queues to be added without rebuilding the entire tree.
  • Each node is sorted by circuit ID, greatly reducing the likelihood of “this customer was 03:04 and is unchanged - but is now 03:05”.

The TC buffer includes all of the CAKE and HTB commands, regardless of the lazy_queues setting.

Submission

The batched commands are submitted to the LibreQoS command bus, and picked up by lqosd - the core Rust portion of LibreQoS. They are then dispatched to the bakery subsystem. This is a very fast operation across a local UNIX domain socket.

State Update

Upon arrival, the bakery compares the received commands with its map of the current shaping state. If the commands are identical, it does nothing - no changes are made to the TC state (first optimization!). If the commands differ, behavior varies:

  • If the only changes are network.json speed limits, the bakery updates the HTB queues with tc change and no runtime impact.
  • If the only changes are network.json network node removals, the bakery removes the HTB queues with tc delete and no runtime impact.
  • If circuits were added, the bakery adds the new queue to the state tree (and adds it immediately if in No mode). No runtime impact to other circuits.
  • If circuits were removed, the bakery removes the queue from the state tree (and removes it immediately if in No mode). No runtime impact to other circuits.
  • Circuits that have had their limits changed currently trigger a full reload of the tree. This is a target for the next version, but we’re running into limitations with Linux’s tc system.
  • Large changes to the HTB tree (such as moving nodes around) will also trigger a full reload of the tree.

Dynamic Queues

If the bakery is in Htb or Full mode, once per second lqosd’s traffic monitor will send the bakery a list of circuits that have had traffic in the last second (just quick hash IDs, again - very fast). The bakery will then check the state tree for those circuits. In Htb mode, if a circuit doesn’t have a CAKE queue - it will create one. In Full mode, it will create both the HTB and CAKE queues for that circuit.

Every circuit in the tree has a “last seen” timestamp. Any circuits whose last seen timestamp is older than lazy_expire_seconds will be removed from the state tree, and tc resources will be freed. (Note that 0 means “no expiration”.)

This is a much more efficient way of managing traffic control: you only pay for what you’re using (RAM and CPU), and not rebuilding the entire tree every time a change is made can have a big performance impact. In my testing at iZones, the bakery has reduced the active circuit count by about 20%, and reloads have gone from being a regular event to once or twice a business day (when changes are made).

The Future

The bakery is a big step forward for LibreQoS, but it’s not the end of the road. Some full reloads are inevitable, but they are now rare. We’re working on reducing them further:

  • Circuits that move - maybe because smart Insight-enhanced binpacking realizes that they are likely to become busy - can be handled by creating a new queue and then (after a time delay) removing the old one (a transaction).
  • Circuits that change speed are problematic. If you run tc change on an HTB class that has children, the entire Linux traffic shaping subsystem deadlocks and needs to be restarted. We’re hoping to use the transaction system to create a new class, move the ID mapping, and then delete the old class.

Even with these limitations, the bakery represents a big improvement. Combined with StormGuard (I’ll write about it soon!), LibreQoS is becoming very robust and efficient.

Thanks to NLNet!