Back to news
Introducing Lion: A Self-Hostable IoT Server for Zephyr Devices

Introducing Lion: A Self-Hostable IoT Server for Zephyr Devices

By Jared Wolff 8 min read

📺 Prefer to watch? This post is the companion writeup to the Updates! livestream, which includes a live tour of the Lion dashboard.

I’ve been wanting an IoT backend that actually fits the way I build with Zephyr for a long time. Self-hostable. Open source. CoAP + DTLS. Sensible OTA. Affordable to run logs over cellular. Designed around the realities of sleepy NB-IoT devices, not always-on Wi-Fi gadgets.

Nothing out there hit all of those at once.

So I built one. It’s called Lion, and the preview is open for signups at lioniot.dev.

This post is the engineering story — the design decisions worth talking about, and why I made them. If you just want the product page and the signup form, head over to lioniot.dev. If you want the why, keep reading.

Three Iterations

Lion isn’t my first attempt at this. The first one was MQTT-based, built well before AI tooling existed, and I spent months on block-wise OTA transfers and various plumbing. It worked, but it never felt polished enough to ship. So it sat.

The second pass got further but had the same problem — too many rough edges, too much surface area to maintain solo.

This third attempt is different for two reasons: AI-assisted iteration is genuinely a 10x productivity boost when used well, and I’d accumulated enough Zephyr work over the years (plus a lot of inspiration from Golioth, who paved the way for a lot of these patterns) that I knew exactly what I wanted to build.

Along the way I also ended up writing a brand new Rust crate and contributing PSK support to the DTLS crate I’m using. Standing on the shoulders of giants, etc.

What It Does

Lion is the server side of the equation. It pairs with a Zephyr client library that runs on your device. The headline features:

  • Device management — projects, devices, roles, user permissions
  • OTA firmware updates — upload images, target devices, track rollout
  • Real-time telemetry — per-device and project-wide views
  • Webhook export — forward telemetry into your own systems
  • Console log streaming — your device’s logs, in the dashboard, without breaking your cellular bill

Lion telemetry chart for a single device

That’s the bullet-list version. The interesting part is how a few of those things work.

Device Shadow as the Source of Truth

I borrowed the device shadow idea from AWS. A shadow is a JSON document representing the device’s state, shared between server and device. The device pushes updates to its half; the server pushes updates to the desired half; they reconcile.

I like it because it gives you one mental model for everything: state, configuration, and — here’s the part with a twist — commands.

Lion device detail page showing shadow, actions, and info

Commands live inside the shadow

The conventional approach to “send a command to a device” is to build a queue. Pending commands sit somewhere, the device pulls them, ack’d ones get cleared. That’s a bunch of infrastructure for what’s really a simple problem.

Lion puts commands directly in the shadow with a generation counter. To reboot a device, the server bumps the reboot counter from 9 to 10. The device sees the new value on its next sync, runs the command, and stores 10 as the last-acted generation.

I’ll be honest: when I first saw AWS doing this, I thought it was silly. Why am I incrementing a number? It only clicked when I ran into the problem myself. The counter is the indicator. No queue infrastructure required, no “is this command stale” logic, no separate ack path. The shadow is already the thing being synced — let it carry commands too.

I’m sure people will have opinions on this. I’d love to hear them — the community is the place.

Config polling, not CoAP observe

CoAP supports a beautiful thing called observe — the server can push state changes to the device, and the device gets them in real time over the existing connection.

I’m not using it for the main config sync. Here’s why: the nRF9151 (and most NB-IoT/LTE-M devices) is a sleepy device. NAT timeouts on cellular networks are aggressive enough that you can’t reasonably keep a CoAP connection alive long enough for “real-time” push to be meaningful. The connection drops, you re-handshake, you’ve burned battery and data for the privilege of maybe getting a push notification a couple seconds faster.

Instead, the device polls config when it’s already publishing telemetry. The connection is open anyway. You’re already paying for it. Pull the latest shadow, apply it, done.

The dashboard does use observe for the “real-time” UI, where it’s appropriate. But for sleepy devices, polling on existing wakeups is the right call.

Console Log Streaming Without Going Broke

This is the design choice I’m most pleased with.

Sending Zephyr LOG_INF() output over cellular sounds prohibitively expensive — and it is, if you send the strings. A handful of verbose logs per minute, multiplied across a fleet, will eat your data budget alive.

Zephyr has a built-in dictionary mode for logging that solves this. At compile time it extracts all your log strings into a dictionary file. At runtime, the device sends only short tokens — basically “log line 1, log line 2, log line 3” — and the host side expands them back into the human-readable strings using the dictionary.

If you’ve used defmt from Ferrous Systems / Knurling, this is the same concept they apply to RTT/SWD-based logging in embedded Rust. They use it to push way more log volume over the wire than would otherwise fit. Lion uses the same idea, but over CoAP/DTLS instead of SWD.

The result: I can leave logging on for fielded devices and actually get useful diagnostics back without dreading the data bill. I’ve personally been on the wrong side of “the device is failing in the field, can you send me logs?” too many times. Now I can.

The dashboard expands the dictionary on the server side and shows you the readable log stream. The device just sends bytes.

Lion console showing dictionary-decoded log stream from a device

OTA: Block2 + MCUboot

Standard ingredients, used the standard way:

  • Block2 CoAP transfer — device requests the firmware, learns the total size, downloads it in chunks. Resumable, low overhead, works fine on flaky cellular.
  • MCUboot swap — chunks land in the secondary slot. On reboot, MCUboot validates and swaps. Standard double-buffered upgrade.

Nothing exotic here. Worth saying out loud because the boring choices are usually the right ones.

Firmware Versioning That’s Actually Useful

I’ve shipped firmware with SemVer. SemVer is fine, but for OTA tracking it’s basically whatever number the developer felt like typing. It doesn’t tell you which commit, which branch, or whether two devices “on 1.4.2” are actually running the same code.

Lion’s client uses a different scheme:

<branch>-<count>-<hash>
  • branch — which branch the build came from. So main-63-dca3856 is unmistakably different from feature-x-12-abc1234.
  • count — total commits in the repo at build time. Monotonically increases.
  • hash — short commit hash. Pin-points the exact code.

The CMake integration pulls this from your application’s git repository — not from lion-client itself. That’s the right choice, because the lion-client version is irrelevant — what matters is what’s running on the device, and that’s your app code.

You can still ship a SemVer string alongside it if you want one for your release notes. But for the “what’s actually on this device” question, the branch/count/hash format has been reliable for me.

Bootstrap PSK with Dual-Key Rotation

Provisioning uses a bootstrap PSK — a pre-shared key the device uses on first contact to request its own per-device PSK from the server. After that, the device-specific PSK is what’s used for all DTLS traffic, and the bootstrap PSK is no longer required for that device.

Rotating a bootstrap PSK is a multi-step dance because all your fielded devices need to learn the new one before you can revoke the old one. Lion supports this directly:

  1. Provision a new bootstrap PSK alongside the old one.
  2. Push a firmware update to your fleet that knows about the new key.
  3. Once you’ve confirmed everyone’s on the new firmware, revoke the old PSK.

PSK isn’t the top of the security pyramid, but it beats clear text by a wide margin and it’s a good fit for resource-constrained devices. The dual-key rotation flow makes the operational story workable.

Per-device PSKs can also be rotated independently from the device side, and the server tracks last-rotated timestamps so you can audit “when did this device last refresh its key?”

Licensing

  • Zephyr client libraries: Apache 2.0. Real shoutout to Golioth for the inspiration here — they did a lot of thinking about how a Zephyr-side IoT client should be structured, and that thinking shaped a lot of decisions in the lion-client.
  • Rust server: BUSL-1.1. Self-hostable, source-available, with the standard BUSL grace period before transitioning to a permissive license.

What’s Next

  • Public preview signups are open at lioniot.dev — get on the list to hear when devices start onboarding.
  • CoAP over Thread support is on the near-term list. The shadow/command/OTA stack should map cleanly onto Thread devices.
  • More device targets beyond the nRF91 family.
  • Real-world deployment testing across more network conditions.

Want to Talk About It?

If any of these design decisions made you go “wait, why?” — that’s a good sign and I want to hear about it. The shadow-as-command-queue choice and the polling-vs-observe call are both ones I expect people will have opinions on.

Last updated April 23, 2026.