letters90 8 months ago

I used nspawn to get a system running in the most ridiculous way.

A debian aarch64 vm on kvm starting a systemd-nspawn for an unpacked raspberry pi 3 iso.

It works way too well judging by how ridiculous it was.

Still saved me a few days instead of setting things up myself.

I actually liked how easy it is to spin up nspawn as a systemd service

  [Unit]
  Description=Raspberry Image Machine
  After=multi-user.target

  [Service]
  Type=simple
  User=root

  ExecStart=/usr/bin/systemd-nspawn -D /mnt/ /sbin/init

  [Install]
  WantedBy=multi-user.target
  • vaylian 8 months ago

    You might want to look into .nspawn files instead. Then you can also manage your nspawn-containers with the machinectl command.

    See man 5 systemd.nspawn

    And many command like systemctl and journalctl accept the -M parameter, which allows you to query systemd units inside your nspawn-containers from the host.

    edit: The article actually explains all of these things in more detail.

  • i_v 8 months ago

    I used to use qemu-user-static to run ARM Linux distros like Buildroot, Yocto, and Raspbian on x88_64. It worked surprisingly well! Outside of some minor bugs here and there, it was perfect for local development, emulating an embedded system I was working on.

  • Vilian 8 months ago

    Why run the Debian VM? Just use nspawn directly

  • Imustaskforhelp 8 months ago

    hmm this is very interesting.

    I am wondering though? Is there something like systemd-nspawn that doesn't require root?

    • vlowrian 8 months ago

      If file system level isolation is enough for you, take a loot at schroot (https://linux.die.net/man/1/schroot) which allows root-less chroot. You can use something like debootstrap to get a complete userland into a user controlled directory and use schroot to chroot into it without root level access.

      • Imustaskforhelp 8 months ago

        this is crazy , trying this out right now.

        But is there a way to also run OCI compatible directly on this as well?

        • mst 8 months ago

          You could use docker export to sluro the container contents (see article for example)

      • Imustaskforhelp 8 months ago

        EDIT: it seems that for creating a chroot you still require root.

        I don't have root on that system and so I can't create a chroot , there is fakeroot but it doesn't work since it uses qemu on that locked system.

        Are there any other alternatives

        • NekkoDroid 8 months ago

          > it seems that for creating a chroot you still require root.

          You actually don't as long as you have user namespaces.

          One thing I am working on I use chroot (rather unshare --root=) to minimally sandbox a subprocess. At the beginning of the script I have this little snippet:

              if [ "$(id --user)" -ne 0 ]; then
               exec unshare --map-root-user --mount -- "$0" "$@"
              fi
          
          Though you can probably just do something roughtly as `unshare --map-root-user --root=<PATH>`.
        • ttyprintk 8 months ago

          Fakeroot is good for the debootstrap step, and then schroot runs unprivileged.

        • igor47 8 months ago

          fakeroot has nothing to do with qemu -- it simply uses LD preload to make commands think they're uid 0

    • derobert 8 months ago

      It looks like systemd-nspawn is gaining rootless support, see https://github.com/systemd/systemd/issues/30239

      Until then, I'm not sure if there is anything lightweight. If you don't need lightweight, there is Podman.

      • NekkoDroid 8 months ago

        Do note that the current support is limited to signed disk images, while it was recently (still not in a release) gained the ability to use any directory that resides inside a signed disk image (instead of just the entire disk image).

      • Imustaskforhelp 8 months ago

        Podman requires one time root for installation though.

        I am on a completely rootless client at one of my servers.

        • tobwen 8 months ago

          Nope, you can compile/download and run it completely from unprivileged userspace.

    • 1oooqooq 8 months ago

      all containers require root.

      docker and the rootless nonsense is just root daemons and suid.

      ...would never have believed marketing lies would reach linux tools if anyone told me this before 2018.

      • yjftsjthsd-h 8 months ago

        Linux user namespaces can be used to create containers without having root access, see ex. https://unix.stackexchange.com/questions/66084/simulate-chro...

        There's also https://github.com/termux/proot-distro which may or may not count as containers depending on how you define the word but I think it does count

        • 1oooqooq 8 months ago

          you can't detach your username from a process, nor the network ns... etc, etc, etc.

          yeah you can do some smaller fakechroot and maybe some bind mounts... if you call that a "container" good for you.

          • yjftsjthsd-h 8 months ago

            > you can't detach your username from a process, nor the network ns... etc, etc, etc.

            Sure looks like it works?

              $ unshare -i -n -p -u -T -r -f
              # ls
              # id
              gid=0(root) groups=0(root),65534(nogroup)
              # ip -br a
              lo               DOWN
            
            > yeah you can do some smaller fakechroot and maybe some bind mounts... if you call that a "container" good for you.

            Why are you being condescending about what constitutes a container?

      • Imustaskforhelp 8 months ago

        you can theoretically run a virtual machine like libriscv5 which doesn't require root. or qemu doesn't require root as well. But qemu is blocked for my usecase. There is flatpak theoretically as well

        There is podman but it requires one time root.

        • 1oooqooq 8 months ago

          qemu is great but it's a VM, not a container.

josteink 8 months ago

I've used lots of different container-types over the years to replace VMs with lightweight containers, but right now I'm running systemd-nspawn, and I really, really like it.

The way it integrates with systemd, both inside and outside the container makes it a no-brainer for app-isolation when the app in question is a bit too complex for just being a service-unit in itself, and you don't want to lose observability by hiding everything behind some obscure docker wall.

The way everything integrates into systemctl and you can get aggregated stats for your entire machine and all its sub-containers... Amazingly nice.

I just can't imagine any better way of managing containers on a Linux system than this.

Only thing I would complain about is the name. They really could have come up with something a bit more catchy or self-descriptive. This is probably the only systemd type service which does not immediately shout out what its about, so most people are probably not even aware that systemd can manage containers for you.

trurl42 8 months ago

> Unfortunately, though, most developers don’t even know that there are options outside of Docker, or that they’re not as “convenient”.

> Hopefully, this article has disabused some of that notion.

If that was the goal, it seems terribly complicated when compared with podman.

  • throwaway894345 8 months ago

    I was thinking similarly. All of those steps to circumvent the OCI image infrastructure just to use systemd…

    • josteink 8 months ago

      OCI is for running prepackaged software in black boxes from the internet, where you have no interest or ownership of the container internals.

      Most of my containers are not like that. Well, actually none are.

      systemd-nspawn is for running your own containers, with a VM-like usage pattern (ie not immutable), deployed as part of your overall systemd based infrastructure for when the thing you need to manage is "too big" to be deployed as its own systemd-service unit, but you still want to be able "to systemd" it.

      This fits my use-case perfectly.

      • fburnaby 8 months ago

        This distinction is a more useful one that the article made. I love dockerfiles and immutability, but there are good cases for mutable containers, too.

        • orbisvicis 8 months ago

          You can also do some neat things with "--ephemeral" and "--volatile" to basically overlay the image (or a subset) with tmpfs; any changes to those overlays will be lost when the container is brought down. The specific mount points can be controlled in greater detail via "--tmpfs" and "--overlay".

          https://0pointer.net/blog/running-an-container-off-the-host-...

          I'm not sure how easy that is to customize in Podman.

        • throwaway894345 8 months ago

          Containers already are mutable on all popular runtimes. “Immutability” comes from destroying and recreating them from their image, but there’s nothing forcing you to delete/recreate them, and indeed that’s not even the default behavior.

      • throwaway894345 8 months ago

        I think you’re conflating packaging and runtime. OCI images are a packaging format while systemd-nspawn is a runtime. Runtimes and package formats are orthogonal.

        > systemd-nspawn is for running your own containers, with a VM-like usage pattern (ie not immutable)

        Containers aren’t immutable (OCI or otherwise). Again, I think you’re conflating images (the package formats are orthogonal) with their runtime instantiation, the container. OCI images like VM images are immutable, but containers and VMs are mutable.

        My main objection to systemd-nspawn (at least as described in the article) is that it lacks a complementary package manager (or rather, that there’s no remotely convenient way to run software packages with it) and so you have to create your containers with manual changes and dodgy bash scripts. Regardless of what runtime you use, that seems like a not-very-maintainable way to manage software.

  • moondev 8 months ago

    Author should consider running it inside Docker for more convenient setup.

    • exceptione 8 months ago

      Never. If he wanted to go the containers route, Podman is there. There is no reason to use Docker anymore. (Only a satellite tool like docker-compose is not 1-1 compatible with podman-compose, but podman has other ways to orchestrate with systemd as part of podman vision for orchestrating.)

proxysna 8 months ago

Used nomad in my homelab to run nspawn containers with nspawn driver[1]

Surprisingly simple and low footprint solution and genuinely pleasant to work with, since it is very similiar to managing a Systemd service.

[1]https://github.com/JanMa/nomad-driver-nspawn

  • JanMa 8 months ago

    Happy to hear you like the project :-)

orbisvicis 8 months ago

I use nspawn but many of the helpers featured here are new, so I appreciate this article. I've only ever booted from directories rather than images, and wasn't aware that an image could mount its own partitions, even swap!

Also I'm a little unclear on the security implications of "--private-users=id". Yes the user IDs are the same, but it is technically running in a separate user namespace. In terms of security is this mode equivalent to privileged containers, or is it safer?

romaniitedomum 8 months ago

Redhat's Leapp, for upgrading between major releases of RHEL, uses systemd-nspawn to create a container where it can test installing the packages without interfering with the running OS.

arminiusreturns 8 months ago

It's really one of those little gems not very many people know about or use, but it seems from the responses that is changing.

As Brendan Gregg said: "Containers are just processes, cgroups, and namespaces."

  • robertlagrant 8 months ago

    Dockerfiles are just a really nice, standard way of specifying them, along with ports, networks and persistent storage.

egorfine 8 months ago

On an unrelated note, is there a way to share some negative feedback on systemd projects without incurring significant hit to karma?

  • abenga 8 months ago

    Do novel issues get a negative reaction? Retreading old grievances is pointless, but I think if you have a reasonable new gripe (that's not dae hate systemd like me?)you would be just fine.

    • egorfine 8 months ago

      Thing is, old grievances keep recurring here and there in systemd projects without being resolved. But I agree, wining for ten years is futile.

MrDrMcCoy 8 months ago

Nspawn would be perfect if it exposed the full suite of service file security and resource control features. Due to their absence, I've been exploding containers into directories and writing my own service units to manage the pseudocontainers.

exabrial 8 months ago

There are lot of ridiculous things in systemd (I'll avoid mentioning specific things to avoid a flame war), but auto containerization of services is by far the most useful thing they've ever come out with. It's a far easier workflow than docker or anything else and is built in "for free"

  • MrDrMcCoy 8 months ago

    I agree for the most part, but would love to see more of the security and resource control features from service units, as well as some better tooling around image management / importing from existing registries. Hopefully this will come in the not too distant future.

INTPenis 8 months ago

I recently ran into them and honestly they seem unnecessarily complicated compared to using Podman and OCI images.

nesarkvechnep 8 months ago

systemd-nspawn is great! It's well integrated with the init system, works as expected.

kragen 8 months ago

This is very interesting! I only heard about systemd-nspawn last night.

  • josteink 8 months ago

    Most systemd-projects have a name which immediately shouts out what it does, so you can easily tell if it is relevant for your needs or not.

    systemd-nspawn is probably the only project without such a name, so most people don't know about it, nor what it does, and therefore never looks any more into it.

    And that's a shame really, because it's fantastic technology.

    • NekkoDroid 8 months ago

      > systemd-nspawn is probably the only project without such a name

      Add sd-tmpfiles to the list IMO. While it still create and manages temporary files its more managing almost any type of system file. From creating them to managing their permissions or making symlinks when needed.

      I am a strong advocator of renaming it systemd-sysfiles to match the systemd-sysusers which is somewhat related (e.g. tmpfiles using users created from sysusers). But it probably won't happen for a while if at all due to backwards compat.

    • Ferret7446 8 months ago

      How so? nspawn means spawn a process in a new namespace, which is... exactly what it does. The problem isn't with systemd-nspawn, the problem is with containers, because the vast majority of devs have no idea that containers are just scripts to set up Linux namespaces.

      • josteink 8 months ago

        > because the vast majority of devs have no idea that containers are just scripts to set up Linux namespaces.

        That’s IMO framing things a bit backwards.

        That’s how containers are implemented… today. on Linux. On Windows it’s completely different. On MacOS it’s completely different again.

        And what makes you think «namespaces» are a term unique to containers? It’s used throughout tech for a million other platforms too.

        Containers however is a well defined concept, regardless of how they are implemented today, and on one platform only.

        systemd would probably see more use of their container-platform if they put «container» in the name, that much seems obvious to me.

houzi 8 months ago

Does breaking out of the container give you root?

  • josteink 8 months ago

    > Does breaking out of the container give you root?

    You can run unprivileged containers, and in that case, no.

  • 1oooqooq 8 months ago

    that means terminating the process, so good luck with that.

  • kennysoona 8 months ago

    I would think so, that would seem in line with systemd's architectural design decisions.