Early on in my career it was pretty common to use clusters comprised of a pile of heterogeneous unix systems: some Sun boxes, some Linux machines, maybe IRIX and AIX in there too.
The thing that made them into a single cluster was that your user account existed on them all, and you had the same home directory on them all: make files on one machine, and they're visible on the other machines; but you still had access to the machine specific features of each host.
The technology of the time often used Network File System (NFS) and Network Information Service (NIS) (formerly known as Yellow Pages, with that name living on in the yp
prefix of commands like yppasswd
).
Fast-forward a decade or two and things look different: virtual machines are thing now, and more recently, containers. It's now very common to custom build a virtual machine or a container, both with something approximating an entire OS, specifically for running one application, or for running just a single piece of one application.
So maybe you'd connect these pieces - virtual machines or containers - with some kind of socket connection: a web front end exposing HTTP and talking to a PostgreSQL database in another container with no shared files between them.
I did a bunch of stuff this way, and it was great: you can install separate package stacks in isolation from each other. Want this weird version of a library or compiler? Or to run some curl | sudo
script without messing up the rest of your system? Or stick with an old distribution of your OS just for one tool? Easy.
But it was a pain getting getting files between different places. Got my text editor and version control set up in one place, but need to compile in another? There are all sorts of different ways to get files between those places: for example, commit regularly to version control; or rsync.
Docker running on one host has options for mounting pieces of the host file system inside containers; but I lacked a good idea of what to mount where.
It was all so simple before: you had ~
everywhere, and nothing else.
So I started using the unix cluster model, described at the top of this post, to guide how I set up a lot of my containers and virtual machines.
The actual technology (NFS, docker volume mounts, YP, LDAP, HESIOD, ...) isn't massively relevant: I've used different mechanisms in different places.
What really matters is: all the (regular human) users get their home directory, mounted at the same place (under /home
).
Pretty much with most ways of sharing files, that means the unix user id for that user should be the same everywhere too.
I've implemented this basic model in a few different ways: for a couple of VMs inside the same physical server, a traditional NFS and OpenLDAP setup (NFS for file sharing, LDAP for distributing account details) which is a more modern replacement for NFS/NIS; on my laptop and some of my physical servers, I've got a wrapper around docker called cue
which creates exactly one user (the invoking user) inside the container, and mounts their home directory appropriately; I have some ad-hoc docker server containers (eg inbound SMTP) where the whole of /home
is volume-mounted, and then OpenLDAP is used to share accounts.
There are plenty of downsides: for example, your whole home directory is accessible in more places than it needs to be and so is more vulnerable; you can't access files outside your home directory, so ~ is now a specially magic directory; posix filesystems work badly in distributed systems. For lots
of what I want, these downsides are outweighed by the upsides.
One twist that doesn't happen so much with a cluster of physical machines: a server such as a mail server is now a container which has a mail queue which I want to persist across rebuilds. Rebuilding would be unusual in the physcial machine model because you don't usually rebuild physical servers often. So where should that persistent data go? Inside a specific /home directory? in a /container-data directory that is mounted too, like an alternate version of /home? What user-id should own the queue? Different builds of a container might assign different user-ids to the mail server.