29 January, 2018

Yellow Pages for Modern Times

Early on in my career it was pretty common to use clusters comprised of a pile of heterogeneous unix systems: some Sun boxes, some Linux machines, maybe IRIX and AIX in there too.

The thing that made them into a single cluster was that your user account existed on them all, and you had the same home directory on them all: make files on one machine, and they're visible on the other machines; but you still had access to the machine specific features of each host.

The technology of the time often used Network File System (NFS) and Network Information Service (NIS) (formerly known as Yellow Pages, with that name living on in the yp prefix of commands like yppasswd).

Fast-forward a decade or two and things look different: virtual machines are thing now, and more recently, containers. It's now very common to custom build a virtual machine or a container, both with something approximating an entire OS, specifically for running one application, or for running just a single piece of one application.

So maybe you'd connect these pieces - virtual machines or containers - with some kind of socket connection: a web front end exposing HTTP and talking to a PostgreSQL database in another container with no shared files between them.

I did a bunch of stuff this way, and it was great: you can install separate package stacks in isolation from each other. Want this weird version of a library or compiler? Or to run some curl | sudo script without messing up the rest of your system? Or stick with an old distribution of your OS just for one tool? Easy.

But it was a pain getting getting files between different places. Got my text editor and version control set up in one place, but need to compile in another? There are all sorts of different ways to get files between those places: for example, commit regularly to version control; or rsync.

Docker running on one host has options for mounting pieces of the host file system inside containers; but I lacked a good idea of what to mount where.

It was all so simple before: you had ~ everywhere, and nothing else.

So I started using the unix cluster model, described at the top of this post, to guide how I set up a lot of my containers and virtual machines.

The actual technology (NFS, docker volume mounts, YP, LDAP, HESIOD, ...) isn't massively relevant: I've used different mechanisms in different places.

What really matters is: all the (regular human) users get their home directory, mounted at the same place (under /home).

Pretty much with most ways of sharing files, that means the unix user id for that user should be the same everywhere too.

I've implemented this basic model in a few different ways: for a couple of VMs inside the same physical server, a traditional NFS and OpenLDAP setup (NFS for file sharing, LDAP for distributing account details) which is a more modern replacement for NFS/NIS; on my laptop and some of my physical servers, I've got a wrapper around docker called cue which creates exactly one user (the invoking user) inside the container, and mounts their home directory appropriately; I have some ad-hoc docker server containers (eg inbound SMTP) where the whole of /home is volume-mounted, and then OpenLDAP is used to share accounts.

There are plenty of downsides: for example, your whole home directory is accessible in more places than it needs to be and so is more vulnerable; you can't access files outside your home directory, so ~ is now a specially magic directory; posix filesystems work badly in distributed systems. For lots of what I want, these downsides are outweighed by the upsides.

One twist that doesn't happen so much with a cluster of physical machines: a server such as a mail server is now a container which has a mail queue which I want to persist across rebuilds. Rebuilding would be unusual in the physcial machine model because you don't usually rebuild physical servers often. So where should that persistent data go? Inside a specific /home directory? in a /container-data directory that is mounted too, like an alternate version of /home? What user-id should own the queue? Different builds of a container might assign different user-ids to the mail server.

21 January, 2018

A string of DNS protocol bugs.

I went to turn on DNSSEC for cqx.ltd.uk today - the server that signed it broken right before my Christmas busy period so I disabled DNSSEC on that zone until I got round to fixing it.

I've encountered three different apparent protocol implementation bugs in the space of a few hours:

  • Andrews and Arnold's web based control panel accepts DS records as generated by BIND's dnssec-keygen tool but then throws a complicated looking error when talking to Nominet, the UK domain registry, to put those records where they need to be. As far as I can tell, this is because the BIND output has whitespace in the middle of a hex string, something RFC 4034 s5.3 seems to think is acceptable. Why is installing crypto keys always so hard?
  • For a while, Hetzner's recursive resolvers were unable to verify (and therefore refused to answer) results for my zone. I have a suspicion (but I don't have much to go on other than a hunch) that this was something to do with DS records and the actual zone having some kind of mismatch - although Google Public DNS at 8.8.8.8, and Verisign's DNSSEC checker both worked ok.
  • I discovered an implementation quirk in the Haskell dns library, which I use inside a debugging tool I'm slowly building. This is to do with the mechanism which DNS uses to compress replies: where a domain name would be repeated in a response, it can be replaced by a pointer to another occurence of that name in the reply. It looks like in this case that the dns library will only accept those pointers if they point to regions of the reply that have specifically already been parsed by the domain name parsing code, rather than pointers to arbitrary bytes in the reply. This is frustratingly familiar to another bug I encountered (at Campus London) where their (not-so) transparent firewall was reordering DNS replies; giving a bug that only manifested when I was sitting in their cafe. (github issue #103)

18 January, 2018

findmnt

One of my customers uses cPanel to administer their internet facing servers.

It has an interesting virtual filesystem setup for sandboxing user accounts: cPanel creates, per user, a new root filesystem for that user, and then chroot into that before running user code (shells, php, ...).

To create that file system, cPanel uses bind mounts to make the root-jail file system look very much like the real root file system.

bind mounts are a thing that appeared after the 1997-era of me spending a lot of time learning Linux. (if it was invented post 2000, lol I've never heard of it)

In the intervening years, isolation techniques like this have been becoming more mainstream - for example, my main use of docker (via my tool cue) has been to prepare and use different root file systems.

Anyway, back to cPanel. I was trying to figure out how this virtual filesystem was constructed. Bind mounts don't appear in the output of mount or df or /proc/mounts with all the information I wanted: the mount just shows are being from the same device that its target is, without saying where that target is.

For example, I can see that /home/virtfs/x/usr/sbin goes to somewhere in the filesystem on s_os-lv_root but not where. (I can guess it's /usr/sbin in this case).

/dev/mapper/vg_os-lv_root   50G   21G   30G  41% /home/virtfs/x/usr/sbin
/dev/mapper/vg_os-lv_root   50G   21G   30G  41% /home/virtfs/x/var/spool
/dev/mapper/vg_os-lv_root   50G   21G   30G  41% /home/virtfs/x/etc/apache2
/dev/mapper/vg_os-lv_root   50G   21G   30G  41% /home/virtfs/y/usr/sbin

Anyway, surely this can't be the way things are??

So along comes findmnt, which gives me this info:

...
│ │ ├─/home/virtfs/x/usr/sbin     /dev/mapper/vg_os-lv_root[/usr/sbin]
      xfs         ro,relatime,seclabel,attr2,inode64,sunit=512,swidth=512,usrquota
...

... which tells me that yes that really is mounted from /usr/sbin.

Anyway, a nice new command to discover, around since only util-linux v2.18 in mid 2010.