Ben Clifford Technical Blog: 2014

01 December, 2014

dive computer subtitles for gopro videos

On a couple of dives recently, I had my own dive computer and wore my GoPro in head-mounted mode.

I thought it would be nice to have the dive computer info displayed on the GoPro video, so I hacked up https://github.com/benclifford/subsurface2srt which pulls data from subsurface and makes it into a VLC subtitles file.

One problem I have is that both the GoPro and the dive computer have manually set clocks, which can be set only to the nearest minute. So guessing a start offset between the video and the dive computer file is a bit hazy.

04 November, 2014

plane wifi

I was on a plane that had wifi for the first time. I think a 777-200 or something like it.

I didn't have much battery power left on my laptop and I didn't want to pay USD16 for just a few minutes; but I did have a poke around the network.

My laptop could see 2 access points with ESSID United_Wi-Fi and 10 with a blank ESSID.

I connected to one of the United_Wi-Fi APs.

They used NAT (I expect) and allocated me an RFC1918 address in subnet with about 500 usable IPs.

inet addr:172.19.248.97  Bcast:172.19.249.255  Mask:255.255.254.0

With each passenger carrying at least one wifi device, I wonder if they'll get near address space exhaustion. A 777 is supposed to be able to carry up to about 450 passengers in some configurations.

The default gateway is down at 172.19.248.1

There is a suggestion that DNS paywall tunnel hacks might work, though I didn't try - some hostname lookups gave me an IP address, and some gave an NXDOMAIN which suggests there is some off-plane communication happening even though the paywall was still in place.

$ host www.google.com
www.google.com has address 74.125.225.51
[...]
$ host blahfkskfdhs.com
Host blahfkskfdhs.com not found: 3(NXDOMAIN)

http GETs were all redirected to www.unitedwifi.com, hosted on-plane at 172.19.248.2.

An nmap of the 172.19.248.0/23 subnet gave 19 addresses responding to pings - I guess mostly passengers, but I guess crew too, and servers/routers.

The three interesting nmap results were:

Nmap scan report for ns.unitedwifi.com (172.19.248.1)
Host is up (0.0020s latency).
Not shown: 997 filtered ports
PORT    STATE  SERVICE
53/tcp  open   domain
80/tcp  open   http
443/tcp closed https
MAC Address: 00:0D:2E:00:40:01 (Matsushita Avionics Systems)

Nmap scan report for www.unitedwifi.com (172.19.248.2)
Host is up (0.0014s latency).
Not shown: 993 filtered ports
PORT      STATE  SERVICE
80/tcp    open   http
443/tcp   open   https
8080/tcp  closed http-proxy
16001/tcp closed fmsascon
16012/tcp closed unknown
16016/tcp closed unknown
16018/tcp closed unknown
MAC Address: 00:0D:2E:00:00:A8 (Matsushita Avionics Systems)

Nmap scan report for 172.19.248.3
Host is up (0.0019s latency).
Not shown: 999 filtered ports
PORT   STATE SERVICE
53/tcp open  domain
MAC Address: 00:0D:2E:00:40:01 (Matsushita Avionics Systems)

I didn't probe any more as my battery had run out.

30 September, 2014

Payment Wristband on the London Underground

I previously blogged about making a paytag sticker into a wristband. Later Barclays Bank released a variation: bpay, a prepay mastercard already in a wristband.

The wristband holder is pretty shitty and falloffable: it is bulky and I know two people (one being myself) who have lost their bands accidentally. I've rehoused mine on a woven bracelet.

Being a pre-pay band, this chip does an online authorisation for every transaction, making it sometimes a little slower. But for the same reason, they expose authorisations (not just cleared transactions) in their live online statement.

I recently made my first journey on the London Overground using bpay (I've been on their contactless payment trial for 6 months but using a different card) and I got to see an initial authorisation that I hadn't seen before with my previous (post-paid) card:

0908 Enter train system at Wapping station

0915 bpay sees this authorization:

     Auth: TfL Travel Charge,TFL.gov.uk/CP,GB  29/09/2014  9:14:50  Posted On: 29/09/2014  GBP 0.10

0922 Leave train system at Shoreditch High Street

then around close of business on day+1, that Auth gets replaced with the actual charge:

    Fin: TFL.GOV.UK/CP,VICTORIA,TFL TRAVEL C   30/09/2014  18:07:58  Posted On: 29/09/2014  GBP 7.20

Interesting that they charge 10p for authorization rather than the minimum single fare. Also note that the description of the transaction changes (to something less readable, IMO) - that seems to happen from other merchants too. Weirdos.

02 September, 2014

Boris bike tidal flow

Docking status information is available in XML for the London bike hire scheme ("Boris bikes").

I made this video (AVI) (animated GIF) of tidal flow as areas get busy or empty during the day, an animated version of the image below using data from Saturday evening until Tuesday lunchtime.

Each point represents a docking station. You can see how the shape of this cloud sits over London on this Google map. Blue means empty docking station. Red means full docking station. Light blue and light red mean almost empty and almost full, respectively.

Not so much on Saturday and Sunday, but clearly (to me) on Monday you can see a 9am rush hour move of bikes into the centre, and a 5pm move of bikes back out to the edges again.

27 August, 2014

ffmpeg X video capture...

It turns out ffmpeg can video-record an X server. I'm using this to capture video of a set of web browsers running tests inside Xvfb virtual frame buffers.

ffmpeg -f x11grab -s 1024x768 -r 4 -i :1 -sameq screencast.flv &
VIDEOPID=$!
xeyes # or some other X-based automated testing program
kill $VIDEOPID

05 August, 2014

ping reverse dns

Slightly unexpected hostname lookup on a CNAME.

maven.op is a CNAME to lulu; the reverse DNS points only to lulu.

Someone has already figured out the lulu hostname by the first line of output because it's shown there. But the ping lines for a few seconds show the name I gave and then turn to the "real" hostname (perhaps when a reverse DNS happens? rather than using the name that we looked up forward-wise to begin with?)

No big deal, but slightly unexpected.

benc@utsire:~$ ping maven.ops.xeus.co.uk
PING lulu.xeus.co.uk (46.4.100.47) 56(84) bytes of data.
64 bytes from maven.ops.xeus.co.uk (46.4.100.47): icmp_req=1 ttl=51 time=466 ms
64 bytes from maven.ops.xeus.co.uk (46.4.100.47): icmp_req=2 ttl=51 time=51.1 ms
64 bytes from maven.ops.xeus.co.uk (46.4.100.47): icmp_req=3 ttl=51 time=51.9 ms
64 bytes from lulu.xeus.co.uk (46.4.100.47): icmp_req=4 ttl=51 time=60.1 ms
64 bytes from lulu.xeus.co.uk (46.4.100.47): icmp_req=5 ttl=51 time=96.5 ms
64 bytes from lulu.xeus.co.uk (46.4.100.47): icmp_req=6 ttl=51 time=50.9 ms
64 bytes from lulu.xeus.co.uk (46.4.100.47): icmp_req=7 ttl=51 time=49.5 ms
64 bytes from lulu.xeus.co.uk (46.4.100.47): icmp_req=8 ttl=51 time=50.6 ms

benc@utsire:~$ ping -V
ping utility, iputils-sss20101006

15 July, 2014

containerisation of my own environment

I've encountered docker in a couple of work-related projects, and for a bit more experimentation I've begun containerising a chunk of my own infrastructure.

Previously I've had a few servers around which over the years have ended up with me being a bit scared to upgrade: too many dependencies for what should be separate services. Last time I rebuilt my main server, I looked at Xen but that machine was a little bit too out of date for doing decent virtualisation: having a bunch of different VMs was at the time the best way I could see for doing this.

I've ended up with an LDAP server for unix accounts, and a /home shared between all the containers that need home directory access (which is done with docker volume mounts at the moment, but there is scope for adding NFS onto that if/when I spread to more than one machine).

I've got separate containers for each of: inbound smtp, outbound smtp, an apache proxy that redirects to other containers based on URL, imap, webmail, ldap server, ssh server (so you are ssh-ing into a container, not the base OS).

The plan is that each of these is built and restarted automatically every week or so; along with a whole machine reboot at least once a month. I'm hoping that keeps stuff fairly up to date and helps me discover upgrade-related breakages around the time they happen rather than years later. It also forces me to pay attention to documenting how I set something up: all the stuff that is torn down and rebuilt each time needs to be documented properly in machine-readable form, so that the rebuild works. In that sense, it is a bit like automated testing of documentation.

I've also tried to set up things like port forwarding and http forwarding so that its not too reliant on using docker - so that I can spread onto other machines or use different virtualisation. That is, for example, how I intend to deal with upgrading the base OS in a few years time - by starting a new VM, moving the services across one by one, and then killing the old one.

08 July, 2014

balancing tests between 4 workers by external observation

We have a bunch of functional tests that are run by our buildbot every time someone makes a git push.

These tests were taking a long time. Obvious answer: run them in parallel.

This brought up some problems, the biggest of which was how to split the tests between the four test runners. Naively chopping the test list into four pieces turned out to be pretty imbalanced: one test runner was taking 4 minutes, another was taking 14 minutes.

Initially this sounded like something a task farm would be good for. But I didn't want to get into digging round in the groovy test running code, and making the test runners talk back to the buildbot to collect tasks.

So I took a less balancing approach with a simpler interface: A balance program picks some tests for each of the four test runners, runs the tests, takes the time of the whole run for each runner, and then iteratively updates its knowledge so that it will hopefully pick a more balanced distribution next time round.

I had a quick play with genetic algorithms but that didn't seem to be going anywhere. Next I implemented this model:

Assume there is a startup cost k, and for each test i, a time t_i that the test takes to run. These cannot directly be measured by the balance program.

Keep an estimate of k and t_i in a state file.

Make a distribution of tests over the four runners based on the estimates.

When each runner finishes, if it took longer than the estimate, nudge up the scores on k and the tests that where on that runner; similarly nudge down if the run time was less.

Run this lots of times.

After a while this rearranges the tests so that each runner takes about 10 minutes each (compared to the 4 .. 14 minutes with a naive distribution)

So we've saved a few minutes on the tests and are hopefully in a position where as we get more tests we can scale up the number of runners and still preserve reasonable balance.

This also copes with converging to a new balance when tests are added/removed; or when test time changes (either due to the test itself changing, or the behaviour being tested)

(The other problem was that it turns out that loads of our tests were secretly dependent on each other and failed when run in different order - this would occasionally cause problems in the naive distribution but was much more of a problem with the reordering that happens with this balancing approach)

Source code is https://github.com/benclifford/weightbal

01 July, 2014

fascist push master

I've been working with a customer to get some development infrastructure and procedures in place.

A problem we had was that, although we had got development branches, some developers were surpremely confident that their change definitely wouldn't break things, and so would regularly put some of their commits directly on the master branch; and then a few minutes later, the build robot would discover that things were broken on master. This was (is) bad because we were trying to keep master always ready to deploy; and trying to keep it so other developers could always base their new work against the latest master.

So we invented a new policy, and a tool to go with it, the fascist-push-master.

Basically, you can only put something on master if it has already passed the build robot's tests (for example, on a development branch).

This doesn't mean "your dev branch passes so now you're allowed to merge it to master" because often breakages happen because of the merge of "independent" work. Instead it means, that master can only be updated to a git commit that has actually passed the tests: you have to merge master into your dev branch, check it works ok there, and then that is what becomes the new master commit.

I wasn't so mean as to make the git repository reject unapproved commits - if someone is so disobedient as to ignore the policy, they can still make whatever commits they want to master. Rejection only happens in the client side tool. This was a deliberate decision.

The merge-to-master workflow previously looked like this:

git checkout mybranch
# do some work
git commit -a -m "my work"
git checkout master
git merge mybranch
git push origin master
# now the build robot tests master, with mybranch changes added

but now it looks like this:

git checkout mybranch
# do some work
git commit -a -m "my work"
git merge master
git push origin mybranch
# now the build robot tests mybranch with master merged in
fascist-push-master
# which will either push this successfully tested
# mybranch to master or fail

The output looks like this:

$ fascist-push-master
You claim ba5810edfc67be5118be6c02ab3ffbe215bbe898 on branch mybranch is tested and ready for master push
Checking that story with build robot.
Umm. Liar. Go merge master to your branch, get jenkins to test it, and come back when it works.

INFORMATION ABOUT YOUR LIES: BRANCHID=mybranch
INFORMATION ABOUT YOUR LIES: COMMITID=ba5810edfc67be5118be6c02ab3ffbe215bbe898

The script is pretty short, and pasted below:

#!/bin/bash

export BRANCHID=$(git rev-parse --abbrev-ref HEAD)
export COMMITID=$(git rev-parse HEAD)

echo You claim $COMMITID on branch $BRANCHID is tested and ready for master push

echo Checking that story with winnie.
ssh buildrobot.example.com grep $COMMITID /var/lib/jenkins/ok-commits
RES=$?

if [ "$RES" == "0" ]; then
  echo OK, you appear to be telling the truth.
  (git fetch && git checkout master && git merge --ff-only origin/master && git merge --ff-only $COMMITID && git push origin master && git checkout $BRANCHID) || echo "SOMETHING WENT WRONG. CONTACT A GROWN UP"
else
  echo Umm. Liar. Go merge master to your branch, get jenkins to test it, and come back when it works.
  echo 
  echo INFORMATION ABOUT YOUR LIES: BRANCHID=$BRANCHID
  echo INFORMATION ABOUT YOUR LIES: COMMITID=$COMMITID
  exit 1
fi

26 May, 2014

D'Hondt results table for London MEP elections

There was just an election in the UK for the European Parliament (to elect MEPs), and it was interesting in the news because of the rise of one particular party, UKIP. The European elections use a proportional representation method, which UK general elections don't, so this doesn't mean that there will be a corresponding rise in UKIP members of parliament (at present, 0) at the next general election.

I voted in the London constituency, which has 8 MEPs elected in one ballot.

Wikipedia has a list of number of votes and number of MEPs elected, for each party.

Both the Green Party and UKIP got the same number of MEPs: 1 each. But their vote count was very different (371,133 for UKIP vs 196,419 for the Greens), and the liberal democrats, traditionally the third party in UK politics, got no seats at all.

I found myself playing with some "what-if" scenarios to better understand how the results came out.

The vote works like this: each elector chooses one choice on the ballot paper from a list of chosen parties - there were 17 parties on the paper, mostly small, fairly irrelevant ones.

The votes for each party are tallied, giving a vote count for each party.

Then, it is necessary to convert that vote count into a set of 8 MEPs that broadly reflects the proportion of votes. This is done here with the D'Hondt method which as an intermediate step needs a two-dimensional table. I'm going to omit the smaller parties here because they don't have an effect on my scenarios.

Party	Count	/1	/2	/3	/4	/5	/6	/7	/8	/9	/10	/11	/12
Labour	806959	806959 (1)	403479 (3)	268986 (5)	201739 (7)	161391 (11)	134493	115279	100869	89662	80695	73359	67246
Conservative	495639	495639 (2)	247819 (6)	165213 (10)	123909	99127	82606	70805	61954	55071	49563	45058	41303
UKIP	371133	371133 (4)	185566 (9)	123711	92783	74226	61855	53019	46391	41237	37113	33739	30927
Green	196419	196419 (8)	98209	65473	49104	39283	32736	28059	24552	21824	19641	17856	16368
LibDem	148013	148013 (12)	74006	49337	37003	29602	24668	21144	18501	16445	14801	13455	12334

So the 8 seats were chosen as the top 8 "votes / seats" quotients. Those are coloured yellow in the table. I've also numbered the winning positions and the next 4 after that in order of "votes / seats".

So there's a difference there between the Greens and UKIP: UKIP was chosen 4th, with a solid block of votes to get its single seat. The greens were chosen last, and only just got a seat.

What do I mean by "only just"? Well, the next seat allocated if not for the Greens would have been a second UKIP seat (numbered 9 in the table) and to get that they would have needed 2 * 196419 = 392838 votes to beat the Greens: 21706 votes more than they actually got.

Or conversely, if the greens had got less than 185566 votes (so 10853 less than they really got), they would have taken 9th place, behind a 2nd UKIP seat taking the 8th seat. If that was the case, then the table would have looked like this:

Party	Count	/1	/2	/3	/4	/5	/6	/7	/8	/9	/10	/11	/12
Labour	806959	806959 (1)	403479 (3)	268986 (5)	201739 (7)	161391 (11)	134493	115279	100869	89662	80695	73359	67246
Conservative	495639	495639 (2)	247819 (6)	165213 (10)	123909	99127	82606	70805	61954	55071	49563	45058	41303
UKIP	371133	371133 (4)	185566 (8)	123711	92783	74226	61855	53019	46391	41237	37113	33739	30927
Green	185565	185565 (9)	92782	61855	46391	37113	30927	26509	23195	20618	18556	16869	15463
LibDem	148013	148013 (12)	74006	49337	37003	29602	24668	21144	18501	16445	14801	13455	12334

Another thing that I think is interesting is just how badly the Lib Dems did. To get a seat, all other votes being equal, they'd have needed to get up to that 196419 to steal the 8th place off the Greens: that is 196419 - 148013 = 48406 more votes, 32% more than they actually got.

A different way of looking at that is considering if there were more than 8 seats, how many more seats would there need to be for the Lib Dems to get a seat to represent their proportion (about 6%) of the electorate?

The numbers in the first table above beyond 8 show that: the Lib Dems are 12th in line for a seat, after This table shows the 11 yellow coloured seats ahead of the lib dems, one more seat for each of Labour, the Conservatives, and UKIP.

Party	Count	/1	/2	/3	/4	/5	/6	/7	/8	/9	/10	/11	/12
Labour	806959	806959 (1)	403479 (3)	268986 (5)	201739 (7)	161391 (11)	134493	115279	100869	89662	80695	73359	67246
Conservative	495639	495639 (2)	247819 (6)	165213 (10)	123909	99127	82606	70805	61954	55071	49563	45058	41303
UKIP	371133	371133 (4)	185566 (9)	123711	92783	74226	61855	53019	46391	41237	37113	33739	30927
Green	196419	196419 (8)	98209	65473	49104	39283	32736	28059	24552	21824	19641	17856	16368
LibDem	148013	148013 (12)	74006	49337	37003	29602	24668	21144	18501	16445	14801	13455	12334

So many election tea-leaves for staring into in this election!

You can get the Haskell code on GitHub that produced the above tables, if you want to fiddle yourself.

01 May, 2014

paytag wristband

A year or so ago, one of my credit cards sent me a PayTag: a sticker with the contactless payment bits of a regular credit card, but without the other stuff (contact chip, embossed number, etc).

Their stated usecase was for sticking on your phone, as a sort-of low tech upgrade for phones which don't have NFC.

I didn't find that use case particularly compelling, and aside from comedy ideas like putting it inside a fairy-wand, I've been waiting for a use.

A few days ago I made it into a payment wristband. My right-hand wrist has loads of bracelets on it already, and I took one of those, some plastic packaging and some superglue and made a payment wristband.

I was a little wary at deploying this in use at first. My initial test was deliberately in a stationer's shop which had unattended self-checkout terminals. (I've used that same branch for RFID fun as a teenager in the past, where I had a anti-shoplifting coil in my pocket and set the alarms off every weekend). For 75p I ended up with a new card case and a successful initial test.

Next I went to Waitrose to buy my groceries. I was wary here because the daytime staff are angry dinner ladies who have confiscated stuff off me in the past(!). They didn't seem to bat an eyelid at me waving my jacket arm at their payment terminal.

Thirdly I went to buy coffee. This was more awkward. Their payment terminal was stuff under a shelf and looked like it was probably quite awkward to use even with a regular contactless card. The dude was a bit confused at me putting my empty hand towards him and waving.

My final test was in a pub. The contactless user experience is a bit different in most pubs: they use hand-held terminals and usually you hand over your card, they notice it is contactless and then use that. So I had to work around that a bit. I handed over my regular payment card and when he noticed it was contactless and went to swipe I drunkenly shouted "WAIT! LET ME USE MY WRIST!" which he did. At that point the card reader decided it needed to do dialup verification so there was a tense few seconds where I hoped I didn't look like a knob. But it worked.

I'm surprised at how unsurprised staff are at seeing this. I need to figure out the right way to start a contactless payment in a hand-held reader environment. I'm looking forward to being able to use this on the London Underground ticket gates later in the year too...

later: someone sent me this article about a contactless payment suit.

26 February, 2014

Haskell numbers

I just put the (still being worked on) slides for a talk in a few hours that I'm doing at the London Haskell User Group, on the subject of "Numbers".

09 February, 2014

Range-weighted average-score voting at battle of the bands.

Went to a "Battle of the Bands" contest. It was in a bar and the audience got to vote. Everyone was given cards numbered 1 point, 3 points and 5 points. Voting happened after each band played. We'd all gone to see a friend of a friend so the obvious strategy was to vote 5 for that band, and not vote for the others.

Turned out they did the voting differently here: they computed the average score rather than the total score. In a fixed audience, both average and total would produce the same ranking. But in this situation the audience changed. Battle of the bands is really "who has the most friends?" and people turn up only to see and vote for their friend.

So normally you'd vote 5 points for your band, and ignore all the others (who might be playing when you haven't even arrived yet or after you've left) and that would be equivalent to voting 0.

Using an average vote penalises you for this, though: your absence didn't count as a zero vote because you weren't in the total vote tally. To downvote a band that isn't your friends, you have to be physically present.

That seemed a neat way for the bar to make you stay and drink more beer.

Ben Clifford Technical Blog