Ben Clifford Technical Blog: May 2011

29 May, 2011

Panickers guide to world ipv6 day

8th June 2011 is World IPv6 day. Maybe you haven't done anything about getting your website ipv6 enabled. Its taken the world 15 years to develop IPv6, so sure it seems *totally* reasonable that you can get it deployed in 10 days.

What I'm going to tell you about here will get basic IPv6 access to your site. It won't do it in a particularly pretty way, and its probably not the long term way to do it. Also I just hammered this out this afternoon (though its based on years of IPv6 use). (I hope) it should work.

First some DOs and DONTs:

* DON'T deploy IPv6 on your production servers. If you don't know much about IPv6, then blindly sticking it on your real production resources is probably a good way to put even your IPv4 (read: the real IP that everyone actually uses) connectivity at risk (for various reasons that you'll understand when you understand...)

* DO deploy a http proxy server on its own machine, and have that proxy your IPv6 traffic. You shouldn't need to modify *anything* on your production machines.

* DON'T put the IPv6 address record (AAAA, as opposed to the IPv4 A record) in your normal DNS. If you do, then users who have both IPv4 and IPv6 will usually try to connect over IPv6. You don't know how well this is going to scale, or how good your IPv6 connectivity is going to be (or even how good your users ipv6 connectivity is going to be, if everyone is going to be fucking around that day)

* DO put a DNS name (eg www.6.example.com, if your main site is on www.example.com) with the AAAA record. That way, users can choose to try using IPv6, and if its broken can easily get back to your main site. You'll need to publicise this, though, because its not going to get users connecting via IPv6 automatically, and at the same time you should provide some feedback: for example, an email address or a forum.

So what do you need to do:

* Get a dedicated server (either a physical hardware server, or a VPS) running a recent version of Linux. (Ubuntu 10.x would be enough)

* Connect that server to the ipv6 internet. If its on a network with native IPv6, then your host will probably give you connection details. If not, then use Hurricane Electric's free tunnel broker which will connect you over a regular internet connection.

* However you connect, you'll end up with an IPv6 address for your machine. It will be a string something like this 2001:470:1f09:1288::2 that you can get out of ifconfig(specifically, if you have a choice, choose the one that begins with a 2, not the one that begins with an f). Put that IPv6 address into an AAAA record in DNS (better hope your DNS hosting provider does AAAA records - the good ones do...) under a new DNS name. Don't put IPv4 addresses in there too. In my example, I'm going to configure:
blog.6.hawaga.org.uk AAAA 2001:470:1f09:1288::2

* Put apache httpd on your server, apt-get install apache2

* Now you'll need a client machine with IPv6 to try connecting to your new server. If you have a windows PC, you can probably turn on Teredo in the network configuration - it comes built in. On OS X, Linux or BSD, you can install miredo which is a Teredo client. Or you can set up another Hurricane Electric tunnel for your client machine. You can use test-ipv6.com to get a score for how well your new client machine is connected to the ipv6 internet.

* You should now be able to use a web browser to reach the hostname you configured in DNS back there - you should see apache's welcome/default page.

* Now, configure apache to forward all requests it receives onwards to your production website over IPv4. In the following example, my production IPv4 website is the one you are reading right now, benctechnicalblog.blogspot.com. Enable mod_proxy and mod_proxy_http, and then set up a virtual host directive like this (or put it in the base of your server config, seeing as this a host dedicated to forwarding ipv6 traffic):

<virtualHost *:80>
  ServerName blog.6.hawaga.org.uk
  ProxyPass / http://benctechnicalblog.blogspot.com/
  <Proxy http://benctechnicalblog.blogspot.com/>
    Allow from All
  </Proxy>
</virtualhost>

Once you've done that, visiting your ipv6 hostname (eg blog.6.hawaga.org.uk) should serve you the content of your real production website website.

Now publish the new hostname in a news item and make it seem like you know what you're doing...

Some stuff will not work, for sure: if you have anything that does things based on IPv4 address of client, that's all going to be based on the address of the proxy machine, not the real client IPv6 address. Things that might affect are localisation (eg language), and rate/load limiting based on ip address.

So, please ask questions in the comments and I'll see about answering them...

27 May, 2011

printing a message to stdout in fortran

write (*,'(''I wonder how someone kept a straight face when they invented this syntax for writing a string to the console'')')

21 May, 2011

ruby cloudwatch -> mrtg interface

I use mrtg to gather historical data on some of my servers. One of those servers lives in Amazon's Elastic Compute Cloud (EC2) and so is also monitored by Amazon CloudWatch.

Can I get cloudwatch data into mrtg?

mrtg has a fairly straightforward interface for plugging in arbitrary unix executables to collect data, so my first attempt was to use the main Java-based cloudwatch client to get data. that attempt started up one jvm for each metric collected, which massively overloaded my ec2 microinstance, keeping the load average around 3. pretty lame.

Amazon also provides a ruby interface. I had never programmed in ruby before, but its often interesting to learn a new language.

Here's what I ended up with.

First the config block for mrtg, which calls out to the ruby-mrtg-cloudwatch3 program that I wrote:

Target[cloudwatch_network]: `/home/mrtg/ruby-mrtg-cloudwatch/ruby-mrtg-cloudwatch3 NetworkIn NetworkOut AWS
/EC2 InstanceId=i-26bcaf51`
Title[cloudwatch_network]: Network traffic according to cloudwatch
options[cloudwatch_network]: growright,absolute,logscaleMaxBytes[cloudwatch_network]: 100000000

This gives a graph of network traffic according to cloudwatch. I can compare that alongside the network traffic graph for eth0 gathered from the local interface statistics. They should roughly match up, and they do (well hopefully they still do by the time you read this - these are live images):

According to the on-host network interface:

According to cloudwatch:

Now the actual ruby code:

#!/usr/bin/ruby1.8

require 'rubygems'
require 'AWS'

The two cloudwatch metric names, one that measures output data, one that measures input data, are give on the command line:

metrico=ARGV[0]
metrici=ARGV[1]

My code has hardcoded access keys at the moment which is a bit shitty:

ACCESS_KEY_ID='foo'
SECRET_ACCESS_KEY='bar'

Using the above credentials, a new cloudwatch object is made, @cw.

@cw = AWS::Cloudwatch::Base.new(:access_key_id => ACCESS_KEY_ID, :secret_access_key => SECRET_ACCESS_KEY, :server => "eu-west-1.monitoring.amazonaws.com" )

Each of the two metrics will be probed with the probe function. This uses a state file based on the metric name to get only readings which have not already been seen by this script. The two metrics use separate state files because cloudwatch doesn't give an atomic read for multiple metrics at once. The state file stores the time of the last seen reading. If there is no state file, we have to invent a time. There is a subtlety here: data does not appear in cloudwatch until around 5 minutes after its time stamp, so using the current time as an initial value results in not seeing any results. Instead, I go back about 15 minutes the first time, which will seems to be far enough back to get something.

def probe(metric)

  et = Time.now()

  statusfn="cloudwatch-"+ARGV[3]+"-"+metric+".status"
  if FileTest.exist?(statusfn) then

    f = File.new(statusfn, "r")
    tstring = f.gets
    ts = Time.parse(tstring)
    f.close
  else
    ts = et - 900 # needs to be more than 5 mins because otherwise we never get any data.
end

  res = @cw.get_metric_statistics(:measure_name => metric,  :statistics => 'Average,Sum', :namespace => ARGV[2], :period => 300, :start_time => ts, :end_time => et, :dimensions=> ARGV[3])

Now we're going to look at the rows that come back. Usually only one row will come back, if we're running this at about the same rate that cloudwatch is adding readings, but sometheres there will be more, or fewer.

In the case of network traffic, I want to return the sum of all readings for this metric. In other cases, such as disk usage, I would want to return the mean. This distinction is the same as default vs gauge measurements in MRTG.

samples = 0
  sum = 0
  avgsum = 0

  datapoints = res["GetMetricStatisticsResult"]["Datapoints"]

  lt = ts
  if datapoints.nil? then
   # nop
  else
    rows = datapoints["member"]

    rows.each { |r|
      nlt = Time.parse(r["Timestamp"])
      if(nlt < ts) then
        # nop - time was before requested start
      else
        samples += Float(r["Samples"])
        avgsum += Float(r["Average"])
        sum += Float(r["Sum"])
        nlt += 1
        if(nlt > lt) then
          lt = nlt
        end
      end
    }

Now we can write out the new state file:

f=File.new(statusfn, "w")
    f.puts(lt)
    f.close
  end
  return sum
end

and finally output the MRTG format information:

sumo=probe(metrico)
sumi=probe(metrici)

# output mrtg format
puts sumo
puts sumi
puts 0
puts "cloudwatch: "+metrico+" and "+metrici

The end.

14 May, 2011

UK electric sockets vs chopsticks

My father taught me this trick when I was far too young to be taught this trick - how to connect a Europlug to a UK (actually HK in this case) electrical socket.

07 May, 2011

PAM python module for out-of-band one-time tokens

I previous wrote about integrating openid with unix shell logins for my public shell server barwen.ch. This post talks about the out-of-band PAM token module that I wrote as part of that - where I'm generating the tokens when the user is authenticated by OpenID. But I guess it could also have use when there's some other out of band mechanism (such as sending something by SMS?)

Subject to some other non-PAM web-based authentication (openid in this case, but it could be anything really), I want to issue a token value to the user in-band with respect to that other authentication (i.e. in a web page shown to the user), which is out-of-band with respect to PAM. That token should then be usable for a short period of time to make a single PAM authorisation to sshd.

That is, if you can log into the web-based authentication, you should then be able to log into the ssh system.

So on one side I need a PAM module (which I will write in Python) to check the tokens, and on the other side I need something (a command line tool) to issue the tokens. To complete the loop, I need some database (the filesystem) to store the tokens on the server side.

So here's the code. The PAM module comes complete with documented security vulnerability which allows anyone to delete certain files on your file system. ho ho.

First the token creator:

#!/usr/bin/python

import base64
import os
import pickle
import sys
import time

VALIDTIME = 60

tokenbits = os.urandom(8)
token = base64.b64encode(tokenbits, "+-")

print token

fn = token + ".token"
fh = open(fn, 'w')

obj = (sys.argv[1], time.time() + VALIDTIME)

pickle.dump(obj, fh)

and secondly the PAM module, which is based on the PAM module in my previous post:

import os
import syslog
import pickle
import time

def pam_sm_authenticate(pamh, flags, argv):
  syslog.syslog("start benc")
  pamh.authtok
  if pamh.authtok == None:
    syslog.syslog("got no password in authtok - trying through conversation")
    passmsg = pamh.Message(pamh.PAM_PROMPT_ECHO_OFF, "Monkeyballs?")
    rsp = pamh.conversation(passmsg)
    syslog.syslog("response is "+rsp.resp)
    pamh.authtok = rsp.resp
  # so we should at this point have the password either through the
  # prompt or from previous module
  syslog.syslog("got password: "+pamh.authtok)

  # now look for token
  # SECURITY BUG TODO: we're using this to make a path so we need to make sure
  # we're not being directed out into some other directory. We know the
  # range of characters that can be used in a token so we can reject if
  # anything other than those exists.
  # Especially as we delete the token file on use - otherwise could delete
  # arbitrary files on the system...
  tfn = "/root/" + pamh.authtok + ".token"

  if os.path.exists(tfn):
    fh = open(tfn)
    tokendata = pickle.load(fh)
    tokenuser = tokendata[0]
    tokentime = tokendata[1]
    fh.close()
    os.remove(tfn);

    # will remove the token even if it was for the wrong user
    # not sure if there's any security different wrt leaving it there if
    # its the wrong user?

    if tokentime < time.time():
      syslog.syslog("token time expired")
      return pamh.PAM_AUTH_ERR

    if tokenuser != pamh.user:
      syslog.syslog("token user "+tokenuser+" does not match requested user "+pamh.user)
      return pamh.PAM_AUTH_ERR
    

    return pamh.PAM_SUCCESS

  return pamh.PAM_AUTH_ERR

def pam_sm_setcred(pamh, flags, argv):
  return pamh.PAM_SUCCESS

Ben Clifford Technical Blog