Ben Clifford Technical Blog: haskell

Showing posts with label haskell. Show all posts

20 May, 2018

temporary merge tool

I've been an enthusiastic user of stgit for some time. This lets you work on a series of commits at the same time, so that you can edit them to look good before sending them out into the world (i.e. pushing upstream). You can move up and down a series of git commits making changes to them, rather than adding new commits onto the end of a branch.

One use I often make of this is preparing a bunch of orthogonal patches: patches which don't interact with each other or need to be applied in a strict order, but that I want all applied at once while I'm making day to day use of my software, so that I can test the changes in real use.

It's pretty awkward (i.e. impossible) to do this collaboratively though: all of git's support for collaborative work is by making new commits onto the end of branches, not editing earlier commits.

So I have been trying a new workflow: lots of features branches at once, instead of lots of stg patches; with a script I wrote, tmt, which makes your checkout look like the merge of all of those feature branches, but lets you commit your changes onto one specific feature branch.

Here's the repo, with a README. Yes, it's Haskell. Of course. https://github.com/benclifford/tmt

16 February, 2018

Build a Crap Web Form in Haskell in 28 days.

I've been writing an informal series of posts about a small scout camp registration system that I've been building:

Build a Crap Web Form in Haskell in 28 days

21 January, 2018

A string of DNS protocol bugs.

I went to turn on DNSSEC for cqx.ltd.uk today - the server that signed it broken right before my Christmas busy period so I disabled DNSSEC on that zone until I got round to fixing it.

I've encountered three different apparent protocol implementation bugs in the space of a few hours:

Andrews and Arnold's web based control panel accepts DS records as generated by BIND's dnssec-keygen tool but then throws a complicated looking error when talking to Nominet, the UK domain registry, to put those records where they need to be. As far as I can tell, this is because the BIND output has whitespace in the middle of a hex string, something RFC 4034 s5.3 seems to think is acceptable. Why is installing crypto keys always so hard?
For a while, Hetzner's recursive resolvers were unable to verify (and therefore refused to answer) results for my zone. I have a suspicion (but I don't have much to go on other than a hunch) that this was something to do with DS records and the actual zone having some kind of mismatch - although Google Public DNS at 8.8.8.8, and Verisign's DNSSEC checker both worked ok.
I discovered an implementation quirk in the Haskell dns library, which I use inside a debugging tool I'm slowly building. This is to do with the mechanism which DNS uses to compress replies: where a domain name would be repeated in a response, it can be replaced by a pointer to another occurence of that name in the reply. It looks like in this case that the dns library will only accept those pointers if they point to regions of the reply that have specifically already been parsed by the domain name parsing code, rather than pointers to arbitrary bytes in the reply. This is frustratingly familiar to another bug I encountered (at Campus London) where their (not-so) transparent firewall was reordering DNS replies; giving a bug that only manifested when I was sitting in their cafe. (github issue #103)

01 September, 2017

Pattern matching in Idris `do` notation has surprising reliance on type checking the action.

Idris is syntactically quite Haskell-like, and especially it has do notation for sequencing "actions".

Like (traditional) Haskell, do blocks are desugared to a sequence of >>= (bind) operators. But, unlike (traditional) Haskell, that >>= is not always the monadic >>= : m a -> (a -> m b) -> m b. (This can also happen in Haskell using rebindable syntax)

In Idris, you can use different "better-than-monad" types to statically (at compile time) reason about a computation beyond using the monad laws. For example, an effect system might track which effects are in scope.

Total parsing

In the case of Text.Parser (in the Idris contrib/ package) the type signature of actions (Grammar _ _ _ indicates whether a parser consumes any characters so that the compiler can tell if a parser might loop forever. (see http://www.cse.chalmers.se/~nad/publications/danielsson-parser-combinators.html)

I was trying to write a new JSON parser for idris-todaybot using Text.Parser. Previously JSON was parsed using lightyear, but Text.Parser has more gratuitous dependent types so was an obvious way to proceed.

A problem.

I ran into a surprising compile error which initially made no sense to me at all.

This code compiles:

objectValuePair : Grammar Char True (List Char, ())
-- definition elided

jsonObject : Grammar Char True (List Char, ())
jsonObject = do
  llll <- objectValuePair
  pure llll

where llll is a tuple; but the following version of jsonObject, which desconstructs that tuple and reassembles it, does not compile:

jsonObject : Grammar Char True (List Char, ())
jsonObject = do
  (k,v) <- objectValuePair
  pure (k,v)

It gives this error:

When checking right hand side of Main.case block
in jsonObject at bug-bind.idr:55:14 with expected type
        Grammar Char c2 (List Char, ())

Type mismatch between
        Grammar Char False (List Char, ()) (Type of pure (k, v))
and
        Grammar Char c2 (List Char, ()) (Expected type)

Specifically:
        Type mismatch between
                False
        and
                c2

Another attempt to deconstruct llll also fails:

  jsonObject : Grammar Char True (List Char, ())
  jsonObject = do
    llll <- objectValuePair
    let (k,v) = llll
    pure (k,v)

but the following deconstruction by function application rather than pattern matching succeeds:

jsonObject : Grammar Char True (List Char, ())
  jsonObject = do
    llll <- objectValuePair
    let k = fst llll
    let v = snd llll
    pure (k,v)

That type error

Let's dig into that type error:

Type mismatch between
        Grammar Char False (List Char, ()) (Type of pure (k, v))
and
        Grammar Char c2 (List Char, ()) (Expected type)

Grammer _ _ _ is the type of parser actions, where the first parameter Char is the type of symbols we're consuming, the final parameter (List Char, ()) is the type that the parser will return on success, and the middle parameter (False or c2) represents whether the parser will definitely consume input (True) or might succeed without consuming anything (False - for example, a parser which removes whitespace, or pure which never even looks at the input stream).

This "consumes" parameter contains the main novelty in Text.Parser beyond monadic parser combinators: Text.Parser combinators manipulate and use this value at compile time to help check that parsers really will consume things: for example, a parser that definitely consumes followed by a parser that might not, results in a parser that definitely consumes; while sequencing two parsers that might not consume results in a parser that might not consume. (See: the source)

So what on earth has this parameter, manipulated by >>=, got to do with pattern matching pure values after they've already been returned by an action?

Desugaring

It turns out we can forget that our troublesome tuple is being returned from an action; let (a,b) = (1,2) breaks in the same way when run inside a Text.Parser do block.

Let's (very roughly) desugar some of the examples above, and then look at the types involved:

jsonObject : Grammar Char True (List Char, ())
jsonObject = do
  llll <- objectValuePair
  pure llll

-- becomes:
jsonObject = objectValuePair >>= (\v => pure v)


jsonObject = do
  (k,v) <- objectValuePair
-- becomes: 
    objectValuePair >>= (\(k,v) => pure (k,v))
-- becomes: 
    objectValuePair >>= (\llll => case llll of
      (k,v) => pure (k,v)
     )

So in the second fragment, there's an extra case expression in there to deconstruct llll using pattern matching.

Apparently that gets in the way of type inference/checking:

On the one hand, that pure has type: Grammar Char False (List Char, ()) - false because it may (actually, will always) succeed without consuming input.
On the other hand, >>= doesn't care whether the right hand side consumes or not - it will take either, as shown by the compile time variable c2 appearing in the error message.

Idris doesn't manage to unify c2 = False.

With further pinning of types using the, an uglier form of pattern matching does work:


export jsonObject : Grammar Char True (List Char, ())
  jsonObject = 
   do
    llll <- objectValuePair
    the (Grammar Char False (List Char, ())) $ do
        let (k,v) = llll
        pure (k, v)

Ugh

Thanks

Thanks to Melvar on #idris for explaining this.

25 November, 2016

smuggling things in a dirty bottom

The Haskell unit type, (), has just one value, also written (), right?


smuggle :: Typeable t => t -> ()
discover :: Typeable t => () -> Maybe t

x :: ()
x = smuggle "hello world"

discover x :: Just String
Just "hello world"

These allow you to inject an arbitrary (Typeable) Haskell value into unit and retrieve it later. Just don't try to inspect the resulting () value.

Rather than Haskell 98, you'll need unsafePerformIO and extensible exceptions, put together in a way that lets you hide arbitrary stuff in a thunk, and force evaluation at just the right time.


smuggle :: Typeable t => t -> ()
smuggle v = unsafePerformIO $ throw (toDyn v)

discover :: Typeable t => () -> Maybe t
discover v = either (fromDynamic) (const Nothing)
  $ unsafePerformIO
  $ try
  $ case v of () -> return ()

I could write more. But it's Friday night and I want to drink my wine.

Edit: This is now available on hackage as acme-smuggler.

25 February, 2016

London Haskell User Group: extensible effects

Just did a talk at London Haskell User Group on my brief experiences with extensible effects in Haskell. slides here.

08 February, 2016

Porting todaybot to use extensible-effects

I wrote todaybot (blog, github) a while ago, and it is chugging along just fine, with the occasional bugfix (e.g. due to people writing Swedish).

I've been meaning to play with the Haskell extensible-effects package, so on a lazy Sunday afternoon I started hacking away porting todaybot. My goal was more to get a hands-on feel for extensible-effects rather than actually replace the existing todaybot implementation.

The existing non-effect-based code is pretty rough-and-ready/script-like. It all runs in the IO monad, and state is threaded round manually in a couple of places.

I started by replacing all the IO actions with calls to the (Lift IO) effect. The problems I encountered here were:

type signatures/type checking errors, with initially impenetrable type errors (reminiscent of what happens when you work with Lens). The constraint based style of effect types gives verbose type errors in a form that I was not used to.
Some interaction that I haven't understood between type signatures/type inference on effects and on lens types and parsec types(!). I needed to add type signatures on top level lens and parser definitions, which I got using typed holes.
Handling IO exceptions - I was unsure how exceptions would work so for this first cut I ignored exception handling and let the whole bot die if anything goes wrong.

So now I had a bot with a single effect, Lift IO, with worse error handling than before. I wanted to get this exception handling back in so I wasn't losing functionality. I didn't (and still don't) know of the idiomatic way to handle IO exceptions in (Lift IO).

extensible-effects has exceptions (Exc) already, and I wanted IO exceptions to be handled using those, with the Exc IOError effect. I made a wrapper for lift, called lift' which called an IO action and translated IO errors into Exc IOError exceptions. IO errors then could be handled straightforwardly. Later on it turned out this wasn't enough: code is throwing errors other than IOError which need to be caught (although maybe this is also a bug in the mainline todaybot).

Next, I wanted to start breaking down the use of the IO effect into more focused effects. The one people talk about a lot is logging. There's a Writer effect in extensible-effects already, and logging by writing a string seemed pretty obvious. There's also a Trace effect which is pretty similar. Neither had effect handlers that did what I want: translate the logging effect into IO effects (which in turn would be handled by the IO effect handler). This was pretty straightforward to write, though. There was some messing round with type signatures but I got it worked out in the end.

And the last bit I did before I went to bed was put in a couple of Reader effects: one for configuration (which comes from a YAML file), and one for the authentication bearer token, which is requested during runtime. Writing the handlers for these had a similar feel to writing the handler for logging - a few lines of code inserted into boilerplate, messing round with type errors, raising more questions that I might get round to writing about.

Next I would like to implement effects to handle the last two pieces of IO, which are are access to the current time, and HTTP GET/POST calls; and see if I can use a choice effet instead of mapM_ to iterate.

The (very messy) code is on the exteff branch on github - at the time of writing, up to commit 7ccc0a92....

06 July, 2015

A Haskell reddit bot.

I am one of many many moderators on reddit's r/LondonSocialClub. This is a place for organising social gatherings in London.

Post titles usually take the form [DD/MM/YY] Event @ Place. Other moderators have fiddled with the CSS for this subreddit to give us a big red TODAY sticker next to today's events, and grey out events that are in the past. This uses reddit's flair mechanism, which allows assigning of labels to posts, and CSS styling based on a post's flair.

Unfortunately, this was not entirely automated - some sucker or other had to go in each day and adjust flair on the relevant posts to match up with reality. This bothered me as being a manual process that should be fairly easily automated. Eventually it bothered me enough that I wrote a bot, lsc-todaybot, to do it. Now the moderation logs make it look like I come home from the pub every day and move everything around before going to sleep.

Another motivation for writing this bot was it seemed small enough in scope that it would be achievable, but give me a chance to learn a few new APIs: several new Haskell libraries, and the reddit REST API.

HTTP: I've previously used HTTP when hacking at cabal. This doesn't do HTTPS (I think) and the maintainer told me to not use it. So I tried wreq. It was easy enough to get going and there was a tutorial for me to rip off.

Configuration: I used yaml to parse a YAML configuration file.

Lenses: I still haven't got a good grasp on what is happening with lenses but I used them in a few places, and it has developed my understanding a little bit: lsc-todaybot extracts fields from reddit's JSON responses using aeson-lens. yaml exposes the parsed configuration file as JSON, so the same lenses can be used for extracting configuration details. wreq also uses lenses for setting HTTP header values and the like.

Strings: I seem to have ended up using several different string classes, which is icky - ByteString, Text and String at least. I've made the source code for that more generic by using the generic monoid <> operator to concatenate them which makes things a bit less horrible looking.

28 May, 2015

10 minute Haskell talk: An awkward interaction between lazy ByteStrings and a misbehaving (non-)transparent HTTP middlebox

The slides for a lightning talk I gave at the London Haskell User Group are here. Press a in the browser and you'll get some explanatory notes with the slides; otherwise they're a bit sparse.

08 July, 2014

balancing tests between 4 workers by external observation

We have a bunch of functional tests that are run by our buildbot every time someone makes a git push.

These tests were taking a long time. Obvious answer: run them in parallel.

This brought up some problems, the biggest of which was how to split the tests between the four test runners. Naively chopping the test list into four pieces turned out to be pretty imbalanced: one test runner was taking 4 minutes, another was taking 14 minutes.

Initially this sounded like something a task farm would be good for. But I didn't want to get into digging round in the groovy test running code, and making the test runners talk back to the buildbot to collect tasks.

So I took a less balancing approach with a simpler interface: A balance program picks some tests for each of the four test runners, runs the tests, takes the time of the whole run for each runner, and then iteratively updates its knowledge so that it will hopefully pick a more balanced distribution next time round.

I had a quick play with genetic algorithms but that didn't seem to be going anywhere. Next I implemented this model:

Assume there is a startup cost k, and for each test i, a time t_i that the test takes to run. These cannot directly be measured by the balance program.

Keep an estimate of k and t_i in a state file.

Make a distribution of tests over the four runners based on the estimates.

When each runner finishes, if it took longer than the estimate, nudge up the scores on k and the tests that where on that runner; similarly nudge down if the run time was less.

Run this lots of times.

After a while this rearranges the tests so that each runner takes about 10 minutes each (compared to the 4 .. 14 minutes with a naive distribution)

So we've saved a few minutes on the tests and are hopefully in a position where as we get more tests we can scale up the number of runners and still preserve reasonable balance.

This also copes with converging to a new balance when tests are added/removed; or when test time changes (either due to the test itself changing, or the behaviour being tested)

(The other problem was that it turns out that loads of our tests were secretly dependent on each other and failed when run in different order - this would occasionally cause problems in the naive distribution but was much more of a problem with the reordering that happens with this balancing approach)

Source code is https://github.com/benclifford/weightbal

26 February, 2014

Haskell numbers

I just put the (still being worked on) slides for a talk in a few hours that I'm doing at the London Haskell User Group, on the subject of "Numbers".

14 October, 2013

list monad character to html entity encoding

This isn't particularly fancy, but it struck me as interesting from the "trying to abstract more" perspective.

Given a string with some funny characters in them, change the characters into HTML entities.

Here's a puffing-out function that given a character c :: Char returns a string (a list of characters, [Char]) for it to be replaced with. In the default case, the returned list contains only the original character.

generalPuffer c = case c of 
  c | ord c == 163 -> "&pound;"
  c | ord c == 174 -> "&reg;"
  c | ord c == 153 -> "&trade;"
  c | ord c == 188 -> "&frac14;"
  c | ord c > 128 -> error $ "unknown high character code " ++ (show $ ord c) ++ " encountered."
  c -> [c]

So how to use this? concatMap will handle this, and that was my first implementation (after fusing map and concat in my head).

concatMap f "£1 Shop™"

but concatMap is monadic bind, restricted to the list monad...

So equivalently you could write:

"£1 Shop™" >>= f

f =<< "£1 Shop™"

 do { l <- "£1 Shop™" ; c l }

24 January, 2013

pronunciation of Haskell applicative operators.

I gave a talk on parsers in Haskell yesterday, and asked the audience how they thought I should pronounce some operators.

They came up with:

<* left looking sparrow
*> right looking sparrow
<*> goatse

If you don't know whatis goatse, you don't want to know. Really. But it made me laugh.

20 November, 2012

columns

(Sorry I'm making up the code in this post rather than actually distilling down the real implementations - it probably doesn't run but you can get the idea)

A few times recently I've wanted to output column-like data from haskell: HTML tables in two cases, and CSV in another.

In these two cases, I wanted column headings (<th> tags in the HTML case; and a CSV heading line in the CSV case.

Previously I've written code that looks roughly like:

mapM_ putHeading ["heading1","heading2","heading3"]
forM_ rows $ \(entry1, entry2, entry3) -> do
  putCell entry1
  putCell entry2
  putCell entry3

The annoyance here was that nothing ties together the headings and the data values: although in the case of two or three columns, it is relatively simple to see the correspondence, it was getting hard in some wider cases.

Vaguely inspired by lenses, I rewrote some of this code to look like this:

cols = [("heading 1", \(entry1, _, _) -> entry1),
        ("heading 2", \(_, entry2, _) -> entry2),
        ("heading 3", \(_, _, entry3) -> entry3)
       ]
mapM_ putHeading (map fst cols)
forM_ rows $ \row -> forM_ cols \col -> (putCell ((snd col) row))

What this does is package up the column heading and the code to generate (for each row) the appropriate content. This makes it easier (I hope) to keep the headings and the data aligned. Also, all the boilerplate that you don't see here (putHeading and putCell disguise it) can be shared, with only a new cols defined for each different table.

13 November, 2012

functor

One of the first cool things you encounter in functional programming is map.

Say you have a function length :: String -> Int which gives the length of a string:

> length "hello"
5

(in real haskell, thats not actually the type of length but its close enough for now)

Now you can apply that to a list of strings, like this:

> map length ["hello","goodbye"]
[5,7]

For most of I've thought of this as meaning "apply the function length to each element of the list ["hello", "goodbye"].

But theres a slightly different interpretation thats a bit more "functional" feeling, that I've come across recently.

Consider only applying the first argument to map (you can do that in haskell...):

map length

Whats the type of this expression? It is [String] -> [Int]. So what its done is converted a function from string to int, into a new function from lists-of-strings to lists-of-ints.
And now we have that function that converts a list of strings to lists of ints, we can apply it to a list of strings:

> (map length) ["hello","goodbye"]
[5,7]

So the different reading that I see now is "lift this function to work on lists", first, followed by application to a list.

The same new intuition applies to functors in general and fmap, and its from thinking more about category theory that this view of things starts to appeal to me.

25 October, 2012

london HUG

well I just got back from the 1st meeting of v2 of the london haskell users group (apparently it used to exist before; and the ghosts of its former incarnation floated around the room in the form of code kata people)
dude (derek) gave a talk on why do monads matter? - a brave thing to do, given how many have tried their own take on a monad tutorial (myself included). nothing spectacular but certainly another take on monads, and it did tickle my brain in the right areas enough into realising that <$> only needs a Functor so it certainly paid off in the OH! sense; even though that leap was personal to me and wouldn't be apparent if you were at the talk - there was no mention of functors at all, really
Turnout was better than the average dutchhug turnout (sorry Shaun)
It also turns out theres a regular Haskell coding dojo in London, hoodlums, already happening (apparently a spinoff or somehow related to v1 of the london hug)
Went to pub afterwards. room booked (or at least some upstairs space that was otherwise empty). chocolate orange beer, which was less disgusting than it sounds. it was cool to meet a bunch of people using haskell for $ (although I count myself in their ranks these days).
after rapidly throwing down a few of those chocolate orange beers (hence the incoherency and lack of case), I shouted out suggestions for future talks on: agda; quickcheck; parsec; and functors/monads/arrows/applicative (turned out some fucker already had a talk on that...)
next meeting 28th nov 2012. i'll probably be there.
ps also at the pub I met another programmer also called Ben - I asked him if he's going to BenConf but although he'd heard of it, it hadn't suckered him in.

08 October, 2012

yield zipper

Oleg wrote about converting an arbitrary traversable into a zipper.

His code uses delimited continuations, and I puzzled a while (years...) before starting to understand what was going on.

I just read Yield: mainstream delimited continuations.

It looked to me like I could easily change Oleg's zipper can be expressed using "yield" which gives a different view, that I think I might have understood more easily - because I know yield from other languages, and don't properly have my head around continuations (which is basically the point of the "Yield" paper, I think)

So then, my altered version of the zipper on Oleg's page, using yield:

>  import Data.Traversable as T


>  type Zipper t a = Iterator (Maybe a) a (t a)

>  make_zipper :: T.Traversable t => t a -> Zipper t a
>  make_zipper t = run $ T.mapM f t
>   where
>   f a = do
>     r <- yield a
>     return $ maybe a id r

This is run and yield pretty much as defined on page 10 of the yield paper:

>  data Iterator i o r = Result r | Susp o (i -> Iterator i o r)
>  yield x = shift (\k -> return $ Susp x k)
>  run x = reset $ x >>= return . Result

and some test code:

>  sample = [1,2,3]

>  main = do
>    let (Susp a1 k1) = make_zipper sample
>    print a1
>    let (Susp a2 k2) = k1 Nothing
>    print a2
>    let (Susp a3 k3) = k2 $ Just 100
>    print a3
>    let (Result end) = k3 Nothing
>    print end

and below, to make this posting properly executable, here's Oleg's library code for shift/reset:

> -- The Cont monad for delimited continuations, implemented here to avoid
> -- importing conflicting monad transformer libraries

>  newtype Cont r a = Cont{runCont :: (a -> r) -> r}


>  instance Monad (Cont r) where
>     return x = Cont $ \k -> k x
>     m >>= f  = Cont $ \k -> runCont m (\v -> runCont (f v) k)

>  reset :: Cont r r -> r
>  reset m = runCont m id

>  shift :: ((a -> r) -> Cont r r) -> Cont r a
>  shift e = Cont (\k -> reset (e k))

Update 1: Changed types from Oleg's Z | ZDone to the yield paper's Susp | Result

23 July, 2012

autogenerating reverse DNS for ipv6

I was getting annoyed by manually configuring an IPv6 reverse domain.

For reverse DNS, you need to break the IP address up into pieces (bytes for IPv4, nibbles for IPv6), reverse them, put dots between pieces, to get a domain name. Then at that domain name, you put a reference to the hostname for that IP.

So an IP address like 2001:8b0:7c:1:216:76ff:fe16:755a turns into a domain name a.5.5.7.6.1.e.f.f.f.6.7.6.1.2.0.1.0.0.0.c.7.0.0.0.b.8.0.1.0.0.2.ip6.arpa., and there you can find a PTR record pointing to the hostname dildano.hawaga.org.uk

Forming those long domain names was/is quite awkward, and its a task well suited to automation. All of the hosts already have forward DNS entries, so there's not even much additional information needed to generate the reverse zone.

I wrote a tool (in an unholy alliance of Haskell and dig) which queries a bunch of forward zones and outputs the appropriate reverse DNS records ready for pasting into a zone file.

You specify zones (and appropriate servers) that will be asked for AAAA records; then all of the AAAA records which refer to IPv6 addresses on the specified network will be converted into PTR records and sent to stdout, ready to paste into a zone file.

$ dnsrz hawaga.org.uk@dildano.hawaga.org.uk clifford.ac@malander.clifford.ac charlottevrinten.org@dildano.hawaga.org.uk mrsclifford.eu@malander.clifford.ac --prefix=200108b0007c0001
 
3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0 PTR clifford.ac.
3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0 PTR malander.clifford.ac.
3.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0 PTR malander.mrsclifford.eu.
4.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0 PTR fecolith.clifford.ac.
4.1.2.0.0.f.e.f.f.f.3.9.d.0.2.0 PTR pomade.clifford.ac.
a.5.5.7.6.1.e.f.f.f.6.7.6.1.2.0 PTR dildano.hawaga.org.uk.
c.0.2.a.4.c.e.f.f.f.3.6.b.1.2.0 PTR newsnowdrop.mrsclifford.eu.
0.a.0.c.b.a.e.f.f.f.3.6.1.2.2.0 PTR tenesmus.clifford.ac.
7.2.f.0.1.9.e.f.f.f.b.4.5.2.2.0 PTR coprolith.clifford.ac.
c.2.5.d.b.f.e.f.f.f.b.e.7.2.a.b PTR pygar.hawaga.org.uk.
c.2.5.d.b.f.e.f.f.f.b.e.7.2.a.b PTR pygar-6.hawaga.org.uk.
b.6.5.8.2.f.e.f.f.f.8.c.c.b.a.c PTR laptop.hawaga.org.uk.

I wanted to use the Haskell dns package which I've used a bit before; but it didn't have enough features: no zone transfer capability, for a start... so I invoke dig and parse that out.

The commandline syntax is: <zonename>@<DNS server> where zonename is a forward zone, and the specified server will answer AXFRs for that zone. Thats quite icky but it gets around needing a full Haskell DNS implementation.

The code is on github under benclifford/dnsrz.

(later: as fits my tradition of writing a tool and then finding someone has done something similar first, bind comes with a tool arpaname which will convert an IP address into a reverse name, though it doesn't do all the other stuff above, but does work for ipv4 too: http://ftp.isc.org/isc/bind9/cur/9.9/doc/arm/man.arpaname.html

12 June, 2012

fcgi, haskell, cpanel, php, drupal

I played with fastcgi, which is like CGI but doesn't have to spawn a new process each time.

The initial motivation for this was a server which has a bunch of drupal websites. It was previously running in plain CGI mode, which forks a PHP process for every page request (about 15 spawns per second on this server), with each site's PHP running under a different user account. (The other mode we've tried with this is using mod_php, which runs things much faster but doesn't provide as much isolation between web sites as using CGI as everything runs as the www-data unix user, rather than as a per-site user).

I thought I'd have to do more compiling, but it turns out fastcgi support for both apache and for PHP was already available. On my dev server I needed to apt-get the fastcgi apache module; on the production server, which uses cpanel, fastcgi support was already installed and switching it on was a single mouse click.

Here's a plot of the server CPU load before and after the switch:

There's a clearly visible daily cycle, using up almost 8 cores worth of CPU before the change. At the end of the 30th, I switched on fastcgi, and woo, the load drops right down and stays down. That's just what I wanted.

Reading more, cpanel disrecommends using fastcgi, and recommends somethign else - ruid2 - which looks like it does something similar but different. That recommendation seems mostly because fastcgi has a lot of tweakables that are hard to get right. see this thread.

caveats

I discovered a few interesting things during deployment:

Firstly, a potential attack on directories that have the ExecCGI option enabled - this is discussed in the context of the nginx web server here.

Another was a bug with a specific version of mod_fcgid and the specific configuration I set up, which resulted in a new PHP process being spawned for every page request, and then staying resident (!). Other people have experienced this and it was straightforward to tweak it so that it didn't happen.

haskell

I have a few apps for my own use written in Haskell, and one (a photo ranking app) struggles when called through the regular CGI interface, due to loading the vote/photo database each time. I've considered putting that into snap, a haskell framework, but it seemed interesting to see if I could get fastcgi running under Haskell.

apt-get install libcfgi-dev; cabal install fcgi got me the modules installed. I had some trouble running the hello-world app here

that came down to me not compiling with the -threaded option.

(I also tried the haskell direct-fastcgi module, but the home page for it is gone, and there is no example code so I rapidly gave up)

barwen.ch

I made an fcgi-bin directory available to all barwen.ch users, running FastCGI code under user accounts. There isn't much CGI going on on barwen.ch, but it seemed easy enough to deploy and make available, and is one more feature for the feature list.

10 April, 2012

commandline RSS->text tool using Haskell arrows

I wanted barwen.ch to display news updates at login. I already have an RSS feed from the drupal installation on the main page; and that RSS feed is already gatewayed into the IRC channel. So that seemed an obvious place to get news updates.

I wrote a tool, rsstty, to output the headlines to stdout. Then, I wired it into the existing update-motd installation to fire everything someone logs in.

So you can say:

$ rsstty http://s0.barwen.ch/rss.xml
 * ZNC hosting(Thu, 01 Mar 2012 10:09:15 +0000)
 * finger server with cgi-like functionaity(Wed, 22 Feb 2012 18:43:08 +0000)
 * Welcome, people who are reading the login MOTD(Fri, 17 Feb 2012 23:56:44 +0000)
 * resized and rebooted(Wed, 25 Jan 2012 12:23:39 +0000)
 * One time passwords (HOTP/TOTP)(Wed, 18 Jan 2012 11:33:45 +0000)

I wrote the code in Haskell, using the arrow-xml package.

arrow-xml is a library for munging XML data. Programming using it is vaguely reminiscent of XSLT, but it is embedded inside Haskell, so you get to use Haskell syntax and Haskell libraries.

The interesting arrow bit of the code is this. Arrow syntax is kinda awkward to get used to Haskell and sufficiently different from regular syntax and monad syntax that even if you know those you have to get used to it. If you want to get even more confused, try to figure out how it ties into category theory - possibly the worst possible way to learn arrows ever.

But basically, the below definition make a Haskell arrow which turns a url (to an RSS feed) into a stream of one line text headlines with title and date (as above)

> arrow1 urlstring =
>  proc x -> do
>   url <- (arr $ const urlstring) -< x

This turns the supplied filename into a stream of just that single filename. (i.e. awkward plumbing)

>   rss <- readFromDocument [withValidate no, withCurl []] -< url

This uses that unixy favourite, curl (which already has Haskell bindings), to convert a stream of URLs into a stream of XML documents retrieved from those URLs - for each URL, there will be one corresponding XML document.

>   item <- deep (hasName "item" <<< isElem) -< rss

Now convert a stream of XML documents into a stream of <item> XML elements. Each XML document might have multiple item elements (and probably will - each RSS news item is supplied as an <item>) so there will be more things in the output stream than in the input stream.

>   title <- textOfChild "title" -< item
>   pubdate <- textOfChild "pubDate" -< item

Next, I'm going to pull out the text of the <title> and <pubdate> child elements of the items - there should be one each per item

>   returnA -< " * " ++ title ++ "(" ++ pubdate ++ ")\n"

When we get to this point, we should have a stream of items, a stream of titles corresponding to each item, and a stream of pubdates corresponding to each title. So now I can return (using the arrow-specific returnA) what I want using regular Haskell string operations: a stream of strings describing each item.

The above arrow is wrapped in code which feeds in the URL from the command line, and displays the stream of one-line news items on stdout.

The other interesting bit is a helper arrow, textOfChild which extracts the text content of a named child of each element coming through an XML stream. Each part of this helper arrow is another arrow, and they're wired together using <<<. To read it, imagine feeding in XML elements at the right hand side, with each arrow taking that stream and outputting a different stream: first each element is converted into a stream of its children; then only the element children are allowed through; then only the elements with the supplied name; then all of the children of any elements so selected; and then the text content of those. (its quite a long chain, but thats what the XML infoset looks like...)

> textOfChild name =
>  textNodeToString <<< getChildren <<< hasName name <<< isElem <<< getChildren