Ben Clifford Technical Blog: todaybot

Showing posts with label todaybot. Show all posts

01 September, 2017

Pattern matching in Idris `do` notation has surprising reliance on type checking the action.

Idris is syntactically quite Haskell-like, and especially it has do notation for sequencing "actions".

Like (traditional) Haskell, do blocks are desugared to a sequence of >>= (bind) operators. But, unlike (traditional) Haskell, that >>= is not always the monadic >>= : m a -> (a -> m b) -> m b. (This can also happen in Haskell using rebindable syntax)

In Idris, you can use different "better-than-monad" types to statically (at compile time) reason about a computation beyond using the monad laws. For example, an effect system might track which effects are in scope.

Total parsing

In the case of Text.Parser (in the Idris contrib/ package) the type signature of actions (Grammar _ _ _ indicates whether a parser consumes any characters so that the compiler can tell if a parser might loop forever. (see http://www.cse.chalmers.se/~nad/publications/danielsson-parser-combinators.html)

I was trying to write a new JSON parser for idris-todaybot using Text.Parser. Previously JSON was parsed using lightyear, but Text.Parser has more gratuitous dependent types so was an obvious way to proceed.

A problem.

I ran into a surprising compile error which initially made no sense to me at all.

This code compiles:

objectValuePair : Grammar Char True (List Char, ())
-- definition elided

jsonObject : Grammar Char True (List Char, ())
jsonObject = do
  llll <- objectValuePair
  pure llll

where llll is a tuple; but the following version of jsonObject, which desconstructs that tuple and reassembles it, does not compile:

jsonObject : Grammar Char True (List Char, ())
jsonObject = do
  (k,v) <- objectValuePair
  pure (k,v)

It gives this error:

When checking right hand side of Main.case block
in jsonObject at bug-bind.idr:55:14 with expected type
        Grammar Char c2 (List Char, ())

Type mismatch between
        Grammar Char False (List Char, ()) (Type of pure (k, v))
and
        Grammar Char c2 (List Char, ()) (Expected type)

Specifically:
        Type mismatch between
                False
        and
                c2

Another attempt to deconstruct llll also fails:

  jsonObject : Grammar Char True (List Char, ())
  jsonObject = do
    llll <- objectValuePair
    let (k,v) = llll
    pure (k,v)

but the following deconstruction by function application rather than pattern matching succeeds:

jsonObject : Grammar Char True (List Char, ())
  jsonObject = do
    llll <- objectValuePair
    let k = fst llll
    let v = snd llll
    pure (k,v)

That type error

Let's dig into that type error:

Type mismatch between
        Grammar Char False (List Char, ()) (Type of pure (k, v))
and
        Grammar Char c2 (List Char, ()) (Expected type)

Grammer _ _ _ is the type of parser actions, where the first parameter Char is the type of symbols we're consuming, the final parameter (List Char, ()) is the type that the parser will return on success, and the middle parameter (False or c2) represents whether the parser will definitely consume input (True) or might succeed without consuming anything (False - for example, a parser which removes whitespace, or pure which never even looks at the input stream).

This "consumes" parameter contains the main novelty in Text.Parser beyond monadic parser combinators: Text.Parser combinators manipulate and use this value at compile time to help check that parsers really will consume things: for example, a parser that definitely consumes followed by a parser that might not, results in a parser that definitely consumes; while sequencing two parsers that might not consume results in a parser that might not consume. (See: the source)

So what on earth has this parameter, manipulated by >>=, got to do with pattern matching pure values after they've already been returned by an action?

Desugaring

It turns out we can forget that our troublesome tuple is being returned from an action; let (a,b) = (1,2) breaks in the same way when run inside a Text.Parser do block.

Let's (very roughly) desugar some of the examples above, and then look at the types involved:

jsonObject : Grammar Char True (List Char, ())
jsonObject = do
  llll <- objectValuePair
  pure llll

-- becomes:
jsonObject = objectValuePair >>= (\v => pure v)


jsonObject = do
  (k,v) <- objectValuePair
-- becomes: 
    objectValuePair >>= (\(k,v) => pure (k,v))
-- becomes: 
    objectValuePair >>= (\llll => case llll of
      (k,v) => pure (k,v)
     )

So in the second fragment, there's an extra case expression in there to deconstruct llll using pattern matching.

Apparently that gets in the way of type inference/checking:

On the one hand, that pure has type: Grammar Char False (List Char, ()) - false because it may (actually, will always) succeed without consuming input.
On the other hand, >>= doesn't care whether the right hand side consumes or not - it will take either, as shown by the compile time variable c2 appearing in the error message.

Idris doesn't manage to unify c2 = False.

With further pinning of types using the, an uglier form of pattern matching does work:


export jsonObject : Grammar Char True (List Char, ())
  jsonObject = 
   do
    llll <- objectValuePair
    the (Grammar Char False (List Char, ())) $ do
        let (k,v) = llll
        pure (k, v)

Ugh

Thanks

Thanks to Melvar on #idris for explaining this.

14 June, 2017

An Idris implementation of a reddit bot.

I wrote and host lsc-todaybot, a bot that moves the [TODAY] flair around on reddit.com/r/LondonSocialClub, a subreddit dedicated to social activities in London. I wrote it in Haskell, and over the last year or so have used it as a simple application for experimenting with different Haskell libraries and techniques. For example, I used it to learn about extensible effects (and gave a talk about it at the London Haskell meetup).

I've been interested in the dependently typed programming language Idris and someone claimed it was pacman-complete - that is, you can write Pacman in it. So rather than playing round with proofs and all that dependent-type gubbins, I set out to rewrite todaybot in Idris.

It's here: https://github.com/benclifford/idris-todaybot

It's not beautiful. But it works well enough to be running alongside the Haskell implementation.

I've littered the source with QUESTION/DISCUSSION blocks, but basically I had to patch up an existing JSON library, interface an HTTP library (I chose libcurl), and struggle with a whole new style of error message (possibly the hardest bit as I'm used to error messages really leading you to the right answer).

There's almost no use of dependent types, but I did find a straightforward place to make use of them, in the parameters to this libcurl function:

CURLcode curl_easy_setopt(CURL *handle, CURLoption option, parameter);

where the type of the supplied parameter depends on the particular option chosen: for example a pointer to a callback function, or a boolean verbosity level.

08 February, 2016

Porting todaybot to use extensible-effects

I wrote todaybot (blog, github) a while ago, and it is chugging along just fine, with the occasional bugfix (e.g. due to people writing Swedish).

I've been meaning to play with the Haskell extensible-effects package, so on a lazy Sunday afternoon I started hacking away porting todaybot. My goal was more to get a hands-on feel for extensible-effects rather than actually replace the existing todaybot implementation.

The existing non-effect-based code is pretty rough-and-ready/script-like. It all runs in the IO monad, and state is threaded round manually in a couple of places.

I started by replacing all the IO actions with calls to the (Lift IO) effect. The problems I encountered here were:

type signatures/type checking errors, with initially impenetrable type errors (reminiscent of what happens when you work with Lens). The constraint based style of effect types gives verbose type errors in a form that I was not used to.
Some interaction that I haven't understood between type signatures/type inference on effects and on lens types and parsec types(!). I needed to add type signatures on top level lens and parser definitions, which I got using typed holes.
Handling IO exceptions - I was unsure how exceptions would work so for this first cut I ignored exception handling and let the whole bot die if anything goes wrong.

So now I had a bot with a single effect, Lift IO, with worse error handling than before. I wanted to get this exception handling back in so I wasn't losing functionality. I didn't (and still don't) know of the idiomatic way to handle IO exceptions in (Lift IO).

extensible-effects has exceptions (Exc) already, and I wanted IO exceptions to be handled using those, with the Exc IOError effect. I made a wrapper for lift, called lift' which called an IO action and translated IO errors into Exc IOError exceptions. IO errors then could be handled straightforwardly. Later on it turned out this wasn't enough: code is throwing errors other than IOError which need to be caught (although maybe this is also a bug in the mainline todaybot).

Next, I wanted to start breaking down the use of the IO effect into more focused effects. The one people talk about a lot is logging. There's a Writer effect in extensible-effects already, and logging by writing a string seemed pretty obvious. There's also a Trace effect which is pretty similar. Neither had effect handlers that did what I want: translate the logging effect into IO effects (which in turn would be handled by the IO effect handler). This was pretty straightforward to write, though. There was some messing round with type signatures but I got it worked out in the end.

And the last bit I did before I went to bed was put in a couple of Reader effects: one for configuration (which comes from a YAML file), and one for the authentication bearer token, which is requested during runtime. Writing the handlers for these had a similar feel to writing the handler for logging - a few lines of code inserted into boilerplate, messing round with type errors, raising more questions that I might get round to writing about.

Next I would like to implement effects to handle the last two pieces of IO, which are are access to the current time, and HTTP GET/POST calls; and see if I can use a choice effet instead of mapM_ to iterate.

The (very messy) code is on the exteff branch on github - at the time of writing, up to commit 7ccc0a92....

06 July, 2015

A Haskell reddit bot.

I am one of many many moderators on reddit's r/LondonSocialClub. This is a place for organising social gatherings in London.

Post titles usually take the form [DD/MM/YY] Event @ Place. Other moderators have fiddled with the CSS for this subreddit to give us a big red TODAY sticker next to today's events, and grey out events that are in the past. This uses reddit's flair mechanism, which allows assigning of labels to posts, and CSS styling based on a post's flair.

Unfortunately, this was not entirely automated - some sucker or other had to go in each day and adjust flair on the relevant posts to match up with reality. This bothered me as being a manual process that should be fairly easily automated. Eventually it bothered me enough that I wrote a bot, lsc-todaybot, to do it. Now the moderation logs make it look like I come home from the pub every day and move everything around before going to sleep.

Another motivation for writing this bot was it seemed small enough in scope that it would be achievable, but give me a chance to learn a few new APIs: several new Haskell libraries, and the reddit REST API.

HTTP: I've previously used HTTP when hacking at cabal. This doesn't do HTTPS (I think) and the maintainer told me to not use it. So I tried wreq. It was easy enough to get going and there was a tutorial for me to rip off.

Configuration: I used yaml to parse a YAML configuration file.

Lenses: I still haven't got a good grasp on what is happening with lenses but I used them in a few places, and it has developed my understanding a little bit: lsc-todaybot extracts fields from reddit's JSON responses using aeson-lens. yaml exposes the parsed configuration file as JSON, so the same lenses can be used for extracting configuration details. wreq also uses lenses for setting HTTP header values and the like.

Strings: I seem to have ended up using several different string classes, which is icky - ByteString, Text and String at least. I've made the source code for that more generic by using the generic monoid <> operator to concatenate them which makes things a bit less horrible looking.

Ben Clifford Technical Blog