01 September, 2017

Pattern matching in Idris `do` notation has surprising reliance on type checking the action.

Idris is syntactically quite Haskell-like, and especially it has do notation for sequencing "actions".

Like (traditional) Haskell, do blocks are desugared to a sequence of >>= (bind) operators. But, unlike (traditional) Haskell, that >>= is not always the monadic >>= : m a -> (a -> m b) -> m b. (This can also happen in Haskell using rebindable syntax)

In Idris, you can use different "better-than-monad" types to statically (at compile time) reason about a computation beyond using the monad laws. For example, an effect system might track which effects are in scope.

Total parsing

In the case of Text.Parser (in the Idris contrib/ package) the type signature of actions (Grammar _ _ _ indicates whether a parser consumes any characters so that the compiler can tell if a parser might loop forever. (see http://www.cse.chalmers.se/~nad/publications/danielsson-parser-combinators.html)

I was trying to write a new JSON parser for idris-todaybot using Text.Parser. Previously JSON was parsed using lightyear, but Text.Parser has more gratuitous dependent types so was an obvious way to proceed.

A problem.

I ran into a surprising compile error which initially made no sense to me at all.

This code compiles:

objectValuePair : Grammar Char True (List Char, ())
-- definition elided

jsonObject : Grammar Char True (List Char, ())
jsonObject = do
  llll <- objectValuePair
  pure llll

where llll is a tuple; but the following version of jsonObject, which desconstructs that tuple and reassembles it, does not compile:

jsonObject : Grammar Char True (List Char, ())
jsonObject = do
  (k,v) <- objectValuePair
  pure (k,v)

It gives this error:

When checking right hand side of Main.case block
in jsonObject at bug-bind.idr:55:14 with expected type
        Grammar Char c2 (List Char, ())

Type mismatch between
        Grammar Char False (List Char, ()) (Type of pure (k, v))
and
        Grammar Char c2 (List Char, ()) (Expected type)

Specifically:
        Type mismatch between
                False
        and
                c2

Another attempt to deconstruct llll also fails:

  jsonObject : Grammar Char True (List Char, ())
  jsonObject = do
    llll <- objectValuePair
    let (k,v) = llll
    pure (k,v)

but the following deconstruction by function application rather than pattern matching succeeds:

jsonObject : Grammar Char True (List Char, ())
  jsonObject = do
    llll <- objectValuePair
    let k = fst llll
    let v = snd llll
    pure (k,v)

That type error

Let's dig into that type error:

Type mismatch between
        Grammar Char False (List Char, ()) (Type of pure (k, v))
and
        Grammar Char c2 (List Char, ()) (Expected type)

Grammer _ _ _ is the type of parser actions, where the first parameter Char is the type of symbols we're consuming, the final parameter (List Char, ()) is the type that the parser will return on success, and the middle parameter (False or c2) represents whether the parser will definitely consume input (True) or might succeed without consuming anything (False - for example, a parser which removes whitespace, or pure which never even looks at the input stream).

This "consumes" parameter contains the main novelty in Text.Parser beyond monadic parser combinators: Text.Parser combinators manipulate and use this value at compile time to help check that parsers really will consume things: for example, a parser that definitely consumes followed by a parser that might not, results in a parser that definitely consumes; while sequencing two parsers that might not consume results in a parser that might not consume. (See: the source)

So what on earth has this parameter, manipulated by >>=, got to do with pattern matching pure values after they've already been returned by an action?

Desugaring

It turns out we can forget that our troublesome tuple is being returned from an action; let (a,b) = (1,2) breaks in the same way when run inside a Text.Parser do block.

Let's (very roughly) desugar some of the examples above, and then look at the types involved:

jsonObject : Grammar Char True (List Char, ())
jsonObject = do
  llll <- objectValuePair
  pure llll

-- becomes:
jsonObject = objectValuePair >>= (\v => pure v)

jsonObject = do
  (k,v) <- objectValuePair
-- becomes: 
    objectValuePair >>= (\(k,v) => pure (k,v))
-- becomes: 
    objectValuePair >>= (\llll => case llll of
      (k,v) => pure (k,v)
     )

So in the second fragment, there's an extra case expression in there to deconstruct llll using pattern matching.

Apparently that gets in the way of type inference/checking:

  • On the one hand, that pure has type: Grammar Char False (List Char, ()) - false because it may (actually, will always) succeed without consuming input.
  • On the other hand, >>= doesn't care whether the right hand side consumes or not - it will take either, as shown by the compile time variable c2 appearing in the error message.
Idris doesn't manage to unify c2 = False.

With further pinning of types using the, an uglier form of pattern matching does work:


export jsonObject : Grammar Char True (List Char, ())
  jsonObject = 
   do
    llll <- objectValuePair
    the (Grammar Char False (List Char, ())) $ do
        let (k,v) = llll
        pure (k, v)

Ugh

Thanks

Thanks to Melvar on #idris for explaining this.

No comments:

Post a Comment