26 March, 2012

email addresses are half case sensitive

The left hand part of an email address, the bit before the @ is case sensitive, in email in general. I've known that for a while - it seems to be an obscure-ish part of SMTP folklore.

Individual mail domains are perfectly at liberty to fold multiple distinct addresses into one, in their own domain, which is what most mail systems do: BENC and benc and bEnC all go to the same place @hawaga.org.uk. This leads many people to think that the left hand side is case insensitive.

This is just as they are at liberty to do that folding in other ways: for example, gmail ignores . in addresses, giving me b.clifford@gmail.com and bclifford@gmail.com. As well as b.c.l.i.ff.o.r.d@gmail.com

This came up on a mailing list (for browserid) that I watch, and I ended up being challenged in private email to cite a source. Luckily there's plenty of stuff around. RFC2821 section 2.4 seems to be the authority: The local-part of a mailbox MUST BE treated as case sensitive.

In the preparation of this blog post, I discovered something I didn't know before. It seems you cannot have multiple dots in a row in a plain email address. The RFC2821 production rules are:

      Local-part = Dot-string / Quoted-string
            ; MAY be case-sensitive

      Dot-string = Atom *("." Atom)

      Atom = 1*atext
where atext seems to come from the companion RFC2822 section 3.2.4. So b...clifford@gmail.com is not a valid address. Shame.

1 comment:

  1. Note that Local-part can also be a Quoted-String, defined by RFC2821 just below the definition for Atom:

          Quoted-string = DQUOTE *qcontent DQUOTE

          String = Atom / Quoted-string

    and qcontent is defined in RFC2822 as follows:

          qtext           =        NO-WS-CTL /     ; Non white space controls
          
                                   %d33 /          ; The rest of the US-ASCII
                                  %d35-91 /       ;  characters not including "\"
                                  %d93-126        ;  or the quote character
          
          qcontent        =       qtext / quoted-pair
          
          quoted-string   =       [CFWS]
                                  DQUOTE *([FWS] qcontent) [FWS] DQUOTE
                                  [CFWS]

    So it seems to me that a local-part CAN be a string of almost any characters, including two or more consecutive dots. Go ahead and use ellipsises in your email address. :)

    ReplyDelete