E-mail Address Validating Regular Expressions – a Warning

This page has been floating around in links over the past couple of weeks, as a collection of test cases to compare e-mail address validating regular expressions. However, watch out: it’s wrong.

RFC822/2822 defines an email address with a bare IP address domain part as using:

  domain-literal  =       [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]

In other words, this test case is not valid at all:

  [email protected]

Instead, it should be:

  [email protected]

ditto for the other addrs using IP addresses in the domain part. They’re rare, but the non-bracketed form is definitely not legal and should not be considered so in the test cases.

I sent a mail to the author a few days ago without response, hence this post.

This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.

5 Comments

  1. Posted June 1, 2010 at 18:26 | Permalink

    Probably none of those exotic but legal addresses work in the real world anyway. Expect numerous problems when you use anything else than an alphanumeric local part and a real domain with 2/3-letter TLD.

  2. Posted June 1, 2010 at 19:22 | Permalink

    God, even I knew that. ;)

  3. Posted June 2, 2010 at 01:05 | Permalink

    Is someone claiming that a new regexp is more accurate than the ancient monster used in such places as Perl’s Email::Valid? I’m sceptical – with or without your correction.

  4. jamie
    Posted June 2, 2010 at 04:51 | Permalink

    Niq –

    No, they’re just replaying the usualy “how do I” game, in the same way it always plays out every couple of years –

    (1) Some developer somewhere underestimates how tricky the problem is, and starts writing a regexp. (2) As jwz famously noted, now they have two problems. (3) They search around, find other examples, discussion and comparison ensues. (4) At some point, the original developer declares one of the “good enough”, moves on, and now yet another incorrect regexp ranks highly in the googles for inexperienced developers to trip over until the cycle of life repeats.

    I’ve been watching this game since the Usenet days. The only evolution I can see is that Twitter has developed a communications platform in which discussing regexps reasonably is basically impossible. I personally consider this a feature.

  5. Posted June 2, 2010 at 12:03 | Permalink

    @niq: actually, it turns out Email::Valid 0.179 (ie. the version I have installed) doesn’t get all the test cases right either. Here are the ones it gets wrong:

        isinvalid [email protected]
        isinvalid [email protected]
        isinvalid [email protected]tersss.org
        isinvalid [email protected]
        isvalid [email protected]
    

    so even Email::Valid fails the domain-literal test!

    here’s the script, fwiw: http://taint.org/x/2010/test-email-valid.txt