[wp-hackers] is_email
Dougal Campbell
dougal at gunters.org
Fri Mar 25 17:00:08 GMT 2005
Nikolay Bachiyski wrote:
> Hello,
>
> Here is the regular expression found in the is_email() function:
>
> $chars = "/^([a-z0-9+_]|\\-|\\.)+@(([a-z0-9_]|\\-)+\\.)+[a-z]{2,6}\$/i";
>
> Some questions arose when I was looking at it:
> - why is it possible to have a '+' in the username
> - why is it possible to have '-' in both the username and the host
Because it's possible to have those characters in username and hostname
portions of email addresses. In particular, it's long been a convention
that many mail servers allow addresses like "user+whatever at example.com",
and automagically alias it to "user at example.com". This allows you to
generate your own dynamic aliases. It's useful for tracking who's
sharing your address. I often use that trick when supplying registration
information. For example, if I registered my email address with the New
York Times as "dougal+nytimes at gunters.org", then if I get spam to that
address later, I know that the Times shared my address (and I can
blackhole further email to that address if I want).
And '-' has always been a valid character for domains/hosts and in
usernames for most systems.
> - is there any difference between ([a-z0-9_]|\\-|\\.)+ and [a-z0-9_\-.]+
> - isn't it better to put it into single quotes and save some backslash
> escaping
>
> Here is a suggestion:
>
> $email_regex = '/^[a-z0-9_\-.]+\@([a-z0-9\-]{1,255}\.)+[a-z]{2,6}$/i';
I'm not sure why '.' was separated out, but I've seen people put '-'
outside of a character class due to bugs in some regex implementations.
I can't remember if PHP had that bug or not. If not, then I think your
suggestion would be a good one.
--
Dougal Campbell <dougal at gunters.org>
http://dougal.gunters.org/
More information about the wp-hackers
mailing list