Saturday, February 24, 2007

Another handy MySQL function

This time it's a quick function to validate email addresses, based on regexp.
It can be used in a trigger to add data validation, or to check data already in your database that needs a clean up ... really a simple wrapper around a simple regexp query, but it can be helpful.
Here it is:

  1. DELIMITER $$
  2. DROP FUNCTION IF EXISTS `test`.`is_valid_email` $$
  3. CREATE DEFINER=`root`@`localhost` FUNCTION `is_valid_email`(p_email varchar(64)) RETURNS tinyint(1)
  4. BEGIN
  5. CASE
  6. WHEN NOT (SELECT p_email REGEXP '^[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$')
  7. THEN
  8. -- bad data
  9. RETURN FALSE;
  10. ELSE
  11. -- good email
  12. RETURN TRUE;
  13. END CASE;
  14. END $$
  15. DELIMITER ;

As per Mushu's comment, this is much cleaner, oops:

  1. DELIMITER $$
  2. DROP FUNCTION IF EXISTS `test`.`is_email_valid` $$
  3. CREATE FUNCTION `test`.`is_email_valid` (p_email varchar(64)) RETURNS tinyint(1)
  4. BEGIN
  5. CASE
  6. WHEN p_email REGEXP '^[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$'
  7. THEN
  8. -- good email
  9. RETURN TRUE;
  10. ELSE
  11. -- bad data
  12. RETURN FALSE;
  13. END CASE;
  14. END $$
  15. DELIMITER ;


Also, if you can move such checks into the application code, it may be a wise thing to do as executing regular expressions can easly consume all CPU resources.

Of course doing checks in the application code is cleaner, but the ability to add them later at database level, or to run them in the database without interaction with external apps is also handy ;-)

7 comments:

jim said...

a common complaint about email validation on the web is that people forget the plus sign, '+', can be a valid character on the left-hand side. to really support the full range of valid email addresses, the regex is really quite complex. (see jeffrey friedl's mastering regular expressions.)

Anonymous said...

Also doesn't support .museum or IDN domains. To be honest the best form of validation is to check for an MX record.

pabloj said...

You are both right, it's just a quick hack which uses a regexp taken from web.
It's by no means RFC compliant and I agree with Jim, the real regexp would be quite complex.
Checking for an MX record in another story as it would require an UDF.

Unknown said...

I would rather put REGEXP into IF condition, which is more natural construction than doing CASE of SELECT negation.

Also, if you can move such checks into the application code, it may be a wise thing to do as executing regular expressions can easly consume all CPU resources.

pabloj said...

mushu,
thanks for your comments, I enhanced the function a bit :-)

Anonymous said...

It also doesn't work for a mail address at a top level domain (they do exist).

pabloj said...

Hi all,
thanks for your feedback, you'll notice that I cleaned up the function a lot but didn't change the regex, this is because I'm an regexp illiterate and there are many good ones available on the web.
Feel free to share your good one here, I'll add it to the original post.