Jump to content
Larry Ullman's Book Forums

Validation And Filtering Ctype Vs Regular Expressions

Recommended Posts

Hi Larry,


Greg again. Once more my apologise for posting question on the comments forum of your book. At your suggestion I moved them here:


I’ve noticed that you suggest the use of ctype and filter. However many people noticed errors with internationalization. I personally am inclined to use regular expressions to blacklist, whitelist and validate. I (recursively) eliminate the control characters from all user data than resolve the magic quotes issue than validate than typecast the result than filter with mysqli_real_escape_characters if it has to go to a database. Is this a redundant effort?


For example if no control characters are allowed from beginning, no chance of mail injection (like your “scrubber” function on the other book to deal with the mail injection).


Than a \p{L} will assure me, for example, that I will have only letters in any language or just from a particular language like \p{Thai}. Is this approach bad? What benefits could possible give me ctype and filter compared with a class that performs what I mentioned? Basically regular expressions against new (and still buggy at this time) PHP functions. Thank you and I’ll appreciate if you’re kind to answer.

Link to comment
Share on other sites

Hey Greg. The short answer is that if you're more comfortable using regular expressions, then by all means use regular expressions. The main problem with regular expressions is that they are hard to get right, so for many people, the regular expression will be "buggy", more so than ctype or Filter.


As for control characters, if I understand what you're talking about, those have nothing to do with Magic Quotes and all string data should be run through a database escaping function.

Link to comment
Share on other sites

Is it redudant? Most likely Be strict when needed, but don't overdo it. Regular expressions are brute force, and therefor slow. Do enough to keep data integrity and prevent security holes, but don't use regular expressions on things like a forum post or a blog comment.


Using them to check REQUIRED patterns, like mail adresses, password requirement and such.


Security is often simple, because it has to be. Mysql_real_escape_string() practically makes queries safe. No need to bring the artillary ( reg. Expr. ) to do a trivial task. :)

  • Upvote 1
Link to comment
Share on other sites

  • 2 weeks later...

Thank you for your replies.


I guess there are several ways of doing the same thing or at least there are several tools in PHP that may be used for the same thing. Here I was thinking to validation.

We have ctype function, validation filters (like FILTER_VALIDATE_EMAIL), regular expressions and we can also use typecasting where needed.

Now if I use the typecasting for integers like when related to a request for a primary key, I see no point of using something like FILTER_VALIDATE_INT or ctype_ digit.

If I have to validate something like the email I can use FILTER_VALIDATE_EMAIL or a regular expression. The first it is not particularly faster than the second. So no need for FILTER_VALIDATE_EMAIL, the regular expression will do the job.

Now FILTER_VALIDATE_REGEXP is way slower for any decent regular expression than preg_match on the same regular expression. So it makes no sense to use FILTER_VALIDATE_REGEXP.


Two issues here:


First: I fail to understand why ctype and validation filters... Almost everything fails between the cracks.


Secondly what is the best practice for the following real world example. A field like First Name for example can contain characters like ' and - and may be persons that want to use . like P. J. D'Alberto-Johnson. Addresses may contain even more like # or ( and ) and a phone number may contain + for country code. What it is supposed to do in such situation? Validating with someting very general like ctype_ print at the risk of having names that may wongly contain + and phone numbers that may wrongly contain ' because of the user input?


In almost no real situation the ctype and validation filters resolve the problem. My question is: what is the best practice under the circumstances? Using many regular expressions to account for (international) names and addresses for example or just making sure there is something there and that is not a security problem?


Currently I am using regular expressions just for email and password, for all the other fields I through the ctype_print and I perform the heavy validation with ajax. If data goes in the database I escape it. Is this a good aproach or there are better options out there that I overlooked?


Thanks again for your time.

Link to comment
Share on other sites

For me:

I'd use a Regex for a first and second name and passwords


telephone numbers specify a format already ie. "numbers only" or perhaps again a regex that allows for spaces and or hyphens.

VALIDATE_INT or typecast when checking $_GET['id']

Regex's are expensive so where there is a decent alternative I'd use that.

I'd also use them with prepared statements leaving the escaping to the database.

Link to comment
Share on other sites

Don't force to many rules when

Not necessary. If someone wants to use something stupid as a name, allow them. It will not break your application.


Take the telephone example. If you want country codes, define another input field for it. As numbers are different in countries, make sure it a plausible number. If the number is truly important, make them validate it by using a code sent by

SMS. If not, allow some slack.


You cannot check everything.

Link to comment
Share on other sites

I tend to use Filter when its supported for validating email addresses and numbers. Names, streets, and addresses I normally just check that the length and run them through strip_tags(). Same goes for comments and other text fields. I may or may not use regexp for phone numbers, but I'd use preg_match(), not Filter.


Generally, you need to first make it secure (both to go into the database and to be displayed back on the Web page) and then you need to validate that necessary information is correct. For example, using a registration activation script to confirm a valid email address. As Antonio says, if someone wants to do something stupid, there's no harm in letting them, so long as it's still secure.

Link to comment
Share on other sites


  • Create New...