Jump to content
Larry Ullman's Book Forums

Ch. 9 Sanitizing Main Text Field Input/Output

Recommended Posts

Hey Larry, I  am building a site using this part of the book to allow users to upload their own articles, and use the $page->getContent() object to output the content, as shown on page 300.


Is the only way to have this type system, where an admin did not have to approve every article, be to use TinyMCE(or a similar plugin that uses it's own tag filters)? I ask because the main text field has no tag filter(strip_tags, htmlspecialchars, etc) applied to it either on the input to the database or the output. Is there a way to filter out all tags except benign tags like <p> for long articles?


I was about to ask also about SQL Injection Attacks but I then saw you used prepared statements, which solves that threat :)

Link to comment
Share on other sites

He covers this in the 4th edition of the book (and I believe the 3rd edition as well), but you can make an array of all the "bad" strings, and then search through all user input for those strings and remove/change them as need be.

For example, you could make a list of all the HTML tags that are bad, and remove them as need be.

Conversely, if you're only going to allow a few valid tags, you might want to run htmlspecialchars on all the user input, and convert all the valid tags back as need be,


Would any of those ideas solve your problem?

Link to comment
Share on other sites

First, thank you for the quick reply. Second, I think your second idea would be better, as to know what all the 'bad' may be harder to pin down than the benign ones.


That said, apparently, there is a feature to strip_tags that I didn't know about. It takes a second argument. The first is the data to strip the tags from, and the second optional argument is maybe a string(or array) of allowable tags. That is exactly what i was looking for. Where did I find that? Where else, from Larry's Effortless Ecommerce Example 1, where he uses TinyMCE and runs the content through strip_tags but builds a string like so

"$allowed = '<div><p><span><br><a><img><h1><h2><h3><h4><ul><ol><li><blockquote>';"

and uses that string as the second argument. This filtered data is then sent to the database etc.

  • Upvote 2
Link to comment
Share on other sites

  • 2 months later...

Good stuff, but I've discovered a hole in the striptags()(and even htmlspecialchars() and htmlentities(): you can still inject Javascript into the acceptable tags. The script doesn't need to start with "<script>". This is explained here. http://www.deepshiftlabs.com/dev_blog/?p=1885&lang=en-us


There seam to be two solutions.


1.To simply not allow any submitted HTML to actually go on your site until you(or an authorized user) approves it(fairly easy with an extra column in the table called "approved"...then the PDO/mysqli will onyl return results that have a true or yes type value). XSS does nothing when stored in a database(not to be confused with sql injection, which prepared statements take care of), but can do damage if posted on your site.


2. You sanitize the HTML. The most popular seems to be something called "HTML purifier". I would imagine that's what this site, and the thousands of other sites that allow certain submitted HTML code on their site.


P.S. Hartley, what you had described is a blacklist. Wouldn't a whitelist be better, since they are alrways coming up with new "bad" code that you would need to constantly update your blacklist :)

Link to comment
Share on other sites

  • 2 weeks later...

  • Create New...