Jump to content
Larry Ullman's Book Forums

Recommended Posts

Hello, Larry and forumers,

 

I'm still trying to better understand functions, and I have two questions regarding the user-defined spam_scrubber() function.

 

On page 365, you begin defining it:

function spam_scrubber($value)
{
   $very_bad = array('to:', 'cc:', 'bcc:', 'content-type:', 'mime-version:', 'multipart-mixed:', 'content-transfer-encoding:');

   foreach ($very_bad as $v)
   {
       if (stripos($value, $v) !== false) return '';
   }
   ...
}

 

and then explain: "The first time that any of these items is found in the submitted value, the function will return an empty string and terminate (functions automatically stop executing once they hit a return)."

 

Does that mean that it will replace any "bad item" with an empty string, or that it will empty the whole form field? For instance, if bcc: is found in $body, will $body be completely empty after spam_scrubber() has returned an empty string ?

 

Also, in your first tip, on page 367, you say : "Using the array_map() function is convenient but not without its downsides. […] Any multidimensional arrays within $_POST will be lost."

 

What does "lost" mean? I understand that array_map() will indiscriminately apply the spam_scrubber() function to the entire $_POST array, but if there are "clean" rows from a database inside, for instance, with none of the "bad items", what will it do to the data?

 

With thanks for your help,

Link to comment
Share on other sites

I don't have the book in front of me, nor do I recall this example specifically, but I can make the following general statements:

 

If any of the strings in $very_bad are found in $value passed to this function, then a single empty string is returned once. In other words, if anything bad is found, the following $result variable is set to an empty string:

 

$result = spam_scrubber($someValue);

 

I'm assuming that $result is probably the body of the email, but I don't know. If that's the case, then the body is set to an empty string if any of the "bad" things are found in the body.

 

And like Larry said, as soon as a return statement is hit, the function is always immediately exited from, even if the return statement is in a loop in the function, as is the case above.

 

The array_map function is a built-in PHP function. I recommend checking out the array_map page on PHP.net for more information. Also, in the code snippet provided above, array_map is not used, so I don't think that array_map has anything to do with the above function (at least, not that I can see).

 

In general though, I think what Larry is trying to say is that if you have an array within an array, and then use the array_map function, the array within the array might be "lost". In other words, only the first value in the array within the array will be retained, while the rest of the array in the array will be completely erased (as if the "non-first" values never existed), so that the end result is a single array (without any arrays in it).

 

Well, hope that helps.

  • Upvote 1
Link to comment
Share on other sites

Thank you, HartleySan. You clarified things for me writing "if anything bad is found, the following $result variable is set to an empty string".

 

As for the array_map() function (it's used to apply the spam_scrubber() function to the whole POST array), I had indeed read the PHP manual page before posting here, but I still don't really understand "lost". I suppose I'll suddenly see the light after further testing!

 

I did test the spam_scrubber() quite intensively, on a very simple example (not mail, since that's not the easiest thing to test), and I'm still confused. Using the script below, if I type anything from the $very_bad array, the script works fine and returns an empty string, whatever else I may have typed. But the str_replace() function works very partially or at least not as I expected it to work: it removes %0a and %0d, but it doesn't remove \n or \r.

 

For instance, if I type:

 

I remember him looking round the cover and whistling to himself as he did so, and then breaking out in that old sea-song that he sang so often afterwards:\n "\n" \r "\r" %0a "%0a" %0d "%0d" ‘Fifteen men on the dead man’s chest— Yo-ho-ho, and a bottle of rum!’

I get

 

I remember him looking round the cover and whistling to himself as he did so, and then breaking out in that old sea-song that he sang so often afterwards:\n "\n" \r "\r" " " " " ‘Fifteen men on the dead man’s chest— Yo-ho-ho, and a bottle of rum!’

 

I don't understand why. I'm using the str_replace() function quite a lot on my website, and it has never failed to replace strings, so I don't understand why it's not the case here.

 

I anyone has an explanation, I'll be glad to read it!

 

With thanks for your help once again,

 

Josée

 

-----

 

The script I'm using for testing the spam_scrubber() function.

 

 

<?php
date_default_timezone_set('Europe/Paris');
setlocale(LC_ALL, 'fr_FR.UTF-8');
?>

<h1>Test</h1>
<p></p>
<?php
if (isset($_POST['submitted']))
{
$before = (get_magic_quotes_gpc() ? stripslashes($_POST['example']) : $_POST['example']);

function spam_scrubber($value)
{
		if (get_magic_quotes_gpc())
		{
			$value = stripslashes($value);
		}

		$value = str_replace(array("\r", "\n", "%0a", "%0d"), ' ', $value);

		$very_bad = array('to:', 'cc:', 'bcc:', 'content-type:', 'mime-version:', 'multipart-mixed:', 'content-transfer-encoding:');

		foreach ($very_bad as $v)
		{
				if (stripos($value, $v) !== false) return '';
		}

		return trim($value);
}

$after = spam_scrubber($_POST['example']);

echo "<h2>Before</h2>
<p>$before</p>
<hr />
<h2>After</h2>
<p>$after</p>
<hr />";
}
?>

<form action="spam_scrubber.php5" method="post">
<p><label for="example">Example:</label><textarea name="example" id="example" rows="5" cols="70"></textarea></p>
<p><input type="submit" name="submit" value="Submit" /></p>
<input type="hidden" name="submitted" value="TRUE" />
</form>

Link to comment
Share on other sites

I cannot respond to your question about \r\n vs. %0d%0a at the moment. I'll try and play around with it when I get home tonight and find an answer. It might be something to do with your specific browser though. Are you using Safari on your Mac OS? I suspect that that has something to do with it.

 

There are a couple other small things I noticed though:

- If you're going to place a form in your script, you should include the other appropriate HTML tags (i.e., a doctype, html, head, body, etc.) in the file directly, or via header and footer include files.

- Your ternary operation syntax does not seem valid (but I'm not 100% sure). I think the closing parenthesis before the semicolon should go right after get_magic_quotes_gpc() to make the following:

 

$before = (get_magic_quotes_gpc()) ? stripslashes($_POST['example']) : $_POST['example'];

 

- You said that you're using the array_map function on the entire POST array, but nowhere in your script are you using the array_map function or using the entire POST array as an argument.

- In regards to str_replace, why are %0a and %0d (that ones without double quotation marks) replaced with " (double quotation marks)? According to your str_replace call, they should be replaced with a space (i.e., ' ') and nothing more.

 

I look forward to a response. Thank you.

  • Upvote 1
Link to comment
Share on other sites

Thank you for your answer, HartleySan. Sorry for the delay in answering you; it always takes me a while to post a message on this forum because I have to struggle with 2 foreign languages: English, and PHP/MySQL.

 

I cannot respond to your question about \r\n vs. %0d%0a at the moment. I'll try and play around with it when I get home tonight and find an answer. It might be something to do with your specific browser though. Are you using Safari on your Mac OS? I suspect that that has something to do with it.

 

I usually use either Camino or Firefox (still the 3.6 version). But I haven't noticed any difference between the two for this script.

 

Since my previous post, I've been thinking that one reason for str_replace() failing to replace \r and \n with a space may be that I start the file with this line:

setlocale(LC_ALL, 'fr_FR.UTF-8');

That's because I usually apply the multibyte family of functions since my websites use mainly French but also other European languages.

I'll have to test Larry's script in another file, without this setlocale() line.

 

- If you're going to place a form in your script, you should include the other appropriate HTML tags (i.e., a doctype, html, head, body, etc.) in the file directly, or via header and footer include files.

 

Yes, of course. I was just trying to post what was really useful from my example.

 

- Your ternary operation syntax does not seem valid (but I'm not 100% sure). I think the closing parenthesis before the semicolon should go right after get_magic_quotes_gpc() to make the following:

 

$before = (get_magic_quotes_gpc()) ? stripslashes($_POST['example']) : $_POST['example'];

 

From what I've read in the PHP manual (http://us2.php.net/manual/en/language.operators.comparison.php), you are right… but I was not really wrong. Since the first part is the equivalent of an if conditional, it should be wrapped in parentheses; but I read on some other page too that you should wrap the whole ternary operator in parentheses. So, at least for more complex conditionals than this one, the best would apparently be:

$before = ((get_magic_quotes_gpc()) ? stripslashes($_POST['example']) : $_POST['example']);

 

- You said that you're using the array_map function on the entire POST array, but nowhere in your script are you using the array_map function or using the entire POST array as an argument.

 

Sorry for not being clearer. In fact Larry uses the array_map() function to apply the spam_scrubber function to the whole POST array. In my example, I had no need for array_map(), so I left it out in order to focus on the str_replace function.

 

- In regards to str_replace, why are %0a and %0d (that ones without double quotation marks) replaced with " (double quotation marks)? According to your str_replace call, they should be replaced with a space (i.e., ' ') and nothing more.

 

They are not replaced with double quotation marks; the ones you see at the end were in the text I typed, they surrounded the second occurrences of %0a and %0d. So they are left behind, with a space between them. That's what I expected… From my point of view, str_replace() should behave just the same with \r and \n!

 

I'll go on playing with these scripts a bit, and if I see the light, I'll let you know!

Link to comment
Share on other sites

I looked into the %0a/%0d issue rather extensively, and after researching the matter and testing various things, I think the following link explains the situation best:

 

http://bytes.com/topic/php/answers/498761-replace-cr-lf-textarea-str_replace

 

I particularly liked the response by Iván Sánchez Ortega and the link he provided.

 

Hopefully that sheds some light on things. If you have any further questions, don't hesitate.

  • Upvote 1
Link to comment
Share on other sites

  • 2 months later...

I ran into a few problems with this one:

 

1) Line 32 -- str_replace() code.

After much hair-pulling, I discovered that the only way I could get this to work was by preceding the backslash r and backslash n items with an additional backslash. ie:

 

$value = str_replace("\\r","\\n","%0a","%0d");

 

This was true for my own computer, as well as 2 different web hosts.

 

2) Mail delays.

I put the code on one of my sites and it worked perfectly. But when I copied it onto another site with a different webhost (with the necessary changes), it didn't APPEAR to work at all...

 

The next morning however, I found that the mail from the second site did indeed get sent...1 or 2 hours after I sent it. Something to do with the caching system of the mail server of that particular web host.

 

Moral of the story: if you hit "Submit" and the email doesn't get sent right away, it may not be your code's fault.

 

3) Magic quotes.

Both my web hosts have 'Magic quotes' on, so I used the following to remove the added slashes:

 

if(get_magic_quotes_gpc()){

 

$_POST['name'] = stripslashes($_POST['name']);

$_POST['email'] = stripslashes($_POST['email']);

$_POST['comments'] = stripslashes($_POST['comments']);

 

}

Link to comment
Share on other sites

  • 7 months later...

I have no idea why this works, but it does - credit goes to someone named Psychopsia....

 

....If you want to replace \r generated by system use "\r" (double quotes), but if is user input use '\r'. (Single quotes)

Link to comment
Share on other sites

That's actually a fairly well documented "issue" in PHP.

Actually, it's not really an issue, but PHP is designed to interpret all characters within single quotes literally, whereas all characters that have special meanings (like the metacharacter \r) are interpreted with their special meanings in double quotes.

 

Please see the following:

http://php.net/manual/en/language.types.string.php

  • Upvote 1
Link to comment
Share on other sites

  • 1 month later...

if the user input is properly sanitized from the beginning i do not see how the form may be abused. The problem is why would somebody have access to the email's header in the first place? Why would somebody be allowed to enter control characters as input in the form? If it's a contact form, than the first argument of the function mail is hidden, the second should take the subject and I will let it alphanumeric, than the body. I see no point in hunting CC and BCC...

 

$string = "\n \r";

echo filter_var($string, FILTER_SANITIZE_SPECIAL_CHARS);

Link to comment
Share on other sites

In the corresponding section of the book I explain how the header is not separate from the body as you may believe. As for why could you enter control characters? Because it's really easy. Or, if someone wanted, they could create their own form on their own computer that submits to your contact form, in the hope of sending spam.

 

If you search online, you'll very quickly see plenty about how easy it is to make use of lax contact forms.

Link to comment
Share on other sites

True, but the root of the problem is the same thing that we debate in other thread: properly sanitizing the user input. If the input is properly sanitized I do not see how anybody would benefit of a function that search for the presence of keywords such "CC" and "BCC".

 

With the same token we should create functions that are hunting keywords to prevent SQL injection, yet we do not do that. We prefer as mechanism of defense sanitizing the user data. I would prefer not to use a function such the scrubber, but to have a high degree of certainty that the user input is properly sanitized.

 

I expressed myself wrong, clearly there is no "body" isolated by "header", you are right to point it out.

 

bool mail ( string $to , string $subject , string $message [, string $additional_headers [, string $additional_parameters ]] )

Link to comment
Share on other sites

Right. You need to properly sanitize all user input. The point of this section of the book and this thread is that with emails, there are different kinds of sanitizing than with, say, database queries. The scrubber gives you a high degree of security; not sure why you would imply otherwise. But, most importantly, as I always say, find the right amount of security for the application at hand.

 

On that other note, I completely disagree with the idea that we should create functions that are hunting keywords.

Link to comment
Share on other sites

 Share

×
×
  • Create New...