Jump to content
Larry Ullman's Book Forums

How/Why Does This Regex Work The Way It Does?


Recommended Posts

I was checking out the str_split function on php.net (http://php.net/manual/en/function.str-split.php), and the very first user comment from "atolia at gmail----- dot com" offers up a way to write your own str_split function with the preg_split function. Unfortunately, I cannot understand the regex pattern used for the function and why the regex works the way it does.

 

Specifically, the following is the alternative str_split function the user provides:

function str_split($str){
  return preg_split('//',$str);
}

 

And here is my specific question:

Why does the regex '//' split the string on every character?

 

Thank you.

Link to comment
Share on other sites

I would guess it is because the actual expression is an empty but still a valid regular expression. (started and ended by a Char '/') As a consequence, I guess everything is a match, and it will then split on any character. While I guess this is the functional behavior, I cannot explain you why that would be.

 

The second example on preg_split() in the manual uses the same approach, btw.

Link to comment
Share on other sites

Yeah, I noticed the preg_split example as well.

And yeah, I'm in the same boat as you in that I'm just assuming that when the regex is empty, it matches every character.

I suppose that's all we really can assume without knowing more about the inner-workings of regex engines.

 

I suppose it makes sense to some degree though, as regexes do typically just loop through every character in a given string. If the regex for a string is nothing, then I guess the default is to assume that every character is a match, like you said.

 

Thanks for your input, Antonio.

We'll see if Larry or someone else can provide any more details.

Link to comment
Share on other sites

Me neither, which is exactly why I asked here.

It definitely works though, so I can only assume that when an empty regex is used, every character is matched (as Antonio suggested).

 

Playing around a bit, I also noticed that unless you specify a certain flag for the preg_split command, every character (including zero-width characters) will be matched. For example, the characters represented by ^ and $ in a regex are also matched and returned.

 

But yeah, like you said, Larry, I was hoping to find some concrete information in some documentation somewhere, but no real information seems to readily exist.

Link to comment
Share on other sites

 Share

×
×
  • Create New...