Hi, I was learning the code on creating a search engine, that accompanies this book from here:
http://www.peachpit.com/articles/article.aspx?p=1802754&seqNum=4
and encountered an issue in the se_index.php file.
Everything worked fine until I tryed cyrillic characters, the
preg_match_all('/\b\w+\b/', $content, $output)
function didn't allow them to get through:
After researching the net I fount the solution adding the u caracter for unicode like this:
preg_match_all('/\b\w+\b/u', $content, $output);
and it worked on the localhost with LAMPP.
Recently I tried to index a website on a shared hosting that runs PHP-5.2.17, and the same method that worked on the localhost, didn't with cyrillic there. I also tryed
preg_match_all('/\b\[a-zA-Z\p{Cyrillic}0-9]+\b/u', $content, $output);
and other combinations of regular expressions with the \p{Cyrillic} but nothing worked so far.
If anybody has knowlege how to solve this issue, please give me a note, thanks.