Josee Posted November 6, 2011 Share Posted November 6, 2011 Hello, Could someone help me with a regular expression in MySQL? If I test the following pattern in PHP: \b(\')?((eau){1}|(eaux){1})\b the results are fine since I get a match for eaul'eau OR l'eau coule d'eau OR cours d'eau les eaux OR les eaux ruissellent but no match for carreaux (the word ends with 'eaux', but "eau" or "eaux" is not the beginning of the word, and doesn't occur just after an apostrophe)beaucoup ('eau' or 'eaux' is neither the beginning nor the end of the word) Yet if I try the very same pattern in the database, using phpMyAdmin, I get no results at all although all the above examples are in the database. Could someone explain why this is happening, and what I need to do in MySQL? In case you wonder why I'm not using a FULLTEXT search, I plan to use regular expressions for words that are 1, 2 or 3 characters long, and a FULLTEXT search for longer words or expressions. With thanks for your help, Link to comment Share on other sites More sharing options...
Larry Posted November 7, 2011 Share Posted November 7, 2011 Can you confirm this isn't a collation or character set issue? Link to comment Share on other sites More sharing options...
Josee Posted November 7, 2011 Author Share Posted November 7, 2011 Thank you for answering, Larry. No, apparently it isn't a collation or character set issue (for once!). It's apparently a problem with word-boundaries since this query returns no result: SELECT id_syntagme, syntagme_c FROM syntagmes WHERE syntagme_c REGEXP '\b(\')?((eau){1}|(eaux){1})\b' whereas this query works… except that it returns all the entries containing "eau" or "eaux", including "carreaux" or "beaucoup" (which I need to exclude): SELECT id_syntagme, syntagme_c FROM syntagmes WHERE syntagme_c REGEXP '(\')?((eau){1}|(eaux){1})' Is there another way of specifying word-boundaries in a MySQL query? With thanks for your help, Link to comment Share on other sites More sharing options...
Josee Posted November 10, 2011 Author Share Posted November 10, 2011 I've just been looking at the MySQL 5.1 Manual, and I found the explanation here: http://dev.mysql.com.../en/regexp.html In the MySQL version of regular expressions, \b means a backspace. And word-boundaries are instead shown like this: REGEXP '[[:<:]]word[[:>:]]' So, rewriting my previous regular expression like this: REGEXP '[[:<:]](\')?((eau){1}|(eaux){1})[[:>:]]' returned just the results I wanted. 1 Link to comment Share on other sites More sharing options...
Larry Posted November 11, 2011 Share Posted November 11, 2011 Ah, excellent. Kudos for figuring it out and thanks for posting the solution. Apologies for not getting an answer to you sooner myself. Link to comment Share on other sites More sharing options...
Josee Posted November 12, 2011 Author Share Posted November 12, 2011 Hello, Larry, Please don't apologize. There's no reason why you should do the research if you don't know the answer off the top of your head. Best wishes, Link to comment Share on other sites More sharing options...
Recommended Posts