Jump to content
Larry Ullman's Book Forums

Converting non english characters to ASCII

Recommended Posts

Hi Larry and everyone,


I'm still using your book as php is still reducing me to tears.  Here is the problem:

I have created a large database in English and Spanish.  In spite of setting the MySQL db to UTF-8 it still returns nonsense characters with eg accented 'a', so I have been entering all of the text with accents as for example á = á, Ú = Ú, ñ = ñ as the only way to guarantee that they print correctly when called.

The problem is that when a Spaniosh speaker does a search, then they will search for eg 'Estación' not 'Estación.  The php if conditional doesn't recognise accented characters so I need to parse each word letter by letter and convert any accented character to its ASCII decimal to then convert it to the ASCII entity  i.e. 'á' => &225; => á to present to MySQL. 

I can't find a converter in the turgid php manual so am stuck.  I have re-read your chapter 'Making Universal Sites' but have now resorted to coming home to daddy.






Link to comment
Share on other sites

I have come across htmlentities and htmlspecialchars but neither seems to work.


Here is a Q&D test:

    $char = $_POST['char'];
    $char2 = htmlentities($char);
    echo 'Answer:  ' . $char . '<br /><br />';
    echo $char2;
echo '<form action = "char_test.php" method = "post">
<input type = "text" name = "char" size = "6">
<input type = "submit" name = "submit" value = "Search!" />

if I entered á then I would expect the answer to be:

Answer: á


but I get:

Answer: á


This goes for all of the characters that I have tried.  Surely php wouldn't put something in their manual that doesn't work?



Link to comment
Share on other sites

This kind of problem is a PITA to debug b/c it could come from several places: the value stored in the database, the value retrieved from the database, the value put into the HTML, or the value displayed in the browser. I don't think you want to go to some converter method; it's best to solve the actual problem. I'd start by confirming how the values are stored in the database. Make sure the database is using UTF8 everywhere, especially on the specific table and column. Make sure you're using UTF8 when connecting to the database, both directly and from PHP. And then make sure your HTML page uses UTF8, both in the HTML encoding and in the encoding used by your IDE/text editor when saving the file. 

Link to comment
Share on other sites

Hi Larry,


<meta charset="utf-8">
 <!--[if lt IE 9]>
    <script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script>

MySQL:  Collation on all relevant columns and tables are utf-8_unicode_ci

When I use phpMyAdmin the accented characters show as ASCII entities (i.e. á = &aacute;), which is what I want.  It all works fine: 

Inauguraci&oacute;n. Bendici&oacute;n de las locomotoras - &Aacute;guilas. Marzo 1890. - from phpMyAdmin


 Inauguración. Bendición de las locomotoras - Águilas. Marzo 1890.

The entities in MySQL have been converted to accented text as wanted.

The problem seems to be that php isn't recognising the accented characters and thus not converting them to ASCII entities for the search engine that I am trying to create.

Thanks as always for your time and help.  

Link to comment
Share on other sites

  • 2 weeks later...

Hi all,


Well, finally solved it.  What really complicated matters was that my browser was converting the (correct) answer from php back to html entities!  This led me to believe that php wasn't actually doing anything.


So, it has all come down to two lines of code:....


    $search = trim($_POST['Search']);
    $search = htmlentities($search);

Now, php can do the MySQL search and now finds matches.


To get round the problem on the test script I added as the first line:


header('Content-Type: text/plain');


.....so now my test script works as well.


Best regards and thank you, Larry.




Edited by Max
Accidentally posted it.
Link to comment
Share on other sites


  • Create New...