Jump to content
Larry Ullman's Book Forums

Chapter 15: Message Board, The General Subject Of Unicode Safe Data


Recommended Posts

Mentioned in the book in especially Chapter 14: Making Universal Sites and Chapter 15: Message board example, are recommendations for processing unicode safe data.

 

We learn that utf-8 supports a wide list of languages. There was so much information to digest, I could not recall if every action is required when building a universal/multilingual site. Are all of these set by default? Particularly the database character set.

 

My mySQL is already set as (actually copied this straight from phpMyAdmin) utf8_unicode_ci Unicode (multilingual), case-insensitive

 

Would I then have to establish a charset and collation for the database?

 

Are these actions needed each time, here is a list I gathered up while reading.

 

Added to inbetween the head tags

<meta http-equiv="content-type" content="text/html; charset=utf-8"> -> Page 416-417

 

Must be the first line before any HTML, must be a php script page

header('Content-Type: text/html; charset=UTF-8'); -> Page 452 (and more information in Chapter 14)

 

Mysql Client

CREATE DATABASE forum2 CHARACTER SET utf8; -> Page 444

 

mysql_connection

mysqli_set_charset($dbc, 'utf8'); or mysqli_query($dbc, 'SET NAMES utf8'); -> Page 450

 

 

 

Any one with some insight on this great appreciated with any comments.

 

-Mark

Link to comment
Share on other sites

I've made a number of Japanese DBs, and here's what I've observed:

1) MySQL DBs will not have their charset/collation set to UTF-8 by default. Whenever I create a new DB, I always manually set the collation to utf8_general_ci.

2) If the DB collation is set to utf8_general_ci, then all text fields in the tables in the DB should automatically default to the same collation. If this isn't the case, just manually override any fields that you need UTF-8 for.

3) In order for a PHP script to be able to handle UTF-8 data sent to it, you must either use the HTML <meta charset="UTF-8"> element or execute the header('Content-Type: text/html; charset=UTF-8') command. The alternative is to edit your php.ini file so that the default charset is UTF-8.

4) In order to get the DB connection to properly send UTF-8 data back and forth, you need to execute the mysqli_set_charset($dbc, 'utf8') command. Be careful, as the 'utf8' argument is case sensitive. I believe it's also possible to set the default charset in the php.ini file, but I'm not sure about that.

 

If you do those four things, then all of your non-English/multilingual scripts/DBs should work fine.

 

Edit: It was kinda vague in my original post, but I should differentiate between charset and collation. For a clear explanation, please see the following:

http://stackoverflow.com/questions/341273/what-does-character-set-and-collation-mean-exactly

  • Upvote 2
Link to comment
Share on other sites

Many thanks for your observation notes Hartley.

 

I had to read that great example twice per stackoverflow - but i finally go it.

 

I've made a number of Japanese DBs, and here's what I've observed:

3) In order for a PHP script to be able to handle UTF-8 data sent to it, you must either use the HTML <meta charset="UTF-8"> element or execute the header('Content-Type: text/html; charset=UTF-

 

Page 452 - 453, Or using both is fine I guess.

 

Either way, this satisfies my question.

Link to comment
Share on other sites

Hartley on page 448, there are instructions for inserting languages.

 

How do we add Francais, Greek, Portuguese, an Japanese to the table? It's not taking them because they are in a different language. I copy pasted them off the web, don't know if that's the reason why.

 

I have taken these steps in my mysql client (shell).

 

(1) mysql > CHARSET UTF8;

 

(2) have altered all my tables to utf8 encoding

 

 

INSERT INTO languages (lang, lang_eng) VALUES

('English', 'English'),

('Português', 'Portuguese'),

('Français', 'French'),

('Norsk', 'Norwegian'),

('Romanian', 'Romanian'),

('ελληνικά', 'Greek'),

('Deutsch', 'German'),

('Srpski', 'Serbian'),

('日本国', Japanese),

('Nederlands', 'Dutch')

 

 

Thanks,

Mark

Link to comment
Share on other sites

I'm actually using the mysql client through the terminal command line.

 

like so:

 

mysql > INSERT INTO languages (lang, lang_eng) VALUES

('English', 'English'),

('Português', 'Portuguese'),

('Français', 'French'),

('Norsk', 'Norwegian'),

('Romanian', 'Romanian'),

('ελληνικά', 'Greek'),

('Deutsch', 'German'),

('Srpski', 'Serbian'),

('日本国', Japanese),

('Nederlands', 'Dutch')

 

But when I get to the first instance of a different encoding, it doesn't paste correctly.

Link to comment
Share on other sites

Here I've taken some screenshots of the database and table. It is strange that Larry has an example with the mysql client using the terminal. I am not sure how he go the accents to work or if special keys were required (or a copy and paste?). It shows on page 448 in plain site that he used this method.

 

I hope the tables help, I am trying to locate that the tables are in fact using UTF8, but couldn't find it. Thanks.

 

tableview.png

 

dbview.png

Link to comment
Share on other sites

Hey Hartley, well i resorted to just inputting the languages straight into mysqlPHPadmin by copy pasting portuguese. This worked fine. I also use a command to list the columns in the database, and the accented portuguese showed fine. Not sure how Larry did it, it might have to do with a locale configuration script within ssh.

 

See below:

 

tencode.png

 

encoding.png

Link to comment
Share on other sites

 Share

×
×
  • Create New...