Asked  7 Months ago    Answers:  5   Viewed   44 times

I have some texts in French (containing accented characters such as "é"), stored in a MySQL table whose collation is utf8_unicode_ci (both the table and the columns), that I want to output on an HTML5 page.

The HTML page charset is UTF-8 (< meta charset="utf-8" />) and the PHP files themselves are encoded as "UTF-8 without BOM" (I use Notepad++ on Windows). I use PHP5 to request the database and generate the HTML.

However, on the output page, the special characters (such as "é") appear garbled and are replaced by "?".

When I browse the database (via phpMyAdmin) those same accented characters display just fine.

What am I missing here?

(Note: changing the page encoding (through Firefox's "web developer" menu) to ISO-8859-1 solves the problem... except for the special characters that appears directly in the PHP files, which become now corrupted. But anyway, I'd rather understand why it doesn't work as UTF-8 than changing the encoding without understanding why it works. ^^;)

 Answers

54

I experienced that same problem before, and what I did are the following

1) Use notepad++(can almost adapt on any encoding) or eclipse and be sure in to save or open it in UTF-8 without BOM.

2) set the encoding in PHP header, using header('Content-type: text/html; charset=UTF-8');

3) remove any extra spaces on the start and end of my PHP files.

4) set all my table and columns encoding to utf8mb4_general_ci or utf8mb4_unicode_ci via PhpMyAdmin or any mySQL client you have. A comparison of the two encodings are available here

5) set mysql connection charset to UTF-8 (I use PDO for my database connection )

  PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"
  PDO::MYSQL_ATTR_INIT_COMMAND => "SET CHARACTER SET utf8"

or just execute the SQL queries before fetching any data

6) use a meta tag <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

7) use a certain language code for French <meta http-equiv="Content-language" content="fr" />

8) change the html element lang attribute to the desired language

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fr" lang="fr">

and will be updating this more because I really had a hard time solving this problem before because I was dealing with Japanese characters in my past projects

9) Some fonts are not available in the client PC, you need to use Google fonts to include it on your CSS

10) Don't end your PHP source file with ?>

NOTE:

but if everything I said above doesn't work, try to adjust your encoding depending on the character-set you really want to display, for me I set everything to SHIFT-JIS to display all my japanese characters and it really works fine. But using UFT-8 must be your priority

Wednesday, March 31, 2021
 
ioleo
answered 7 Months ago
60

This works like charm

<?php
$dir = 'D:wampwwwtestdataFolderé';
var_dump(file_exists((utf8_decode($dir))));
Wednesday, March 31, 2021
 
LOKESH
answered 7 Months ago
69

You probably can't.

Exchange doesn't seem to implement charset aware searching for IMAP, and doing so is not a requirement of RFC3501 (only US-ASCII must be supported). UTF-8 is usually supported, but this does not seem to be the case for Exchange.

You would have to switch protocols (EAS, EWS, REST services, etc.) or pull down the information, decode it yourself, and search it. If you cache it, this isn't even too bad long term. Since it's headers, you can get this all in one fetch. If you need to search bodies, the case is much harder.

Saturday, May 29, 2021
 
Arman
answered 5 Months ago
57

Quote your ambiguous or "special" table names with a back tick:

INSERT INTO `e!` ...

Or better, don't use special characters in table names to avoid such problems.

Friday, June 11, 2021
 
scessor
answered 5 Months ago
25

The below answers are basically taken from elsewhere. The key is getting your unwanted_array in the right format. You might want it as a list:

unwanted_array = list(    'Š'='S', 'š'='s', 'Ž'='Z', 'ž'='z', 'À'='A', 'Á'='A', 'Â'='A', 'Ã'='A', 'Ä'='A', 'Å'='A', 'Æ'='A', 'Ç'='C', 'È'='E', 'É'='E',
                            'Ê'='E', 'Ë'='E', 'Ì'='I', 'Í'='I', 'Î'='I', 'Ï'='I', 'Ñ'='N', 'Ò'='O', 'Ó'='O', 'Ô'='O', 'Õ'='O', 'Ö'='O', 'Ø'='O', 'Ù'='U',
                            'Ú'='U', 'Û'='U', 'Ü'='U', 'Ý'='Y', 'Þ'='B', 'ß'='Ss', 'à'='a', 'á'='a', 'â'='a', 'ã'='a', 'ä'='a', 'å'='a', 'æ'='a', 'ç'='c',
                            'è'='e', 'é'='e', 'ê'='e', 'ë'='e', 'ì'='i', 'í'='i', 'î'='i', 'ï'='i', 'ð'='o', 'ñ'='n', 'ò'='o', 'ó'='o', 'ô'='o', 'õ'='o',
                            'ö'='o', 'ø'='o', 'ù'='u', 'ú'='u', 'û'='u', 'ý'='y', 'ý'='y', 'þ'='b', 'ÿ'='y' )

You can do this easily with iconv or chartr:

> iconv(string, to='ASCII//TRANSLIT')
[1] "Holmer"

> chartr(paste(names(unwanted_array), collapse=''),
         paste(unwanted_array, collapse=''),
         string)
[1] "Holmer"

Otherwise you have to loop through all of replacements because mapply or similar wouldn't account for symbols already replaced by previous gsub operations.:

# the loop:
out <- string
for(i in seq_along(unwanted_array))
    out <- gsub(names(unwanted_array)[i],unwanted_array[i],out)

The result:

> out
[1] "Holmer"
Wednesday, July 28, 2021
 
mnagel
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :