Asked  7 Months ago    Answers:  5   Viewed   46 times

So I have an array of strings, and all of the strings are using the system default ANSI encoding and were pulled from a SQL database. So there are 256 different possible character byte values (single byte encoding).
Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like u0082?

Or is that the standard for JSON?

 Answers

35

Is there a way I can get json_encode() to work and display these characters instead of having to use utf8_encode() on all of my strings and ending up with stuff like "u0082"?

If you have an ANSI encoded string, using utf8_encode() is the wrong function to deal with this. You need to properly convert it from ANSI to UTF-8 first. That will certainly reduce the number of Unicode escape sequences like u0082 from the json output, but technically these sequences are valid for json, you must not fear them.

Converting ANSI to UTF-8 with PHP

json_encode works with UTF-8 encoded strings only. If you need to create valid json successfully from an ANSI encoded string, you need to re-encode/convert it to UTF-8 first. Then json_encode will just work as documented.

To convert an encoding from ANSI (more correctly I assume you have a Windows-1252 encoded string, which is popular but wrongly referred to as ANSI) to UTF-8 you can make use of the mb_convert_encoding() function:

$str = mb_convert_encoding($str, "UTF-8", "Windows-1252");

Another function in PHP that can convert the encoding / charset of a string is called iconv based on libiconv. You can use it as well:

$str = iconv("CP1252", "UTF-8", $str);

Note on utf8_encode()

utf8_encode() does only work for Latin-1, not for ANSI. So you will destroy part of your characters inside that string when you run it through that function.


Related: What is ANSI format?


For a more fine-grained control of what json_encode() returns, see the list of predifined constants (PHP version dependent, incl. PHP 5.4, some constants remain undocumented and are available in the source code only so far).

Changing the encoding of an array/iteratively (PDO comment)

As you wrote in a comment that you have problems to apply the function onto an array, here is some code example. It's always needed to first change the encoding before using json_encode. That's just a standard array operation, for the simpler case of pdo::fetch() a foreach iteration:

while($row = $q->fetch(PDO::FETCH_ASSOC))
{
  foreach($row as &$value)
  {
    $value = mb_convert_encoding($value, "UTF-8", "Windows-1252");
  }
  unset($value); # safety: remove reference
  $items[] = array_map('utf8_encode', $row );
}
Wednesday, March 31, 2021
 
Terry
answered 7 Months ago
43

I managed to figure it out. It's not really the solution I wanted but it works. I had to adjust my query to look like:

CONVERT(CAST(langselect as BINARY) USING latin1) as langselect
Wednesday, March 31, 2021
 
Viralk
answered 7 Months ago
62

This is an encoding issue. It looks like at some point, the data gets represented as ISO-8859-1.

Every part of your process needs to be UTF-8 encoded.

  • The database connection

  • The database tables

  • Your PHP file (if you are using special characters inside that file as shown in your example above)

  • The content-type headers that you output

Wednesday, June 23, 2021
 
Juriy
answered 4 Months ago
36

That depends on where you use it...

The name of the encoding is UTF-8.

A dash is not valid to use everywhere, so for example in .NET framework the property of the System.Text.Encoding class that returns an instance of the UTF8Encoding class that handles the UTF-8 encoding is named UTF8.

Monday, August 2, 2021
 
Skipper
answered 3 Months ago
100

Check out Joel Spolsky's The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

EDIT 20140523: Also, watch Characters, Symbols and the Unicode Miracle by Tom Scott on YouTube - it's just under ten minutes, and a wonderful explanation of the brilliant 'hack' that is UTF-8

Saturday, September 4, 2021
 
Xun Yang
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :