Asked  7 Months ago    Answers:  5   Viewed   40 times

In the "PHP Cookbook", they say (p.589) that to properly set the char encoding of outgoing data to utf-8 it is necessary to edit the default_encoding configuration to utf-8.

However, I cannot find this configuration in php.ini. Should I simply add a line that would say default_encoding = "utf-8"?

I do have a ;default_charset = "iso-8859-1" . As you can see (;), right now it is not activated. Should I remove the semi-colon and set it to "utf-8"? Does that take care of the default encoding?

I also found other encoding directives that I don't know what to do about:

[iconv]
;iconv.input_encoding = ISO-8859-1
;iconv.internal_encoding = ISO-8859-1
;iconv.output_encoding = ISO-8859-1
...
; http://php.net/exif.encode-unicode
;exif.encode_unicode = ISO-8859-15
...
;mssql.charset = "ISO-8859-1"
...
;exif.encode_unicode = ISO-8859-15

Is there any reason why I shouldn't simply replace them all with utf-8?

 Answers

69

You should set your default_charset to UTF-8:

default_charset = "utf-8"

(PHP Cookbook may have a typo in it if they ask you to change the default_encoding — I've never heard of it.)

You'll also want to make sure that your webserver is set to output UTF-8 if you're going to outputting UTF-8 encoded characters. In Apache this can be set by in the httpd.conf file:

AddDefaultCharset UTF-8

As for modifying the iconv, exif, and mssql encoding settings, you probably don't need to set these (your settings have these commented out anyhow) but it's a good idea to change them all to UTF-8 anyhow.

Wednesday, March 31, 2021
 
aurelijusv
answered 7 Months ago
30

use quoted_printable_decode("YOUR String to decode"); OR imap_qprint("Your String to decode")

Check FIDDLE

Description : quoted_printable_decode — Convert a quoted-printable string to an 8 bit string

his function returns an 8-bit binary string corresponding to the decoded quoted printable string (according to » RFC2045, section 6.7, not » RFC2821, section 4.5.2, so additional periods are not stripped from the beginning of line).

More Info and here too

Wednesday, March 31, 2021
 
nhunston
answered 7 Months ago
10

In httpd.conf add (or change if it's already there):

AddDefaultCharset utf-8
Tuesday, June 1, 2021
 
fillobotto
answered 5 Months ago
25

Unfortunately encodings.aliases.aliases.keys() is NOT an appropriate answer.

aliases(as one would/should expect) contains several cases where different keys are mapped to the same value e.g. 1252 and windows_1252 are both mapped to cp1252. You could save time if instead of aliases.keys() you use set(aliases.values()).

BUT THERE'S A WORSE PROBLEM: aliases doesn't contain codecs that don't have aliases (like cp856, cp874, cp875, cp737, and koi8_u).

>>> from encodings.aliases import aliases
>>> def find(q):
...     return [(k,v) for k, v in aliases.items() if q in k or q in v]
...
>>> find('1252') # multiple aliases
[('1252', 'cp1252'), ('windows_1252', 'cp1252')]
>>> find('856') # no codepage 856 in aliases
[]
>>> find('koi8') # no koi8_u in aliases
[('cskoi8r', 'koi8_r')]
>>> 'x'.decode('cp856') # but cp856 is a valid codec
u'x'
>>> 'x'.decode('koi8_u') # but koi8_u is a valid codec
u'x'
>>>

It's also worth noting that however you obtain a full list of codecs, it may be a good idea to ignore the codecs that aren't about encoding/decoding character sets, but do some other transformation e.g. zlib, quopri, and base64.

Which brings us to the question of WHY you want to "try encoding bytes into many different encodings". If we know that, we may be able to steer you in the right direction.

For a start, that's ambiguous. One DEcodes bytes into unicode, and one ENcodes unicode into bytes. Which do you want to do?

What are you really trying to achieve: Are you trying to determine which codec to use to decode some incoming bytes, and plan to attempt this with all possible codecs? [note: latin1 will decode anything] Are you trying to determine the language of some unicode text by trying to encode it with all possible codecs? [note: utf8 will encode anything].

Wednesday, June 2, 2021
 
cbcp
answered 5 Months ago
60

That last character just isn't in the file (try viewing the source), which is why you don't see it.

I think you might be better off saving the PHP file as UTF-8 (in Notepad++ that options is available in Format -> Encode in UTF-8 without BOM), and inserting the actual characters in your PHP file (i.e. in Notepad++), rather than hacking around with inserting à everywhere. You may find Windows Character Map useful for inserting unicode characters.

Friday, October 1, 2021
 
docaholic
answered 3 Weeks ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :