Asked  7 Months ago    Answers:  5   Viewed   26 times

I am able use UTF-8 characters just fine in my scripts.

As a matter of fact it is possible to have names of variables and functions contain Unicode characters.

There is also the mb_string extension which deals with multi-byte strings, yet in countless articles PHP is criticized for its lack of Unicode support.

I don't get it; why is PHP said to not support Unicode?



When PHP was started several years ago, UTF-8 was not really supported. We are talking about a time when non-Unicode OS like Windows 98/Me was still current and when other big languages like Delphi were also non-Unicode. Not all languages were designed with Unicode in mind from day 1, and completely changing your language to Unicode without breaking a lot of stuff is hard. Delphi only became Unicode compatible a year or two ago for example, while other languages like Java or C# were designed in Unicode from Day 1.

So when PHP grew and became PHP 3, PHP 4 and now PHP 5, simply no one decided to add Unicode. Why? Presumably to keep compatible with existing scripts or because utf8_de/encode and mb_string already existed and work. I do not know for sure, but I strongly believe that it has something to do with organic growth. Features do not simply exist by default, they have to be written by someone, and that simply did not happen for PHP yet.

Edit: Ok, I read the question wrong. The question is: How are strings stored internally? If I type in "Währung" or "Écriture", which Encoding is used to create the bytes used? In case of PHP, it is ASCII with a Codepage. That means: If I encode the string using ISO-8859-15 and you decode it with some chinese codepage, you will get weird results. The alternative is in languages like C# or Java where everything is stored as Unicode, which means: There is no codepage anymore, and theoretically you cannot mess up. I recommend Joel's article about Unicode and Character Sets, but essentially it boils down to: How are strings stored internally, and the answer with PHP is "Not in Unicode", which means that you have to be very careful and explicit when processing strings to make sure to always keep the string in the proper encoding during input, storage (database) and output, which is very errorprone.

Wednesday, March 31, 2021
answered 7 Months ago

It can't currently be done on Windows (possibly PHP 5.4 will support this scenario). In PHP, you can only write filenames using the Windows set codepage. If the codepage, does not include the character ?, you cannot use it. Worse, if you have a file on Windows with such character in its filename, you'll have trouble accessing it.

In Linux, at least with ext*, it's a different story. You can use whatever filenames you want, the OS doesn't care about the encoding. So if you consistently use filenames in UTF-8, you should be OK. UTF-16 is however excluded because filenames cannot include bytes with value 0.

Tuesday, July 6, 2021
answered 4 Months ago

Try mb_convert_encoding() with the "to" encoding as 'HTML-ENTITIES', and (if necessary) the "from" encoding set to 'UTF-8' or whichever Unicode encoding you're using.

Thursday, July 29, 2021
answered 3 Months ago

It reads the existing .config file that was used for an old kernel and prompts the user for options in the current kernel source that are not found in the file. This is useful when taking an existing configuration and moving it to a new kernel.

Saturday, July 31, 2021
answered 3 Months ago

In (at least) Ubuntu when using bash, it tells you what package you need to install if you type in a command and its not found in your path. My terminal says you need to install 'texinfo' package.

sudo apt-get install texinfo
Friday, September 3, 2021
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :