Asked  7 Months ago    Answers:  5   Viewed   45 times

I have encountered a similar problem described here (and in other places) - where as on an ajax callback I get a xmlhttp.responseText that seems ok (when I alert it - it shows the right text) - but when using an 'if' statement to compare it to the string - it returns false.

(I am also the one who wrote the server-side code returning that string) - after much studying the string - I've discovered that the string had an "invisible character" as its first character. A character that was not shown. If I copied it to Notepad - then deleted the first character - it won't delete until pressing Delete again.

I did a charCodeAt(0) for the returned string in xmlhttp.responseText. And it returned 65279.

Googling it reveals that it is some sort of a UTF-8 control character that is supposed to set "big-endian" or "small-endian" encoding.

So, now I know the cause of the problem - but... why does that character is being echoed? In the source php I simply use

echo 'the string'...

and it apparently somehow outputs [chr(65279)]the string...

Why? And how can I avoid it?

 Answers

50

To conclude, and specify the solution:

Windows Notepad adds the BOM character (the 3 bytes: EF BB BF) to files saved with utf-8 encoding.

PHP doesn't seem to be bothered by it - unless you include one php file into another - then things get messy and strings gets displayed with character(65279) prepended to them.

You can edit the file with another text editor such as Notepad++ and use the encoding
"Encode in UTF-8 without BOM",
and this seems to fix the problem.

Also, you can save the other php file with ANSI encoding in notepad - and this also seem to work (that is, in case you actually don't use any extended characters in the file, I guess...)

Wednesday, March 31, 2021
 
pwaring
answered 7 Months ago
44
  • mb_internal_encoding('UTF-8') doesn't do anything by itself, it only sets the default encoding parameter for each mb_ function. If you're not using any mb_ function, it doesn't make any difference. If you are, it makes sense to set it so you don't have to pass the $encoding parameter each time individually.
  • IMO mb_detect_encoding is mostly useless since it's fundamentally impossible to accurately detect the encoding of unknown text. You should either know what encoding a blob of text is in because you have a specification about it, or you need to parse appropriate meta data like headers or meta tags where the encoding is specified.
  • Using mb_check_encoding to check if a blob of text is valid in the encoding you expect it to be in is typically sufficient. If it's not, discard it and throw an appropriate error.
  • Regarding:

    does this mean I have to use all multi byte functions instead of its core functions

    If you are manipulating strings that contain multibyte characters, then yes, you need to use the mb_ functions to avoid getting wrong results. The core string functions only work on a byte level, not a character level, which is what you typically want when working with strings.

  • utf8_general_ci vs. utf8_bin only makes a difference when collating, i.e. sorting and comparing strings. With utf8_bin data is treated in binary form, i.e. only identical data is identical. With utf8_general_ci some logic is applied, e.g. "é" sorts together with "e" and upper case is considered equal to lower case.
Wednesday, March 31, 2021
 
ammezie
answered 7 Months ago
90

I did a little more googling and came across this page

The command doesn't seem to make sense but I tried it anyway:

In the file /usr/share/phpmyadmin/libraries/dbi/mysqli.dbi.lib.php at the end of function PMA_DBI_connect() just before the return statement I added:

mysqli_query($link, "SET SESSION CHARACTER_SET_RESULTS =latin1;");
mysqli_query($link, "SET SESSION CHARACTER_SET_CLIENT =latin1;");

And it works! I now see Japanese characters in phpMyAdmin. WTF? Why does this work?

Friday, June 4, 2021
 
matthy
answered 5 Months ago
30

You need to authenticate the user somehow.

Your user needs to be authenticated with a username and a password.

PHP session can be used to remember, and you should use a database table or a text file on the server to store file ownership information.

Then, before unlinking anything, your logic should make sure that the currently "authenticated" user is the owner of the file.

Tuesday, August 17, 2021
 
Nickool
answered 2 Months ago
11

Encoding is more than specifying the meta tag and content type - the files themselves must really be in the encoding you specify, or you'll get mojibake.

Check that everything is using UTF-8, your database, database connection, table columns. Check that any static files you are including are also encoded in UTF-8.

Sunday, August 22, 2021
 
FWH
answered 2 Months ago
FWH
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :