Asked  7 Months ago    Answers:  5   Viewed   45 times

I am generating XML using PHP library as below:

$dom = new DOMDocument("1.0","utf-8");

Doing above results in a page which shows a message on top of the output.

This page contains the following errors: error on line 16 at column 274505: PCDATA invalid Char value 27 Below is a rendering of the page up to the first error.

I have tried rectifying using Tidy library.. used iconv to get the chinese character in UTF-8.

 Answers

10

A useful function to get rid of that error is suggested on this website. http://www.phpwact.org/php/i18n/charsets#common_problem_areas_with_utf-8

When you put utf-8 encoded strings in a XML document you should remember that not all utf-8 valid chars are accepted in a XML document http://www.w3.org/TR/REC-xml/#charsets

So you should strip away the unwanted chars, else you’ll have an XML fatal parsing error such as above

function utf8_for_xml($string)
{
    return preg_replace ('/[^x{0009}x{000a}x{000d}x{0020}-x{D7FF}x{E000}-x{FFFD}]+/u', ' ', $string);
}

Hope that saves someone else some time..

Wednesday, March 31, 2021
 
DilbertDave
answered 7 Months ago
90
header('Content-type: text/html; charset=UTF-8') ;

/**
 * Encodes HTML safely for UTF-8. Use instead of htmlentities. 
 *
 * @param string $var 
 * @return string 
 */
function html_encode($var)
{
    return htmlentities($var, ENT_QUOTES, 'UTF-8');
}

Those two rescued me and I think it is now working. I'll come back if I continue to encounter problems. Should I store it in the DB, eg as "&" or as "&"?

Wednesday, March 31, 2021
 
ManojGeek
answered 7 Months ago
38

If you are 100% sure $message contain ISO-8859-1 you can use utf8_encode as David says. Otherwise use mb_detect_encoding and mb_convert_encoding on $message.

Also take note that

$mail -> charSet = "UTF-8"; 

Should be replaced by:

$mail->CharSet = 'UTF-8';

And placed after the instantiation of the class (after the new). The properties are case sensitive! See the PHPMailer doc fot the list & exact spelling.

Also the default encoding of PHPMailer is 8bit which can be problematic with UTF-8 data. To fix this you can do:

$mail->Encoding = 'base64';

Take note that 'quoted-printable' would probably work too in these cases (and maybe even 'binary'). For more details you can read RFC1341 - Content-Transfer-Encoding Header Field.

Wednesday, March 31, 2021
 
TMichel
answered 7 Months ago
67

The file that you've posted has a single space character before the PHPExcel output... check your script to see where this might be sent to the php://output stream. Check that there's no space before your initial <?php opening tag; watch out in particular for ?> <?php or similar closing/opening tags. And also check any files that might be included by your script

Wednesday, March 31, 2021
 
Novalirium
answered 7 Months ago
14

XML can handle just about any character, but there are ranges, control codes and such, that it won't.

Your best bet, if you can't get them to fix their output, is to sanitize the raw data you're receiving. You need replace illegal characters with the character reference format you noted.

(You can't even resort to CDATA, as there is no way to escape these characters there.)

Thursday, July 29, 2021
 
Lorav
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :