Asked  7 Months ago    Answers:  5   Viewed   38 times
$ser = 'a:2:{i:0;s:5:"héllö";i:1;s:5:"wörld";}'; // fails
$ser2 = 'a:2:{i:0;s:5:"hello";i:1;s:5:"world";}'; // works
$out = unserialize($ser);
$out2 = unserialize($ser2);
print_r($out);
print_r($out2);
echo "<hr>";

But why?
Should I encode before serialzing than? How?

I am using Javascript to write the serialized string to a hidden field, than PHP's $_POST
In JS I have something like:

function writeImgData() {
    var caption_arr = new Array();
    $('.album img').each(function(index) {
         caption_arr.push($(this).attr('alt'));
    });
    $("#hidden-field").attr("value", serializeArray(caption_arr));
};

 Answers

42

The reason why unserialize() fails with:

$ser = 'a:2:{i:0;s:5:"héllö";i:1;s:5:"wörld";}';

Is because the length for héllö and wörld are wrong, since PHP doesn't correctly handle multi-byte strings natively:

echo strlen('héllö'); // 7
echo strlen('wörld'); // 6

However if you try to unserialize() the following correct string:

$ser = 'a:2:{i:0;s:7:"héllö";i:1;s:6:"wörld";}';

echo '<pre>';
print_r(unserialize($ser));
echo '</pre>';

It works:

Array
(
    [0] => héllö
    [1] => wörld
)

If you use PHP serialize() it should correctly compute the lengths of multi-byte string indexes.

On the other hand, if you want to work with serialized data in multiple (programming) languages you should forget it and move to something like JSON, which is way more standardized.

Wednesday, March 31, 2021
 
KHM
answered 7 Months ago
KHM
38

If you are 100% sure $message contain ISO-8859-1 you can use utf8_encode as David says. Otherwise use mb_detect_encoding and mb_convert_encoding on $message.

Also take note that

$mail -> charSet = "UTF-8"; 

Should be replaced by:

$mail->CharSet = 'UTF-8';

And placed after the instantiation of the class (after the new). The properties are case sensitive! See the PHPMailer doc fot the list & exact spelling.

Also the default encoding of PHPMailer is 8bit which can be problematic with UTF-8 data. To fix this you can do:

$mail->Encoding = 'base64';

Take note that 'quoted-printable' would probably work too in these cases (and maybe even 'binary'). For more details you can read RFC1341 - Content-Transfer-Encoding Header Field.

Wednesday, March 31, 2021
 
TMichel
answered 7 Months ago
53

It seems that those kind of chars aren't allowed to be part of the "local part" of the email address http://en.wikipedia.org/wiki/E-mail_address#Local_part.

Friday, May 28, 2021
 
PedroKTFC
answered 5 Months ago
69

You probably can't.

Exchange doesn't seem to implement charset aware searching for IMAP, and doing so is not a requirement of RFC3501 (only US-ASCII must be supported). UTF-8 is usually supported, but this does not seem to be the case for Exchange.

You would have to switch protocols (EAS, EWS, REST services, etc.) or pull down the information, decode it yourself, and search it. If you cache it, this isn't even too bad long term. Since it's headers, you can get this all in one fetch. If you need to search bodies, the case is much harder.

Saturday, May 29, 2021
 
Arman
answered 5 Months ago
26

You can try to make a function to create your regex expression based on your txt_search, replacing any possible match to all possible matches like this:

function search_term($txt_search) {
    $search = preg_quote($txt_search);

    $search = preg_replace('/[aàáâãåäæ]/iu', '[aàáâãåäæ]', $search);
    $search = preg_replace('/[eèéêë]/iu', '[eèéêë]', $search);
    $search = preg_replace('/[iìíîï]/iu', '[iìíîï]', $search);
    $search = preg_replace('/[oòóôõöø]/iu', '[oòóôõöø]', $search);
    $search = preg_replace('/[uùúûü]/iu', '[uùúûü]', $search);
    // add any other character

    return $search;
}

Then you use the result as a regex on your preg_replace.

Thursday, August 5, 2021
 
bsd
answered 3 Months ago
bsd
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :