Asked  7 Months ago    Answers:  5   Viewed   60 times

I've got this very simple thing that just outputs some stuff in CSV format, but it's got to be UTF-8. I open this file in TextEdit or TextMate or Dreamweaver and it displays UTF-8 characters properly, but if I open it in Excel it's doing this silly íÄ kind of thing instead. Here's what I've got at the head of my document:

header("content-type:application/csv;charset=UTF-8");
header("Content-Disposition:attachment;filename="CHS.csv"");

This all seems to have the desired effect except Excel (Mac, 2008) doesn't want to import it properly. There's no options in Excel for me to "open as UTF-8" or anything, so … I'm getting a little annoyed.

I can't seem to find any clear solutions to this anywhere, despite a lot of people having the same problem. The thing I see the most is to include the BOM, but I can't exactly figure out how to do that. As you can see above I'm just echoing this data, I'm not writing any file. I can do that if I need to, I'm just not because there doesn't seem like a need for it at this point. Any help?

Update: I tried echoing the BOM as echo pack("CCC", 0xef, 0xbb, 0xbf); which I just pulled from a site that was trying to detect the BOM. But Excel just appends those three characters to the very first cell when it imports, and still messes up the special characters.

 Answers

97

To quote a Microsoft support engineer,

Excel for Mac does not currently support UTF-8

Update, 2017: This is true of all versions of Microsoft Excel for Mac before Office 2016. Newer versions (from Office 365) do now support UTF-8.

In order to output UTF-8 content that Excel both on Windows and OS X will be able to successfully read, you will need to do two things:

  1. Make sure that you convert your UTF-8 CSV text to UTF-16LE

    mb_convert_encoding($csv, 'UTF-16LE', 'UTF-8');
    
  2. Make sure that you add the UTF-16LE byte order mark to the start of the file

    chr(255) . chr(254)
    

The next problem that appears only with Excel on OS X (but not Windows) will be when viewing a CSV file with comma separated values, Excel will render rows only with one row and all of the text along with the commas in the first row.

The way to avoid this is to use tabs as your separated value.

I used this function from the PHP comments (using tabs "t" instead of commas) and it worked perfectly on OS X and Windows Excel.

Note that to fix an issue with an empty column as the end of a row, that I did have to change the line of code that says:

    $field_cnt = count($fields);

to

    $field_cnt = count($fields)-1;

As some of the other comments on this page say, other spreadsheet apps like OpenOffice Calc, Apple's own Numbers and Google Doc's Spreadsheet have no issues with UTF-8 files with commas.

See the table in this question for what works and doesn't work for Unicode CSV files in Excel


As a side note, I might add that if you are using Composer, you should have a look at adding LeagueCsv to your requires. LeagueCsv has a really nice API for building CSV files.

To use LeagueCsv with this method of creating CSV files, check out this example

Tuesday, June 1, 2021
 
michele
answered 7 Months ago
69

Well, I assume it's because the raw binary data includes the BOM. You could always remove the BOM yourself after decoding, if you don't want it - but you should consider whether the byte array should consider the BOM to start with.

EDIT: Alternatively, you could use a StreamReader to perform the decoding. Here's an example, showing the same byte array being converted into two characters using Encoding.GetString or one character via a StreamReader:

using System;
using System.IO;
using System.Text;

class Test
{
    static void Main()
    {
        byte[] withBom = { 0xef, 0xbb, 0xbf, 0x41 };
        string viaEncoding = Encoding.UTF8.GetString(withBom);
        Console.WriteLine(viaEncoding.Length);

        string viaStreamReader;
        using (StreamReader reader = new StreamReader
               (new MemoryStream(withBom), Encoding.UTF8))
        {
            viaStreamReader = reader.ReadToEnd();           
        }
        Console.WriteLine(viaStreamReader.Length);
    }
}
Wednesday, June 9, 2021
 
Domiik
answered 7 Months ago
76

You've got a Unicode UTF-8 BOM at the start of the file:

http://en.wikipedia.org/wiki/Byte_order_mark

A text editor or web browser interpreting the text as ISO-8859-1 or CP1252 will display the characters  for this

R is giving you the ï and then converting the other two into dots as they are non-alphanumeric characters.

Here:

http://r.789695.n4.nabble.com/Writing-Unicode-Text-into-Text-File-from-R-in-Windows-td4684693.html

Duncan Murdoch suggests:

You can declare a file to be in encoding "UTF-8-BOM" if you want to ignore a BOM on input

So try your read.csv with fileEncoding="UTF-8-BOM" or persuade your SQL wotsit to not output a BOM.

Otherwise you may as well test if the first name starts with ï.. and strip it with substr (as long as you know you'll never have a column that does start like that genuinely...)

Friday, June 11, 2021
 
SheppardDigital
answered 6 Months ago
82

This isn't really the job of phpMyAdmin, a GUI for MySQL beginners.

Put the query in a script, in a loop that runs 1,000,000 times.

Though that's not a very good benchmark of anything. If you're trying to simulate real demand, you need to have some concurrent activity, not just 1,000,000 queries issued and returned one at a time.

Saturday, August 28, 2021
 
Giorgi Peikrishvili
answered 4 Months ago
72

you can use struct.pack for this

>>> a =  [67, 97, 102, -61, -87, 32, 70, 108, 111, 114, 97]
>>> struct.pack("b"*len(a),*a)
'Cafxc3xa9 Flora'
>>> print struct.pack("b"*len(a),*a).decode('utf8')
Café Flora
Sunday, November 14, 2021
 
Luis Masuelli
answered 4 Weeks ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share