Asked  7 Months ago    Answers:  5   Viewed   53 times

How can I load a CSV file into a System.Data.DataTable, creating the datatable based on the CSV file?

Does the regular ADO.net functionality allow this?

 Answers

46

Here's an excellent class that will copy CSV data into a datatable using the structure of the data to create the DataTable:

A portable and efficient generic parser for flat files

It's easy to configure and easy to use. I urge you to take a look.

Tuesday, June 1, 2021
 
laurent
answered 7 Months ago
44

When testing for performance, I would include a test case for:

std::vector<BYTE> readFile(const char* filename)
{
    // open the file:
    std::ifstream file(filename, std::ios::binary);

    // Stop eating new lines in binary mode!!!
    file.unsetf(std::ios::skipws);

    // get its size:
    std::streampos fileSize;

    file.seekg(0, std::ios::end);
    fileSize = file.tellg();
    file.seekg(0, std::ios::beg);

    // reserve capacity
    std::vector<BYTE> vec;
    vec.reserve(fileSize);

    // read the data:
    vec.insert(vec.begin(),
               std::istream_iterator<BYTE>(file),
               std::istream_iterator<BYTE>());

    return vec;
}

My thinking is that the constructor of Method 1 touches the elements in the vector, and then the read touches each element again.

Method 2 and Method 3 look most promising, but could suffer one or more resize's. Hence the reason to reserve before reading or inserting.

I would also test with std::copy:

...
std::vector<byte> vec;
vec.reserve(fileSize);

std::copy(std::istream_iterator<BYTE>(file),
          std::istream_iterator<BYTE>(),
          std::back_inserter(vec));

In the end, I think the best solution will avoid operator >> from istream_iterator (and all the overhead and goodness from operator >> trying to interpret binary data). But I don't know what to use that allows you to directly copy the data into the vector.

Finally, my testing with binary data is showing ios::binary is not being honored. Hence the reason for noskipws from <iomanip>.

Thursday, June 3, 2021
 
francadaval
answered 7 Months ago
27

As it says in the documentation,

In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer.

And you can see by looking at the implementation of the csv module (line 784) that csv.reader calls the next() method of the underlyling iterator (via PyIter_Next).

So if you really want unbuffered reading of CSV files, you need to convert the file object (here sys.stdin) into an iterator whose next() method actually calls readline() instead. This can easily be done using the two-argument form of the iter function. So change the code in test_reader.py to something like this:

for row in csv.reader(iter(sys.stdin.readline, '')):
    print("Read: ({}) {!r}".format(time.time(), row))

For example,

$ python test_writer.py | python test_reader.py
Read: (1388776652.964925) ['R0', '$']
Read: (1388776653.466134) ['R1', '$$']
Read: (1388776653.967327) ['R2', '$$$']
Read: (1388776654.468532) ['R3', '$$$$']
[etc]

Can you explain why you need unbuffered reading of CSV files? There might be a better solution to whatever it is you are trying to do.

Sunday, August 1, 2021
 
jasu
answered 5 Months ago
56

I was able to get this to work by adding a DataTable row and filling it in explicitly, instead of trying to add a CsvHelper record as a row.

I used the following part instead of the similar part that is shown above:

foreach (var record in records)
{
    DataRow row = dt.NewRow();
    record.CompanyId = company.Id;
    row["Date"] = record.Date;
    row["Close"] = record.Close;
    row["AdjClose"] = record.AdjClose;
    row["High"] = record.High;
    row["Low"] = record.Low;
    row["Open"] = record.Open;
    row["Volume"] = record.Volume;
    row["CompanyId"] = record.CompanyId;
    dt.Rows.Add(row);
}

If you can solve the issue without so much hard coding, I will accept your answer as the answer.

Thursday, August 12, 2021
 
MikeMcClatchie
answered 4 Months ago
23

I don't know if that what are you looking for :

string s = "Id,Name ,Deptrn1,Mike,ITrn2,Joe,HRrn3,Peter,ITrn";
        DataTable dt = new DataTable();

        string[] tableData = s.Split("rn".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
        var col = from cl in tableData[0].Split(",".ToCharArray())
                  select new DataColumn(cl);
        dt.Columns.AddRange(col.ToArray());

        (from st in tableData.Skip(1)
         select dt.Rows.Add(st.Split(",".ToCharArray()))).ToList();
Tuesday, November 9, 2021
 
talkhabi
answered 4 Weeks ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share