Asked  6 Months ago    Answers:  5   Viewed   37 times

In order to test some functionality I would like to create a DataFrame from a string. Let's say my test data looks like:

TESTDATA="""col1;col2;col3
1;4.4;99
2;4.5;200
3;4.7;65
4;3.2;140
"""

What is the simplest way to read that data into a Pandas DataFrame?

 Answers

70

A simple way to do this is to use StringIO.StringIO (python2) or io.StringIO (python3) and pass that to the pandas.read_csv function. E.g:

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd

TESTDATA = StringIO("""col1;col2;col3
    1;4.4;99
    2;4.5;200
    3;4.7;65
    4;3.2;140
    """)

df = pd.read_csv(TESTDATA, sep=";")
Tuesday, June 1, 2021
 
mdevils
answered 6 Months ago
92
import pandas as pd

df = pd.read_csv('filex.csv')
df['A'] = df['A'].astype('str')
df['B'] = df['B'].astype('str')
mask = (df['A'].str.len() == 10) & (df['B'].str.len() == 10)
df = df.loc[mask]
print(df)

Applied to filex.csv:

A,B
123,abc
1234,abcd
1234567890,abcdefghij

the code above prints

            A           B
2  1234567890  abcdefghij
Wednesday, June 30, 2021
 
peixotorms
answered 5 Months ago
25

You can split the string manually:

>>> df['Tags'] = df.Tags.apply(lambda x: x[1:-1].split(','))
>>> df.Tags[0]
['Tag1', 'Tag2']
Wednesday, July 21, 2021
 
Alix
answered 5 Months ago
42

Your current iteration overwrites x twice every time it runs: the for loop assigns a customer name to x, and then you assign a dataframe to it.

To be able to call each dataframe later by name, try storing them in a dictionary:

df_dict = {name: df.loc[df['customer name'] == name] for name in customerNames}

df_dict['Name3']
Thursday, August 12, 2021
 
krs8785
answered 4 Months ago
27

The zip_longest function from itertools does this:

>>> import itertools, pandas
>>> pandas.DataFrame((_ for _ in itertools.zip_longest(*nest)), columns=['aa', 'bb', 'cc'])
    aa    bb    cc
0  aa1   bb1   cc1
1  aa2   bb2   cc2
2  aa3   bb3   cc3
3  aa4   bb4  None
4  aa5  None  None

If you have an older version of pandas you may need to wrap zip_longest in a list constructor. On older Python you may need to call izip_longest instead of zip_longest.

Monday, August 23, 2021
 
Daniel H
answered 3 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share