Asked  6 Months ago    Answers:  5   Viewed   40 times

How can I convert a DataFrame column of strings (in dd/mm/yyyy format) to datetimes?

 Answers

87

The easiest way is to use to_datetime:

df['col'] = pd.to_datetime(df['col'])

It also offers a dayfirst argument for European times (but beware this isn't strict).

Here it is in action:

In [11]: pd.to_datetime(pd.Series(['05/23/2005']))
Out[11]:
0   2005-05-23 00:00:00
dtype: datetime64[ns]

You can pass a specific format:

In [12]: pd.to_datetime(pd.Series(['05/23/2005']), format="%m/%d/%Y")
Out[12]:
0   2005-05-23
dtype: datetime64[ns]
Tuesday, June 1, 2021
 
VieStar
answered 6 Months ago
29

Use ggplot and aes_string. Something like this:

ggplot(data = df, aes_string(x = colname)) + geom_histogram()

aes_string was written precisely for this purpose.

Tuesday, August 3, 2021
 
leetwinski
answered 4 Months ago
92

It means you have an extra space. Though pd.to_datetime is very good at parsing dates normally without any format specified, when you actually specify a format, it has to match EXACTLY.

You can likely solve your issue by adding .str.strip() to remove the extra whitespace before converting.

import pandas as pd
df['Time stamp'] = pd.to_datetime(df['Time stamp'].str.strip(), format='%d/%m/%Y')

Alternatively, you can take advantage of its ability to parse various formats of dates by using the dayfirst=True argument

df['Time stamp'] = pd.to_datetime(df['Time stamp'], dayfirst=True)

Example:

import pandas as pd
df = pd.DataFrame({'Time stamp': ['01/02/1988', '01/02/1988 ']})

pd.to_datetime(df['Time stamp'], format= '%d/%m/%Y')

ValueError: unconverted data remains:

pd.to_datetime(df['Time stamp'].str.strip(), format='%d/%m/%Y')
#0   1988-02-01
#1   1988-02-01
#Name: Time stamp, dtype: datetime64[ns]

pd.to_datetime(df['Time stamp'], dayfirst=True)
#0   1988-02-01
#1   1988-02-01
#Name: Time stamp, dtype: datetime64[ns]
Monday, August 23, 2021
 
xosp7tom
answered 4 Months ago
82

Use:

df['date'] = pd.to_datetime(df['date'].str[:-2] + '19' + df['date'].str[-2:])

Another solution with replace:

df['date'] = pd.to_datetime(df['date'].str.replace(r'-(d+)$', r'-191'))

Sample:

print (df)
       date
0  01-06-70
1  01-06-69
2  01-06-68
3  01-06-67

df['date'] = pd.to_datetime(df['date'].str.replace(r'-(d+)$', r'-191'))
print (df)
        date
0 1970-01-06
1 1969-01-06
2 1968-01-06
3 1967-01-06
Sunday, August 29, 2021
 
juherr
answered 3 Months ago
39

You need remove times first by this solutions:

df = df[df.index.normalize().isin(['2016-04-25', '2016-04-26'])]

df = df[df.index.floor('D').isin(['2016-04-25', '2016-04-26'])]

Another solution is compare DatetimeIndex.date, but necessary use numpy.in1d instead isin:

df = df[np.in1d(df.index.date, pd.to_datetime(['2016-04-25', '2016-04-26']).date)]

Or compare strings created DatetimeIndex.strftime:

df = df[np.in1d(df.index.strftime('%Y-%m-%d'), ['2016-04-25', '2016-04-26'])]

print (df)
                              A           B
2016-04-25 18:50:06  440.967796  201.049600
2016-04-25 18:50:13  441.054995  200.767034
2016-04-25 18:50:20  441.142337  200.484475
Friday, October 15, 2021
 
Nicolas Le Thierry d'Ennequin
answered 2 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share