Asked  7 Months ago    Answers:  5   Viewed   28 times

In both the bellow cases:

import pandas

d = {'col1': 2, 'col2': 2.5}
df = pandas.DataFrame(data=d, index=[0])

print(df['col2'])
print(df.col2)

Both methods can be used to index on a column and yield the same result, so is there any difference between them?

 Answers

63

The "dot notation", i.e. df.col2 is the attribute access that's exposed as a convenience.

You may access an index on a Series, column on a DataFrame, and an item on a Panel directly as an attribute:

df['col2'] does the same: it returns a pd.Series of the column.

A few caveats about attribute access:

  • you cannot add a column (df.new_col = x won't work, worse: it will silently actually create a new attribute rather than a column - think monkey-patching here)
  • it won't work if you have spaces in the column name or if the column name is an integer.
Tuesday, June 1, 2021
 
medhybrid
answered 7 Months ago
96

You can use boolean indexing:

df = pd.DataFrame({'Sales':[10,20,30,40,50], 'A':[3,4,7,6,1]})
print (df)
   A  Sales
0  3     10
1  4     20
2  7     30
3  6     40
4  1     50

s = 30

df1 = df[df['Sales'] >= s]
print (df1)
   A  Sales
2  7     30
3  6     40
4  1     50

df2 = df[df['Sales'] < s]
print (df2)
   A  Sales
0  3     10
1  4     20

It's also possible to invert mask by ~:

mask = df['Sales'] >= s
df1 = df[mask]
df2 = df[~mask]
print (df1)
   A  Sales
2  7     30
3  6     40
4  1     50

print (df2)
   A  Sales
0  3     10
1  4     20

print (mask)
0    False
1    False
2     True
3     True
4     True
Name: Sales, dtype: bool

print (~mask)
0     True
1     True
2    False
3    False
4    False
Name: Sales, dtype: bool
Friday, June 4, 2021
 
PandemoniumSyndicate
answered 7 Months ago
89

In the following situations, they behave the same:

  1. Selecting a single column (df['A'] is the same as df.loc[:, 'A'] -> selects column A)
  2. Selecting a list of columns (df[['A', 'B', 'C']] is the same as df.loc[:, ['A', 'B', 'C']] -> selects columns A, B and C)
  3. Slicing by rows (df[1:3] is the same as df.iloc[1:3] -> selects rows 1 and 2. Note, however, if you slice rows with loc, instead of iloc, you'll get rows 1, 2 and 3 assuming you have a RangeIndex. See details here.)

However, [] does not work in the following situations:

  1. You can select a single row with df.loc[row_label]
  2. You can select a list of rows with df.loc[[row_label1, row_label2]]
  3. You can slice columns with df.loc[:, 'A':'C']

These three cannot be done with []. More importantly, if your selection involves both rows and columns, then assignment becomes problematic.

df[1:3]['A'] = 5

This selects rows 1 and 2 then selects column 'A' of the returning object and assigns value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises SettingWithCopyWarning. The correct way of making this assignment is:

df.loc[1:3, 'A'] = 5

With .loc, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (df.loc[:, 'C':'F']), select a single row (df.loc[5]), and select a list of rows (df.loc[[1, 2, 5]]).

Also note that these two were not included in the API at the same time. .loc was added much later as a more powerful and explicit indexer. See unutbu's answer for more detail.


Note: Getting columns with [] vs . is a completely different topic. . is only there for convenience. It only allows accessing columns whose names are valid Python identifiers (i.e. they cannot contain spaces, they cannot be composed of numbers...). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment df.a = 1 won't work if there is no column a). Other than that, . and [] are the same.

Friday, June 11, 2021
 
jsuissa
answered 6 Months ago
55

std::pair provides pre-written constructors and comparison operators. This also allows them to be stored in containers like std::map without you needing to write, for example, the copy constructor or strict weak ordering via operator < (such as required by std::map). If you don't write them you can't make a mistake (remember how strict weak ordering works?) so it's more reliable just to use std::pair.

Thursday, June 24, 2021
 
Jauco
answered 6 Months ago
39

BrenBarn's answer works.

The following also worked via this thread, which isn't a troubleshooting so much as an articulation of how to reset the index:

test = test.reset_index(drop=True)
Wednesday, July 28, 2021
 
Sufi
answered 5 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :  
Share