pandas//wiki č.cc

Misc. notes

value_counts() omits NaN values by default

Bad pandas

groupby.apply() omits the group index if the indices of the rows being returned are identical to the indices of the input.

Take the following 10 line dataframe:

In [1]: import pandas as pd
In [2]: import numpy as np

In [3]: F = pd.DataFrame({ "a" : np.r_[0:10], "b" : np.r_[0:10]//5 })

   a  b
0  0
1  0
2  0
3  0
4  0
5  1
6  1
7  1
8  1
9  1

If we group F by column b and take only the first two rows from each group, we get the group index b as the outer index, as expected:

In [4]: F.groupby("b").apply(lambda x : x.iloc[:2])
Out[4]:
     a  b
b
0 0  0  0
  1  1  0
1 5  5  1
  6  6  1

However, if we take all rows from each group, the outer group index is missing!

In [5]: F.groupby("b").apply(lambda x : x.iloc[:])
Out[5]:
   a  b
0  0
1  0
2  0
3  0
4  0
5  1
6  1
7  1
8  1
9  1

In [6]: F.groupby("b").apply(lambda x : x)
Out[6]:
   a  b
0  0
1  0
2  0
3  0
4  0
5  1
6  1
7  1
8  1
9  1

It seems that any function that preserves the order of the indices will cause the outer group index to disappear:

In [7]: F.groupby("b").apply(lambda x : x + 1)
Out[7]:
    a  b
 1  1
 2  1
 3  1
 4  1
 5  1
 6  2
 7  2
 8  2
 9  2
10  2

Only functions that disrupt the order of the indices cause the outer group index to appear:

In [8]: F.groupby("b").apply(lambda x : x.iloc[np.r_[0, 4, 1, 2, 3]])
Out[8]:
     a  b
b
0  0  0
4  0
1  0
2  0
3  0
5  5  1
9  1
6  1
7  1
8  1
  
In [9]: F.groupby("b").apply(lambda x : x.iloc[::-1])
Out[9]:
     a  b
b
4  4  0
3  0
2  0
1  0
0  0
9  9  1
8  1
7  1
6  1
5  1