Group sum and count with two unique columns in Python

I have a dataset where I would like to groupby two column, sum and take the count of these values.


source  ex  pw  role    date
aa          10  hello   q222
aa          10  hello   q222
        bb  15  ok      q422
        bb  5   no      q422
        bb  1   sure    q422
        bb  4   yes     q422


source  ex  pw  count   date
aa          20  2       q222
        bb  25  4       q422



However, with this, I have to now perform a concatenation to merge the two outputs. Any suggestion is appreciated

Submitted August 16th 2021 by Admin


Try groupby with new key create with fillna

out = df.groupby([df.source.fillna(df.ex),df.date]).agg({'source':'first', 'ex':'first', 'pw':'sum', 'role':'count', 'date':'first'}).reset_index(drop=True)
Out[489]: source ex pw role date
0 aa None 20 2 q222
1 None bb 25 4 q422

Admin | 2 months ago


use groupby() with dropna=False + rename():

out=(df.groupby(['source','ex','date'],dropna=False)['pw'].agg(['count','sum']) .reset_index().rename(columns={'sum':'pw'}))


groupby() with dropna=False and aggregration with named tuples:

out=(df.groupby(['source','ex'],dropna=False) .agg(pw=('pw','sum'),count=('pw','count'),date=('date','first')) .reset_index())

output of out:

 source ex date count pw
0 aa NaN q222 2 20
1 NaN bb q422 4 25

Note: If empty values are '' so use df=df.replace('',np.nan)

Admin | 2 months ago

Relevant Questions