1

Group sum and count with two unique columns in Python

I have a dataset where I would like to groupby two column, sum and take the count of these values.

Data

source  ex  pw  role    date
aa          10  hello   q222
aa          10  hello   q222
        bb  15  ok      q422
        bb  5   no      q422
        bb  1   sure    q422
        bb  4   yes     q422

Desired

source  ex  pw  count   date
aa          20  2       q222
        bb  25  4       q422

Doing

#df.groupby(['source','date'])['pw'].agg(['count','sum'])
df.groupby(['ex','date'])['pw'].agg(['count','sum'])

However, with this, I have to now perform a concatenation to merge the two outputs. Any suggestion is appreciated

Submitted August 16th 2021 by Admin

Answers
0

Try groupby with new key create with fillna

out = df.groupby([df.source.fillna(df.ex),df.date]).agg({'source':'first', 'ex':'first', 'pw':'sum', 'role':'count', 'date':'first'}).reset_index(drop=True)
Out[489]: source ex pw role date
0 aa None 20 2 q222
1 None bb 25 4 q422

Admin | 2 months ago


0

use groupby() with dropna=False + rename():

out=(df.groupby(['source','ex','date'],dropna=False)['pw'].agg(['count','sum']) .reset_index().rename(columns={'sum':'pw'}))

OR

groupby() with dropna=False and aggregration with named tuples:

out=(df.groupby(['source','ex'],dropna=False) .agg(pw=('pw','sum'),count=('pw','count'),date=('date','first')) .reset_index())

output of out:

 source ex date count pw
0 aa NaN q222 2 20
1 NaN bb q422 4 25

Note: If empty values are '' so use df=df.replace('',np.nan)

Admin | 2 months ago



Relevant Questions