1

how to merge dataframe based on two columns without order of columns?

I am running into the following issue. I want to merge two data frames on multiple columns. Example data frames are as follows: df1:

df1:
playerA   playerB  Weight
kim         lee      1
jackson     kim      3
dan         lee      4


df2:
name1   name2      score
dan         lee      11
dan         kim      23
jackson     kim      44
lee         kim      35
Jackson     lee      26
lee         dan      27

I want to merge this two dataframe based on two columns. df1[['playerA','playerB']],df2[['name1','name2']]

But the problem is, I want to merge this two data frame without considering the order of columns.

I tried

result = pd.merge(df1, df2, on =df1[['playerA','playerB']],df2[['name1','name2']])

But it didn't work.

Here is what I Want,

df1:
playerA   playerB  Weight score
kim         lee      1    
jackson     kim      3
dan         lee      4

I want to merge df1 and df2 by using two columns from each data frame without considering the order 
(df1[['playerA','playerB']],df2[['name1','name2']])

df2_merge:
name1   name2      score  weight
dan         lee      11    4
kim         lee      23    1
jackson     kim      44    3
lee         kim      35    1
kim        jacson     26   3
lee         dan      27    4

And then I want to delete the row which has the same name list.

Thus, here is my final wishes

df2_merge_ final:
name1   name2      score  weight
dan         lee      11    4
kim         lee      23    1
jackson     kim      44    3

I am not sure what else to try. Any advice? Thanks :)

Submitted May 21st 2021 by Admin

Answers
0

You can add new columns filled by sorted values and then use merge with remove unnecessary columns:

df1[['a','b']] = np.sort(df1[['playerA','playerB']], axis=1)
df2[['a','b']] = np.sort(df2[['name1','name2']], axis=1) result = pd.merge(df1, df2, on=['a','b']).drop(['a','b','name1','name2'], axis=1)
print (result) playerA playerB Weight score
0 kim lee 1 35
1 jackson kim 3 44
2 dan lee 4 11
3 dan lee 4 27

If need first unique combination add DataFrame.drop_duplicates:

df1[['a','b']] = np.sort(df1[['playerA','playerB']], axis=1)
df2[['a','b']] = np.sort(df2[['name1','name2']], axis=1) df2 = df2.drop_duplicates(['a','b']) result = pd.merge(df1, df2, on=['a','b']).drop(['a','b','name1','name2'], axis=1)
print (result) playerA playerB Weight score
0 kim lee 1 35
1 jackson kim 3 44
2 dan lee 4 11

Admin | 3 months ago



Relevant Questions