1

r subset to remove duplicate id and condition

If this is my dataset

Id   Weight   Category
1    10.2     Pre
1    12.1     Post
2    11.3     Post
3    12.9     Pre
4    10.3     Post
4    12.3     Pre
5    11.8     Pre

How Do I get rid of duplicate IDs that are also Category=Pre. My final expected dataset would be

Id   Weight   Category

1    12.1     Post
2    11.3     Post
3    12.9     Pre
4    10.3     Post
5    11.8     Pre

Submitted October 10th 2021 by Admin

Answers
0

You may arrange the data and then use distinct.

library(dplyr) df %>% arrange(Id, Category) %>% distinct(Id, .keep_all = TRUE) # Id Weight Category
#1 1 12.1 Post
#2 2 11.3 Post
#3 3 12.9 Pre
#4 4 10.3 Post
#5 5 11.8 Pre

This works because 'Pre' > 'Post'.

Admin | 1 week ago


0

Using by, split dat by Id and select Post, then rbind result.

do.call(rbind, by(dat, dat$Id, function(x) if (nrow(x) == 2) x[x$Category == 'Post', ] else x))
# Id Weight Category
# 1 1 12.1 Post
# 2 2 11.3 Post
# 3 3 12.9 Pre
# 4 4 10.3 Post
# 5 5 11.8 Pre

Data:

dat <- read.table(header=T, text=' Id Weight Category
1 10.2 Pre
1 12.1 Post
2 11.3 Post
3 12.9 Pre
4 10.3 Post
4 12.3 Pre
5 11.8 Pre ')

Admin | 1 week ago


0

We could use filter after grouping and arranging using first() as Post comes before Pre:

df %>% group_by(Id) %>% arrange(Id, Category) %>% filter(Category ==first(Category)) 

output:

 Id Weight Category <int> <dbl> <chr> 1 1 12.1 Post 2 2 11.3 Post 3 3 12.9 Pre 4 4 10.3 Post 5 5 11.8 Pre 

Admin | 1 week ago



Relevant Questions