Trying out dplyr 1.2.0

tutorials
rstats
data cleaning
documentation
Testing out integrating some of the new dplyr functions into my workflow. Image from Posit.
Author

Crystal Lewis

Published

February 9, 2026

Last week Posit released a new version of {dplyr}, the powerhouse of {tidyverse}. This new release brought big changes to two heavily used processes, filtering and recoding. In this blog post I review some of these changes, and then attempt to try out these changes by replacing some of my old data wrangling code with the new functions.

If you read the original post from Davis Vaughan, this blog post does not provide any new information. This is simply a chance for me to experiment with these new functions, to get more used to using them in my own work, and to provide some additional supplemental examples of how they work.

Filtering

One of the biggest issues that people (or at least I) have had with filter() is that it is optimized to keep rows. However, we often want to drop rows and the function forces you to add complex logic if you want to remove rows while also retaining your missing values.

Let’s review an example of what this looked like before.

Say we have this dataset

df
# A tibble: 5 × 5
  stu_id    q1    q2    q3    q4
   <dbl> <dbl> <dbl> <dbl> <dbl>
1    100     1     2    10   205
2    101  -999     0    11   220
3    103     3  -999    12   250
4    105     4     0    13   217
5    109    NA    NA    NA    NA

If I want to remove any row where q1 == -999, you would think it would look like this.

df |>
  filter(q1 != -999)
# A tibble: 3 × 5
  stu_id    q1    q2    q3    q4
   <dbl> <dbl> <dbl> <dbl> <dbl>
1    100     1     2    10   205
2    103     3  -999    12   250
3    105     4     0    13   217

But unfortunately that does not give us the output we were expecting. In addition to removing rows where q1 == -999, it also removes any rows where q1 is NA. To understand this more, see Section 12.3.1 of R for Data Science.

One way around this is to explicitly say you want to keep values where is.na(q1).

df |>
  filter(q1 != -999 | is.na(q1))
# A tibble: 4 × 5
  stu_id    q1    q2    q3    q4
   <dbl> <dbl> <dbl> <dbl> <dbl>
1    100     1     2    10   205
2    103     3  -999    12   250
3    105     4     0    13   217
4    109    NA    NA    NA    NA

Or you can use the %in% operator which follows different rules for NA compared to the == operator. See Section 12.3.3 in R for Data Science.

df |>
  filter(!q1 %in% -999)
# A tibble: 4 × 5
  stu_id    q1    q2    q3    q4
   <dbl> <dbl> <dbl> <dbl> <dbl>
1    100     1     2    10   205
2    103     3  -999    12   250
3    105     4     0    13   217
4    109    NA    NA    NA    NA

However, this kind of logic gets much more complicated when you start needing to filter using more than one variable. Here we want to remove any row where q1 == -999 OR q2 == -999. As you can see, this is starting to look a little overwhelming.

df |>
  filter(!(q1 == -999 | q2 == -999) | (is.na(q1) | is.na(q2)))
# A tibble: 3 × 5
  stu_id    q1    q2    q3    q4
   <dbl> <dbl> <dbl> <dbl> <dbl>
1    100     1     2    10   205
2    105     4     0    13   217
3    109    NA    NA    NA    NA

In comes dplyr 1.2.0, with a new function to improve the readability of removing rows, filter_out().

We can now do this.

df |>
  filter_out(q1 == -999 | q2 == -999)
# A tibble: 3 × 5
  stu_id    q1    q2    q3    q4
   <dbl> <dbl> <dbl> <dbl> <dbl>
1    100     1     2    10   205
2    105     4     0    13   217
3    109    NA    NA    NA    NA

I was curious if filter_out continues to work with if_any() or if_all() functions, and it does.

df |>
  filter_out(if_any(q1:q2, ~. == -999))
# A tibble: 3 × 5
  stu_id    q1    q2    q3    q4
   <dbl> <dbl> <dbl> <dbl> <dbl>
1    100     1     2    10   205
2    105     4     0    13   217
3    109    NA    NA    NA    NA

Recoding

Recoding received several new updates in this release. Three new functions were introduced to help you work with recoding in different ways, recode_values(), replace_values(), and replace_when().

Image from https://dplyr.tidyverse.org/articles/recoding-replacing.html

Image from https://dplyr.tidyverse.org/articles/recoding-replacing.html

Let’s go through each of these.

case_when()

case_when() continues to be the go-to function when you need to recode values by matching with conditions. As an example from the type of work I do, maybe I am recoding an existing variable into a new dichotomous risk factor variable.

df
# A tibble: 4 × 2
  stu_id sum_score
   <dbl>     <dbl>
1    100        22
2    101        35
3    103        15
4    105        41
df |>
  mutate(risk =
    case_when(
      sum_score < 25 ~ 1,
      sum_score >= 25 ~ 0
    )
  )
# A tibble: 4 × 3
  stu_id sum_score  risk
   <dbl>     <dbl> <dbl>
1    100        22     1
2    101        35     0
3    103        15     1
4    105        41     0

replace_when()

Similar to case_when(), this function is still best used when matching by conditions. BUT this one is best used when only replacing some of the values. And the benefit of using this function in this specific case is that you do no have to use a default value to prevent non-recoded values from becoming NA.

Say we have this dataset and we want to truncate any value above $100,000 to $100,000. We can do that using case_when() but we would need to supply a default value. Otherwise, anything I do not recode will be coded to NA.

df
# A tibble: 4 × 2
     id income
  <dbl>  <dbl>
1   100  25000
2   101  42000
3   105 672000
4   109  83000
df |>
  mutate(income =
      case_when(
        income > 100000 ~ 100000,
        .default = income
      )
  )
# A tibble: 4 × 2
     id income
  <dbl>  <dbl>
1   100  25000
2   101  42000
3   105 100000
4   109  83000

However, with replace_when(), it knows that you are not recoding all of the values, so a default is not necessary. It retains all original values from the variable you provide.

df |>
  mutate(income = 
      replace_when(income, # This is where the default values are coming from
        income > 100000 ~ 100000
      )
  )
# A tibble: 4 × 2
     id income
  <dbl>  <dbl>
1   100  25000
2   101  42000
3   105 100000
4   109  83000

recode_values()

However, now if you will be recoding by matching values, it is no longer recommended to use case_when() OR case_match() (which I had actually just recently got used to using). case_match() is now deprecated.

One way I might have previously recoded these course names into newly updated names.

df
# A tibble: 4 × 2
  course_name subject
  <chr>       <chr>  
1 Course A    Math   
2 Course D    English
3 Course E    Science
4 Course F    Math   
df |>
  mutate(new_course_name = 
    case_when(
      course_name ==  "Course A" ~ "New Course A",
      course_name ==  "Course D" ~ "New Course D",
      course_name ==  "Course E" ~ "New Course E",
      course_name ==  "Course F" ~ "New Course F"
    )
  )
# A tibble: 4 × 3
  course_name subject new_course_name
  <chr>       <chr>   <chr>          
1 Course A    Math    New Course A   
2 Course D    English New Course D   
3 Course E    Science New Course E   
4 Course F    Math    New Course F   

Obviously that got super repetitive so I switched to case_match().

df |>
  mutate(new_course_name = 
    case_match(
      course_name,
        "Course A" ~ "New Course A",
        "Course D" ~ "New Course D",
        "Course E" ~ "New Course E",
        "Course F" ~ "New Course F"
    )
  )
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `new_course_name = case_match(...)`.
Caused by warning:
! `case_match()` was deprecated in dplyr 1.2.0.
ℹ Please use `recode_values()` instead.
# A tibble: 4 × 3
  course_name subject new_course_name
  <chr>       <chr>   <chr>          
1 Course A    Math    New Course A   
2 Course D    English New Course D   
3 Course E    Science New Course E   
4 Course F    Math    New Course F   

But as you can see, case_match() is now deprecated in favor of recode_values(), which is a drop replacement.

df |>
  mutate(new_course_name =
    recode_values(course_name,
        "Course A" ~ "New Course A",
        "Course D" ~ "New Course D",
        "Course E" ~ "New Course E",
        "Course F" ~ "New Course F"
    )
  )
# A tibble: 4 × 3
  course_name subject new_course_name
  <chr>       <chr>   <chr>          
1 Course A    Math    New Course A   
2 Course D    English New Course D   
3 Course E    Science New Course E   
4 Course F    Math    New Course F   

And while it is difficult to see the benefit of recode_values() in the previous example, it really does shine if you plan to use a lookup table for recoding, as I often do. I particularly like to use an external lookup table, such as a data dictionary.

Here is an example lookup table.

dictionary_long
# A tibble: 4 × 2
  old_value new_value   
  <chr>     <chr>       
1 Course A  New Course A
2 Course D  New Course D
3 Course E  New Course E
4 Course F  New Course F

Previously, I would have used a lookup table using recode().

I would have first had to create a named vector from my dictionary.
Then I would have had to use !!! to splice the vector into the argument of a quoting expression.
Not the most intuitive code.

# Create named vector

dict_long <- dictionary_long |>
  deframe()

# Recode values using that named vector

df |>
  mutate(new_course_name = recode(course_name, !!!dict_long))
# A tibble: 4 × 3
  course_name subject new_course_name
  <chr>       <chr>   <chr>          
1 Course A    Math    New Course A   
2 Course D    English New Course D   
3 Course E    Science New Course E   
4 Course F    Math    New Course F   

But now with recode_values(), it takes a from and to argument where you can easily supply columns of the original data dictionary.

df |>
  mutate(new_course_name = 
    recode_values(
      course_name, 
      from = dictionary_long$old_value,
      to = dictionary_long$new_value
    )
  )
# A tibble: 4 × 3
  course_name subject new_course_name
  <chr>       <chr>   <chr>          
1 Course A    Math    New Course A   
2 Course D    English New Course D   
3 Course E    Science New Course E   
4 Course F    Math    New Course F   

What I especially appreciate is that if you expected all values to be recoded during this process, you can add a check to make sure that assumption holds true.

So here I’ve taken our original dictionary_long and I’ve removed a recode for “Course F” just to test this. We can see it throws us a helpful error message.

df |>
  mutate(new_course_name = 
    recode_values(
      course_name, 
      from = dictionary_long$old_value,
      to = dictionary_long$new_value,
      unmatched = "error"
    )
  )
Error in `mutate()`:
ℹ In argument: `new_course_name = recode_values(...)`.
Caused by error in `recode_values()`:
! Each location must be matched.
✖ Location 4 is unmatched.

But as usual, if we don’t need to recode every value (say some course names were not updated), we can also set a default value for those. Note that this default does not have a leading period, like the “.default” used in case_when().

df |>
  mutate(new_course_name = 
    recode_values(
      course_name, 
      from = dictionary_long$old_value,
      to = dictionary_long$new_value,
      default = course_name
    )
  )
# A tibble: 4 × 3
  course_name subject new_course_name
  <chr>       <chr>   <chr>          
1 Course A    Math    New Course A   
2 Course D    English New Course D   
3 Course E    Science New Course E   
4 Course F    Math    Course F       

replace_values()

Similar to replace_when(), replace_values() will be most useful when you are providing a lookup table that only provides replacements for some of the variable’s values. Using this function will allow you to skip providing the default argument.

df |>
  mutate(new_course_name = 
    replace_values(
      course_name, 
      from = dictionary_long$old_value,
      to = dictionary_long$new_value
    )
  )
# A tibble: 4 × 3
  course_name subject new_course_name
  <chr>       <chr>   <chr>          
1 Course A    Math    New Course A   
2 Course D    English New Course D   
3 Course E    Science New Course E   
4 Course F    Math    Course F       

Lists of vectors as input

One other nice thing you can do with recode_values() or replace_values() is give lists of vectors as input.

Say for instance, you have a file that was manually entered and there were a variety of ways people entered “Yes” and “No”.

df
# A tibble: 4 × 3
  q1    q2    q3   
  <chr> <chr> <chr>
1 y     No    No   
2 yes   Y     N    
3 N     N     Y    
4 No    N     No   

Previously if I wanted to recode q1, I might have done something like this. Which worked just fine.

df |>
  mutate(q1 = 
    case_when(
      q1 %in% c("yes", "y") ~ "Yes",
      q1 %in% c("N", "No") ~ "No"
    )
  )
# A tibble: 4 × 3
  q1    q2    q3   
  <chr> <chr> <chr>
1 Yes   No    No   
2 Yes   Y     N    
3 No    N     Y    
4 No    N     No   

But if I have this information in a lookup table already, I can now provide that information into recode_values() or replace_values().

# Create lookup table

lookup <- tribble(
~old_value, ~new_value,
c("yes", "y", "Y"), "Yes",
c("N", "No"), "No"
  )

# Lookup table

lookup
# A tibble: 2 × 2
  old_value new_value
  <list>    <chr>    
1 <chr [3]> Yes      
2 <chr [2]> No       
# Recode q1 values

df |>
  mutate(q1 = 
    recode_values(
      q1, 
      from = lookup$old_value,
      to = lookup$new_value   
    )
  )
# A tibble: 4 × 3
  q1    q2    q3   
  <chr> <chr> <chr>
1 Yes   No    No   
2 Yes   Y     N    
3 No    N     Y    
4 No    N     No   

And I can still apply these recodes across multiple variables if I want to.

# Recode q1, q2, and q3 values

df |>
  mutate(across(q1:q3,
    \(x) recode_values(x,
      from = lookup$old_value,
      to = lookup$new_value)
    )
  )
# A tibble: 4 × 3
  q1    q2    q3   
  <chr> <chr> <chr>
1 Yes   No    No   
2 Yes   Yes   No   
3 No    No    Yes  
4 No    No    No   

These new functions will take some getting used to, but once I get used to them, hopefully they will reduce confusion and be a unified set of functions that are hopefully easier to implement and interpret. Last, as always there are many ways to accomplish these same outcomes (e.g., using joins to update course names rather than recoding). Yet, it is good to have lots of options in your toolbox so that you can use what works best for each scenario.

Citation

BibTeX citation:
@online{lewis2026,
  author = {Lewis, Crystal},
  title = {Trying Out Dplyr 1.2.0},
  date = {2026-02-09},
  url = {https://cghlewis.com/blog/dplyr_update/},
  langid = {en}
}
For attribution, please cite this work as:
Lewis, Crystal. 2026. “Trying Out Dplyr 1.2.0.” February 9, 2026. https://cghlewis.com/blog/dplyr_update/.