Testing out integrating some of the new dplyr functions into my workflow. Image from Posit.
Author
Crystal Lewis
Published
February 9, 2026
Last week Posit released a new version of {dplyr}, the powerhouse of {tidyverse}. This new release brought big changes to two heavily used processes, filtering and recoding. In this blog post I review some of these changes, and then attempt to try out these changes by replacing some of my old data wrangling code with the new functions.
If you read the original post from Davis Vaughan, this blog post does not provide any new information. This is simply a chance for me to experiment with these new functions, to get more used to using them in my own work, and to provide some additional supplemental examples of how they work.
Filtering
One of the biggest issues that people (or at least I) have had with filter() is that it is optimized to keep rows. However, we often want to drop rows and the function forces you to add complex logic if you want to remove rows while also retaining your missing values.
Let’s review an example of what this looked like before.
Say we have this dataset
df
# A tibble: 5 × 5
stu_id q1 q2 q3 q4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 100 1 2 10 205
2 101 -999 0 11 220
3 103 3 -999 12 250
4 105 4 0 13 217
5 109 NA NA NA NA
If I want to remove any row where q1 == -999, you would think it would look like this.
But unfortunately that does not give us the output we were expecting. In addition to removing rows where q1 == -999, it also removes any rows where q1 is NA. To understand this more, see Section 12.3.1 of R for Data Science.
One way around this is to explicitly say you want to keep values where is.na(q1).
df |>filter(q1 !=-999|is.na(q1))
# A tibble: 4 × 5
stu_id q1 q2 q3 q4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 100 1 2 10 205
2 103 3 -999 12 250
3 105 4 0 13 217
4 109 NA NA NA NA
Or you can use the %in% operator which follows different rules for NA compared to the == operator. See Section 12.3.3 in R for Data Science.
df |>filter(!q1 %in%-999)
# A tibble: 4 × 5
stu_id q1 q2 q3 q4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 100 1 2 10 205
2 103 3 -999 12 250
3 105 4 0 13 217
4 109 NA NA NA NA
However, this kind of logic gets much more complicated when you start needing to filter using more than one variable. Here we want to remove any row where q1 == -999 OR q2 == -999. As you can see, this is starting to look a little overwhelming.
# A tibble: 3 × 5
stu_id q1 q2 q3 q4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 100 1 2 10 205
2 105 4 0 13 217
3 109 NA NA NA NA
In comes dplyr 1.2.0, with a new function to improve the readability of removing rows, filter_out().
We can now do this.
df |>filter_out(q1 ==-999| q2 ==-999)
# A tibble: 3 × 5
stu_id q1 q2 q3 q4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 100 1 2 10 205
2 105 4 0 13 217
3 109 NA NA NA NA
I was curious if filter_out continues to work with if_any() or if_all() functions, and it does.
df |>filter_out(if_any(q1:q2, ~. ==-999))
# A tibble: 3 × 5
stu_id q1 q2 q3 q4
<dbl> <dbl> <dbl> <dbl> <dbl>
1 100 1 2 10 205
2 105 4 0 13 217
3 109 NA NA NA NA
Recoding
Recoding received several new updates in this release. Three new functions were introduced to help you work with recoding in different ways, recode_values(), replace_values(), and replace_when().
Image from https://dplyr.tidyverse.org/articles/recoding-replacing.html
Let’s go through each of these.
case_when()
case_when() continues to be the go-to function when you need to recode values by matching with conditions. As an example from the type of work I do, maybe I am recoding an existing variable into a new dichotomous risk factor variable.
Similar to case_when(), this function is still best used when matching by conditions. BUT this one is best used when only replacing some of the values. And the benefit of using this function in this specific case is that you do no have to use a default value to prevent non-recoded values from becoming NA.
Say we have this dataset and we want to truncate any value above $100,000 to $100,000. We can do that using case_when() but we would need to supply a default value. Otherwise, anything I do not recode will be coded to NA.
df
# A tibble: 4 × 2
id income
<dbl> <dbl>
1 100 25000
2 101 42000
3 105 672000
4 109 83000
df |>mutate(income =case_when( income >100000~100000,.default = income ) )
# A tibble: 4 × 2
id income
<dbl> <dbl>
1 100 25000
2 101 42000
3 105 100000
4 109 83000
However, with replace_when(), it knows that you are not recoding all of the values, so a default is not necessary. It retains all original values from the variable you provide.
df |>mutate(income =replace_when(income, # This is where the default values are coming from income >100000~100000 ) )
# A tibble: 4 × 2
id income
<dbl> <dbl>
1 100 25000
2 101 42000
3 105 100000
4 109 83000
recode_values()
However, now if you will be recoding by matching values, it is no longer recommended to use case_when() OR case_match() (which I had actually just recently got used to using). case_match() is now deprecated.
One way I might have previously recoded these course names into newly updated names.
df
# A tibble: 4 × 2
course_name subject
<chr> <chr>
1 Course A Math
2 Course D English
3 Course E Science
4 Course F Math
# A tibble: 4 × 3
course_name subject new_course_name
<chr> <chr> <chr>
1 Course A Math New Course A
2 Course D English New Course D
3 Course E Science New Course E
4 Course F Math New Course F
Obviously that got super repetitive so I switched to case_match().
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `new_course_name = case_match(...)`.
Caused by warning:
! `case_match()` was deprecated in dplyr 1.2.0.
ℹ Please use `recode_values()` instead.
# A tibble: 4 × 3
course_name subject new_course_name
<chr> <chr> <chr>
1 Course A Math New Course A
2 Course D English New Course D
3 Course E Science New Course E
4 Course F Math New Course F
But as you can see, case_match() is now deprecated in favor of recode_values(), which is a drop replacement.
# A tibble: 4 × 3
course_name subject new_course_name
<chr> <chr> <chr>
1 Course A Math New Course A
2 Course D English New Course D
3 Course E Science New Course E
4 Course F Math New Course F
And while it is difficult to see the benefit of recode_values() in the previous example, it really does shine if you plan to use a lookup table for recoding, as I often do. I particularly like to use an external lookup table, such as a data dictionary.
Here is an example lookup table.
dictionary_long
# A tibble: 4 × 2
old_value new_value
<chr> <chr>
1 Course A New Course A
2 Course D New Course D
3 Course E New Course E
4 Course F New Course F
Previously, I would have used a lookup table using recode().
I would have first had to create a named vector from my dictionary.
Then I would have had to use !!! to splice the vector into the argument of a quoting expression.
Not the most intuitive code.
# Create named vectordict_long <- dictionary_long |>deframe()# Recode values using that named vectordf |>mutate(new_course_name =recode(course_name, !!!dict_long))
# A tibble: 4 × 3
course_name subject new_course_name
<chr> <chr> <chr>
1 Course A Math New Course A
2 Course D English New Course D
3 Course E Science New Course E
4 Course F Math New Course F
But now with recode_values(), it takes a from and to argument where you can easily supply columns of the original data dictionary.
# A tibble: 4 × 3
course_name subject new_course_name
<chr> <chr> <chr>
1 Course A Math New Course A
2 Course D English New Course D
3 Course E Science New Course E
4 Course F Math New Course F
What I especially appreciate is that if you expected all values to be recoded during this process, you can add a check to make sure that assumption holds true.
So here I’ve taken our original dictionary_long and I’ve removed a recode for “Course F” just to test this. We can see it throws us a helpful error message.
Error in `mutate()`:
ℹ In argument: `new_course_name = recode_values(...)`.
Caused by error in `recode_values()`:
! Each location must be matched.
✖ Location 4 is unmatched.
But as usual, if we don’t need to recode every value (say some course names were not updated), we can also set a default value for those. Note that this default does not have a leading period, like the “.default” used in case_when().
# A tibble: 4 × 3
course_name subject new_course_name
<chr> <chr> <chr>
1 Course A Math New Course A
2 Course D English New Course D
3 Course E Science New Course E
4 Course F Math Course F
replace_values()
Similar to replace_when(), replace_values() will be most useful when you are providing a lookup table that only provides replacements for some of the variable’s values. Using this function will allow you to skip providing the default argument.
# A tibble: 4 × 3
course_name subject new_course_name
<chr> <chr> <chr>
1 Course A Math New Course A
2 Course D English New Course D
3 Course E Science New Course E
4 Course F Math Course F
Lists of vectors as input
One other nice thing you can do with recode_values() or replace_values() is give lists of vectors as input.
Say for instance, you have a file that was manually entered and there were a variety of ways people entered “Yes” and “No”.
df
# A tibble: 4 × 3
q1 q2 q3
<chr> <chr> <chr>
1 y No No
2 yes Y N
3 N N Y
4 No N No
Previously if I wanted to recode q1, I might have done something like this. Which worked just fine.
# A tibble: 4 × 3
q1 q2 q3
<chr> <chr> <chr>
1 Yes No No
2 Yes Yes No
3 No No Yes
4 No No No
These new functions will take some getting used to, but once I get used to them, hopefully they will reduce confusion and be a unified set of functions that are hopefully easier to implement and interpret. Last, as always there are many ways to accomplish these same outcomes (e.g., using joins to update course names rather than recoding). Yet, it is good to have lots of options in your toolbox so that you can use what works best for each scenario.
Citation
BibTeX citation:
@online{lewis2026,
author = {Lewis, Crystal},
title = {Trying Out Dplyr 1.2.0},
date = {2026-02-09},
url = {https://cghlewis.com/blog/dplyr_update/},
langid = {en}
}