ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

How to use map() instead of if_else() sandwich?

2022-02-01 17:03:18  阅读:180  来源: 互联网

标签:map use sandwich No Sun tip paid Dinner Male


1. Why do I care?

In R, when we want to create a new column we often use mutate() function. A little complex situation is that this new column is base on some existing columns such that we have to prepare our logic first.

One common solution is using if_else() function. For, example:

library(tidyverse)
(tips <- read_csv("tips.csv"))

 

# A tibble: 244 × 7
   total_bill   tip sex    smoker day   time    size
        <dbl> <dbl> <chr>  <chr>  <chr> <chr>  <dbl>
 1      17.0   1.01 Female No     Sun   Dinner     2
 2      10.3   1.66 Male   No     Sun   Dinner     3
 3      21.0   3.5  Male   No     Sun   Dinner     3
 4      23.7   3.31 Male   No     Sun   Dinner     2
 5      24.6   3.61 Female No     Sun   Dinner     4
 6      25.3   4.71 Male   No     Sun   Dinner     4
 7       8.77  2    Male   No     Sun   Dinner     2
 8      26.9   3.12 Male   No     Sun   Dinner     4
 9      15.0   1.96 Male   No     Sun   Dinner     2
10      14.8   3.23 Male   No     Sun   Dinner     2
# … with 234 more rows

 When the logic is simple if_else() is convenience.

# a common workflow is...
tips %>%
  mutate(tips_type = if_else(tip >= total_bill * 0.20, "well paid", "under paid"))

 

# A tibble: 244 × 8
   total_bill   tip sex    smoker day   time    size tips_type 
        <dbl> <dbl> <chr>  <chr>  <chr> <chr>  <dbl> <chr>     
 1      17.0   1.01 Female No     Sun   Dinner     2 under paid
 2      10.3   1.66 Male   No     Sun   Dinner     3 under paid
 3      21.0   3.5  Male   No     Sun   Dinner     3 under paid
 4      23.7   3.31 Male   No     Sun   Dinner     2 under paid
 5      24.6   3.61 Female No     Sun   Dinner     4 under paid
 6      25.3   4.71 Male   No     Sun   Dinner     4 under paid
 7       8.77  2    Male   No     Sun   Dinner     2 well paid 
 8      26.9   3.12 Male   No     Sun   Dinner     4 under paid
 9      15.0   1.96 Male   No     Sun   Dinner     2 under paid
10      14.8   3.23 Male   No     Sun   Dinner     2 well paid 
# … with 234 more rows

But a disadvantage of this method is that when the logic is growing, the code will become chaos. There will be many layers of if_else() overlapping, like a sandiwch.

Needless to say, if you are going to use more than just two columns, you will confuse yourself easily.

For example:

# more layers of if_else() is difficult to write and to read:
tips %>%
  mutate(tips_type = if_else(tip >= total_bill * 0.2, "well paid", 
                     if_else(tip >= total_bill * 0.15, "fare paid",
                     if_else(tip >= total_bill * 0.1, "acceptable", "under paid")))) # many layers of if_else() overlapping, like a sandwich

# needless to say, if we are using more columns than just two:
#tips %>%
#  mutate(tips_type = if_else((tip >= total_bill * 0.2) & (day %in% c("Sat", "Sun") & time == "Dinner"), "well paid", ...))

That is, we need a separated place to arrange our business logic and prepare our function, instead of making huge if_else() sandwich. Luckly, R is a functional programming language and it has a convenience tool call map(), from tidyverse package.

 

2. Before map()

Firstly, in base R, there are functions called apply(), lapply() and sapple(). They are designed for the same purpose like map(). But we don't talk about them in this article. And I don't use them because they lack consistence just like most of other base R functions.

Secondly, map() is no magic and it is only a wrapper of for-loop. For-loop is a good choice to handle multiple inputs with a same process. This is exactly what we need. However, for-loop is hard to write inside a mutate() function. Don't forget we are talking about create a new column problem. So we can use the wrapper of for-loop, the map().

Of course, you can use for-loop all the same but that will break your data flow pipline. For example:

# you can use for-loop all the same but that breaks your data flow pipline:
tips %>%
  filter(time == "Dinner") %>%
  mutate(tips_type = "oh waite a minute, I first write a for-loop to find the result")

# (joke)mY cOoL foR-lO0p
tips_type_result <- vector("character", nrow(tips))
for (i in seq_along(tips$tip)) {
  if (tips$tip[[i]] > tips$total_bill[[i]] * 0.2) {
    tips_type_result[[i]] = "well paid"
  } else {
    tips_type_result[[i]] = "under paid"
  }
}

# (joke)lEt's gO bAck to mUtaTe
tips %>%
  filter(time == "Dinner") %>%
  mutate(tips_type = tips_type_result) %>%
  summarise()...

# you will not want to do things like this. That's why we need map().

Lastly, when I say map() but actually it is a family of functions, like map_chr() which output is character string, map_dbl() which output is double floating point number and so on. You can lookup them with ?map at the R console.  

 

3. A workflow of using map()

As we have talked above, one of the biggest advantage map() gives us is a calm to write our logic and function separately.

After that, we can use this function with map() inside mutate(), which also keeps out data flow pipline to next step.

Using map() increases readability and rubust of our code. If our business logic changes, we can change the independ function instead change mutate clause.

Note that I actually use map2_chr(). It means we have two variables as input, and output is a character string. You can use map_dbl() or map2_dbl() if your result is double floating point numbers.  

tip_type_judge <- function(tip, total_bill) {
  if (tip >= total_bill * 0.2) {
    return("well paid")
  } else if (tip >= total_bill * 0.15) {
    return("fare paid")
  } else if (tip >= total_bill * 0.1) {
    return("acceptable")
  } else {
    return("under paid")
  }
}

tips %>%
  mutate(tip_type = map2_chr(tip, total_bill, tip_type_judge))

  

# A tibble: 244 × 8
   total_bill   tip sex    smoker day   time    size tip_type  
        <dbl> <dbl> <chr>  <chr>  <chr> <chr>  <dbl> <chr>     
 1      17.0   1.01 Female No     Sun   Dinner     2 under paid
 2      10.3   1.66 Male   No     Sun   Dinner     3 fare paid 
 3      21.0   3.5  Male   No     Sun   Dinner     3 fare paid 
 4      23.7   3.31 Male   No     Sun   Dinner     2 acceptable
 5      24.6   3.61 Female No     Sun   Dinner     4 acceptable
 6      25.3   4.71 Male   No     Sun   Dinner     4 fare paid 
 7       8.77  2    Male   No     Sun   Dinner     2 well paid 
 8      26.9   3.12 Male   No     Sun   Dinner     4 acceptable
 9      15.0   1.96 Male   No     Sun   Dinner     2 acceptable
10      14.8   3.23 Male   No     Sun   Dinner     2 well paid 
# … with 234 more rows

If you have more than just 2 columns as input, you can use pmap_*(). It takse a list as input and in the list we can use as many columns as we need. Note that I use pmap_chr() below. You can use pmap_dbl() if your result is double floating point numbers. 

# if you have more than 2 columns as input, use pmap_*() instead
tip_type_judge_v2 <- function(tip, total_bill, day) {
  if ((tip >= total_bill * 0.2) & (day %in% c("Sun", "Sat"))) {
    return("well paid")
  } else if ((tip >= total_bill * 0.15) & !(day %in% c("Sun", "Sat"))) {
    return("well paid")
  } else if ((tip >= total_bill) * 0.1) {
    return("acceptable")
  } else {
    return("under paid")
  }
}

tips %>%
  mutate(tip_type = pmap_chr(list(tip, total_bill, day), tip_type_judge_v2))

  

# A tibble: 244 × 8
   total_bill   tip sex    smoker day   time    size tip_type  
        <dbl> <dbl> <chr>  <chr>  <chr> <chr>  <dbl> <chr>     
 1      17.0   1.01 Female No     Sun   Dinner     2 under paid
 2      10.3   1.66 Male   No     Sun   Dinner     3 under paid
 3      21.0   3.5  Male   No     Sun   Dinner     3 under paid
 4      23.7   3.31 Male   No     Sun   Dinner     2 under paid
 5      24.6   3.61 Female No     Sun   Dinner     4 under paid
 6      25.3   4.71 Male   No     Sun   Dinner     4 under paid
 7       8.77  2    Male   No     Sun   Dinner     2 well paid 
 8      26.9   3.12 Male   No     Sun   Dinner     4 under paid
 9      15.0   1.96 Male   No     Sun   Dinner     2 under paid
10      14.8   3.23 Male   No     Sun   Dinner     2 well paid 
# … with 234 more rows

  

  

 

 

 

 

 

 

  

  

 

标签:map,use,sandwich,No,Sun,tip,paid,Dinner,Male
来源: https://www.cnblogs.com/drvongoosewing/p/15859210.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有