6.7 数据合并

merge 合并两个数据框

authors <- data.frame(
  ## I(*) : use character columns of names to get sensible sort order
  surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
  nationality = c("US", "Australia", "US", "UK", "Australia"),
  deceased = c("yes", rep("no", 4))
)
authorN <- within(authors, {
  name <- surname
  rm(surname)
})
books <- data.frame(
  name = I(c(
    "Tukey", "Venables", "Tierney",
    "Ripley", "Ripley", "McNeil", "R Core"
  )),
  title = c(
    "Exploratory Data Analysis",
    "Modern Applied Statistics ...",
    "LISP-STAT",
    "Spatial Statistics", "Stochastic Simulation",
    "Interactive Data Analysis",
    "An Introduction to R"
  ),
  other.author = c(
    NA, "Ripley", NA, NA, NA, NA,
    "Venables & Smith"
  )
)

authors
##    surname nationality deceased
## 1    Tukey          US      yes
## 2 Venables   Australia       no
## 3  Tierney          US       no
## 4   Ripley          UK       no
## 5   McNeil   Australia       no
authorN
##   nationality deceased     name
## 1          US      yes    Tukey
## 2   Australia       no Venables
## 3          US       no  Tierney
## 4          UK       no   Ripley
## 5   Australia       no   McNeil
books
##       name                         title     other.author
## 1    Tukey     Exploratory Data Analysis             <NA>
## 2 Venables Modern Applied Statistics ...           Ripley
## 3  Tierney                     LISP-STAT             <NA>
## 4   Ripley            Spatial Statistics             <NA>
## 5   Ripley         Stochastic Simulation             <NA>
## 6   McNeil     Interactive Data Analysis             <NA>
## 7   R Core          An Introduction to R Venables & Smith

默认找到同名的列,然后是同名的行合并,多余的没有匹配到的就丢掉

merge(authorN, books)
##       name nationality deceased                         title other.author
## 1   McNeil   Australia       no     Interactive Data Analysis         <NA>
## 2   Ripley          UK       no            Spatial Statistics         <NA>
## 3   Ripley          UK       no         Stochastic Simulation         <NA>
## 4  Tierney          US       no                     LISP-STAT         <NA>
## 5    Tukey          US      yes     Exploratory Data Analysis         <NA>
## 6 Venables   Australia       no Modern Applied Statistics ...       Ripley

还可以指定合并的列,先按照 surname 合并,留下 surname

merge(authors, books, by.x = "surname", by.y = "name")
##    surname nationality deceased                         title other.author
## 1   McNeil   Australia       no     Interactive Data Analysis         <NA>
## 2   Ripley          UK       no            Spatial Statistics         <NA>
## 3   Ripley          UK       no         Stochastic Simulation         <NA>
## 4  Tierney          US       no                     LISP-STAT         <NA>
## 5    Tukey          US      yes     Exploratory Data Analysis         <NA>
## 6 Venables   Australia       no Modern Applied Statistics ...       Ripley

留下的是 name

merge(books, authors, by.x = "name", by.y = "surname")
##       name                         title other.author nationality deceased
## 1   McNeil     Interactive Data Analysis         <NA>   Australia       no
## 2   Ripley            Spatial Statistics         <NA>          UK       no
## 3   Ripley         Stochastic Simulation         <NA>          UK       no
## 4  Tierney                     LISP-STAT         <NA>          US       no
## 5    Tukey     Exploratory Data Analysis         <NA>          US      yes
## 6 Venables Modern Applied Statistics ...       Ripley   Australia       no

为了比较清楚地观察几种合并的区别,这里提供对应的动画展示 https://github.com/gadenbuie/tidyexplain

(inner, outer, left, right, cross) join 共5种合并方式详情请看 https://stackoverflow.com/questions/1299871

cbind 和 rbind 分别是按列和行合并数据框