Question:
Hi,
I have a database in which I used tidyr::spread which resulted in information spread over several lines. I would like to merge the different lines into one. Example:
df <- data.frame(obs = c(1,1,1,2),
msgA = c("A", NA, NA, "A" ),
msgB = c(NA,"B",NA, NA),
msgC = c(NA,NA,"C", NA))
#O que eu tentei:
library(dplyr)
df %>%
group_by(obs) %>%
mutate(msg = paste(msgA, msgB, msgC, sep=", ")) %>%
select(-c(2:4))
# Resultado que eu queria:
#
# > df
# obs msg
# 1 A, B, C
# 2 A
Thanks for the help!
Answer:
Tidyr has a unite()
function that can be used for this.
df %>%
tidyr::unite(msg, msgA:msgC, sep = ", ")
obs msg
1 1 A, NA, NA
2 1 NA, B, NA
3 1 NA, NA, C
4 2 A, NA, NA
It so happens that unite()
is designed for data.frame tidy , so the result it generates is not quite what you wanted.
To generate the expected result it is better to use data.frame
before spread()
and use group_by()
combined with summarise()
.
library(tidyverse)
tidy <- df %>%
gather(tipo, msg, -obs) %>%
filter(!is.na(msg))
tidy %>%
group_by(obs) %>%
summarise(msg = paste(msg, collapse = ", "))
## # A tibble: 2 x 2
## obs msg
## <dbl> <chr>
## 1 1 A, B, C
## 2 2 A