Question:
Suppose I have the following database
vendas<-c(100,140,200,300,20,1000,200,3000)
vendedor<-c("A","B","A","B","C","C","D","A")
regiao<-c("Norte","Sul","Leste","Norte","Sul","Norte","Leste","Sul")
df<-data.frame(vendedor,regiao,vendas)
And please look at total sales by seller and by seller/region.
How do I generate this new database with aggregated data for analysis?
Answer:
Hadley recently created dplyr
, a much faster and more intuitive syntax version than plyr
. (links to CRAN and RStudio's blog post )
on dplyr
would look like this
library(dplyr)
group_by(df,vendedor)%>%summarise(Total=sum(vendas))
vendedor Total
1 A 3300
2 B 440
3 C 1020
4 D 200
And grouping by seller and region
group_by(df,vendedor, regiao)%>%summarise(Total=sum(vendas))
vendedor regiao Total
1 A Leste 200
2 A Norte 100
3 A Sul 3000
4 B Norte 300
5 B Sul 140
6 C Norte 1000
7 C Sul 20
8 D Leste 200
Edit: The latest version of dplyr
uses the dplyr
%>%
magrittr
.