Add rownames as column using dplyr

Question:

I would like to do something that is quite simple using common R syntax, but using the dplyr package.

The task is basically to add the row.names of a data.frame object as a column in that same object. Using mtcars as an example, this could be done like this:

dados <- mtcars
dados$nomes <- row.names(mtcars)

I would like to do something like

dados <- mtcars %>% mutate(nomes=row.names(.))

But this code gives the error Error: unsupported type for column 'nomes' (NILSXP) (of course, I'm doing something wrong).

I would like to know if there is a way to solve this "problem".

Answer:

Attention : update in magrittr 1.5

As of magrittr 1.5 , the dot (.) of the %>% operator works with nested calls. In this way it correctly replaces the period within row.names(.) and now the example works normally without any modification.

dados <- mtcars %>% mutate(nomes=row.names(.))
head(dados)
   mpg cyl disp  hp drat    wt  qsec vs am gear carb             nomes
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4         Mazda RX4
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4     Mazda RX4 Wag
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1        Datsun 710
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    Hornet 4 Drive
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 Hornet Sportabout
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1           Valiant

Answer given before magrittr 1.5

Complementing Rogério's answer.

What is %>% doing?

If you take the code from %>% , roughly speaking, it creates a new environment and dumps what's on the left side into that environment. Then take the command that is on the right side, modify some things, and order the modified command to be executed within this new environment.

For example, if you run mtcars %>% mutate(., nomes = row.names(.)) , the left side is mtcars and the right side is mutate(., nomes = row.names(.)) :

lhs <- substitute(mtcars)
rhs <- substitute(mutate(., nomes = row.names(.)))

A new environment and a name for the left side are created:

env <- new.env(parent = parent.frame())
nm <- paste(deparse(lhs), collapse = "")

The left side is saved in the new environment with the created name:

env[[nm]] <- eval(lhs, env)

#Para ver que o objeto foi criado:
head(env$mtcars)

Now it is necessary to change the command points on the right side. The part that identifies where the dots are is:

dots <- c(FALSE, vapply(rhs[-1], identical, quote(.), 
                              FUN.VALUE = logical(1)))

But note that it only goes through the first level of the call.

dots
            nomes 
FALSE  TRUE FALSE 

When replacing, therefore, only the first point is replaced:

 rhs[dots] <- rep(list(as.name(nm)), sum(dots))
 e <- rhs
 e
 # veja que apenas o primeiro ponto foi substituído
 mutate(mtcars, nomes = row.names(.))

So, when you run the function in the env environment, as there is no object named ".", the error will occur:

eval(e, env)
Erro em row.names(.) : objeto '.' não encontrado

The solution to this would be for the replacement part to take place at all levels of the call. For example, if we change the other point of e manually:

e[[3]][[2]] <- as.name("mtcars")

Now it works:

eval(e, env)
# resultado omitido porque é grande

Why did it work with %.% putting '_prev' ?

The function behind the %.% is chain_q . To see the code, type dplyr:::chain_q .

function (calls, env = parent.frame()) 
{
    if (length(calls) == 0) 
        return()
    if (length(calls) == 1) 
        return(eval(calls[[1]], env))
    e <- new.env(parent = env)
    e$`__prev` <- eval(calls[[1]], env)
    for (call in calls[-1]) {
        new_call <- as.call(c(call[[1]], quote(`__prev`), as.list(call[-1])))
        e$`__prev` <- eval(new_call, e)
    }
    e$`__prev`
}

Note that the function creates a new environment called e and stores the first call of the command string with the name '_prev' ( e$'__prev' <- eval(calls[[1]], env) . access the result of the previous command in this way.

Hacking %>% (for illustration only)

If we set up a function that swaps all dots, like this one ( based on this SOen question ):

convert.call <- function(x, replacement) {
  if (is.call(x)) as.call(lapply(x, convert.call, replacement=replacement)) else
    if (identical(x, quote(.))) as.name(replacement) else
      x
}
# testando
expr <- substitute(mean(exp(sqrt(.)), .))
convert.call(expr, "x")
# mean(exp(sqrt(x)), x)

Then we can hack the definition of %>% to get all the points exchanged:

`%>%` <- function (lhs, rhs) 
{
  convert.call <- function(x, replacement) {
    if (is.call(x)) as.call(lapply(x, convert.call, replacement=replacement)) else
      if (identical(x, quote(.))) as.name(replacement) else
        x
  }
  
  lhs <- substitute(lhs)
  rhs <- substitute(rhs)
  if (is.call(rhs) && identical(rhs[[1]], quote(`(`))) 
    rhs <- eval(rhs, parent.frame(), parent.frame())
  if (!any(is.symbol(rhs), is.call(rhs), is.function(rhs))) 
    stop("RHS should be a symbol, a call, or a function.")
  env <- new.env(parent = parent.frame())
  nm <- paste(deparse(lhs), collapse = "")
  nm <- if (nchar(nm) < 9900 && (is.call(lhs) || is.name(lhs))) 
    nm
  else "__LHS"
  env[[nm]] <- eval(lhs, env)
  if (is.function(rhs)) {
    res <- withVisible(rhs(env[[nm]]))
  }
  else if (is.call(rhs) && deparse(rhs[[1]]) == "function") {
    res <- withVisible(eval(rhs, parent.frame(), parent.frame())(eval(lhs, 
                                                                      parent.frame(), parent.frame())))
  }
  else {
    if (is.symbol(rhs)) {
      if (!exists(deparse(rhs), parent.frame(), mode = "function")) 
        stop("RHS appears to be a function name, but it cannot be found.")
      e <- call(as.character(rhs), as.name(nm))
    }
    else {
      e <- convert.call(rhs, nm)
    }
    res <- withVisible(eval(e, env))
  }
  if (res$visible) 
    res$value
  else invisible(res$value)
}

Note that now mtcars %>% mutate(., nomes = row.names(.)) works. But I put this here just to explain what's going on, I wouldn't recommend you use the hacked version of %>% as it might cause bugs at other times — for example, as it is you'll have to explicitly put the dots every time, as in mtcars %>% filter(., cyl==4) %>% mutate(., nomes = row.names(.)) .

dplyr does not necessarily keep row.names across operations

One last note: dplyr (nor data.table) does not keep row.names intact during operations. Notice that dplyr replaces the row.names in the filter and the data.table replaces it when you convert the data.frame:

mt_dplyr <- filter(mtcars, cyl==4)
row.names(mt_dplyr)
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11"

mt_dt <- data.table(mtcars)
row.names(mt_dt)
1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22"
[23] "23" "24" "25" "26" "27" "28" "29" "30" "31" "32"

So, in the end, if row.names contains relevant information, it seems safer to turn it into a column before manipulating the data further.

An alternative "workaround": creating your own mutate function that has a local row_names

One solution that can be done is as follows: you create your own mutate that stores a row_names vector inside its parent environment (which in context will be the %>% , but if you use the function alone it will be the global environment, so be careful) and then run the dplyr mutate in this environment. So, if you want to use the row names, just use the row_names object. Let's call our mutate mutate2 :

mutate2 <- function(x, ...){
  assign("row_names", row.names(x), parent.frame())
  eval(substitute(mutate(x, ...)), parent.frame())
}

mtcars %>% mutate2(z = cyl^2, nomes=row_names) %>% filter(z==36)

   mpg cyl  disp  hp drat    wt  qsec vs am gear carb  z          nomes
1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 36      Mazda RX4
2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 36  Mazda RX4 Wag
3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 36 Hornet 4 Drive
4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 36        Valiant
5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 36       Merc 280
6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4 36      Merc 280C
7 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6 36   Ferrari Dino
Scroll to Top