Inconsistent numeric format

Question:

I am an experienced SAS programmer, but new to R. I am working with RStudio Version 0.99.903 – © 2009-2016 RStudio, Inc. and Windows 8. I have the following question:

  1. The file "a_us" has 4 numeric and 2 alphanumeric fields as follows:

    str(a_us) //command to show file structure

'data.frame': 1039992 obs. of 7 variables: $ 'dsSisOriginario' : chr "Construcard" "Construcard" "Construcard" "Construcard" ... $ 'nrContrato' : chr "000002160000023630," "000002160000116565," "000002160000225267," ... $ 'vlCredInadimplenciaLancadoCa': num 9570 4455 6791 2678 4483 ... $ 'dtCredInadimplenciaEntradaCa': chr "03/11/2002" "17/10/2004" "25/03/2007" "15/12/2006" ... $ 'vlCredFcvsCessao' : num 271 216 329 130 217 ... $ PercentPagoCarteira : num 0.0283 0.0484 0.0484 0.0484 ... $ QtdCredDiasAtraso : int 5110 4396 3507 3607 2768 2407 2640 ...
  1. Using summary(a_us) the result is as expected, that is, the statistics for the numerical variables are perfect.

  2. However, when I try to take, for example, the mean (mean()) or any other quantitative procedure, such as hist(), of these same numeric variables ('vlCredInadimplenciaLancadoCa', 'vlCredFcvsCessao', PercentPagoCarteira, QtdCredDiasAtraso), it works only for the variables (PercentPagoCarteira, QtdCredDiasAtraso), for the others ('vlCredInadimplenciaLancadoCa', 'vlCredFcvsCessao'), I get the message:

> mean(a_us$'vlCredFcvsCessao') > [1] NA > Warning message: > In mean.default(a_us$vlCredFcvsCessao) : > argumento não é numérico nem lógico: retornando NA

Even though the variable is numeric, I get this error message!

Can anyone give me a hint of what is going on and how to resolve it?

Answer:

As your data was imported, some columns had quotes around the name. This prevents the $ operator from working the way you'd expect. The best way to fix it is to re-import the base. But you can also refer to the column like this:

mean(a_us$`'vlCredFcvsCessao'`)

Note the accent surrounding the column name.

See this simple example:

> df <- dplyr::data_frame("'colunacomaspas'" = 1, colunasemaspas = 1)
> str(df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   1 obs. of  2 variables:
 $ 'colunacomaspas': num 1
 $ colunasemaspas  : num 1
> mean(df$`'colunacomaspas'`)
[1] 1
> mean(df$'colunacomaspas')
[1] NA
Warning messages:
1: Unknown column 'colunacomaspas' 
2: In mean.default(df$colunacomaspas) :
  argument is not numeric or logical: returning NA

Note that str shows differently the names of the columns with quotes and those without quotes in your example as well.

Another way to fix it would be to rename the columns by removing these quotes. Example:

> names(df) <- gsub("'", "", names(df))
> mean(df$colunacomaspas)
[1] 1
Scroll to Top
AllEscort