Read file with non-ascii format [à=&quot;<U+00E0> &quot;]

Question:

I'm reading a file in R called roubobs.rds . it's a proprietary format of R and I couldn't open it in excel. I can import the data into a variable but, inside the records, the texts are with non-ascii codes (unicode? utf-8?). I've searched to try to find out what this code is, as well as I've tried exporting as CSV, but it doesn't work. Does anyone have a light? I need what appears as "Armed Assault" to appear as "Armed Assault".

The R code you're reading is this one:

dados <- readRDS("roubo2.rds")

The file can be downloaded here: https://www.dropbox.com/s/yp9r0tln0vwdvej/robo2.rds?dl=0 I'm running RStudio on Mac. SessionInfo below.

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.1 (Sierra)

Answer:

To export to .csv in the correct encoding, just add the fileEncoding argument to the write.csv() function

The code would look like this:

dados <- readRDS('roubo2.rds')

write.csv2(dados, 'roubo2.csv', fileEncoding = 'UTF-8')

I also suggest that you change the Factor format variables to Char since you are working with texts. To do this just use as.character() . Example:

roubo$tipo <- as.character(roubo$tipo)

When reading a .csv file you can do this directly by passing the stringsAsFactors = FALSE argument in the read.csv() function


Finally, it would be good to use version 3.2 of R, since the vast majority of packages are developed for this version.

Scroll to Top