Question:
Why do I get NA when I do this character to POSIXlt conversion?
library(bReeze)
data(winddata)
tempo <- winddata[,1]
tempo[1:6] # Preview
# [1] "06.05.2009 11:20" "06.05.2009 11:30" "06.05.2009 11:40"
tempo_POSIX <- strptime(tempo, format = "%d.%m.%Y %H:%M")
sum(is.na(tempo_POSIX))
# [1] 6
valores_NA <- which(is.na(tempo_POSIX))
tempo[valores_NA]
# [1] "18.10.2009 00:00" "18.10.2009 00:10" "18.10.2009 00:20"
# [3] "18.10.2009 00:30" "18.10.2009 00:40" "18.10.2009 00:50"
As you can see, the values that were converted to NA behave normally… they follow the same format as the others.
Interestingly, the error DOES NOT OCCUR if I pass a value to the tz argument
tempo_POSIX <- strptime(tempo, format = "%d.%m.%Y %H:%M", tz = "GMT")
sum(is.na(tempo_POSIX))
# [1] 0
My system information is:
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Portuguese_Brazil.1252 LC_CTYPE=Portuguese_Brazil.1252
[3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Brazil.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bReeze_0.4-0
loaded via a namespace (and not attached):
[1] tools_3.0.2
Answer:
In the as.POSIXlt
help, there is the following passage that highlights that converting date time formats needs a time-zone and will validate this time and that this can cause problems in daylight saving time (DST):
Character input is first converted to class "POSIXlt" by strptime: numeric input is first converted to "POSIXct". Any conversion that needs to go between the two date-time classes requires a time zone: > conversion from "POSIXlt" to "POSIXct" will validate times in the selected time zone. One issue is what happens at transitions to and from DST
When you do strptime(tempo, format = "%d.%m.%Y %H:%M")
, you are converting the object to the POSIXlt class.
class(tempo_POSIX)
[1] "POSIXlt" "POSIXt"
But when you do is.na()
, you are converting to POSIXct. Note that the is.na.POSIXlt
method uses the as.POSIXct
function:
is.na.POSIXlt
function (x)
is.na(as.POSIXct(x))
<bytecode: 0x26519a14>
<environment: namespace:base>
Daylight saving time in Brazil in 2009 started on October 18th at 00:00. That is, considering daylight savings time, there is no 00:00 in Brazil on October 18, 2009, because when the clock turned 23:59 the day before, it automatically jumped to 01:00 in the morning.
So when you do is.na()
you are transforming the date into POSIXct and this conversion validates the date supplied with your locale (which is probably Brazil/São Paulo, as you didn't specify the time zone, the of the system). And since there is no 00:00 on October 28th in this time zone, this results (correctly, but unexpectedly) in NA. When you put the GMT time-zone or another one that exists the date you are passing (like London), it does the conversion normally, that's why it worked with tz = "GMT"
(and why it worked with Djongs, it must be in another locale).