2

I have difficulty converting dates from excel (reading from csv) to R. Help is much appreciated.

Here is what I'm doing:

df$date = as.Date(df$excel.date, format = "%d/%m/%Y")

However, some dates get converted but some not. Here is the output of:

head(df$date)
[1] NA           NA           NA           "0006-01-05" NA           NA 

the first 5 entries imported from csv file are as follows:

7/28/05
7/28/05
12/16/05
5/1/06
4/21/05

and here is the output of:

head(df$excel.date)
[1] 7/28/05  7/28/05  12/16/05 5/1/06   4/21/05  1/25/07 
1079 Levels: 1/1/00 1/1/02 1/1/97 1/10/96 1/10/99 1/11/04 1/11/94 1/11/96 1/11/97 1/11/98 ... 9/9/99

str(df)
.
.
$ excel.date   : Factor w/ 1079 levels "1/1/00","1/1/02",..: 869 869 288 618 561 48 710 1022 172 241 ...
3
  • First you should make df$date = as.character(df$excel.date) and after of this make df$date = as.Date(df$excel.date,format = "%m/%d/%y" Commented Apr 12, 2014 at 22:57
  • Tx. did that; still here is the result: df = read.csv("df.csv", as.is=TRUE) > df$date = as.character(df$excel.date) > head(df$date) [1] "7/28/05" "7/28/05" "12/16/05" "5/1/06" "4/21/05" "1/25/07" > df$date = as.Date(df$date, format = "%d/%m/%y") > head(df$date) [1] NA NA NA "2006-01-05" NA NA Commented Apr 12, 2014 at 23:15
  • it should be not "%d/%m/%y" but "%m/%d/%y" - 7/28/05 is 28th July. Commented Apr 13, 2014 at 0:26

2 Answers 2

2

First of all, make sure you have the dates in your file in an unambiguous format, using full years (not just 2 last numbers). %Y is for "year with century" (see ?strptime) but you don't seem to have century. So you can use %y (at your own risk, see ?strptime again) or reformat the dates in Excel.

It is also a good idea to use as.is=TRUE with read.csv when reading in these data -- otherwise character vectors are converted to factors which can lead to unexpected results.

And on Wndows it may be easier to use RODBC to read in dates directly from xls or xlsx file.

(edit)

The following may give a hint:

> as.Date("13/04/2014", format= "%d/%m/%Y")
[1] "2014-04-13"
> as.Date(factor("13/04/2014"), format= "%d/%m/%Y")
[1] "2014-04-13"
> as.Date(factor("13/04/14"), format= "%d/%m/%Y")
[1] "14-04-13"
> as.Date(factor("13/04/14"), format= "%d/%m/%y")
[1] "2014-04-13"

(So as.Date can actually take care of factors - the magick happens in as.Date.factor method defined as:

function (x, ...)  as.Date(as.character(x), ...)

It is not a good idea to represent dates as factors but in this case it is not a problem either. I think the problem is excel which saves your years as 2-digit numbers in a CSV file, without asking you.)

-

The ?strptime help file says that using %y is platform specific - you can have different results on different machines. So if there's no way of going back to the source and save the csv in a better way you might use something like the following:

x <- c("7/28/05", "7/28/05", "12/16/05", "5/1/06", "4/21/05", "1/25/07")

repairExcelDates <- function(x, yearcol=3, fmt="%m/%d/%Y") {
 x <-  do.call(rbind, lapply(strsplit(x, "/"), as.numeric))
 year <- x[,yearcol]
 if(any(year>99)) stop("dont'know what to do")
 x[,yearcol] <- ifelse(year <= as.numeric(format(Sys.Date(), "%Y")), year+2000, year + 1900) 
 # if year <= current year then add 2000, otherwise add 1900
 x <- apply(x, 1, paste, collapse="/")
 as.Date(x, format=fmt)
 }

repairExcelDates(x)
# [1] "2005-07-28" "2005-07-28" "2005-12-16" "2006-05-01" "2005-04-21"
# [6] "2007-01-25"
Sign up to request clarification or add additional context in comments.

3 Comments

The dates seem to be unambiguous on the excel file (4 digit year), I also added the as.id =TRUE; still here is the result: df = read.csv("df.csv", as.is=TRUE) > df$date = as.character(df$excel.date) > head(df$date) [1] "7/28/05" "7/28/05" "12/16/05" "5/1/06" "4/21/05" "1/25/07" > df$date = as.Date(df$date, format = "%d/%m/%y") > head(df$date) [1] NA NA NA "2006-01-05" NA NA
check your csv file in notepad -- is the year a 4 digit number? It is probably the way excel saved the csv file ... if there were 4 digits there then R would read it. I don't know how to change the way excel saves the dates in csv format -- there might be something on it in excel help. or try RODBC - e.g milanor.net/blog/?p=779
I ended up importing the file directly from excel (data.xlsx), thanks to the link you suggested. And now it converts the dates just fine. Thanks.
1

Your data is formatted as Month/Day/Year so

df$date = as.Date(df$excel.date, format = "%d/%m/%Y")

should be

df$date = as.Date(df$excel.date, format = "%m/%d/%Y")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.