3

In R am reading a file with comments as csv using

read.data.raw = read.csv(inputfile, sep='\t', header=F, comment.char='')

The file looks like this:

#comment line 1
data 1<tab>x<tab>y
#comment line 2
data 2<tab>x<tab>y
data 3<tab>x<tab>y

Now I extract the uncommented lines using

comment_ind = grep( '^#.*', read.data.raw[[1]])
read.data = read.data.raw[-comment_ind,]

Which leaves me:

 data 1<tab>x<tab>y
 data 2<tab>x<tab>y
 data 3<tab>x<tab>y

I am modifying this data through some separate script which maintains the number of rows/cols and would like to put it back into the original read data (with the user comments) and return it to the user like this

#comment line 1
modified data 1<tab>x<tab>y
#comment line 2
modified data 2<tab>x<tab>y
modified data 3<tab>x<tab>y

Since the data I extracted in read.data preserves the row names row.names(read.data), I tried

original.read.data[as.numeric(row.names(read.data)),] = read.data

But that didn't work, and I got a bunch of NA/s

Any ideas?

4
  • How exactly did it change the data? If it turned factors into characters, or similar changes in data types, that would account for the NAs. Commented Aug 27, 2012 at 19:52
  • Also, you're going to get NAs after the comment line in any column if you force the column to be numeric. R wasn't really meant to read in comment data along with the data frame, though you could find ways around it. In any case, you'd have to be more specific about the type of data you read in and how you modified it Commented Aug 27, 2012 at 19:58
  • The data I'm reading in is a 5 column formatted data: Column 1-3(numeric) column 4-5 character strings In most cases I am replacing values in specific cells of the data frame (example data[5,8]=NA) and sometimes replacing the whole column (example data[[3]]=1:100) I forced R to read the comment data, because when I set comment.char to '#', I lost the comment lines. So by getting R to read it that way, I can extract the uncommented lines, leaving commented lines behind. At least that was my logic behind my choices Commented Aug 27, 2012 at 20:22
  • 1
    Why not edit your original question to include a fully reproducible example? Commented Aug 27, 2012 at 20:24

1 Answer 1

1

Does this do what you want?

read.data.raw <- structure(list(V1 = structure(c(1L, 3L, 2L, 4L, 5L),
   .Label = c("#comment line 1", "#comment line 2", "data 1", "data 2", 
   "data 3"), class = "factor"), V2 = structure(c(1L, 2L, 1L, 2L, 2L), 
   .Label = c("", "x"), class = "factor"), V3 = structure(c(1L, 2L, 1L,
   2L, 2L), .Label = c("", "y"), class = "factor")), .Names = c("V1", 
   "V2", "V3"), class = "data.frame", row.names = c(NA, -5L))

comment_ind = grep( '^#.*', read.data.raw[[1]])
read.data <- read.data.raw[-comment_ind,]
# modify V1
read.data$V1 <- gsub("data", "DATA", read.data$V1)
# rbind() and then order() comments into original places
new.data <- rbind(read.data.raw[comment_ind,], read.data)
new.data <- new.data[order(as.numeric(rownames(new.data))),]
Sign up to request clarification or add additional context in comments.

1 Comment

Ah! Sorting by the row names, I didn't think of that! Works like a charm! THANKS!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.