0

I have a csv file with 1.5 million rows which consists of 2 columns name and email.I want to write a program in such way that when I read my file in R, the output is segmented of 5000 data in each csv.

Maybe I can do this with a loop: run from row 1 to 5000 and save it as project1.csv and then 5001 to 10000 and save to project2.csv and then 10001 till 15000 in project3.csv in my working directory. Any suggestions?

2 Answers 2

2

Assuming that 'df1' is the data.frame which we need to segment every 5000 rows and save it in a new file, we split the dataset by creating a grouping index based on the sequence of rows to a list (lst). We loop through the sequence of list elements (lapply(...), and write new file with write.csv.

n <- 5000
lst <-  split(df1, ((seq_len(nrow(df1)))-1)%/%n+1L)
invisible(lapply(seq_along(lst), function(i) 
   write.csv(lst[[i]], file=paste0('project', i, '.csv'), row.names=FALSE)))
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks and awesome its worked ...u saved my day almost
what if i have 7 columns instead of 2
@Sandy2511 It should be the same code as the split is dependent upon the rows and not the columns.
0

An answer using purrr and readr

n <- 5000
split(df1, ((seq_len(nrow(df1)))-1)%/%n+1L) %>%
  purrr::iwalk(., ~ readr::write_csv(.x, paste0("project", .y, ".csv")))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.