1

I have created a series of commands in R that get a job done using a specific URL. I would like to iterate the series of commands over a list of URLS that reside in a separate text file. How do I call the list into the commands one at a time?

I do not know what the proper terminology for this programming action. I've looked into scripting and batch programming but this is not what I want to do.

# URL that comes from list
URL <- "http://www.urlfromlist.com"

# Load URL
theurl <- getURL(URL,.opts = list(ssl.verifypeer = FALSE) )

# Read the tables
tables <- readHTMLTable(theurl)

# Create a list
tables <- list.clean(tables, fun = is.null, recursive = FALSE)

# Convert the list to a data frame
df <- do.call(rbind.data.frame, tables)

# Save dataframe out as a csv file
write.csv(df2, file = dynamicname, row.names=FALSE)

The above code is what I am doing. The first variable needs to be a different URL each time from a list - rinse and repeat. Thanks!

UPDATED CODE - this is still not writing out any files but runs.

# Function to pull tables from list of URLs
URLfunction<- function(x){
  # URL that comes from list
  URL <- x

  # Load URL
  theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) )

  # Read the tables
  tables <- XML::readHTMLTable(theurl)

  # Create a list
  tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE)

  # Convert the list to a data frame
  df <- do.call(rbind,tables)

  # Split date and time column out
  df2 <- separate(df, "Date / Time", c("Date", "Time"), sep = " ")

  # Fill the missing column with text, in this case shapename
  shapename <- qdapRegex::ex_between(URL, "ndxs", ".html")
  df2$Shape <- shapename

  # Save dataframe out as a csv file
  write.csv(result, paste0(shapename, '.csv', row.names=FALSE))

  return(df2)
}

URL <- read.csv("PATH", header = FALSE)
purrr::map_df(URL, URLfunction) ## Also tried purrr::map_df(URL[,1], URLfunction) 
2
  • Is the list of URLs in a text document on your local computer or is it at a URL? Commented May 15, 2019 at 0:19
  • Hi Andrew, Yes the URLs are in a csv, Commented May 15, 2019 at 11:09

1 Answer 1

1

If i understand your question correctly, my answer could be work with your problem.

Used library

library(RCurl)
library(XML)
library(rlist)
library(purrr)

Define function

URLfunction<- function(x){
# URL that comes from list
URL <- x

# Load URL
theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) )

# Read the tables
tables <- XML::readHTMLTable(theurl)

# Create a list
tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE)

# Convert the list to a data frame
df <- do.call(rbind,tables)

# Save dataframe out as a csv file

return(df)
}

Assume you have a data like below

( I am not sure what data looks like you have )

URL <- c("https://stackoverflow.com/questions/56139810/how-to-call-a-script-in-another-script-in-r",
         "https://stackoverflow.com/questions/56122052/labelling-points-on-a-highcharter-scatter-chart/56123057?noredirect=1#comment98909916_56123057")

result<- purrr::map(URL, URLfunction) 
result <- do.call(rbind, result)

Write.csv is last step

If you want write.csv by each URL , plz move in to URLfunction

write.csv(result, file = dynamicname, row.names=FALSE)

Aditional

List version

URL <- list("https://stackoverflow.com/questions/56139810/how-to-call-a-script-in-another-script-in-r",
        "https://stackoverflow.com/questions/56122052/labelling-points-on-a-highcharter-scatter-chart/56123057?noredirect=1#comment98909916_56123057")


result<- purrr::map_df(URL, URLfunction) 

>result

   asked    today yesterday
1 viewed 35 times      <NA>
2 active    today      <NA>
3 viewed     <NA>  34 times
4 active     <NA>     today

CSV

URL <- read.csv("PATH",header = FALSE)

result<- purrr::map_df(URL[,1], URLfunction) 

>result

   asked    today yesterday
1 viewed 35 times      <NA>
2 active    today      <NA>
3 viewed     <NA>  34 times
4 active     <NA>     today

Add edited version of your code.


URLfunction<- function(x){
  # URL that comes from list
  URL <- x
  
  # Load URL
  theurl <- RCurl::getURL(URL,.opts = list(ssl.verifypeer = FALSE) )
  
  # Read the tables
  tables <- XML::readHTMLTable(theurl)
  
  # Create a list
  tables <- rlist::list.clean(tables, fun = is.null, recursive = FALSE)
  
  # Convert the list to a data frame
  df <- do.call(rbind,tables)
  
  # Split date and time column out
  df2 <- tidyr::separate(df, "Date / Time", c("Date", "Time"), sep = " ")
  
  # Fill the missing column with text, in this case shapename

  shapename <- unlist(qdapRegex::ex_between(URL, "ndxs", ".html"))
  # qdapRegex::ex_between returns list type, when it added to df2 it couldn't be saved. 
  # So i added 'unlist' 

  df2$Shape <- shapename
  
  # Save dataframe out as a csv file
  write.csv(df2, paste0(shapename, '.csv'), row.names=FALSE)
# Here are two error.
# First, You maked the data named 'df2' not 'result'. So i changed result -->df2
# Second, row.names is not the 'paste0' attributes, it is 'write.csv's attributes.  
  return(df2)
}

After defining above function,

URL = c("nuforc.org/webreports/ndxsRectangle.html",
        "nuforc.org/webreports/ndxsRound.html")

RESULT = purrr::map_df(URL, URLfunction) ## Also tried purrr::map_df(URL[,1], URLfunction) 

Finally, i get the result below

1. Rectangle.csv, Round.csv files on your desktop(Saved path).
2. Returning row binded data frame looks like below (2011 x 8)
> RESULT[1,]
    Date  Time     City State     Shape  Duration
1 5/2/19 00:20 Honolulu    HI Rectangle 3 seconds
                                                                                                                             Summary
1 Several of rectangles connected in different LED like colors.  Such as red, green, blue, etc. ;above Waikiki. ((anonymous report))
  Posted
1 5/9/19
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. I have a csv file of all the urls, but I could try to use the vector shown above.
Thanks, I am able to run the function but it is not achieving the result. I want to run the function over each URL separately and then write the file out. Please see above the edited code. Any ideas?
Sure, it's just a list of URLs, see the first two URLs below. However, what I think is happening is that the code is compiling all the URLS into one data frame instead of doing each URL separately and then writing each CSV out per URL. I need to do changes to each URL individually before writing them out. The changes cannot be made because URL is all the links, not one at a time. Let me know. nuforc.org/webreports/ndxsRectangle.html, nuforc.org/webreports/ndxsRound.html
From your reply, i added new edited version(chnaged a little from your code). It worked well on me!!
Thanks for those edits. It worked for me too. I really appreciate it!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.