1

I have a list of urls in col A.

I am trying to extract a date and time from each url. I have been using importxml formula but reaches max limit i can use due to one importxml formula per url.

i want the date and time to populate adjacent to the urls in col B.

here is an example of url.

https://www.punters.com.au/form-guide/bunbury_171923/nti-maiden_987656/#Overview

here is the date and time field i am interested in.

Any help is highly appreciated

date and time location

4
  • About one importxml formula per url, can you provide the sample Spreadsheet including the sample URLs? Because if the HTML structure is different for each URL, the script is required to be modified for each URL. I worry about this. Commented Mar 6, 2020 at 23:56
  • I tried importxml and it says xml cannot be parsed. The html structure should not change. Here are a few more urls you can try. Commented Mar 7, 2020 at 4:58
  • punters.com.au/form-guide/doomben_171972/… punters.com.au/form-guide/doomben_171972/… Commented Mar 7, 2020 at 4:59
  • Thank you for replying. I proposed a sample script. Could you please confirm it? If I misunderstood your question and that was not the result you want, I apologize. Commented Mar 7, 2020 at 6:09

1 Answer 1

1
  • You want to retrieve the date and time from the URLs like https://www.punters.com.au/form-guide/doomben_171972/millers-swim-school-maiden-plate_987944/ and https://www.punters.com.au/form-guide/doomben_171972/paddyfest-march-14-qtis-three-years-old-maiden-plate_987945/.
  • The HTML structure is constant for each URL.
  • You want to achieve this using Google Apps Script.

If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.

Unfortunately, the values like Thursday, 13 Feb at 4:57pm cannot be directly retrieved. So at first, the unix time is retrieved and the value is converted to the string with the format.

Sample script:

Before you run the script, please set the sheet name. And this script supposes that the URLs are put in the column "B".

function myFunction() {
  var sheetName = "Sheet1";  // Please set the sheet name.

  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheetByName(sheetName);
  var urls = sheet.getRange(1, 2, sheet.getLastRow(), 1).getValues();
  var requests = urls.map(([url]) => ({url: url}));
  var res = UrlFetchApp.fetchAll(requests);
  var timezone = ss.getSpreadsheetTimeZone();
  var dateTimes = res.map(e => {
    if (e.getResponseCode() == 200) {
      var r = /<abbr class\="form-header__timestamp timestamp time12" data-utime\="(\d.+)"/.exec(e.getContentText())[1];
      return [Utilities.formatDate(new Date(Number(r) * 1000), timezone, "EEEE, dd MMM 'at' hh:mm") + Utilities.formatDate(new Date(Number(r) * 1000), timezone, "a").toLowerCase()];
    }
    return [""];
  });
  sheet.getRange(1, 3, dateTimes.length, 1).setValues(dateTimes);
}

Result:

When you run the script, the retrieved date values are put to the column "C" as follows.

enter image description here

Note:

  • If fetchAll cannot be used in your situation, I think that it is required to use fetch in the loop.

References:

Sign up to request clarification or add additional context in comments.

7 Comments

You sir, are a legend! Worked perfectly. had to change timezone of the spreadsheet to get exact time
I have been having a few issues. The code only works if there are blank columns after the URL. For example, I had urls in col B and Col C was left blank for dates and had some other calculations in Col D and so on. But this caused issues with the code when run threw error showing unknown error http://. Can you please improvise the code. I made a few changes to suit my need as below.
@Joe Abraham Thank you for replying. I have to apologize for my poor English skill. Unfortunately, I cannot understand about your new question. But I would like to resolve it. So, can you explain about it using a sample Spreadsheet? By this, I would like to think of your new question and solution. If you can cooperate to resolve it, I'm glad.
No problem. Appreciate you trying to help. Link here: - docs.google.com/spreadsheets/d/…
@Joe Abraham Thank you for replying and adding the information. About the issue of your new question, the reason of issue is that there are the values which have no URLs. And in your shared Spreadsheet, 1st row is not required to be used. So please modify var urls and var requests and the last line in my script to var urls = sheet.getRange(2, 2, sheet.getLastRow() - 1, 1).getValues();, var requests = urls.map(([url]) => ({url: url})).filter(e => e.url); and sheet.getRange(2, 5, dateTimes.length, 1).setValues(dateTimes);.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.