1

Here I am trying to loop array of URLS and get the response from one set of URLs and process the response URLs . In code I want the outer loop should enter after completing all the inner request and want the result like below.

Checking Urls in : https://stackoverflow.com status 200 done .... .... Checking Urls in : https://example.com ..... ..... .....

Total links #20

But in my code outer loop is finishing before the request done.

const getHrefs = require('get-hrefs');
const async = require("async");
var req = require('request-promise');
var errors = require('request-promise/errors');

var pageUrls = ['https://stackoverflow.com','https://www.exsample.com'];
testUrls='';

async.map(pageUrls, function(pageUrl,callback){
    //process itemA
    req(pageUrl, function (err, response, body) {
        console.log(pageUrl, " STATUS: ", response.statusCode);
        if ( err){
            return callback(err);
        } 
        else {
        testUrls= getHrefs(response.body);

        async.map(testUrls, function(testUrl,callback1){
             linkCount++;
               req(testUrl).catch(errors.StatusCodeError, function (reason) {
                        brokenLinks++;
                        console.log("URL: "+ testUrl+ "reason: "+ reason.statusCode);
                    })
                    .catch(errors.RequestError, function (reason) {

                    }).finally(function () {


                    });

                return  callback1();
             },function(err){

                 callback();

              }) ;
        }
    })

} ,function(err){
    console.log("OuterLoopFinished");
    console.log('*************************************************************' + '\n');
    console.log('Check complete! || Total Links: ' + linkCount + ' || Broken Links: ' + brokenLinks);
    console.log('*************************************************************');

});
4
  • You're calling the callback1 too early. Try putting it inside the finally block. Commented Jul 21, 2017 at 2:32
  • thanks for your reply, in that case its not ending with the final call back function (outerloopfinished) Commented Jul 21, 2017 at 4:01
  • The call of callback1 marks the end of the inner async.map execution (for each testUrls items). The completion of async.map(testUrls... calls callback which marks the end of the outer async.map execution (for each pageUrl items). The completion of async.map(pageUrl... calls the outerloopfinished function... Commented Jul 21, 2017 at 4:45
  • One thing to note is the async.map callback function (the function(err){ "Outerloopfinished".. } and the function(err) { callback(); }) is called when all iteratee functions have finished, OR AN ERROR OCCURS (see: caolan.github.io/async/docs.html#map). It's possible that your first IF branch (if ( err){ return callback(err); }) invoked an early termination... Commented Jul 21, 2017 at 4:50

1 Answer 1

1

I think you should relook at your approach. This makes 400 URLs. You should fire all request in parallel (for sublinks) and then you can track the count of broken URIs from host URL. This will complete your script faster.

const pageUrls = ['https://stackoverflow.com','https://www.google.com'];
const rp = require('request-promise');
const allRequestPromises = [];
const getHrefs = require('get-hrefs');

const checkBrokenCount = (url, host) => {
  rp(url).then((response) => {
    console.log('valid url', url, host);
    // other code
  })
  .catch((error) => {
    console.log('invalid url', url, host);
  });
}

pageUrls.forEach((pageUrl)=> {
  // Lets call all the base URLs in parallel asuming there are not incorrect.
  allRequestPromises.push(rp({uri: pageUrl, resolveWithFullResponse: true}));
});
Promise.all(allRequestPromises).then((responses) => {
  responses.forEach((response, index) => {
    // Promise.all guarantees the order of result.
    console.log(pageUrls[index], response.statusCode);
    const testUrls= getHrefs(response.body);
    testUrls.forEach((testUrl) => {
      checkBrokenCount(testUrl, pageUrls[index]);
    });
  });
});
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your response its sounds good . But how to skip the invalid base urls in promise all?
Promise.all is fail fast (developer.mozilla.org/en/docs/Web/JavaScript/Reference/…) If you have invalid base URLs, you cannot use Promise.all rather use the function we already created and call it twice. Once for the base URLs and once if the response of the first request is 200 If this helps, don't forget to upvote/accept the answer :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.