-1

I have a Javascript in my HTML code. I want navigate through the links ending with "Doc". In this HTML there is only one link, called SunnydataDoc. So I want search this string on this page and if there are existing any links ending with "Doc", I want to navigate further down in those pages. Could you please help me out in this? I've heard I can use regex and match methods in combination with Jsoup. Here my code.

<script>
    var data = {"totalRecords": 2, "sort": "name", "startIndex": 0, "dir": "asc", "records": [{"raw_name": "samia/export/sunnydata", "last_changeset": "\n  <div>\n      <pre><a title=\"ownerID:\n\nAdded tag V2.11.d50.mkt.001 for changeset 56e10a4864ff\" class=\"tooltip\" href=\"/samia/export/sunnydata/changeset/f602409eba261d749d23dc75551b2959425dfa8d\">r17:f602409eba26</a></pre>\n  </div>\n", "atom": "\n    <a title=\"Subscribe to samia/export/sunnydata atom feed\" href=\"/samia/export/sunnydata/feed/atom?api_key=e214ebea2335318bee1460a1fd33725ab3e1002e\"><i class=\"icon-rss-sign\"  style=\"color: #fa9b39\"></i></a>\n", "owner": "ownerID (Owner)", "rss": "\n    <a title=\"Subscribe to samia/export/sunnydata rss feed\" href=\"/samia/export/sunnydata/feed/rss?api_key=e214ebea2335318bee1460a1fd33725ab3e1002e\"><i class=\"icon-rss-sign\" style=\"color: #fa9b39\"></i></a>\n", "name": "\n    \n  <div style=\"white-space: nowrap; }\">\n        <a href=\"/samia/export/sunnydata\">\n\n        <span title=\"Mercurial repository\"><i class=\"icon-hg\" style=\"color: #316293; font-size: 14px;\"></i></span>\n\n      <span style=\"margin: 0px 8px 0px 8px\"></span>\n    Sunnydata\n    </a>\n  </div>\n", "last_rev_raw": 17, "state": "\n  <div>\n        <div class=\"btn btn-mini btn-success disabled\">Created</div>\n  </div>\n", "menu": "\n  <ul class=\"menu_items hidden\">\n\n    <li style=\"border-top:1px solid #003367;margin-left:18px;padding-left:-99px\"></li>\n    <li>\n       <a title=\"Summary\" href=\"/samia/export/sunnydata\">\n       <span class=\"icon\">\n           <i class=\"icon-file-text\"></i>\n       </span>\n       <span>Summary</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Changelog\" href=\"/samia/export/sunnydata/changelog\">\n       <span class=\"icon\">\n           <i class=\"icon-list-alt\"></i>\n       </span>\n       <span>Changelog</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Files\" href=\"/samia/export/sunnydata/files/tip/\">\n       <span class=\"icon\">\n           <i class=\"icon-file-alt\"></i>\n       </span>\n       <span>Files</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Fork\" href=\"/samia/export/sunnydata/fork\">\n       <span class=\"icon\">\n           <i class=\"icon-code-fork\"></i>\n       </span>\n       <span>Fork</span>\n       </a>\n    </li>\n  </ul>\n", "desc": "GHU Sunnydataimport", "last_change": "\n  <span class=\"tooltip\" date=\"2014-08-21 18:49:50\" title=\"Thu, 21 Aug 2014 18:49:50\">10 days and 16 hours ago</span>\n"}, {"raw_name": "samia/export/sunnydatadoc", "last_changeset": "\n  <div>\n      <pre><a title=\"ownerID;lt;owneremail;gt;:\n\nChangedokumentation\" class=\"tooltip\" href=\"/samia/export/sunnydataDoc/changeset/9ed1679c7a35b76e1402b540cee38000461fdfdd\">r0:9ed1679c7a35</a></pre>\n  </div>\n", "atom": "\n    <a title=\"Subscribe to samia/export/sunnydataDoc atom feed\" href=\"/samia/export/sunnydataDoc/feed/atom?api_key=e214ebea2335318bee1460a1fd33725ab3e1002e\"><i class=\"icon-rss-sign\"  style=\"color: #fa9b39\"></i></a>\n", "owner": "ownerID (Owner)", "rss": "\n    <a title=\"Subscribe to samia/export/sunnydataDoc rss feed\" href=\"/samia/export/sunnydataDoc/feed/rss?api_key=e214ebea2335318bee1460a1fd33725ab3e1002e\"><i class=\"icon-rss-sign\" style=\"color: #fa9b39\"></i></a>\n", "name": "\n    \n  <div style=\"white-space: nowrap; }\">\n        <a href=\"/samia/export/sunnydataDoc\">\n\n        <span title=\"Mercurial repository\"><i class=\"icon-hg\" style=\"color: #316293; font-size: 14px;\"></i></span>\n\n      <span style=\"margin: 0px 8px 0px 8px\"></span>\n    SunnydataDoc\n    </a>\n  </div>\n", "last_rev_raw": 0, "state": "\n  <div>\n        <div class=\"btn btn-mini btn-success disabled\">Created</div>\n  </div>\n", "menu": "\n  <ul class=\"menu_items hidden\">\n\n    <li style=\"border-top:1px solid #003367;margin-left:18px;padding-left:-99px\"></li>\n    <li>\n       <a title=\"Summary\" href=\"/samia/export/sunnydataDoc\">\n       <span class=\"icon\">\n           <i class=\"icon-file-text\"></i>\n       </span>\n       <span>Summary</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Changelog\" href=\"/samia/export/sunnydataDoc/changelog\">\n       <span class=\"icon\">\n           <i class=\"icon-list-alt\"></i>\n       </span>\n       <span>Changelog</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Files\" href=\"/samia/export/sunnydataDoc/files/tip/\">\n       <span class=\"icon\">\n           <i class=\"icon-file-alt\"></i>\n       </span>\n       <span>Files</span>\n       </a>\n    </li>\n    <li>\n       <a title=\"Fork\" href=\"/samia/export/sunnydataDoc/fork\">\n       <span class=\"icon\">\n           <i class=\"icon-code-fork\"></i>\n       </span>\n       <span>Fork</span>\n       </a>\n    </li>\n  </ul>\n", "desc": "GHU Sunnydataimport (Dokumentation)", "last_change": "\n  <span class=\"tooltip\" date=\"2014-04-25 11:03:45\" title=\"Fri, 25 Apr 2014 11:03:45\">4 months and 6 days ago</span>\n"}]};
    var myDataSource = new YAHOO.util.DataSource(data);
    myDataSource.responseType = YAHOO.util.DataSource.TYPE_JSON;

So in this example I have this link: href=\"/samia/export/sunnydataDoc\". I want to take this link and go in there with my code.

And this is my Java code.

public class JScripttest {

public static void main(String[] args) throws IOException {

    Response res = Jsoup
            .connect(
                    "url")
            .data("username", "username", "password", "password")
            .method(Method.POST).execute();
    Map<String, String> loginCookies = res.cookies();
    Document doc = Jsoup.connect("url")
            .cookies(loginCookies).get();


    Element script = doc.select("href").last();

    Pattern p = Pattern.compile("href\s=\s"([^"]+Doc)""); // Regex for the value of the href
    Matcher m = p.matcher(script.html()); // you have to use html here and NOT text! Text will drop the 'href' part

    while( m.find() )
    {
        System.out.println(m.group()); 
        System.out.println(m.group(1));
    }

    }



private static void print(String msg, Object... args) {
    System.out.println(String.format(msg, args));
}

So I get errors in "Pattern...."line

Thanks for looking.

1 Answer 1

0

This regex gets you the links that end in Doc. I'm unsure what you mean with "go in" but this should help you on your way. Group 1 contains the URL.

href\s*=\s*"([^"]+Doc)"

Regular expression visualization

Debuggex Demo

Properly escaped " in Java:

Pattern p = Pattern.compile("href\\s*=\\s*\"([^\"]+Doc)\"");
Sign up to request clarification or add additional context in comments.

11 Comments

I am using the pattern and matcher for this. So is that correct? I get errors. ` Pattern p = Pattern.compile(href\s*=\s*"([^"]+Doc)"); Matcher m = p.matcher(script.html());
You seem to be confusing Java and JavaScript. Despite the unfortunate similarity in name, they have little to do with each other. Your <script> tag makes me think you're working with JavaScript on a website, but Pattern p = Pattern.compile() is Java. Take a look here for help with regexes in JavaScript: w3schools.com/jsref/jsref_obj_regexp.asp
No I am not confusing. I program with Jsoup which is Java and having the Javascript code in the HTML code, where I parse and fetch with Jsoup. I take an example of this stackoverflow.com/questions/14904776/… and wanted to build your approach in my code.
Ah, that explains. Could you update your question with the Java code you have? It looks like you forgot tot put the regex in " and escape the " inside the regex, you can see in the link you gave, take a good look at where " and \" is placed: Pattern p = Pattern.compile("(?is)key=\"(.+?)\"");
updated. To your question with "go in", I mean I want to go inside of that "href=http:/..../...Doc" link and do more things.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.