5

I am trying to parse a HTML code in order to extract all links in it. To avoid unavailable links I remove the commented code that begins with <!-- and ends with --> .Here comes the problem: In the HTML code I may find some JavaScript code, for example:

<html>
<HEAD>
<SCRIPT LANGUAGE="JavaScript">
<!-- Begin
if (document.images) {
  var pic2 = new Image(); // for the inactive image
  pic2.src = "pic2.jpg";
  var title2 = new Image();
  title2.src = "title2.jpg";
  }
...
-->

and the weird thing is that the js code is commented but it still works. So, if I remove that code, the result won't be as expected. What should I do in order to identify when I'm facing with unused commented code and when that commented code is functional?

3 Answers 3

6

the weird thing is that the js code is commented but it still works

Those aren't comments. Is is just syntax allowed inside script (and style) elements that follows the comment syntax so that browsers which predate script and style don't render the code as text.

What should I do in order to identify when I'm facing with unused commented code and when that commented code is functional?

Write a real HTML parser, following the parsing specification, and then remove any comment nodes from the generated DOM.


As a dirty (but possibly quick) solution, you could just ignore comments inside elements marked as containing CDATA in the HTML 4.01 DTD.

Sign up to request clarification or add additional context in comments.

1 Comment

Ok. Now things are clear to me. Thanx very much for your answer. I'll look for the best strategy.
0

the weird thing is that the js code is commented but it still works

There is nothing weird about it. The comments <!-- --> only work in HTML, not JavaScript. Your above code will still work since you've put these comments within the <script> tags. The only difference it makes is that if the user has disabled JavaScript on his/her browser, he won't see the code printed on the browser (since HTML will parse those comments in the absence of JavaScript).

1 Comment

This doesn't answer the question (which is about identifying which <!-- and --> are comments and which are not). You're also wrong, browsers which support JS but have it disabled (as well as any browser which doesn't support JS written since about 1998) won't render the text inside scripts. It is only browsers which predate the addition of script to HTML that will.
-1

You need to comment out the whole <script> block. e.g.

 <!-- <script>
       ...some javascript code... 
</script> -->

2 Comments

The question is asking how to identify <!--/--> sequences which are comments and which are not. It isn't asking how to comment out a script.
sorry, completely missed that part.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.