I am trying to parse a page with HtmlUnit and the html has a defect in it, where table columns are ended with <?td> instead of </td>. Unfortunately I can't fix the html myself on the server-side as I don't own the project, so I need to work around this.
I noticed that when I save the page onto my hard drive from Chrome (right-click -> save as) and then I open the file that I've saved and view the source (right-click -> view page source), Chrome has magically fixed the error in the actual html. After the page has been saved and re-opened by Chrome I see this in the source <td> <!--?td--> </td>, so it seems like Chrome has detected the error, commented it out and replaced it with the correct tag.
Is it possible to get HtmlUnit to do something similar? Either automatically, or can I implement some kind of filter myself to replace all <?td> with </td> before it parses it into an HtmlPage? I see that I can implement my own IncorrectnessListener for the WebClient, perhaps something in there? I haven't been able to figure it out so any help would be appreciated.