So I am trying to code a program which takes a file containing simple HTML syntax into a tree which would show the hierarchy of the tags. Ultimately, each leaf would contain a tag (ie. p, h, ul, etc) and text. Much of this is pretty simple and I am planning on using Jtree to show the final output. However, what I am having difficulty on is going through the syntax and building an in initial tree with the tags without losing the relations. What I am think is that the entire file would be one long string. The program will find a '<' where the second char is not a '/' and consider that an new tag/leaf. The code would then move on and check the next set of chars to see if there is another '<' which would indicate a child tag. If a '/' is found in the second char after the '<', then the code would move to the next leaf on the same level.
Hopefully, you get what I am trying to do, unfortunately, my attempt at it was less than successful as it only showed the child nodes of the root tag. Currently, I am only trying to get the tags to work in a tree, the text and what not I can figure out later. To test the code, I used a string "test" that has some basic sample html code, each of the nodes are shown within the root when the jtree is created, but the child nodes in node2 never shows up. I am so confused and cannot rap my head around this. Also, is there a more simpler/efficient way of doing this?
**EDIT: So I Modified the code to work using JSoup. I managed to get it to work, however, I am having an issue where for some reason, all but the first child tag of the head tag gets moved under the body take. So now body has 3 children instead of one and head only has one instead of three. Also, how would i modify the getChildren() recursive function to work for each child layer within the previous child? For example, to get the h3 tag within the title tag?
package weboqltree_converter;
import javax.swing.JFrame;
import javax.swing.JTree;
import javax.swing.SwingUtilities;
import javax.swing.tree.DefaultMutableTreeNode;
import java.util.ArrayList;
import java.awt.Dimension;
import java.util.List;
import javax.swing.tree.TreeNode;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;
public class GUI extends JFrame
{
private JTree tree;
private String test = "<html>"
+ "<head>"
+ "<title><h3>First parse<h3></title>"
+ "<a></a>"
+ "<h3></h3>"
+ "</head>"
+ "<body>"
+ "<p>Parsed HTML into a doc.</p>"
+ "</body>"
+ "</html>";
private int parentNode;
public static void main(String[] args)
{
SwingUtilities.invokeLater(new Runnable() {
public void run() {
new GUI();
}
});
}
public GUI()
{
DefaultMutableTreeNode html = new DefaultMutableTreeNode("html");
Document doc = Jsoup.parse(test);
int children = doc.childNodes().get(0).childNodes().size();
for(int i=0; i < children; i++){
String tag = doc.childNodes().get(0).childNodes().get(i).nodeName();
String text = "N/A"; //doc.childNodes().get(0).childNodes().get(i).toString();
html.add(new DefaultMutableTreeNode("Tag: " + tag+ ", Text: " + text));
System.out.println(tag+" : "+doc.childNodes().get(0).childNodes().get(i).childNodeSize());
if(doc.childNodes().get(0).childNodes().get(i).childNodeSize() > 0){
getChildren(html.getLastLeaf(), doc.childNodes().get(0).childNodes().get(i),0, doc.childNodes().get(0).childNodes().get(i).childNodeSize());
}
}
System.out.println("tag: " + children);
//System.out.println(Tree.get(2) +" "+Tree.get(2).getChildCount());
tree = new JTree(html);
add(tree);
this.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
this.setTitle("JTree Example");
this.setMinimumSize(new Dimension(300, 400));
this.setExtendedState(3);
this.pack();
this.setVisible(true);
}
public void getChildren(DefaultMutableTreeNode tree, Node doc, int start, int size){
tree.add(new DefaultMutableTreeNode("Tag: " + doc.childNodes().get(start).nodeName()));
start++;
if(start < size){
getChildren(tree, doc, start, size);
}
}
}