I sometimes need to parse data like these:
<tr>
<td data-th="Name">
John Smith
</td>
<td data-th="Phone">
1234567
</td>
<td data-th="Postal">
16803
</td>
<td data-th="Office Number">
12345678
</td>
<td data-th="Remarks">
Hello
</td>
</tr>
<tr>
<td data-th="Name">
Mary Smith
</td>
<td data-th="Phone">
1234589
</td>
<td data-th="Postal">
16801
</td>
<td data-th="Office Number">
2385234
</td>
<td data-th="Remarks">
Hi There
</td>
</tr>
I would do something like loading this to a TStringList:
for i := 0 to oStringList.Count-1 do
begin
if oStringList[i].Trim = '<tr>' then
begin
// start of record
end else if oStringList[i].Trim = '</tr>' then
begin
// end of record
end else
begin
// part of record data
end;
end;
Is there a better way to do this, either via some very efficient code, or is there already some really good Delphi components (preferably free/opensource) that can accomplish this? I saw a thread (dated 3+ years ago) in stackoverflow that mentioned a component, just wondering if something better has popped up.
Thanks.
Update: trying the htmlp component --> how do I configure the code to parse above data... the sketchy example did not help. i want to loop through each TR/TR and get the
var HtmlParser: THtmlParser;
var HtmlDoc: TDocument;
var x: Integer;
var body, el: TElement;
var node: TNode;
begin
HtmlParser := THtmlParser.Create;
try
HtmlDoc := HtmlParser.parseString(memo1.Text);
try
body := GetDocBody(HtmlDoc);
if Assigned(body) then
for x := 0 to body.childNodes.length - 1 do
begin
node := body.childNodes.item(x);
if (node is TElement) then
begin
el := node as TElement;
if (el.tagName = 'td') then //and (el.GetAttribute('data-th') = 'Name') then
begin
// iterate el.childNodes here...
//ShowMessage(IntToStr(el.childNodes.length));
memo1.Lines.Add(IntToStr(el.childNodes.length));
end else
begin
end;
end else
begin
memo1.Lines.Add('node is not element');
end;
end;
finally
HtmlDoc.Free;
end;
finally
HtmlParser.Free
end;
end;
</td>in a row) that could be easy for you as a human to adapt to, but on the other hand it also most likely supports entities (<) and whatnot that must be expected with HTML.</td>. may i ask if anyone has any experience with the htmlp parser to know how to parse in this case?