0

I want to extract the string KLE3KAN918D429 from the following html code:

<td class="Labels"> CODE (Sp Number): </td><td width="40.0%"> KLE3KAN918D429</td>

Is there a method in C# where I can specify the source-text , start string , end string and get the string between start and end ?

2

3 Answers 3

1

You are, as per the comments, probably better off using a parsing library to iterate the DOM structure but if you can make some assumptions about the html you'll be parsing, you could do something like below:

var html = "<td class=\"Labels\"> CODE (Sp Number): </td><td width=\"40.0%\"> KLE3KAN918D429</td>";
var labelIndex = html.IndexOf("<td class=\"Labels\">");
var pctIndex = html.IndexOf("%", labelIndex);
var closeIndex = html.IndexOf("<", pctIndex);
var key = html.Substring(pctIndex + 3, closeIndex - pctIndex - 3).Trim();
System.Diagnostics.Debug.WriteLine(key);

Likely quite brittle but sometimes quick and dirty is all that is required.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, it works fine. I thought there will be one-line solution to this problem :)
There is if you wrap this as a function which is all any one liner really is ;)
1

As others already suggested, you should use something like HtmlAgilityPack for parsing html. Don't use regular expressions or other hacks for parsing html.

You have several td nodes in your html string. Getting last one is really easy with td[last()] XPath:

string html = "<td class=\"Labels\"> CODE (Sp Number): </td><td width=\"40.0%\"> KLE3KAN918D429</td>";
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var td = doc.DocumentNode.SelectSingleNode("td[last()]");
var result = td.InnerText.Trim(); // "KLE3KAN918D429"

Comments

0

I really suggest using HTMLAgilityPack for this.

It's as easy as:

var doc = new HtmlDocument();
doc.LoadHtml(@"<td class=""Labels""> CODE (Sp Number): </td><td width=""40.0%""> KLE3KAN918D429</td>");

var tdNode = doc.DocumentNode.SelectSingleNode("//td[@class='Labels' and text()=' CODE (Sp Number): ']/following-sibling::td[1]");
Console.WriteLine(tdNode.InnerText.Trim());

Before you start, add HtmlAgilityPack from NuGet:

Install-Package HtmlAgilityPack

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.