0

For personal use i am trying to parse a little html page that show in a simple grid the result of the french soccer championship.

var Url = "http://www.lfp.fr/mobile/ligue1/resultat.asp?code_jr_tr=J01";
WebResponse result = null;
WebRequest req = WebRequest.Create(Url);
result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding(0);
StreamReader sr = new StreamReader(ReceiveStream, encode);

                while (sr.Read() != -1)
                {
                    Line = sr.ReadLine();
                    Line = Regex.Replace(Line, @"<(.|\n)*?>", " ");
                    Line = Line.Replace("&nbsp;", "");
                    Line = Line.TrimEnd();
                    Line = Line.TrimStart();

and then i really dont have a clue either take line by line or the whole stream at one and how to retreive only the team's name with the next number that would be the score.

At the end i want to put both 2 team's with scores in a liste or xml to use it with an phone application

If anyone has an idea it would be great thanks!

4 Answers 4

7

Take a look at Html Agility Pack

Sign up to request clarification or add additional context in comments.

2 Comments

I was just about to suggest this.
+1 sixth Don't Parse HTML With Regex question of the day bonus
1

You could put the stream into an XmlDocument, allowing you to query via something like XPath. Or you could use LINQ to XML with an XDocument.

It's not perfect though, because HTML files aren't always well-formed XML (don't we know it!), but it's a simple solution using stuff already available in the framework.

3 Comments

This assumes the HTML is well-formed XML, which is a long shot.
Ha! I just edited to make a note of that, and when the screen came back - I saw this comment!
Our edits crossed paths like two ships passing in the night... =P
0

You'll need an SgmlReader, which provides an XML-like API over any SGML document (which an HTML document really is).

Comments

0

You could use the Regex.Match method to pull out the team name and score. Examine the html to see how each row is built up. This is a common technique in screen scraping.

2 Comments

And smoking is a common technique for relieving stress. It doesn't mean it's a good idea, or that it works in the long term. ;)
Well smoking is always bad for your health but I wouldn't say the Match method is always bad in a case like this, not sure of his needs. Its nice to know what all the options are bad or good before you make a choice.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.