I'm currently in the process of converting an old bash script of mine into a Python script with added functionality. I've been able to do most things, but I'm having a lot of trouble with Python pattern matching.
In my previous script, I downloaded a web page and used sed to get the elemented I wanted. The matching was done like so (for one of the values I wanted):
PM_NUMBER=`cat um.htm | LANG=sv_SE.iso88591 sed -n 's/.*ol.st.*pm.*count..\([0-9]*\).*/\1/p'`
It would match the number wrapped in <span class="count"></span> after the phrase "olästa pm". The markup I'm running this against is:
<td style="padding-left: 11px;">
<a href="/abuse_list.php">
<img src="/gfx/abuse_unread.png" width="15" height="12" alt="" title="9 anmälningar" />
</a>
</td>
<td align="center">
<a class="page_login_text" href="/pm.php" title="Du har 3 olästa pm.">
<span class="count">3</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/blogg_latest.php" title="Du har 1 ny bloggkommentar">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/user_guestbook.php" title="Min gästbok">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/forum.php?view=3" title="Du har 1 ny forumkommentar">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/user_images.php?user_id=162005&func=display_new_comments" title="Du har 1 ny albumkommentar">
<span class="count">1</span>
</td>
<td style="padding-left: 11px;" align="center">
<a class="page_login_text" href="/forum_favorites.php" title="Du har 2 uppdaterade trådar i "bevakade trådar"">
<span class="count">2</span>
</td>
I'm hesitant to post this, because it seems like I'm asking for a lot, but could someone please help me with a way to parse this in Python? I've been pulling my hair trying to do this, but regular expressions and I just don't match (pardon the pun). I've spent the last couple of hours experimenting and reading the Python manual on regular expressions, but I can't seem to figure it out.
Just to make it clear, what I need are 7 different expressions for matching the number within <span class="count"></span>. I need to, for example, be able to find the number of unread PMs ("olästa pm").