Here is a sample
<tr>
<td>
<div class="VBChap"></div>
<a href="/testing/1">Sample Textbook Chapter 1</a> : Introduction to VB.net
</td>
<td>09/24/2013</td>
</tr>
The document basically consists of these entries repeated over and over
I would like to extract the following:
- the partial URL after href=".
- The Chapter text
- The Chapter Name
- The Date
Currently I am using two separate queries to get the data
Query 1:
(?<=^|>)[^><]+?(?=<|$)
This extracts 2, 3 and 4.
Query 2:
(?<=<a href=")[^"]+
This extracts 1.
I want a single query that can extract all four.
Regex is something I am not good at. It took me 2 hours of trial and error to get this.