I'm trying to use Regex in VBScript to replace a HTML tag that has the class 'candidate' with the text 'PLACEHOLDER'. However, it's not always working.
<[^\>]*class=""[^\>]*candidate[^\>]*""[^\>]*>([\s\S]*?)</[^\>]*>
Flags: IgnoreCase = True, Multiline = True, Global = True
The issue is that I'm not sure what type of HTML tags will contain this class (e.g. It might be a < div > tag or a < p > tag). Secondly the Regex doesn't work particularly well with inner HTML tags.
Subject HTML:
<div class="outer">
<div class="normal">
<p><strong><em>Test</em></strong></p>
</div>
<div class="candidate">
<p>Test 1:</p>
<ul>
<li>Test 2</li>
<li>Test 3 </li>
<li>Test 4 </li>
</ul>
<p>Test 5</p>
</div>
<p>Test 6</p>
<div class="normal">
<p><strong>Test 7</strong></p>
</div>
</div>
Expected:
<div class="outer">
<div class="normal">
<p><strong><em>Test</em></strong></p>
</div>
<div class="candidate">
PLACEHOLDER
</div>
<p>Test 6</p>
<div class="normal">
<p><strong>Test 7</strong></p>
</div>
</div>
Actual:
<div class="outer">
<div class="normal">
<p><strong><em>Test</em></strong></p>
</div>
<div class="candidate">
PLACEHOLDER
<li>Test 2</li>
<li>Test 3 </li>
<li>Test 4 </li>
</ul>
<p>Test 5</p>
</div>
<p>Test 6</p>
<div class="normal">
<p><strong>Test 7</strong></p>
</div>
</div>
The same HTML tag may also have inner tags with the same type but different classes which is currently sporadically working.
e.g:
<div class="candidate">Test<div class="normal">Test</div></div>
Any help would very greatly be appreciated.