0

I wanted to display more than one substring from a string.

Raw string: <td><strong></strong></td><td><strong></strong></td><td><strong></strong></td><td><strong></strong></td><td><strong>Mar08</strong></td><td><strong>Mar09</strong></td><td><strong>Mar10</strong></td><td><strong>Mar11</strong></td><td><strong>Mar12</strong></td><td><strong>Mar13</strong></td></tr>

To display, expected result[Substring] :

Mar08 Mar09 Mar10 Mar11 Mar12 Mar13

I've tried with this code

def parseyear(list):
    sfind = "<strong>"
    efind = "</strong>"
    i = 0
    while i < len(list):
        s =  list.find(sfind,i,len(list))
        e = list.find(efind,s,len(list))
        v = list[s+len(sfind):e]
        i =  i + s
        print v

But it doesn't give the expected result.

5
  • 1
    This looks like HTML. Consider using an HTML parser? Commented Sep 7, 2015 at 6:07
  • I don't see any difference between input and output Commented Sep 7, 2015 at 6:07
  • @AhsanulHaque please find the edited version . Commented Sep 7, 2015 at 6:08
  • Oops, was just trying to adjust formatting a little. Sorry! Commented Sep 7, 2015 at 6:09
  • @ChrisMartin Thank you, no problem Commented Sep 7, 2015 at 6:10

3 Answers 3

2

Use a regex:

>>> for m in re.findall(r'<strong>([^<]+)</strong>', raw_string):
...     print m
... 
Mar08
Mar09
Mar10
Mar11
Mar12
Mar13
Sign up to request clarification or add additional context in comments.

4 Comments

When I tried to do the same for following raw text, it doesn't work <td><strong>0.00</strong></td><td><strong>0.00</strong></td><td><strong>0.00</strong></td><td><strong>0.21</strong></td><td><strong>0.23</strong></td><td><strong>1.23</strong></td><td><strong>1.30</strong></td><td><strong>1.74</strong></td><td><strong>0.87</strong></td><td><strong>0.98</strong></td></tr>
Now you have two problems.
Just refine the regex @jOSe. See my ed answer
1

If you do not want to use regex:

def find_substrings(s, delim_start, delim_end):
    """Find the string that is delimited by two different strings."""
    start = s.find(delim_start)
    # to calculate the length of the start delimiter
    len_delim_start = len(delim_start)
    while start != -1:
        end = s.find(delim_end, start + 1)
        substring = s[(start + len_delim_start):end]
        # print only if substring is not empty
        if substring: print substring
        start = s.find(delim_start, end + 1)

html = """
<td><strong></strong></td><td><strong></strong></td><td><strong></strong></td><td><strong></strong>
</td><td><strong>Mar08</strong></td><td><strong>Mar09</strong></td><td><strong>Mar10</strong></td>
<td><strong>Mar11</strong></td><td><strong>Mar12</strong></td><td><strong>Mar13</strong></td></tr>
"""

html2 = """
<td><strong>0.00</strong></td><td><strong>0.00</strong></td><td><strong>0.00</strong></td><td>
<strong>0.21</strong></td><td><strong>0.23</strong></td><td><strong>1.23</strong></td><td><strong>
1.30</strong></td><td><strong>1.74</strong></td><td><strong>0.87</strong></td><td><strong>
0.98</strong></td></tr>
"""

find_substrings(html2, "<strong>", "</strong>")

# output:
# 0.00
# 0.00
# 0.00
# 0.21
# 0.23
# 1.23
# 1.30
# 1.74
# 0.87
# 0.98

Comments

0

Simply using xml parser, given known xml data structure.

import xml.etree.ElementTree 
s = "<tr><td><strong></strong></td><td><strong></strong></td><td><strong></strong></td><td><strong></strong></td><td><strong>Mar08</strong></td><td><strong>Mar09</strong></td><td><strong>Mar10</strong></td><td><strong>Mar11</strong></td><td><strong>Mar12</strong></td><td><strong>Mar13</strong></td></tr>"
parsed_xml = xml.etree.ElementTree.fromstring(s)
values = [e.text for e in parsed_xml.findall("./td/strong") if e.text]
assert values == ['Mar08', 'Mar09', 'Mar10', 'Mar11', 'Mar12', 'Mar13']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.