1

I'm fetch value from the URL.

import urllib2
response = urllib2.urlopen('url')    
response.read()

It's give me too long string type output, but I only put here what I have issue.

STRING TYPE OUTPUT:

'<p>Dear Customer,</p>
<p>This notice serves as proof of delivery for the shipment listed below.</p>
<dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd>
<dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd>
<dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
<dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
<dt><label>Left At:</label></dt>
<dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'

QUESTION:

how I can take date (09/14/2015 11:07 A.M.) which is assign for Delivered On?

5
  • 1
    If the time format has constant length. you might use like re.search('Delivered On:</label></dt><dd>(.*)$',a).group(1)[:20], where a is the string Commented Sep 25, 2015 at 8:04
  • @Vineesh, Thank you so much for your comments, your code works fine but it's fail when Delivered On: is empty. Here is error. AttributeError: 'NoneType' object has no attribute 'group' Commented Sep 25, 2015 at 13:20
  • Can you add an check for it . Like "data = re.search('Delivered On:</label></dt><dd>(.*)$',a)" then "if data: data.group(1)[:20]". This should handle Nonetype Commented Sep 25, 2015 at 13:29
  • I added but it's give me output this => '</dd><dt><label for=' Commented Sep 25, 2015 at 13:33
  • the code is written in the answer box Commented Sep 25, 2015 at 13:47

5 Answers 5

6

You could start by using something like Beautiful Soup or some other html parser. It might look something like this:

from bs4 import BeautifulSoup
import urllib2
response = urllib2.urlopen('url')    
html = response.read()
soup = BeautifulSoup(html)
datestr = soup.find("label", text="Delivered On:").find_parent("dt").find_next_sibling("dd").string

And if you need to, once you have a hold of the date string, you can use strptime to convert it to a datetime object.

import datetime
date = datetime.datetime.strptime(datestr, "%mm/%dd/%Y %I:%M %p")

Remember - you generally should not find yourself parsing HTML or XML with regexes...

Sign up to request clarification or add additional context in comments.

8 Comments

"Never Say Never Again". If you want to parse 1B of letters, it's better to write you own tool to parse html instead of using BeatifulSoup, because Soup is a tool for html analyze. And it does a lot of work, that you (probably) don't need. Also, Soup are not memory efficient.
haha okay yes you're right... never say never. I just was thinking about this famous question (and top answer): stackoverflow.com/questions/1732348/…
Now it's much better ;) Here is your +1 :) Btw, look into second answer from that topic :)
@Jimilian: no, regex is even less of an answer with larger masses of XML. There are fast tools to parse XML that are not BeautifulSoup. Doesn't mean regex is the only alternative.
In general, there are XML parsers that build up a useable DOM presentation (like BS) and then there are parsers that read a stream of XML into a tokenized stream, usually only used when the input XML doesn't fit into the memory.
|
1

Try this code:

import re

text = '''<p>Dear Customer,</p>
          <p>This notice serves as proof of delivery for the shipment listed below.</p>
          <dl class="outHozFixed clearfix"><label>Weight:</label></dt>
          <dd>18.00 lbs</dd>
          <dt><label>Shipped&#047;Billed On:</label></dt>
          <dd>09/11/2015</dd>
          <dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
          <dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
          <dt><label>Left At:</label></dt>
          <dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'''

re.findall(r'<dt><label>Delivered On:<\/label><\/dt><dd>([0-9\.\/\s:APM]+)', text)

OUTPUT:

['09/14/2015 11:07 A.M.']

Comments

1

Based on that output only, I would use re and re.search. Create a regex for finding a date with time, like this:

import re

output = '''<p>Dear Customer,</p>
            <p>This notice serves as proof of delivery for the shipment listed below.</p>
            <dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd>
            <dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd>
            <dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
            <dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
            <dt><label>Left At:</label></dt>
            <dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'''

pattern = '\d{2}/\d{2}/\d{4} \d{1,2}:\d{2} [A|P]\.M\.'

result = re.search(pattern, text, re.MULTILINE).group(0)

1 Comment

Thank you so much. Your code works fine but it's fail when Delivered On: is empty. Here is error. AttributeError: 'NoneType' object has no attribute 'group'
1

If you don't like regexp and third-part libraries, you always can use old-school hardcoded one-line solution:

import datetime

text_date = [item.strip() for item in input_text.split('\n') if "Delivered On:" in item][0][41:-5]
datetime.datetime.strptime(text_date.replace(".",""), "%m/%d/%Y %I:%M %p")

For one line case:

start_index = input_text.index("Delivered On:")+len("Delivered On:</label></dt><dd>")
stop_index = start_index + 21
text_date = input_text[start_index:stop_index]

Because any solution for your question will be a different type of hardcode :(

6 Comments

thank you for your answer. But this code will not fetch the date.
if you test @Alexandr Faizullin code, I get what i want. But in your case I didn't get what I want.
It sounds fair enough, but what output you get? Can you show it? It's just interesting for me.
yes i will. JFI, input text will be in one line not a "\n" might be that will be issue. You are testing with line by line and i get response from the server in one line.
@Odedra, yeap, for one line case solution should be different :)
|
1

Try this code:

import re
a = """<p>Dear Customer,</p><p>This notice serves as proof of delivery for the shipment listed below.</p><dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd><dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd><dt><label>Delivered On:</label></dt><dd>12/4/2015 11:07 A.M.</dd><dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt><dt><label>Left At:</label></dt><dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>"""
data = re.search('Delivered On:</label></dt><dd>(.*)$',a)
if data and data.group(1)[:1].isdigit(): 
    data.group(1)[:20]

2 Comments

I added but it's give me output this => '</dd><dt><label for='
@Odedra, I have added one more check in the answer part. Can you please try with this

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.