how to take date value from string in python?

Question

I'm fetch value from the URL.

import urllib2
response = urllib2.urlopen('url')    
response.read()

It's give me too long string type output, but I only put here what I have issue.

STRING TYPE OUTPUT:

'<p>Dear Customer,</p>
<p>This notice serves as proof of delivery for the shipment listed below.</p>
<dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd>
<dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd>
<dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
<dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
<dt><label>Left At:</label></dt>
<dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'

QUESTION:

how I can take date (09/14/2015 11:07 A.M.) which is assign for Delivered On?

If the time format has constant length. you might use like re.search('Delivered On:</label></dt><dd>(.*)$',a).group(1)[:20], where a is the string — Vineesh
– Vineesh, Commented Sep 25, 2015 at 8:04
@Vineesh, Thank you so much for your comments, your code works fine but it's fail when Delivered On: is empty. Here is error. AttributeError: 'NoneType' object has no attribute 'group' — Bhavesh Odedra
– Bhavesh Odedra, Commented Sep 25, 2015 at 13:20
Can you add an check for it . Like "data = re.search('Delivered On:</label></dt><dd>(.*)$',a)" then "if data: data.group(1)[:20]". This should handle Nonetype — Vineesh
– Vineesh, Commented Sep 25, 2015 at 13:29
I added but it's give me output this => '</dd><dt><label for=' — Bhavesh Odedra
– Bhavesh Odedra, Commented Sep 25, 2015 at 13:33

jfs · Accepted Answer · 2015-09-25 16:24:19Z

6

You could start by using something like Beautiful Soup or some other html parser. It might look something like this:

from bs4 import BeautifulSoup
import urllib2
response = urllib2.urlopen('url')    
html = response.read()
soup = BeautifulSoup(html)
datestr = soup.find("label", text="Delivered On:").find_parent("dt").find_next_sibling("dd").string

And if you need to, once you have a hold of the date string, you can use strptime to convert it to a datetime object.

import datetime
date = datetime.datetime.strptime(datestr, "%mm/%dd/%Y %I:%M %p")

Remember - you generally should not find yourself parsing HTML or XML with regexes...

edited Sep 25, 2015 at 16:24

jfs

417k210 gold badges1k silver badges1.7k bronze badges

answered Sep 25, 2015 at 8:04

stett

1,4111 gold badge12 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Jimilian Over a year ago

"Never Say Never Again". If you want to parse 1B of letters, it's better to write you own tool to parse html instead of using BeatifulSoup, because Soup is a tool for html analyze. And it does a lot of work, that you (probably) don't need. Also, Soup are not memory efficient.

stett Over a year ago

haha okay yes you're right... never say never. I just was thinking about this famous question (and top answer): stackoverflow.com/questions/1732348/…

Jimilian Over a year ago

Now it's much better ;) Here is your +1 :) Btw, look into second answer from that topic :)

mike3996 Over a year ago

@Jimilian: no, regex is even less of an answer with larger masses of XML. There are fast tools to parse XML that are not BeautifulSoup. Doesn't mean regex is the only alternative.

mike3996 Over a year ago

In general, there are XML parsers that build up a useable DOM presentation (like BS) and then there are parsers that read a stream of XML into a tokenized stream, usually only used when the input XML doesn't fit into the memory.

|

Bhavesh Odedra · Accepted Answer · 2015-09-25 13:30:25Z

1

Try this code:

import re

text = '''<p>Dear Customer,</p>
          <p>This notice serves as proof of delivery for the shipment listed below.</p>
          <dl class="outHozFixed clearfix"><label>Weight:</label></dt>
          <dd>18.00 lbs</dd>
          <dt><label>Shipped&#047;Billed On:</label></dt>
          <dd>09/11/2015</dd>
          <dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
          <dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
          <dt><label>Left At:</label></dt>
          <dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'''

re.findall(r'<dt><label>Delivered On:<\/label><\/dt><dd>([0-9\.\/\s:APM]+)', text)

OUTPUT:

['09/14/2015 11:07 A.M.']

edited Sep 25, 2015 at 13:30

Bhavesh Odedra

11.2k12 gold badges37 silver badges58 bronze badges

answered Sep 25, 2015 at 8:03

Alexandr Faizullin

665 bronze badges

Comments

makeMonday · Accepted Answer · 2015-09-25 08:08:58Z

1

Based on that output only, I would use re and re.search. Create a regex for finding a date with time, like this:

import re

output = '''<p>Dear Customer,</p>
            <p>This notice serves as proof of delivery for the shipment listed below.</p>
            <dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd>
            <dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd>
            <dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
            <dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
            <dt><label>Left At:</label></dt>
            <dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'''

pattern = '\d{2}/\d{2}/\d{4} \d{1,2}:\d{2} [A|P]\.M\.'

result = re.search(pattern, text, re.MULTILINE).group(0)

answered Sep 25, 2015 at 8:08

makeMonday

2,4354 gold badges27 silver badges44 bronze badges

1 Comment

Bhavesh Odedra Over a year ago

Thank you so much. Your code works fine but it's fail when Delivered On: is empty. Here is error. AttributeError: 'NoneType' object has no attribute 'group'

Jimilian · Accepted Answer · 2015-09-25 14:21:51Z

1

If you don't like regexp and third-part libraries, you always can use old-school hardcoded one-line solution:

import datetime

text_date = [item.strip() for item in input_text.split('\n') if "Delivered On:" in item][0][41:-5]
datetime.datetime.strptime(text_date.replace(".",""), "%m/%d/%Y %I:%M %p")

For one line case:

start_index = input_text.index("Delivered On:")+len("Delivered On:</label></dt><dd>")
stop_index = start_index + 21
text_date = input_text[start_index:stop_index]

Because any solution for your question will be a different type of hardcode :(

edited Sep 25, 2015 at 14:21

answered Sep 25, 2015 at 8:29

Jimilian

3,94932 silver badges35 bronze badges

6 Comments

Bhavesh Odedra Over a year ago

thank you for your answer. But this code will not fetch the date.

Bhavesh Odedra Over a year ago

if you test @Alexandr Faizullin code, I get what i want. But in your case I didn't get what I want.

Jimilian Over a year ago

It sounds fair enough, but what output you get? Can you show it? It's just interesting for me.

Bhavesh Odedra Over a year ago

yes i will. JFI, input text will be in one line not a "\n" might be that will be issue. You are testing with line by line and i get response from the server in one line.

Jimilian Over a year ago

@Odedra, yeap, for one line case solution should be different :)

|

Vineesh · Accepted Answer · 2015-09-25 14:31:28Z

1

Try this code:

import re
a = """<p>Dear Customer,</p><p>This notice serves as proof of delivery for the shipment listed below.</p><dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd><dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd><dt><label>Delivered On:</label></dt><dd>12/4/2015 11:07 A.M.</dd><dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt><dt><label>Left At:</label></dt><dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>"""
data = re.search('Delivered On:</label></dt><dd>(.*)$',a)
if data and data.group(1)[:1].isdigit(): 
    data.group(1)[:20]

edited Sep 25, 2015 at 14:31

answered Sep 25, 2015 at 13:40

Vineesh

2632 silver badges7 bronze badges

2 Comments

Bhavesh Odedra Over a year ago

I added but it's give me output this => '</dd><dt><label for='

Vineesh Over a year ago

@Odedra, I have added one more check in the answer part. Can you please try with this

Collectives™ on Stack Overflow

how to take date value from string in python?

5 Answers 5

8 Comments

Comments

1 Comment

6 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

8 Comments

Comments

1 Comment

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related