0

I require to find out the phone bill due date from SMS using Python 3.4 I have used dateutil.parser and datefinder but with no success as per my use-case.

Example: sms_text = "Your phone bill for Jun'17 of Rs.72.23 due on 15-07-2017 has been sent to your regd email ID [email protected]. Pls check Inbox"

Code 1:

import datefinder
due_dates = datefinder.find_dates(sms_text)
for match in due_dates:
    print(match)

Result: 2017-07-17 00:00:00

Code 2:

import dateutil.parser as dparser
due_date = dparser.parse(sms_text,fuzzy=True)
print(due_date)

Result: ValueError probably because of multiple dates in the text

How can I pick the due date from such texts? The date format is not fixed but there would be 2 dates in the text: one is the month for which bill is generated and other the is the due date, in the same order. Even if I get a regular expression to parse the text, it would be great.

More sample texts:

  1. Hello! Your phone billed outstanding is 293.72 due date is 03rd Jul.
  2. Bill dated 06-JUN-17 for Rs 219 is due today for your phone No. 1234567890
  3. Bill dated 06-JUN-17 for Rs 219 is due on Jul 5 for your phone No. 1234567890
  4. Bill dated 27-Jun-17 for your operator fixedline/broadband ID 1234567890 has been sent at [email protected] from [email protected]. Due amount: Rs 3,764.53, due date: 16-Jul-17.
  5. Details of bill dated 21-JUN-2017 for phone no. 1234567890: Total Due: Rs 374.12, Due Date: 09-JUL-2017, Bill Delivery Date: 25-Jun-2017,
  6. Greetings! Bill for your mobile 1234567890, dtd 18-Jun-17, payment due date 06-Jul-17 has been sent on [email protected]
  7. Dear customer, your phone bill of Rs.191.24 was due on 25-Jun-2017
  8. Hi! Your phone bill for Rs. 560.41 is due on 03-07-2017.
8
  • If your strings are as simple as this, you can just use regex. Commented Jul 13, 2017 at 12:14
  • @cᴏʟᴅsᴘᴇᴇᴅ I'd love to sir... the strings are simple but date format may vary. Also, I am not very good with regex. If the result is extracting the due_date, a regex would also be perfect for me. Commented Jul 13, 2017 at 12:17
  • When you say the date format my wary, that rings a few alarm bells. What possible date formats would you encounter? There's no point having a regex that works for one format but fails for everything else. Commented Jul 13, 2017 at 12:18
  • 1
    due dates can be: YYYY-MM-DD, DD-MM-YYYY, MMMDD, DDMMM. Bill Month can be: MMM-YY, MMM'YY, MMM YYYY. These are few examples I have encountered. As the format was not fixed, I was trying to solve it using Python3.x utilities which can detect different date formats Commented Jul 13, 2017 at 12:21
  • My apologies. I'm not sure regex can handle so many formats. Commented Jul 13, 2017 at 12:28

4 Answers 4

3

An idea for using dateutil.parser:

from dateutil.parser import parse

for s in sms_text.split():
    try:
        print(parse(s))
    except ValueError:
        pass
Sign up to request clarification or add additional context in comments.

1 Comment

it gets messed up on parsing other numeric values such as amount
2

There are two things that prevent datefinder to parse correctly your samples:

  1. the bill amount: numbers are interpreted as years, so if they have 3 or 4 digits it creates a date
  2. characters defined as delimiters by datefinder might prevent to find a suitable date format (in this case ':')

The idea is to first sanitize the text by removing the parts of the text that prevent datefinder to identify all the dates. Unfortunately, this is a bit of try and error as the regex used by this package is too big for me to analyze thoroughly.

def extract_duedate(text):
    # Sanitize the text for datefinder by replacing the tricky parts 
    # with a non delimiter character
    text = re.sub(':|Rs[\d,\. ]+', '|', text, flags=re.IGNORECASE)

    return list(datefinder.find_dates(text))[-1]

Rs[\d,\. ]+ will remove the bill amount so it is not mistaken as part of a date. It will match strings of the form 'Rs[.][ ][12,]345[.67]' (actually more variations but this is just to illustrate).

Obviously, this is a raw example function. Here are the results I get:

1 : 2017-07-03 00:00:00
2 : 2017-06-06 00:00:00 # Wrong result: first date instead of today
3 : 2017-07-05 00:00:00
4 : 2017-07-16 00:00:00
5 : 2017-06-25 00:00:00
6 : 2017-07-06 00:00:00
7 : 2017-06-25 00:00:00
8 : 2017-03-07 00:00:00

There is one problem on the sample 2: 'today' is not recognized alone by datefinder

Example:

>>> list(datefinder.find_dates('Rs 219 is due today'))
[datetime.datetime(219, 7, 13, 0, 0)]
>>> list(datefinder.find_dates('is due today'))
[]

So, to handle this case, we could simply replace the token 'today' by the current date as a first step. This would give the following function:

def extract_duedate(text):
    if 'today' in text:
        text = text.replace('today', datetime.date.today().isoformat())

    # Sanitize the text for datefinder by replacing the tricky parts 
    # with a non delimiter character
    text = re.sub(':|Rs[\d,\. ]+', '|', text, flags=re.IGNORECASE)

    return list(datefinder.find_dates(text))[-1]

Now the results are good for all samples:

1 : 2017-07-03 00:00:00
2 : 2017-07-18 00:00:00 # Well, this is the date of my test
3 : 2017-07-05 00:00:00
4 : 2017-07-16 00:00:00
5 : 2017-06-25 00:00:00
6 : 2017-07-06 00:00:00
7 : 2017-06-25 00:00:00
8 : 2017-03-07 00:00:00

If you need, you can let the function return all dates and they should all be correct.

11 Comments

Agreed. I am working along similar lines, trying to split the text and check individual words for a date. In your solution, I think regex might need some modification as datetime.datetime(2017, 7, 17, 0, 0) is no date in the text. It is still somehow referring to something else
@DrunkKnight Actually, I think the year is filled as of today by dateutil. Anyway, I'm testing with your added examples, it almost works for all the cases.
Does the regex work or you are using Python utility?
@DrunkKnight I'm using the same method as in my answer trying to find a better regex. The idea would be to find the parts that cause issues so you can sanitize the string first.
@DrunkKnight I updated my answer, this is not perfect but I hope this will help.
|
0

Why not just using regex? If your input strings always contain this substrings due on ... has been you can just do something like that:

import re
from datetime import datetime

string = """Your phone bill for Jun'17 of Rs.72.23 due on 15-07-2017 has been
 sent to your regd email ID [email protected]. Pls check Inbox"""

match_obj = re.search(r'due on (.*) has been', string)

if match_obj:
    date_str = match_obj.group(1)
else:
    print "No match!!"
try:
    # DD-MM-YYYY
    print datetime.strptime(date_str, "%d-%m-%Y")
except ValueError:
    # try another format
    try:
        print datetime.strptime(date_str, "%Y-%m-%d")
    except ValueError:
        try:
            print datetime.strptime(date_str, "%m-%d")
        except ValueError:
            ...

3 Comments

Using dateutil.parser would be easier than trying all the possible formats at the end.
@Alexey It could be due on, due date is, is due for month <Jun'17> on <date>. Format is not fixed for text or date.
@Gall I am trying to work with that, but I am not able to understand the results I am getting using it.
0

Having a text message as the example you have provided:

sms_text = "Your phone bill for Jun'17 of Rs.72.23 due on 15-07-2017 has been sent to your regd email ID [email protected]. Pls check Inbox"

It could be possible to use pythons build in regex module to match on the 'due on' and 'has been' parts of the string.

import re

sms_text = "Your phone bill for Jun'17 of Rs.72.23 due on 15-07-2017 has been sent to your regd email ID [email protected]. Pls check Inbox"

due_date = re.split('due on', re.split('has been', sms_text)[0])[1]

print(due_date)

Resulting: 15-07-2017

With this example the date format does not matter, but it is important that the words you are spliting the string on are consistent.

3 Comments

It is similar to what Alexey has posted, but the text is not consistent wherein lies the entire problem.
Could you maybe add the restrictions to the question? Since it is valuable information for people trying to help you.
I am sincerely hoping that someone would provide some solution without the restrictions, as it is essentially date extraction, the solution to which Python provides. Adding restrictions will neither help me or nor we will get a fool-proof solution

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.