6

I have to parse a list of simple strings with a known structure but I'm finding it unnecessarily clunky. I feel I'm missing a trick, perhaps some simple regex that would make this trivial?

The string refers to some number of years/months in the future, I want to make this into decimal years.

Generic format: "aYbM"

Where a is the number of years, b is the number of months these can be ints and both are optional (along with their identifier)

Test cases:

5Y3M == 5.25
5Y == 5.0
6M == 0.5
10Y11M = 10.91666..
3Y14M = raise ValueError("string '%s' cannot be parsed" %input_string)

My attempts so far have involved string splitting and been pretty cumbersome though they do produce the correct results:

def parse_aYbM(maturity_code):
    maturity = 0
    if "Y" in maturity_code:
        maturity += float(maturity_code.split("Y")[0])
        if "M" in maturity_code:
            maturity += float(maturity_code.split("Y")[1].split("M")[0]) / 12
        return maturity
    elif "M" in maturity_code:
        return float(maturity_code[:-1]) / 12
    else:
        return 0 

3 Answers 3

5

You could use the regex pattern

(?:(\d+)Y)?(?:(\d+)M)?

which means

(?:        start a non-grouping pattern
  (\d+)    match 1-or-more digits, grouped
  Y        followed by a literal Y
)?         end the non-grouping pattern; matched 0-or-1 times
(?:        start another non-grouping pattern
  (\d+)    match 1-or-more digits, grouped
  M        followed by a literal M
)?         end the non-grouping pattern; matched 0-or-1 times 

When used in

re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', text).groups()

the groups() method returns the portion of the matches inside the grouping parentheses. None is returned if the group was not matched. For example,

In [220]: re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', '5Y3M').groups()
Out[220]: ('5', '3')

In [221]: re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', '3M').groups()
Out[221]: (None, '3')

import re
def parse_aYbM(text):
    a, b = re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', text).groups()
    if a == b == None:
        raise ValueError('input does not match aYbM')
    a, b = [int(item) if item is not None else 0 for item in (a, b)]
    return a + b/12.0

tests = [
('5Y3M', 5.25),
('5Y', 5.0),
('6M', 0.5),
('10Y11M', 10.917),
('3Y14M', 4.167),
]

for test, expected in tests:
    result = parse_aYbM(test)
    status = 'Failed'
    if abs(result - expected) < 0.001:
        status = 'Passed'
    print('{}: {} --> {}'.format(status, test, result))

yields

Passed: 5Y3M --> 5.25
Passed: 5Y --> 5.0
Passed: 6M --> 0.5
Passed: 10Y11M --> 10.9166666667
Passed: 3Y14M --> 4.16666666667

Note, it's not clear what should happen if the input to parse_aYbM does not match the pattern. With the code above a non-match raises ValueError:

In [227]: parse_aYbM('foo')
ValueError: input does not match aYbM

but a partial match may return a value:

In [229]: parse_aYbM('0Yfoo')
Out[229]: 0.0
Sign up to request clarification or add additional context in comments.

3 Comments

Strictly speaking, your "non-match" is actually matching the empty string, since both pieces are optional. This returns the groups() as (None, None). It is your code that is raising the ValueError, not the re module. Nice solution, though.
You can guard against a number of months >= 12 (as indicated in the original question) with r"(?:(\d+)Y)?(?:(0?\d|1[01])M)?\b" - the OP wasn't clear on whether leading zeros might be present or not. And the trailing \b guards against matching a leading year with invalid month.
Thanks for the detailed answer, breaking out what the regex expression actually does! I've found documentation on regex assumes a certain level of knowledge and is almost impossible to read if you're not there so this is really helpful.
0

You may use re.findall

>>> def parse(m):
    s = 0
    j = re.findall(r'\d+Y|\d+M', m)
    for i in j:
        if 'Y' in i:
            s += float(i[:-1])
        if 'M' in i:
            s += float(i[:-1])/12
    print(s)


>>> parse('5Y')
5.0
>>> parse('6M')
0.5
>>> parse('10Y11M')
10.916666666666666
>>> parse('3Y14M')
4.166666666666667

Comments

0

Not familiar with python regex, but try something like (?<year>[^Y])\D(?<month>[^M]*)\Dmight just do the trick.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.