3

I need to turn phonenumbers into international format. I have a list of phone number examples

rows = [
  (datetime.time(20, 35, 30), '0707262078',),
  (datetime.time(20, 38, 18), '+46706332602',),
  (datetime.time(20, 56, 35), '065017063'),
  (datetime.time(21, 45, 1), '+46730522807',),
  (datetime.time(22, 13, 47), '0046733165812')
]

I need to replace all numbers starting with ex 07 with +467, all 06 with +466 and 00 with +. For above example I need the number to turn out 0707262078 to +46707262078, 065017063 to +4665017063 and 0046733165812 to +46733165812.
Dont know if its possible to do this in regex only or if I need to do it with other code.

Been trying with re.sub combined with lamda, my thought is to make a dictionary with the matching replaces like this:

repl_dict = {
  '01': '+461',
  '02': '+462',
  '03': '+463',
  '04': '+464',
  '05': '+465',
  '06': '+466',
  '07': '+467',
  '08': '+468',
  '09': '+469',
  '00': '+'
}

My try so far:

import re
rows = [
  (datetime.time(20, 35, 30), '0707262078',),
  (datetime.time(20, 38, 18), '+46706332602Ring via Mitel ',),
  (datetime.time(20, 56, 35), '065017063'),
  (datetime.time(21, 45, 1), '+46730522807Ring via Mitel ',),
  (datetime.time(22, 13, 47), '0046733165812')
]

repl_dict = {
  '01': '+461',
  '02': '+462',
  '03': '+463',
  '04': '+464',
  '05': '+465',
  '06': '+466',
  '07': '+467',
  '08': '+468',
  '09': '+469',
  '00': '+'
}

for row in rows:
    regex = re.compile(r'^\d{1}[0-9](\d*)'), re.S
    DialedNumber = regex.sub(lambda match: repl_dict.get(match.group(0), row[1]), row[1], row[1])  
1
  • Can I note that you for loop is not indented properly. Fix that first(it may just be a copying and pasting error). Commented Feb 13, 2018 at 15:07

4 Answers 4

3

Your regex, ending in \d*, will match the entire number, and hence no entry is found in the dict. Also, there seems to be an unmatched parens and one too many row[1] in the call to sub.

You can simplify your regex to ^00? and your replacements dict to {'00': '+', '0': '+46'}. This will check whether the number starts with either one or two 0, making the replacement dict much simpler and less repetetive.

rows = [(datetime.time(20, 35, 30), '0707262078',), (datetime.time(20, 38, 18), '+46706332602Ring via Mitel ',), (datetime.time(20, 56, 35), '065017063'), (datetime.time(21, 45, 1), '+46730522807Ring via Mitel ',), (datetime.time(22, 13, 47), '0046733165812')]
repl_dict = {'00': '+', '0': '+46'}
regex = re.compile(r'^00?')
for date, number in rows:
    print(regex.sub(lambda match: repl_dict.get(match.group(0)), number))

Output:

+46707262078
+46706332602Ring via Mitel 
+4665017063
+46730522807Ring via Mitel 
+46733165812

If you only want the numeric part, you can pre- or postprocess the numbers with a second regex like [0-9+]*.

Sign up to request clarification or add additional context in comments.

1 Comment

[(tm, re.sub('^(00|0)', lambda m: {'00': '+', '0': '+46'}[m.group(0)], ph)) for (tm, ph) in rows] (Admittedly making it a one-liner does make it less nice, but you shortened both the pattern and the replacement dictionary so much that I wanted to show it.)
1

This is the naive approach based on repl_dict as given in your question.

def repl(match): 
    return repl_dict[match.group(0)]

pat = '^(' + '|'.join(repl_dict) + ')'
new_rows = [(tm, re.sub(pat, repl, ph)) for (tm, ph) in rows]

tobias_k's answer gives a better approach by improving your repl_dict and pattern.

Comments

1

Regex: ^[0-9]{2}

Details:

  • ^ Asserts position at start of a line
  • [] Match a single character present in the list
  • {n} Matches exactly n times

Python code:

By @tobias_k you can use repl_dict.get(m.group(), m.group()) instead of repl_dict.get(m.group()) or m.group().

regex = re.compile(r'^[0-9]{2}')

for i in range(len(rows)):
    rows[i] = (rows[i][0], regex.sub(lambda m: repl_dict.get(m.group()) or m.group(), rows[i][1]))

Output:

[(datetime.time(20, 35, 30), '+46707262078'), (datetime.time(20, 38, 18), '+46706332602Ring via Mitel '), (datetime.time(20, 56, 35), '+4665017063'), (datetime.time(21, 45, 1), '+46730522807Ring via Mitel '), (datetime.time(22, 13, 47), '+46733165812')]

Code demo

3 Comments

This might fail if, for whatever reason, there are number starting with a digit other than 0. For example, if the number is "12345", then get returns None and you end up with "345"
@tobias_k Fixed it!
You could also use repl_dict.get(m.group(), m.group()), i.e. using the matched group itself as a default.
0

You can do it without a regex:

for row in rows:
  for repl in repl_dict:
    if row[1].startswith(repl):
      print repl_dict[repl]+row[1][len(repl):]
      break

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.